BiCMOS SiGe Transmitter and Receiver for by 1t94zsh0

VIEWS: 190 PAGES: 27

									          100Gb/s per Channel
  Scrambler/Descramble Circuits for
Optical Serializer De-serialzer (SERDES)
 Links using IBM's 8HP, 8XP and 9HP
     SiGe HBT BiCMOS Processes
     Proposal to Advanced Technology Group (NRO/AS&T/ATG)

               (Technical POC: Rick Ridgley)

                     November 19, 2008

                      John F. McDonald
                         Rm CII 6123
               Center for Integrated Electronics
               Rensselaer Polytechnic Institute
                       Troy, NY 12180

                  Office: (518) 276-2919
                  Home: (518) 371-5607
                  FAX:     (518) 276-2990
                  eFAX: (503)-212-9337

     This proposal is concerned with circuit design for high-speed data scramblers for serial
     communications (SERDES), either for computer-to-computer applications or long haul
     focusing on achieving the highest possible data rates from a given technology and
     investigating the fastest technologies suitable for this work. An additional goal is
     designing circuits with flexible characteristics that will allow newer systems to coexist
     with older ones. The proposed research encompasses design, simulation, and perhaps
     ultimately fabrication of data scramblers for serializer/deserializer (SERDES) chips
     using novel circuit elements in IBM’s relatively inexpensive 8HP, 8XP and 9HP SiGe
     HBT BiCMOS process. In 8HP routine design techniques can produce multiplexed output
     and demultiplexed input bit stream data in the ~80GB/s range. With more aggressive
     techniques 120Gb/s is possible in 8XP and 9HP. The fT for the 8HP HBT is 210GHz,
     and in this technology 80GHz flip-flops and 3.5ps gate delays are possible in CML
     circuitry. However, 8XP/9HP will offer an HBT with roughly 350GHz fT, which will
     enable 120 to perhaps 160Gb/s. Although, many computer-to-computer transmission
     links achieve high throughputs on as many as 12 parallel channels these typically employ
     only 10 or possibly 40Gb/s per channel usually in WDM mode. But substantially higher
     rates are possible for each channel in 8XP and 9HP. Even higher bit rates at 240Gb/s
     are possible since we have evidence from TAPO presentations by IBM that SiGe HBT
     technologies 10HP and 11HP in the future assuming appropriate applications can be
     found to exploit that speed. Since Internet upgrades tend to occur in factors of 4 per
     generation (due to the high cost of infrastructure), this 160Gb/s goal would be an
     important milestone. Of course short haul interprocessor communications need not track
     Internet generations, but cost favors a circuit with a wider range of applications. Again
     it would be less expensive to produce in SiGe BiCMOS than in compound semiconductor
     processes. It would also permit cointegration of the SERDES with other CMOS circuitry
     facilitating low cost Systems on a Chip (SOC) such as ATP Framing or Message Packing
     Protocol (MPP) circuits. In particular hardware data scrambling would have to keep up
     with such circuits. Another reason to realize the rate of 160 Gb/s is that the faster speed
     possible in a given HBT technology can be sacrificed for lower power. The trade-off of
     current vs. switching speed is roughly 2:1 (every factor of 2 speed reduction produces a
     factor of 4 less current and power in CML). A problem for such fast circuitry is that
     metallic interconnections suffer HF loss due to such mechanisms as the “Skin Effect,”
     finite dielectric “Loss Tangent” effects, and Silicon Slow Wave losses (if the substrate is
     a semiconductor). Additionally other electromagnetic parasitic effects can cause signal
     distortion such as coupling, line reflection and dispersion. Consequently, optical fiber
     communication is preferable to cable, printed circuit clad, or Hybrid metallization. But
     this requires fast light modulation, and detection. The devices for this must be mounted in
     close proximity to the SERDES. In the limit of extremely fast operation a full integration
     of the light modulation and detection must be sought. Fortunately INTEL has discovered
     that the Drude Effect can be exploited in Si devices for light modulation, and SiGe
     integrated diodes have shown sufficient speed to meet the requirements of the detector.
     INTEL has demonstrated 10 Gb/s operation using a Si FET capacitor type of structure.
     However this approach does not appear promising for 160Gb/s operation. The FET
     capacitor device is terribly large in its lithographic dimensions. On the other hand the
     SiGe HBT seems particularly appropriate in these dimensions. The vertical HBT
     dimensions (particularly the base thickness) determine the ultimate speed of the device.
     The larger lateral dimensions permitted for the HBT are then compatible with optical in-
     substrate waveguides, mirrors, couplers, and resonators.

     In this contract task we propose to push the frontier for data scramblers for SERDES to
     greater than 100 Gb/s per channel in IBM 8XP and 9HP SiGe HBT BiCMOS

Serial communication circuits are at the heart of networking whether for Internet
applications or for parallel processor message passing. Our flexible bit rate goals are
intended to allow incremental upgrades to existing networks while permitting faster
signaling when the environment can sustain it, reducing downtime and costs. Nodes will
be able to communicate with other nodes at the highest data rate supported by both Tx
and Rx circuits. The high-speed aspect of our research is relevant to maximizing the
utility of existing physical infrastructure such as fiber optic lines. Also, maximizing the
performance in a given technology will save in costs as a cheaper and older technology
might possibly be employed when the absolute maximum in serial data rates are not
necessary. For secure communications in any application in-line scrambling and de-
scrambling are essential, and so it must operate at the highest speed possible.

Attached is the Infini-Band Link Speed Roadmap

Figure 1. Infiniband Link Speed Roadmap showing total rate vs. number of WDM

Based on this road map one can see that with 12 parallel WDM channels at 40 Gb/s per
channel approximately a half Tera-Bit is the capacity. In order to go faster, above Tb/s,
either more channels are needed or higher bit rates per channel are needed. The 40Gb/s
number is set by what is achievable with CMOS and current light modulators.
Furthermore 12 WDM channels is a rather bulky structure. While this could be raised to
16 without too much difficulty, raising it to 48 or 64 channels would be more
challenging, hotter, and more bulky. Data synchronization over multiple channels is
harder than over serial channels. SiGe HBT technology offers the chance to quadruple
speeds with the same number of channels. But to accomplish this everything would need
to run 4 times faster and at the same power level.

Higher bit rate networks and switches are essential enabling technologies for parallel
computing. This high rate enhances any STREAM like attributes of data flow in parallel
processor algorithms, but it also reduces the transfer time of large blocks of data. This bit
transfer time together with packing and unpacking, scrambler/descrambler, or encryption
and de-encryption functions compose parts of network latency or delay time for transfer
of packets of information between instruction threads operating on different machines.
In some of the largest machines today this delay can be as high as 10 microseconds. An
algorithm waiting for data flow through such a machine could remain idle for 10’s of
thousands of cycles. It is not surprising then that these large parallel processors often
perform at only 1-5% of peak capacity. Another way of stating this is that these
machines are 95% idle. Many such algorithms have national defense impact. In a world
in which information technology is playing an increasing role in successful military
campaigns, such delays, and associated inefficiencies are intolerable. Fortunately there
are ways to mitigate network latency through faster circuits. This permits packets move
faster, through faster packet preparation circuitry, faster encryption and de-encryption
circuitry, faster bit SERDES, and faster throughput and reconfiguration network
switches. All of these can benefit from the use of SiGe HBT BiCMOS, possibly in
conjunction with 3D chip stacking technology.

Our work initially started with a goal of having a wide lock in range to permit greater
back-compatibility and a mix of differing Tx and Rx rates. The thought being that as
designers reach out to 160Gb/s rates many technologies might not be able to achieve this
vaunted goal. Some manufacturers might offer 80, 100 or 120 Gb/s. This flexibility
comes at a performance price. One of these is the lock in time to switch between packets
at differing rates. Another may be excess noise due to the wider bandwidths of the
circuitry involved, which must then support the lock in range. However, much circuit
research is possible in this area. There are a number of alternatives only just now being
suggested, and as yet unexplored. This has been a rich, classic area for basic circuit
design and analysis research, and it remains so even today.

Historic Review: SERDES Research at Rensselaer

Our earlier work, which has been funded by the Naval Research Lab (NRL), had the
desired goal of achieving a short-haul system with a 20Gb/s NRZ data rate, with the
possibility of reaching a 40Gbps data rate using a process with 50GHz fT HBTs using the
same process. The IBM 5HP SiGe BiCMOS process has been used to accomplish these
objectives. Many of the students on this initial project have graduated and gone on to
Sierra Monolithics where they have produced a commercial 40 Gb/s SERDES using the
next step in the fabrication sequence at IBM, namely the 120 GHz fT 7HP BiCMOS
process. The experience gained in 5HP at RPI clearly made an early commercial product
introduction possible. However, this commercial offering represents only a 30% capture
of the fT as a bit rate. Further research at RPI in the NRL sponsored SERDES effort
showed that approximately 80 Gb/s is attainable in 8HP a 40% capture of fT .
Unexplored in this work is the attendant speed possible in hardware encryption and de-
encryption. So this becomes the target of the present work.

A natural question is what can be done in the next generation of circuitry. Currently IBM
has sent RPI a model for circuit simulation of 8XP which is an intermediate. This
proposal is to continue to pursue and evaluate the most advanced process offered by
IBM, namely the 210 GHz fT 8HP SiGe HBT and the pending 375GHz fT 9HP BiCMOS
process and to achieve 60-80% capture. It is not clear the industry will push on to
achieve this goal since fiber dispersion becomes a problem for long haul communication.
However, recent advances in ADC and DAC technology, some of which is being
conducted at Rensselaer under residual DARPA TEAM funding, promise to provide
direct adaptive dispersion compensation. Nevertheless, for the important area of short-
haul local message passing computer architecture, such fast circuits would still be very
desirable. Some of the expense of the SiGe technology can be offset by the possibility of
replacing multiple chip package solutions with single chip monolithic SOC’s possible in

The designs we have produced all use current mode logic (CML), which has several
benefits over other logic families. First, all gate input and output signals are differential,
reducing the effects of coupling noise as well as reducing the coupling noise generated.
Second, the current in a given gate remains nearly constant due to the current “steering”
nature of the circuit. This minimizes the switching noise introduced into the power
supply rails. Thirdly, the complement of any logic signal is easily obtainable by
transposing the two outputs of a signal pair. Any three input combinatorial circuit as well
as latches and multiplexers can be implemented in a single 3-level current tree using our
selected power supply voltage of –3.4V.

          Figure 2. Full Differential Current Mode Logic AND/NAND Circuit

Above is a typical CML gate, which realizes the AND/NAND function. One of the less
desirable but easily overcome characteristics of this logic family is that the differential
input pairs can require different common-mode levels for proper operation. Fortunately,
different common-mode levels can easily be generated at the outputs of the gates via
emitter-followers. The difference-mode voltage of the signals is approximately 500mV,
meaning each single wire swings through a 250mV range. There is a small, but definite
difference in propagation delays for such circuits depending on the level on which the
changing input signal arrives. In the circuit above, the “a” input would propagate to the
outputs slightly faster than the “b” input. The reason for this is the necessity for lower
transistors in the tree to store charge in devices above them in the tree. The delay is
approximately proportional to the depth down the tree from the pullup rails, for the
unloaded gate case. This has to be taken into account when gate delays are involved in
the generation of critical timing.

Circuits are biased with a constant current at the value which results in the highest
frequency of unity current gain (fT) of the transistors in the tree. Larger transistors have a
higher fT, but they consume more power and so are used sparingly. For most logic
circuits the smallest (1m) 5HP transistors are used which reach a peak fT of 47GHz at
around 0.7ma. The peak is approximately at the same current density for all of the curves
to first order at 0.7ma per square micron. Although the curves are tightly bunched for
various emitter areas, the longer emitters do offer some improvement in fT
(approximately 25% between the 1 and 20 micron length). At the same time the penalty
for smaller emitter sizes is not great. Smaller emitter sizes require proportionately
smaller total currents at the peak, lowering power dissipation. This current must flow
through the transistors on the current path sensitized thorough a CML current tree to
accomplish the best speed possible. Interestingly the prevailing scaling strategy for
bipolar devices (Solomon and Tang scaling) leaves the current density at the peak of this
curve at the same current, which means that the power delay product or PDP for a gate in
each successive generation improves proportionate to the increase in fT. Note that 8XP is
not a fully scaled device. The base thickness has been scaled but the lithographic feature
sizes are those of 8HP or 130nm. A fully scaled 9HP has these features shrunk by 33%
to 90nm.

    Figure 3. The bipolar fT vs. Ic curve for IBM’s 5HP SiGe HBT BiCMOS process.
Figure 4. Four generations of SiGe HBT, fT vs. J C . Note that the current density at
 the peak grows substantially but the actual current into the scaled collector is kept
                    constant in the Solomon and Tang scaling.

Many factors have influenced the direction of our research. All our designs are wafer-
probed, and we have a limited number of signals that can be probed at one time. For this
reason, our designs are fairly monolithic and often require on-chip testing circuitry.
Actually this has become one of the strengths of our group. Ultimately, these circuits
will be operating at speeds that are so fast, there will be no instruments nor probing
hardware capable of observing these signals in the environment where test engineers
conventionally operate. Our test circuits have to be designed with fast in-circuit micro
instrumentation circuits so they can test themselves. This permits the group to look
ahead into the future with our own personal advanced oscilloscopes.

Current Starving VCO

Because circuit speed is often not what simulation predicts at ultra high clock rates, our
clock circuits include variable clock rate ring Voltage Controlled Oscillator (VCO)
designs employing a Voltage Controlled Delay Line (VCDL) element with novel
architectures and methods of control. One VCDL was based on a Tektronix current
mixing strategy patent filed by one of the early students, Hans Greub, in the predecessor
DARPA Fast RISC project in GaAs HBT technology. A later HP patent, called the
“Leap” or leapfrog VCO built on this earlier work. For example, below is a ring buffer
VCO, which utilizes a feed-forward scheme and averaging to allow each buffer to
“anticipate” the incoming edge, resulting in a 33% increase in operating frequency. The
frequency of the VCO was controlled by varying the amount of current through the CML
buffers, altering the current-dependent value of fT. This afforded a reasonably wide
locking range. Extensions of this idea to more stages might eventually widen this locking

One of the reasons for making test circuits is that many features of high-speed operation
such as wire parasitics are not well modeled the design kits. Issues like substrate losses,
slow substrate waves, and switching noise are hard to capture in large circuit simulations.
Of course for small ones it is easier to be more comprehensive. Our budget includes
some of the costs of CAD tool maintenance to stay on top of these issues.

    Figure 5. Voltage Controlled Oscillator (VCO) based on a “Leap” strategy.

           Figure 6. Layout of Leap 4 Stage VCO illustrating symmetry.

     Figure 7. Eight timing signals based on four stage VCO outputs and their

The four phase VCO provides up to 8 sequential timing edges which enable events to
occur only fractions of a current tree delay apart dividing one VCO cycle very precisely.
By mixing the leap and non-leap feed forward signals with a voltage controlled weighting
the oscillator can be tuned over nearly an octave range.
However, the wider the locking range, the wider the bandwidth of the supporting circuits
must be, thereby permitting more noise to enter the signal band. The 8 edges in one VCO
cycle, lessens the demand on operating frequencies for nearly all of the other circuits in
the SERDES. However the uniformity of the spacing of these edges, and the signal
transition timing derived from these edges then becomes one of the defining limitations
of the approach. Any noise or lack of circuit symmetry contributes to jitter or duty cycle
variations, which can close the “eye of pulse” for the communication link or compromise
the bit error rate (BER) of the system.

Data “Shuffling” Scheme

The four-phase feature of the VCO lends itself to an interleaving transmitter scheme. It
was designed to use an innovative “shuffling” data scheme, which required only a
quarter-frequency clock rate for operation. This extraordinary method allows us to
refrain from dealing with the maximum-rate data signals until the final multiplexer/pad
driver of the transmitter or Tx circuit, requiring only small gains in those circuits.
Nothing else in the Rx or Tx circuit needs to operate at this upper limit in frequency, so if
the Tx final mux works well, the rest of the system can keep up with it.

          Figure 8. Final Mux chopping between interleaved data streams.

The “shuffling” scheme is diagrammed above. Two phases, exactly 90 apart, control
multiplexers to take data from shift registers A, B, C, and D. In our case, we used four 4-
bit shift registers to feed the circuit above. A state machine driven by one of the VCO
phases controlled when the registers would be loaded or shifted. Note that the
multiplexer at the bottom left is used to delay the 90 phase signal by the exact amount of
time the data takes to propagate through the latches above it, maintaining the phase
difference so that the final multiplexer can introduce edges exactly between those
introduced by the prior multiplexers which handle the data. The final output multiplexer
has to operate at the final bit rate. Most of the other Tx and Rx circuitry actually operates
at a slower clock rate, although circuit bandwidths need to be high.

Hence this final multiplexer becomes the circuit bottleneck for highest speeds, and a great
deal of time and thought is given to how this is accomplished ranging from circuit
schematic, through circuit artwork layout and sizing of the devices to accomplish the
absolute symmetry required for uniformity in the final data transitions. Any variation in
the spacing between transitions or their rise or fall time whether data dependent or not
closes the “eye of pulse” diagram. The search for the ideal circuit symmetry led to the
development of the “Sym Mux” one level multiplexer. This circuit avoids the different
speeds associated with switching transistors at different levels in conventional CML

 Figure 8. “Sym-Mux” Circuit designed to reduce duty cycle variations by forcing
                 all inputs to enter on the same transistor level.

  Figure 9. Chip layout in 5HP SiGe BiCMOS showing the Sym Mux to the right

Perimeter pads are for wafer probing while area array solder bumps are for packaging.
The circuit in figure 5 is sufficiently uniform to obtain workable eye diagrams at 20 Gb/s
in 5HP, but there is a subtle residual lack of symmetry that occurs when more than one
transistor switch changes at the same time. This second order lack of symmetry proves
more challenging when attempting to operate at 40 Gb/s in 5HP. However this circuit
was fabricated as shown in Figure 8, and exhibited the eye diagram shown on the left side
of Figure 9, at 20 Gb/s. To the right is shown a 50 Gb/s eye generated using IBM 7HP
with an fT that is 2.5 times higher. The right eye was generated without a “Leap” VCO
and lacks the wider tuning range of the circuit on the left, but one can detect a slightly
larger noise in the wider tuning range circuit’s eye.

  Figure 10. Left plot shows eye of a 20 Gb/s pulse for the “Sym Mux” Tx part of
 Figure 5 fabricated in IBM 5HP. Right plot shows a similar plot for 50Gb/s using
 IBM 7HP which has an approximately 2.5 times higher fT . The circuit for the left
   eye has nearly an octave tuning range, whereas the one on the right does not.

 Figure 11. Simulated histogram Sym Mux “eye” axis crossings reveals 2 distinct
peaks. The impact of this lack of duty cycle symmetry can easily be simulated at 40
                      Gb/s and is shown in Figure 12 below.
          Figure 12. Simulated “Sym-Mux” eye diagram at 40 Gb/s in 5HP.

Attempting to squeeze the Sym-Mux circuit to achieve the desired 80% capture of fT
reveals that this circuit is not sufficiently symmetric. Figure 10 reveals that the histogram
of simulated eye diagram crossings at 20 Gb/s actually exhibits two peaks separated by
nearly a 15% an eye cycle. This indicates duty cycle variations that while acceptable at
20 Gb/s tend to close the eye at 40 Gb/s in 5HP. The reason for this as has been already
mentioned is that when two or more transistors switch in the Sym-Mux at the same time
they load each other’s transitions giving rise to different rise and fall times that are data
dependent. This data dependent duty cycle variation gives rise to much of the observed
jitter seen in Figure 9. This result suggests that VCO noise is not yet a severe problem
and the bulk of the problem is still due to subtle circuit asymmetries that were not
recognized earlier in the research program (or by other workers evidently).

The New Approach

Lack of timing symmetry in circuits is a well-known phenomenon even in CMOS. But at
the speeds typically employed in CMOS these subtle timing problems are easy to
overlook. However when attempting to capture 80% of the device fT in circuit
performance every asymmetry counts. This is where the focus of the Rensselaer research
has been, devising novel circuit topologies that can overcome these timing problems. The
current student, Peter Curran, on the project has devised a novel “Mix-Sym” circuit to
address this issue. Since transistors on different levels of CML current trees switch at
different speeds, but transistor dotted collector connections mix their speeds, a circuit
approach has been developed that mixes two levels to obtain an average speed
          consistently that is always the same. While this cleans up the eye duty cycle it does
          involve some compromise on speed since switching is now between the fastest and next
          fastest levels of switching speed. Figure 8 shows a “Mix-Sym” OR-NOR.

                        Figure 13. “Mix-Sym” OR-NOR Circuit Building Block

          Figure 13 shows a “Mix-Mux” constructed using symmetric “Mix-Sym” building blocks
          that extend the idea in Figure 10 to a two way multiplexer. In this case the mux requires
          all 8 phases available from the 4 stage ring VCO shown in Figure 6 rather than just the
          0 or 90 phases shown in Figure 7.

   
      Figure 14. The “Mix-Mux” circuit exhibits improved timing symmetry.

Now the “Mix-Mux” circuit is much more complex than the simple “Sym-Mux” circuit.
It also achieves its symmetry by mixing the fastest signal transitions with slower ones, so
techniques to squeeze even more performance out of the device and circuit become
necessary. Figure 2 shows that with slightly larger emitters it becomes possible to obtain
another 10% from the device at higher currents. Additional circuit tricks to buy back
speed include bridging capacitors, which become important during the signal transitions.

The purpose of this discussion has been to show some of the elements and design
challenge of operation of bit serial circuits at 100 Gb/s rates. The focus of the proposed
research will be to insert encryption circuits into the bit serial system and to do so with
the minimal circuitry, least power and lowest cost. Additionally we have shown some of
the circuitry needed to test and certify the speed at which these building blocks operate.
Statement of Work for 100 Gb/s Circuitry

1 General

1.1 Introduction
    The Advanced Technology Group (AS&T/ATG) is interested in the development of
    techniques that will enable high-data-rate serial communications. A key component
    needed for this is a scrambler/descrambler circuit that can attain 100 GHz channel
    capability. The specific goal of this effort is to enable the creation of a set of well
    designed building blocks on semiconductor material capable of attaining such speeds.
    This effort will demonstrate that such a device can be designed and lager fabricated
    utilizing the IBM 8XP SiGe BiCMOS process capable of attaining a 100 GHz
    scrambler/descrambler circuit.

1.2 Background
    The Intelligence Community (IC) has a need for highly secure data transfers over
    short networks and long hauls. The amount of data has been growing exponentially
    and as such, there is a need for the next generation scrambler/descrambler devices to
    operate at 40 GHz and eventually at 100 GHz.

1.3 Scope
    There is interest in determining what the ultimate speed is for scrambling circuits
    using SiGe HBT devices in the 8HP/8XP/9HP kits primarily by simulation. The
    project will concentrate on examination of a generic building block of a scrambler
    circuit that will be supplied by the government.

   For the simulation to be realistic, not only should this generic individual cell be
   characterized carefully at high speed, with full layout extraction of wire parasitics,
   and over wide temperature range, but the circuit needs to be driven as it will be driven
   in a real serial system, by realistic SERDES signals. Additionally, some effort to
   model a large scrambler circuit even if not entirely realistic in the actual scrambler
   algorithm, to see if other issues like clock distribution, power droop or switching
   noise compromises the circuit performance. The work will be partitioned into two
   years, the first will be simulation using 8HP models, and the second year will attempt
   to fabricate and test an 8HP scrambler and descrambler pair, along with further
   simulations with 8XP.

2 Applicable Documents
2.1 Compliance Documents
    Not Applicable

2.2 Reference Information and Documents
    A provided circuit as depicted in Appendix A

2.3 Tasks
2.3.1 Year One Design
       The contractor is to design the provided circuit along with the necessary support
       functions in the IBM SiGe kit 8HP/8XP/9HP BiCMOS processes. This device,
       consisting of the provided circuit and necessary support functions, consider
       starting with 40 GHz and proceeding to 100 GHz data rates to discern what the
       ultimate speed can be for the IBM SiGe kit 8HP/8XP/9HP BiCMOS processes.
       For the simulation to be realistic, not only this generic individual cell be
       characterized carefully at high speed, with full layout extraction of wire parasitics,
       and over wide temperature range, but the circuit needs to be driven as it will be
       driven in a real serial system, by realistic SERDES signals. Additionally, some
       effort to model a series of the provided generic cell shall be undertaken to see if
       other issues like clock distribution, power droop, or switching noise compromises
       the circuit performance. The contractor will need to establish a goodness measure
       that defines the probability of 80% or greater that the stated throughput can be
       attained for the IBM SiGe kit 8HP/8XP/9HP BiCMOS processes.

       The IBM 8HP is the existing process and as such, the contractor shall utilize the
       8HP as a baseline to design the provided circuit and the necessary support
       functions. In addition to establishing the baseline, the contractor shall forecast any
       speed increase when utilizing the intermediate 8XP process of the provided circuit
       and the necessary support functions.

       Items to be tracked as subtask shall include the following:
       1) Scrambler cell design, simulation in 8HP
       2) Scrambler cell design, simulation in 8XP
       3) SER and DES for testing in 8HP
       4) SER and DES for testing in 8XP
       5) Cell Array Design
       6) Clock Distribution

Requirements that need to be obtained are contained in the following table

Requirements Table
REQUIREMENT                                                    THRESHOLD GOAL

Operating speed (GHz)                                          40                100
Number of Gates (K)                                            10                200
Series of the provided circuit (number connected in series)    5                 20
Manufacturability (% of confidence)                            80                95
Connection to Conventional Fiber Networks (% of
confidence)                                                    80                95

2.3.2 Option Year
      The contractor shall fabricate the provided circuit and the necessary support
      functions using the IBM 8HP BiCMOS process. In addition, the contractor shall
      exercise simulations of the provided circuit and the necessary support functions in
       the extended IBM 8XP BiCMOS process and compare these results to those
       obtained with the 8HP process.

       Items to be tracked as subtask shall include the following:
       1) SER in 8XP
       2) Des in 8XP
       3) Cell Array Design
       4) Clock Distribution
       5) Data skew study
       6) Signal integrity study (including Skin Effects)
       7) Clock jitter (impact of the new DRO technology with PLL’s)
       8) Voltage Droop Checks

Requirements that need to be obtained are contained in the following table

Requirements Table
REQUIREMENT                                                   THRESHOLD       GOAL

Operating Speed (GHz)                                         40              100
Number of Gates (K)                                           10              200
Series of the provided circuit (number connected in series)   5               20
Manufacturability (% of confidence                            80              95
Connection to Conventional Fiber Networks (% of
confidence)                                                   80              95

2.3.3 Program Management
      The Contractor shall manage the effort in order to accomplish the program
      objectives within the cost, schedule, and performance constraints of the contract.
      Management tasks will include post-award planning/program baselining;
      preparation for and presentation of customer meetings and reviews; and additional
      management activities associated with program related events. The program
      manager will be responsible for all matters related to the performance, scheduling
      and financial controls of the program. If there are any deviations with regard to
      cost and schedule, the COTR will be notified within one week of discovery.
      Expenditures will be reported via CDRL A003. Program risks and their
      mitigations shall be tracked by the contractor. The contractor will host and
      conduct a kickoff meeting within 2 weeks of contract award and host a mid-term
      Technical Interchange Meeting with the government COTR. The kickoff meeting
      will establish the direction and schedule for the program. Upon program
      completion, a final briefing will be presented to review results and next steps for
      the program. Additional brief informal meetings and/or telecons will be held as
      needed to further the effort.

3. Deliverables
3.1 Hardware
       All hardware purchased under this contract is deliverable to the government in
accordance with CDRL AXXX. Leased or borrowed hardware is not deliverable. The
COTR will inspect and accept the deliverables within 30 days of receipt.

3.2 Documentation
       The Contractor shall deliver a final report on the analysis and testing in
accordance with CDRL A003.

3.3 Contract Data Requirements List (CDRL)
       The contractor shall provide the following CDRLs:
A001 –Meeting Package: Meeting packages shall include any presentation materials
prepared by the contractor for meetings, including an updated program schedule. Formal
meetings will include a kick-off meeting, midterm meeting, and a final review.

A002 – Progress Report: Monthly progress reports shall be in Contractor-determined
format, and shall describe technical progress/accomplishments, funding status, issues,
risks, schedule and the status of any action items. Reports are due 14 calendar days after
the last day of each month.

A003 – Final Report: The Final Report (in Microsoft Word format) shall summarize the
objective, approach, accomplishments, and the lessons learned. The report will detail the
results of the tasks in Section 2.3. The report is due 7 calendar days prior to the end of the
base contract and any subsequent period of performance. A draft of the report shall be
provided 14 calendar days prior to the final version.

   4 Other Requirements AND CONSIDERATIONS
   4.1 Government Furnished Information
        No GFP or GFE is anticipated for this program other than the provided scrambler
        circuit in Appendix A.
   4.2 Security
   4.2.1 General
           This is an unclassified program
   4.2.2 Program Protection Plan:
           Not Applicable
   4.2.3 Security at Reviews and Incident Reporting
           Not Applicable
The sponsor has indicated that there is interest in determining what the ultimate speed is
for encryption circuits using SiGe HBT devices in the 8HP/8XP/9HP kits primarily by
simulation. Since encryption involves in some cases issues of national security (there are
commercial encryption systems) the project will concentrate on examination of the
building blocks of the encryption circuit. In particular the sponsor has tendered the
circuit shown in Figure 15 for exploration primarily in the first year by simulation, using
the best models available. The work would include circuit schematic generation, non-
extracted wire parasitic simulation, layout of artwork, design rule checks, layout to
schematic validation, network wire parasitic extraction, and post layout simulation

   Figure 15. Generic Encryption Circuit for Design, Simulation and Ultimately
                             Fabrication and Test.

Program Management Plan

For the simulation to be realistic not only should this generic individual cell be
characterized carefully at high speed, with full layout extraction of wire parasitics, and
over wide temperature range, but also the circuit needs to be driven as it will be driven in
a real serial system, by realistic SERDES signals. The reason for this is that the eye of
pulse (the common quality measure of serial bit streams) is partially dictated by the
SERDES and associated clock jitter. The simulation will include high speed
characterization of the generic individual cell with full layout extraction of wire parasitics
over a wide temperature range. In addition, the circuit will be driven with realistic
SERDES signals as observed in a real serial system. Additionally, some effort to model a
large encryption circuit even if not entirely realistic in the actual encryption algorithm, to
see if other issues like clock distribution, power droop, or switching noise compromises
the circuit performance.

The work will be partitioned into two years, the first will be simulation using 8HP
models, and the second year will attempt to fabricate and test an 8HP encryption and de-
encryption pair, along with further simulations with 8XP. In the absence of network
extraction, a priori speed of the shift register chain shown in Figure 15 would be order of
80 GHz in 8HP and greater than 100GHz in 8XP respectively, but special interleaving
mux/demux strategies one could coax out additional speed.

There are no oscilloscopes that work at higher than 50 GHz. There are no eye-of-pulse
instruments suitable for testing circuits in excess of 40 Gb/s. All circuits needed to test
the proposed circuits also have to be designed to be hosted by any actual test chip. Hence
testing of encryption or data scrambler circuits will require SER and DES circuits that
can drive these systems, and mux-up, or demux-down data to more manageable speeds.
One can think of these as interleaved samplers, which is another way to create a
ADC/DAC, fusing together several sample streams (up sampling) or down sampling.
This is also similar to the problem faced by RF designers, which require up-converter and
down-converter circuits.

One should also understand that driving even one foot of coaxial cable at greater than 40
GHz is problematic due to skin effects, so at some point one must incorporate a light
modulator to propagate these 100+ Gb/s signals. A patent disclosure to do this with the
SiGe HBT light modulator has been filed with NRL. Additional funding would be
needed to work on this aspect of the problem.

Hence the following tasks completion of each of which are identified for scheduling
purposes and serving as milestones.

1)Scrambler cell design, simulation in 8HP
2)Scrambler cell design, simulation in 8XP
3)SER and DES for testing in 8HP
4)SER and DES for testing in 8XP
5)Cell Array Design
6)Clock Distribution
7)Data skew study
8)Signal intgrity study (including Skin Effects)
9)Clock jitter (impact of the new DRO technology with PLL's)
10) Voltage Droop Checks

A Gantt chart has been created with these subtasks in mind, and which forms the program
management plan.
Staffing of the Program and cost explanations

The program scope and available funding are able to support one full time graduate
student, travel, CAD tool support, support staff, and some principal investigator charge-
out (2.5% for Academic Year and 3 weeks of summer support). Some funding is
included for TAPO fabrication and test of an encryption de-encryption pair.

Every effort will be made to staff the project with a American doctoral student. Owing
to the fact that such students are becoming increasingly rare in all graduate schools due to
the high cost of undergraduate degrees today, everything should be done to fund a
multiple year effort to provide stability for the degree.

In addition there is 8% charge-out for Professor Kraft who manages the CADENCE CAD
resources and licenses, the IBM SiGe BiCMOS and CMOS kits, and the testing lab.

Facilities and Support

For the past two decades RPI has been a DARPA Contractor involved in innovative
circuit exploitation of DARPA developed technologies including GaAs H-MESFETs and
HBTs, thin film MCM-D packaging, and SiGe HBT BiCMOS. More recently under
DARPA/SPAWAR TEAM the FPGA was developed under IBM’s 7HP fabrication line
and exhibited typical operating speeds between 3 and 7GHz. Later a 40GHz RF Analog
crossbar switch was developed in IBM’s 8HP line. It could be used to route 80 Gb/s
serial data. Such switches can also be used to route phases in a phased array.
Additionally DARPA TEAM has sponsored SiGe DAC’s and ADC’s. The figure below
shows one of the dozens of test circuits pushing hard on the limits of the Walden chart.

RPI has a high frequency testing lab with equipment for both time domain and frequency
domain characterization of devices and validation of circuit performance: a Tektronix
STD/OP 3 Probe Station and Chuck Controller, an Alessi MH 5 Probe Station, 2
Tektronix 11801C and a Tektronix 11801A Digital Sampling Oscilloscopes, a Tektronix
SM-11 Multi-Channel Unit, several SD-32 and SD-24 sampling heads, a Rohde &
Schwarz SML01 (1.1GHz) Signal generator, an Agilent programmable pulse generator,
many HF probes with various configurations and multiple contacts, a Rohde & Schwarz
FS300 Spectrum analyzer (3GHz), an Agilent E4407B spectrum analyzer (26.5GHz), an
Agilent E5062A network analyzer (3GHz), an Agilent 8654A signal generator
(520MHz), an Agilent 33120 arbitrary waveform generator, an Agilent 8510B 40GHz
network analyzer, and an Agilent 20GHz CW signal generator.
  Figure 16. RPI microwave wafer probe station.

Figure 17. RPI Microwave Test Facilities to 40 GHz
         Figure 18. Karl Suss Probe Station 2000 automated wafer prober.

        Figure 19. SunFire parallel processors and design workstation room.

The microelectronics industry is reaching physical limits and the technical and cost
constraints are limiting growth. To address these needs Rensselaer Polytechnic Institute,
International Business Machines (IBM), and New York state have joined together to
establish a unique research and computational center to provide leadership in
nanotechnology modeling and simulation. This center will focus on reducing time and
costs associated with design to manufacturing and producing new integrated predictive
design tools for nano-scale devices. This center (The Computational Center for
Nanotechnology Innovations or CCNI) will complement other upstate NYS regional
strengths (such as Sematech Headquarters and the Sematech EUV and Advanced
Processing Center) in physical device research and development in nano-electronics. The
CCNI will operate heterogeneous supercomputing systems consisting of massively-
parallel Blue Gene supercomputers, Power-based Linux clusters, and AMD Opteron
processor-based clusters. This diverse set of systems will enable large-scale leading-edge
computational research in both the scientific and technical arenas. It is expected that this
initial hardware and software configuration will provide upwards of 70 TeraFLOPS of
computational power with associated high-speed networking and storage, currently the
largest supercomputer facility in a university. This facility will be available to model
EMP propagation in the structures to be built under this contract, thermal modeling,
stress and strain modeling, and grain growth modeling. The CCNI is currently being
delivered by IBM and collaborator AMD, and is due to operational in a matter of months.
Our budget includes fees to gain access to this facility. The CCNI has been funded
through the efforts of NYS State Senator Joe Bruno, Governor Spitzer, and John E.
Kelley, VP of IBM, and another key graduate of RPI.

  Figure 20. Rensselaer Polytechnic CCNI Supercomputer Center during current
             installation of the 70 TOPS machine (Blue Gene Section).

Biographical Sketch of Principal Investigator (John F. McDonald)

Professional Preparation.

Undergraduate Institution(s)                   Major          Degree & Year
Massachusetts Institute of Technology          EE             BSEE 1963

Graduate Institution(s)                        Major          Degree & Year
Yale University                                EE             M.Eng. 1965
Yale University                                EE             PhD     1969


   1985-          Full Professor, Rensselaer Polytechnic Institute
   1974-1985      Associate Professor, Rensselaer Polytechnic Institute
   1969-1974      Assistant Professor, Yale University
   1968           Lecturer, Yale University
   1964           MTS, Bell Labs

Abbreviated ßPublications List Related to the Proposal

1. “A 32-Word by 32-Bit Three-Port Bipolar Retgster File Implemented Using a SiGe
   HBT BiCMOS Technology”, with S. Steidl, I.E.E.E. Journal of Sol. State Circuits, Vol.
   37(#2), Feb. 2002, pp. 228-236 [reprinted in I.E.E.E. collection of papers on SiGe,
   edited by R. Singh, D. Harame, and M. Oprysko, 2004]
2. “SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable
   Computing,” with B. S. Goda, S. R. Carlough, R. P. Kraft and T. W. Krawczyk, J.
   I.E.E. Proceedings – Comp. and Digital Tech, 147(#3), June 20, 2001, pp. 189-194.
3. “Accurate High-Speed Performance Prediction for Full Differential Current-Mode
   Logic: The Effect of Dielectric Anisotropy,'' with A. Garg, Y. L. LeCoz, H. J. Greub,
   R. B. Iverson, R. F. Philhower, P. M. Campbell, C. A. Maier, S. A. Steidl, M. W.
   Ernest, R.P. Kraft, S. R. Carlough, J. W. Perry, T. W. Krawczyk, I.E.E.E.
   Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol.
   18(#2), February, 1999, 212-219.
4. “A 2-GHz Clocked AlGaAs/GaAs HBT Byte-Slice Datapath Chip,” with S. R.
   Carlough, R. A. Philhower, Cliff A. Maier, S. A. Steidl, P. M. Campbell, A. Garg, K.-S.
   Nah, M. W. Ernest, J. R. Loy, T. W. Krawczyk, P. F. Curran, Russell P. Kraft, and H.
   J. Greub, I.E.E.E. Journal of Sol. State Circuits, Vol. 35(#6), June 2000, pp. 885-894.
5. “Integration of CryoCooled Superconducting Analog-to-Digital Converter and SiGe
   Output Amplifier,“ with Deepnarayan Gupta, Alan M. Kadin, Robert J. Webber, Irwin
   Rochwarger, Daniel Bryce, William J. Hollander, Young Uk Yim, Channakeshav,
   Russell P. Kraft, Jin-Woo Kim, and John. F. McDonald, IEEE Trans. on Applied
   Superconductivity, Vol. 13, No. 2, June 2003, pp. 477-584.
6. "Ultra High Speed Interleaved A/D Conversion Using an fT Doubler Core in SiGe HBT
   Technology," M. Chu. R. Heikaus, K. Zhou, J.-R. Guo, C. You, J. F. McDonald, R. P. Kraft,
   Proceedings of the 2003 I.E.E.E. Instrumentation and Measurement Technology Conference,
   Vail, CO, May 20-22, 2003, pp. 839-844.

7. “SiGe HBT Microprocessor Core Test Vehicle,” P. M. Belemjian, O. Erdogan, R. P.
   Kraft, ANDJ. F. McDonald, Proc. of the IEEE, Vol. 93, NO. 9, Sept. 2005. pp. 1669-

Synergistic Activities

(1) URL

(2) Patent #6022595, 2/8/2000, “Increase in Deposition Rate of Vapor Deposited
Polymer by Electric Field,” with T.-M. Lu, G.-R. Yang, and B. Wang.

(3) NSF ILI award DUE-9850821, “Field Programmable Circuit Boards - A 21st Century
Technique for Teaching Computer Hardware Design,” $71,000, 1998.

List of Collaborators (related to SiGe BiCMOS research)

Greg Freeman (IBM), Bernie Meyerson (VP IBM), Raminderpal Singh (IBM), David
Harame (IBM), Modest Oprysko (IBM)

List of Thesis and Post-Doctoral Advisees (in last five years)

Steve Carlough (IBM), Cliff Maier (AMD), Atul Garg (AMD), Bob Philhower (IBM),
Matt Ernest (INTEL), Tom Krawczyk (Sierra Monolithics), Sam Steidl (Sierra

Total Number of Former Doctoral Students Advised: 35
Total Number of Patents: 13
Total Number of Former Post-Doctoral Students Sponsored: 1

To top