13 by xiangpeng

VIEWS: 6 PAGES: 41

									Random Number
   Generator
          May 1, 2006

      Dmitriy Solmonov W1-1
         David Levitt W1-2
         Jesse Guss W1-3
     Sirisha Pillalamarri W1-4
         Matt Russo W1-5

Design Manager – Thiago Hersan
Why Random Numbers?
• Real-Time Simulations
• Encryption
• Gambling




                          2
Encryption
• Need random numbers for authentication
• Key generation
• Software vs. Hardware
  – Less power/time per number
  – Portable
Gambling
• ePoker Rooms
• SoC Deck Generation
• Other future casino games

                                           3
Business Plan
•Potential markets
   •Defense and Intelligence
   Organizations
   •E-Gambling / Casinos
   •Game Consoles
   •Mobile Communication
•License the IP
•Our design will be part of a larger ASIC
or GPP design
                                            4
     IBAA Algorithm
• Uses RC4 encryption algorithm
  – Cryptographically secure
  – Deterministic
• 1024-bit number generated
• Internally Updated Seed
  – not user visible = secure



                                  5
The IBAA Algorithm
 #define ALPHA (8)
 #define SIZE (1<<ALPHA)
 #define ind(x) ((x)&(0x1F))
 #define barrel(a) (((a)<<19)^((a)13))
 uint32 A, B, Y, X;
 uint32 M[32], R[32];
 …
 for ( i=0; i<SIZE; i++ ) {
          X = m[ind(i)];
          A = barrel(A) + M[ind(i +16)];
          M[ind(i)] = Y = M[ind(X)] + A + B;
          R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;
 }
                                                  6
Architecture
         IBAA Algorithm to
           Architecture
for ( i=0; i<SIZE; i++ ) {            4 Reads from M
                                      1 Write to M
         X = M[ind(i)];               1 Write to R
         A = barrel(A) + M[ind(i +16)];

         M[ind(i)] = Y = M[ind(X)] + A + B;

         R[ind(i)] = B = M[ind(Y>>ALPHA)] + X;
}

      dependencies, feedback, and RAW hazards


                                                       8
Algorithm to Architecture
• Hardware Limits
  – Max. of 2 simultaneous reads from
    memory
• Can’t do better than two stages
• Each stage must take multiple cycles to
  complete

                                        9
Algorithm to Architecture
• Chosen Timing
  – Addition = 1 cycle
  – Memory Read = 0.5 cycles
  – Memory is clocked ½ period off phase
  – Set address and receive data in 1 cycle
• When forwarding is applied, need 4
  cycles per stage

                                              10
Stage 1
--------------------------------------
M1 = M[i+16]
--------------------------------------         (X)    (M4)                                   (M1)       (M2)   (M3)

X = M[i] | A = M1 + barrel(A)
                                         Adder Reg Reg             SRAM (M)                  Reg        Reg    Reg


--------------------------------------
M3 = M[X] | C1 = (X==i-1)
--------------------------------------
Y1 = A + (C1) ? Y : M3
                                                              Control Logic
Stage 2                                                                                                   Counter


------------------------------------                               FSM        Counter        Register




Y = B + Y1
------------------------------------
M4 = M[Yaddr] | C2 = (i==Yaddr)
------------------------------------
B = X + (C2) ? Y : M4                    SRAM        (B)     (Y)
                                                             Reg
                                                                   Adder
                                                                              (Y1)
                                                                              Reg
                                                                                     Adder
                                                                                                  (A)
                                                                                                  Reg
                                                                                                          Adder
                                                     Reg

------------------------------------      (R)
M[i] = Y | R[i] = B

                                                                                                                      11
Design For
Manufacture
 Regular Fabrics
13
14
15
Why DFM?

•Ability to print on smaller processes
•Robust Manufacturability
•Sacrifice area, speed and metal layers
for a regular design




                                          16
Regular Fabrics
 Sample Layout:




                  17
Lithography Simulations




                          18
Hardware
 Adder
• Four adders execute 256 times.
• Hybrid adder
• Fast and low power.


                                         B[27:10]

                                                    A[27:10]
          B[31:28]

                     A[31:28]




                                                                       B[9:4]

                                                                                A[9:4]


                                                                                                 B[3:0]

                                                                                                          A[3:0]
                                C’[28]
  C[32]




                                                               C[10]




                                                                                         C’[4]
                CS4                           CS18                         CS6                       CS4



          S[31:28]                       S[27:10]                         S[9:4]                    S[3:0]
                                                                                                                   20
32-Bit Adder: First 4 Bits




                             21
32-Bit Adder: CS6 Block




                          22
32-Bit Adder: CS18 Block




                           23
32 Bit Fast Adder




                    24
Adder Performance
• Delay: 1.56 ns
• Energy Consumption
  – (worst case switching) : 12.4 pJ
• Power Dissipation
  – (estimating with our switch factor) : 148 μW



                                               25
SRAM
   Single Bus Cell




                     Double Bus
                     Cell




                                  26
SRAM




       27
Functional Verification
 • Structural Verilog vs. C Code:
   – Generate numbers under equal load
     conditions
   – Compare Numbers
 • Schematic vs. Structural Verilog
   – Under equal inputs, check if port
     outputs match
 • LVS
                                         28
 Verification
• Schematic and Extracted Parasitic spice
  simulations of major blocks
   – Check for clean signals
   – Check delays and rise/fall times
• Extracted Parasitic simulation of critical
  Register-Register Path
   – Signals are clean
   – Delay = 2.1 ns
• Extracted Parasitic simulation of chip clock
  distribution
                                                 29
Critical Delay




                 30
Final Layout




               31
Poly Density
7.52%




               Metal1 Density
               20.85 %


                                32
                 Metal3 Density
                 18.76%




Metal2 Density
19.89%
                                  33
Metal4 Density
9.36%



Metal5 Density
6.8%




                 34
Analysis
Specifications
 • Pins
   – 36 input pins
      • 32 bit seed input, gen, read, rst, clk
   – 34 output pins
      • 32 bit random output, rdy, done
   – 2 input/output pins
      • vdd, gnd
 • 475 MHz chip speed
 • 436 KHz throughput
                                                 36
               Putting it All Together
         Trans                               Prop         Power     Power
                       Area
Part     Count                       Density Delay        (1x) (mW) (Avg) (mW)
                       (um2)
                                             (ns)         500MHz    475 MHz
Adders   5,856         25,200                   1.45      0.60       0.14
(4)                                  0.232
         (1,464 ea.)   (6,300 ea.)              1.56      0.62       0.148

SRAM                                                      W: 0.51
         17,736        51,000        0.348
(M&R)                                           0.735     W: 3.25    0.27
         (M=10,458     (M=35,000     (M=0.293
                                                0.845     R: 0.19    1.86
         R=7,278)      R=16,000      R=0.456)
                                                          R: 1.40
Regs     6,400         38,400                   0.220     0.53       0.13
(10)                                 0.167
         (640 ea.)     (3,840 ea.)              0.275     0.59       0.145
Total                                           2.1 ns
         33,371        182,000       0.194                   -----   4.1 mW
                                                475 MHz
                                                                      Schematic
                                                                       ExtractRC
                                                                              37
 Performance Comparison

Operation                        Time (ms)
                                 ~4,000,000 Runs

Intel P4 3.20 GHz (90 nm)                    5000

W1-2006 475 MHz (180 nm)                     9000

AMD Opteron Blade 1.005 GHz ()              14000

ARM Intel XScale 700 MHz ()                125000


                                                   38
Where to Now ?
• ERC, tapeout, etc.
• Thermal noise unit to use as input
  seed
• On-Chip Bus Interface
• HyperTransport™ Interface



                                       39
               References
•Jenkins, Robert J. “ISAAC”.
http://burtleburtle.net/bob/rand/isaac.html

•Chirca, Schulte, Glossner, et al. “A
Static Low-Power, High-Performance 32-bit Carry
Skip Adder”.
http://mesa.ece.wisc.edu/publications/cp_2004-
12.pdf

•“CLA and Ling Adders”.
http://umunhum.stanford.edu/~farland/notes.html


                                                  40
Questions




            41

								
To top