Slide 1 - ECE Users Pages

Document Sample
Slide 1 - ECE Users Pages Powered By Docstoc
					ECE 4100/6100
Advanced Computer Architecture

Lecture 11 DRAM and Storage

         Prof. Hsien-Hsin Sean Lee
         School of Electrical and Computer Engineering
         Georgia Institute of Technology
 The DRAM Cell

                        Word Line (Control)




                          Storage Capacitor
     Bit Line
   (Information)

                1T1C DRAM cell
                                                 Stack capacitor (vs. Trench capacitor)
                                              Source: Memory Arch Course, Insa. Toulouse
• Why DRAMs
   – Higher density than SRAMs
• Disadvantages
   – Longer access times
   – Leaky, needs to be refreshed
   – Cannot be easily integrated with CMOS
                                                                                      2
One DRAM Bank

                                            bitlines




                       Row decoder
  Address


                                                       wordline

                                     Sense amps
            Column decoder            I/O gating



                                       Data out


                                                                  3
 Example: 512Mb 4-bank DRAM (x4)
                                                   2k

               BA[1:0]




                          Row
                          Rowdecoder
                           Rowdecoder
                            Rowdecoder
                                               Bank0




                                                                 16K
               A[13:0]
                                decoder
                                           16384 x 2048 x 4

Address



                Columndecoder
                        decoder            Sense amps
                Columndecoder
               Column decoder
               Column                       I/O gating
  A[10:0]                                                              A DRAM page
                                                                       = 2kx4 = 1KB
Address Multiplexing                      Data out D[3:0]

                                                              A x4 DRAM chip

                                                                                      4
  DRAM Cell Array
Wordline0   Wordline1 Wordline2   Wordline3   Wordline1023
                                                             bitline0



                                                             bitline1



                                                             bitline2




                                                             bitline15




                                                                    5
DRAM Sensing (Open Bitline Array)
WL0   WL1   WL2       WL127      WL128 WL129   WL130      WL255




                              Sense
                               Amp




            A DRAM Subarry                     A DRAM Subarry
                                                                  6
Basic DRAM Operations
                                                  Vdd                     Sense
         WL             Vdd/2                                              Amp
              BL                                                                  Vdd/2
                                                Vdd - Vth
driver
                                                                                          Write ‘1’




                                                        Vdd/2 + Vsignal   Sense
                    Precharge to Vdd/2
   WL                                                                      Amp
              BL                                                                  Vdd/2

CBL                Cm                                                                     Read ‘1’
                                Vdd - Vth         refresh
                                                        Vdd   Cm
                                            Vsignal                     Amplified Vsignal
                                                         2 C m  C BL


                                                                                                      7
DRAM Basics
• Address multiplexing
  – Send row address when RAS asserted
  – Send column address when CAS asserted
• DRAM reads are self-destructive
  – Rewrite after a read
• Memory array
  – All bits within an array work in unison
• Memory bank
  – Different banks can operate independently
• DRAM rank
  – Chips inside the same rank are accessed simultaneously


                                                             8
                D0
                                          D0




                      x8
                                                x8
                D7
                D8                        D7
                                          D8




                      x8
                                                x8
                D15
                CB0                       D15
                                          D16




                      x8
                                                x8
                CB7
                D16                       D23
                                          D24




                      x8
                                                x8



                D23
                D24                       D31
                                          D32




    X72 (ECC)
                      x8
                                                x8




                D31
                           x64 (No ECC)

                D32                       D39
                                          D40


                      x8
                                                x8




                D39
                D40                       D47
                                          D48
                      x8
                                                x8




                D47
                D48                       D55
                                          D56
                      x8
                                                x8




                D55
                D56                       D63
                      x8




                D63
                                                     Examples of DRAM DIMM Standards




9
                 Memory
                Controller




                 CS1
                         CS0
     D0




           x8
     D7


                x8
     D8
                                  DRAM Ranks




     D15   x8
                x8
     D16
           x8

     D23
                x8


     D24
           x8



     D31
                x8




     D32
           x8




     D39
                x8




     D40
           x8




     D47
                x8




     D48
           x8




     D55
                x8




     D56
           x8




     D63
                x8
                       Rank1
                          Rank0




10
DRAM Ranks
                                                64b

  8b   8b   8b   8b   8b   8b   8b    8b
                                                      Single
                                                       Rank

                                                      64b

  4b   4b   4b   4b   4b   4b    4b        4b
                                                      Single
                                                       Rank
  4b   4b   4b   4b   4b   4b    4b        4b


                                                      64b

  8b   8b   8b   8b   8b   8b    8b        8b
                                                      Dual-
                                                      Rank
  8b   8b   8b   8b   8b   8b    8b        8b

                                                      64b
                                                               11
DRAM Organization




Source: Memory Systems Architecture Course, B. Jacobs, Maryland   12
   Organization of DRAM Modules
                     Addr and Cmd Bus




   Memory
  Controller




                               Data Bus
                          Channel




                                                                    Multi-Banked DRAM Chip

Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland              13
 DRAM Configuration Example
Source: MICRON DDR3 DRAM




                              14
DRAM Access (Non Nibble Mode)
 RAS


 CAS

             Row       Col                    Col
 ADDR                  Addr                   Addr
             Addr



 DATA                              Data               Data


                                             DRAM Module
                                Assert RAS
                       RAS      Assert CAS
                       CAS
        Memory         WE    Row Address
                            Column Address
        Controller
                     Addr Bus


                     Data Bus
                                              Row Opened

                                                             15
DRAM Refresh
• Leaky storage
• Periodic Refresh across DRAM rows
• Un-accessible when refreshing
• Read, and write the same data back

• Example:
  – 4k rows in a DRAM
  – 100ns read cycle
  – Decay in 64ms

  – 4096*100ns = 410s to refresh once
  – 410s / 64ms = 0.64% unavailability
                                          16
DRAM Refresh Styles
• Bursty

  410s =(100ns*4096)            410s


               64ms                      64ms


• Distributed           15.6s

100ns



               64ms                      64ms


                                                17
DRAM Refresh Policies
• RAS-Only Refresh
                                                DRAM Module
                                  Assert RAS
                         RAS
                         CAS
          Memory
                         WE
          Controller           Row Address

                       Addr Bus

                                                Refresh Row



• CAS-Before-RAS (CBR) Refresh                  DRAM Module

                                  Assert RAS
                         RAS      Assert CAS


                                                 Addr counter
                         CAS        WE High                           No
          Memory                                                address involved
                         WE#
          Controller
                       Addr Bus
                                                 Refresh counter
                                               IncrementRow
                                                                              18
Types of DRAM
• Asynchronous DRAM
   – Normal: Responds to RAS and CAS signals (no clock)
   – Fast Page Mode (FPM): Row remains open after RAS for multiple CAS
     commands
   – Extended Data Out (EDO): Change output drivers to latches. Data can be held on
     bus for longer time
   – Burst Extended Data Out: Internal counter drives address latch. Able to provide
     data in burst mode.

• Synchronous DRAM
   – SDRAM: All of the above with clock. Adds predictability to DRAM operation
   – DDR, DDR2, DDR3: Transfer data on both edges of the clock
   – FB-DIMM: DIMMs connected using point to point connection instead of bus.
     Allows more DIMMs to be incorporated in server based systems

• RDRAM
   – Low pin count



                                                                                   19
Disk Storage




               20
Disk Organization
                            3600 to 15000 RPM
     A sector
(100 to 500)


    512 Bytes

                                                    Platters
                                                    (1 to 12)




                                               A track
               A cylinder                   (5000 to 30000)
                                                              21
Disk Organization
                    Read/write Head
                    (10s of nanometers above magnetic surface)
      Arm




                                                            22
Disk Access Time
• Seek time
   – Move the arm to the desired track
   – 5ms to 12ms
• Rotation latency (or delay)
   – For example, average rotation latency for a 10,000 RPM
     disk is 3ms (=0.5/(10,000/60))
• Data transfer latency (or throughput)
   – Some tens of hundreds of MB per second
   – E.g., Seagate Cheetah 15K.6 sustained 164MB/sec
• Disk controller overhead

• Use Disk cache (or cache buffer) to exploit locality
   – 4 to 32MB today
   – Come with the embedded controller in the HDD
                                                              23
Reliability, Availability, Dependability
• Program faults




                                           24
Reliability, Availability, Dependability
• Program faults
• Static Permanent faults
   – Design flaw
       •FDIV ~500 million$
   – Manufacturing
       •Stuck-at-faults
       •Process variability
• Dynamic faults
   – Soft errors
   – Noise-induced
   – Wear-out


                                           25
Solution Space
• DRAM / SRAM
  – Use ECC (SECDED)


• Disks
  – Use redundancy
    •User’s backup
    •Disk arrays




                       26
RAID
• Reliability and Performance consideration
• Redundant Array of Inexpensive Disks
• Combine multiple small, inexpensive disk drives
• Break arrays into “reliability groups”
• Data are divided and replicated across multiple disk
  drives
• RAID-0 to RAID-5

• Hardware RAID
   – Dedicated HW controller

• Software RAID
   – Implemented in the OS
                                                         27
Basic Principles
• Data mirroring

• Data striping

• Error correction code




                          28
RAID-1

                           A0                    A0
                           A1                    A1
                           A2                    A2
                           A3                    A3
                           A4                    A4
                        Disk 0               Disk 1
                      (Data Disk)         (Check Disk)
•   Mirrored disks
•   Most expensive (100% overhead)
•   Every write to disk also writes to the check disk
•   Can improve read/seek performance with sufficient number of controllers

                                                                          29
RAID-10

     A0       A0       A1       A1       A2       A2
     A3       A3       B0       B0       B1       B1
     B2       B2       B3       B3       B4       B4
     B5       B5       C0       C0
    Data     Data     Data     Data     Data     Data
    Disk 0   Disk 1   Disk 2   Disk 3   Disk 4   Disk 5




• Combine data striping atop of RAID-1



                                                          30
RAID-2

    A0         A1          A2        A3         aECC0      aECC1      aECC2
    B0         B1          B2        B3         bECC0      bECC1      bECC2
    C0         C1          C2        C3         cECC0      cECC1      cECC2
    D0         D1          D2        D3         dECC0      dECC1      dECC2
  Data       Data       Data        Data        Check      Check      Check
  Disk 0     Disk 1     Disk 2      Disk 3      Disk 0     Disk 1     Disk 2

• Bit-interleaving striping
• Use Hamming Code to generate and store ECC on check disks (e.g.,
  Hamming(7,4))
    – Space: 4 data disks need 3 check disks (75%), 10 data disks need 4 check
      disks (40% overhead), 25 data disks need 5 check disks (20%)
    – CPU needs more compute power to generate Hamming code than parity
• Complex controller
• Not really used today!
                                                                                 31
RAID-3

   One
 Transfer    A0       A1       A2       A3      ECCa
   Unit      B0       B1       B2       B3      ECCb
             C0       C1       C2       C3      ECCc
             D0       D1       D2       D3      ECCd
            Data     Data     Data     Data     Check
            Disk 0   Disk 1   Disk 2   Disk 3   Disk 0


• Byte-level striping
• Use XOR parity to generate and store parity
  code on the check disk
• At least 3 disks: 2 data disks + 1 check disk
                                                         32
RAID-4
                      A0              B0                 C0     D0       ECC0
                      A1              B1                 C1     D1       ECC1
                      A2              B2                 C2     D2       ECC2
                       A3             B3                 C3     D3       ECC3



                   Data            Data            Data       Data       Check
                   Disk 0          Disk 1          Disk 2     Disk 3     Disk 0
•   Block-level striping
•   Keep each individual accessed unit in one disk
     –   Do not access all disks for (small) transfers
     –   Improved parallelism
•   Use XOR parity to generate and store parity code on the check disk
•   Check info is calculated over a piece of each transfer unit
•   Small read  one read on one disk
•   Small write  two reads and two writes (data and check disks)
     –   New parity = (old data  new data)  old parity
     –   No need to read B0, C0, and D0 when read-modify-write A0
•   Write is the bottlenecks as all writes access the check disk
                                                                                  33
RAID-5
             A0       B0      C0       D0       ECC1
             A1       B1      C1      ECC0       E0
             A2       B2     ECC4      D1        E1
             A3     ECC3      C2       D2        E2
            ECC2      B3      C3       D3        E3

           Data     Data     Data     Data      Data
           Disk 0   Disk 1   Disk 2   Disk 3    Disk 4

• Block-level striping
• Distributed parity to enable write parallelism. Remove
  bottleneck of accessing parity
• Example: write “sector A” and write “sector B” can be
  performed simultaneously
                                                           34
RAID-6
                A0           B0       C0      ECC0p    ECC0q
              ECC1q          B1       C1       D0      ECC1p
              ECC2p     ECC2q         C2       D1       E0
                A1      ECC3p        ECC3q     D2       E1
                A2           B2      ECC4p    ECC4q     E2

              Data          Data     Data     Data     Data
              Disk 0        Disk 1   Disk 2   Disk 3   Disk 4

•   Similar to RAID-5 with “dual distributed parity”
•   ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0, ECC_p)
•   Sustain 2 drive failures with no data loss
•   Minimum requirement: 4 disks
    – 2 for data striping
    – 2 for dual parity
                                                                35

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:3
posted:7/12/2011
language:English
pages:35