Docstoc

CSC 345 Computer Architecture

Document Sample
CSC 345 Computer Architecture Powered By Docstoc
					     CSC: 345
Computer Architecture

      Jane Huang
       Lecture 5
   Memory Organization
    Error Correction
                 Review of cache
Stallings Question 4.2
For the hex main memory addresses 111111,666666,BBBBBB show the
following information in hex form:
Direct mapped cache: 16Mbyte main memory, with FFFC words of 32 bits
each. 16Kword cache with 3FFF words of 32 bits each:
Show Tag, Line, and Word values for these addresses.
Associative cache:
  Direct mapped cache:
16Mbyte main memory,
with FFFC words of 32 bits
each. 16Kword cache with
3FFF words of 32 bits each:
         Show Tag, Line,
and Word values for these
addresses.
  Specify the following
values for hex addresses
111111, 666666, BBBBBB
  Word
  Line
  Tag
 Associative Memory
 Address length
 Number of addressable
units
 Block size
 Number of blocks in main
memory
 Number of lines in cache
 Size of tag.
  Two-way set associative
cache
  Address length
  Number of addressable
units
  Block size
  Number of blocks in main
memory
  Number of lines in set
  Number of sets
  Number of lines in cache
  Size of tag.
     Semiconductor Main Memory
Basic element – memory cell
   Exhibit 2 stable states used to represent 0 and 1
   Can be written into (at least once)
   Can be read to sense state
Random Access Memory
   Read and write easily by use of electrical signals
   Volatile – must be provided with a constant electrical supply or else
   data will be lost. (only good for temporary storage).
   DRAM (Dynamic) and SRAM (Static)
                 Dynamic RAM (DRAM)
DRAM made from cells that store data as charge on capacitors. (Charge =
1, no charge = 0)
Capacitors have a tendency to discharge.
DRAMS need periodic refreshing to maintain data storage.

                    Static RAM (SRAM)
SRAM is a digital device.
Binary values stored using traditional flip-flop logic gates.
SRAM holds value as long as power is supplied.


                     SRAM vs. DRAM
Both volatile
DRAM is simpler, smaller, denser, less expensive – but needs refresh
circuitry. (Only worthwhile for larger memories – main memory).
SRAM is faster, more expensive – therefore usually used for smaller cache
memories.
                               ROM
Read-only memory
Contains a permanent pattern of bits, therefore no power source needed to
maintain bit values.
Created like any other integrated chip.
Useful for microprogramming, system programs, function tables etc.
Problems:
   Large fixed cost incurred for 1 or 1000s of chips.
   No room for error.
Programmable ROM
   If only a small number of ROMs of one memory content are needed, a
   good alternative is programmable ROM (PROM)
   PROM can only be written once, but the writing process is performed
   electronically and need not be done at the time of original chip fabrication.
   Provides flexibility and convenience.
Read mostly memory
   EPROM (Erasable programmable read-only memory – erases everything)
   EEPROM (Electrically erasable programmable read-only memory – byte
   level)
   Flash Memory (Uses electrical technology to flash erase one section)
Chip “Art Gallery”                                      Chip designers often
                                                        ‘secretly’ add artwork to
                                                        the chips they design.




Where is Waldo?
“We caught this silicon version of Waldo (that is about 30 microns in size)
hiding among caches, buses, and registers while searching through many
thousands of square microns of complex circuitry with a high-power optical
microscope. Waldo is the first Silicon Creature that we discovered, and this led
to an exhaustive search for more creatures and construction of the Silicon Zoo
gallery. “
 http://www.wired.com/news/print/0,1294,17028,00.html
Chip “Art Gallery”




Daffy Duck
“As we see it, the engineers that designed this wireframe version of Daffy Duck
must have had a very interesting sense of humor. We found it deeply embedded
within the circuitry of a RISC microprocessor, about 1500 microns away from a
similar-style rendition of Waldo. Daffy is about 50 microns in size, making it
necessary to use a high-power (40X to 60X) microscope objective to photograph
the wireframe character.”
 http://www.wired.com/news/print/0,1294,17028,00.html
64 bit                          0000
                                0001
ROM                             0010
                                0011
                                0100
         X1
                       Four-    0101
                                0110
         X2            input    0111
         X3           sixteen   1000
                      output    1001
         X4                     1010
                      decoder   1011
                                1100
                                1101
                                1110
                                1111



        Input               Output
0   0         0   0     0   0     0    0
0   0         0   1     0   0     0    1
                                           Z1   Z2   Z3   Z4
0   0         1   0     0   0     1    1
0   0         1   1     0   0     1    0
(More rows……)
                                Use of a ROM to                B2 B1 B0 G2 G1 G0
An example                      implement a                    0        0   0    0   0   0
                                conversion from
of ROM                          Binary to Gray Code
                                                               0
                                                               0
                                                                        0
                                                                        1
                                                                            1
                                                                            0
                                                                                 0
                                                                                 0
                                                                                     0
                                                                                     1
                                                                                         1
                                                                                         1
                                (A 24 bit Rom                  0        1   1    0   1   0

• ROM only performs the         consisting of 8                1        0   0    1   1   0
                                words of 3 bits each)          1        0   1    1   1   1
  read operation.                                              1        1   0    1   0   1
                                                               1        1   1    1   0   0
• A given input always
  produces the same output.
• Therefore a ROM is just a                         000
                                                    001
  combinational circuit.        B2                  010
                                     Three Input
                                                    011
• Also can be viewed as a       B1   Eight Output
                                                    100
                                       Decoder
  memory of n words * b bits,   B0                  101
  where 2n = the number of                          110
  inputs, and b = the number                        111

  of outputs.


                                                          G2


                                                                   G1


                                                                                G0
Chip Logic

 Trade offs in terms of speed,
 capacity, and cost.
 Physical arrangement of cells
 matches logical arrangement.
 Memory array organized into W
 words of B bits each.
 Example: 16-Mbit chip  1 M 16-
 bit words.
 One-bit-per chip organization.
 Data is read/written one bit at a
 time.                               16-MBit DRAM
Typical 16 Megabit DRAM (4M X 4)   19 bit address
                                   multiplexed into
                                   the Chip
                                   Select an entire
                                   row using 11
                                   most significant
                                   bits.
                                   Select a column
                                   using 11 least
                                   significant bits.
                                   Refresh
                                   circuitry
                                   (DRAM)
256-Kbyte Memory Organization
MAR                                        MBR   In this example




                Decode 1
                               512




                 of 512
                             Words by       1    a RAM chip
                             512 Bits
                             Chip # 1
                                            2    contains 1 bit
9                           Decode 1 of          per word.
                           512 Bit-Sense    3
                                            4    For 256K 8-bit
                                            5    words – we
9                                           6    need 8 chips.
                1 of 512                    7    Row address
                Decode         512
                             Words by       8
                             512 Bits            simultaneously
                             Chip # 7

                            Decode 1 of
                                                 sent to all 8
                           512 Bit-Sense
                                                 chips.
                                                 Followed by
                Decode 1




                               512
                                                 column address
                 of 512




                             Words by
                             512 Bits
                             Chip # 8            simultaneously
                            Decode 1 of
                                                 sent to all 8
                           512 Bit-sense
                                                 chips.
Group Exercise

  Design a 512K 4 bit memory using 256X256 chips.
  Show how the address would be used to access data.
                      Error Correction
A semiconductor memory system is subject to errors.
Hard failures – permanent physical defects
Environmental abuse, manufacturing defects, wear.
Soft error
Power supply problems, alpha particles.
Need logic for detecting and correcting errors.
Basic technique
    Prior to storing data a code is generated from the bits in the word.
    Code stored alongside the word in memory.
    Code used to identify and correct errors.
    When the word is fetched a new code is generated and compared to the
    stored code.
       No error (normal case)
       Correctable error is detected and corrected.
       Non-fixable error is detected and reported.
                           Hamming Code
A                          B      A                          B

             1                         1       1        0
             1                                 1
         1        0                        1        0

                                                0
    C                                 C

Assign data bits to the inner     Fill the remaining compartments
compartments.                     with parity bits.

                                  The total number of bits in a circle
                                  must equal 1.

                                  For example: The data bits in A =
                                  1+1+1 = 3. This is odd – therefore
                                  add an additional
A                         B           Hamming Code
             1
             1
         1        0


     C




    A                                 B       A                          B
                      1                                    1
             1                    0                1                 0
                      1                                    1
                 10           0                        0         0

                      0                                     0
         C                                        C

    If a bit gets erroneously                 Errors are found in A and C – and
    changed, the parity bits in               the shared bit in A and C is in error
    that circle will no longer add            and can be fixed.
    up to 1.
           Single Bit Errors in 8-bit words
8 data bits
The code needs to represent the bit position of the error. For example, if bit # 2
were in error (10011001  10011011) we would like the syndrome word to
output a value of 2 (0010). If no errors occurred the code should output 0
(0000)
Therefore code length (K) must be greater or equal to Log2W + 1,
where W = word length. ie for 8 bits, it must be big enough to represent
numbers 0 – 8, therefore 4 bits are needed.

  Data bits       Single Error Correction        •   No errors – code = 0.
                  Check Bits     % Increase
                                                 •   One error bit – error
      8               4               50             occurred in one of the
      16              5             31.25            check bits. No action.
      32              6             18.75        •   More than one bit set to
      64              7             10.94            ‘1’ – the numerical
     128              8              6.25            value of the syndrome
     256              9              3.52            indicates the position of
                                                     the data bit in error.
           Single Bit Errors in 8-bit words
Data and check bits arranged into a 12-bit word.
Bit positions numbered from 1 to 12.
Bit positions representing position numbers that are powers of 2 are designated
as check bits.
Check bits calculated as follows:

 C1 = D1      D2              D4      D5             D7
 C2 = D1              D3      D4              D6      D7
 C4 =          D2     D3      D4                              D8
 C1 =                                 D5      D6      D7      D8


Data and check bits arranged into a 12 bit syndrome word:

           8                4
        data bits       check bits
                     Calculating check bits
Bit Position Binary Type
1             0001     C1            All D’s with a 1 in bit 1
2             0010     C2            All D’s with a 1 in bit 2
3             0011          D1
4             0100     C3            All D’s with a 1 in bit 3
5             0101          D2
6             0110          D3
7             0111          D4
8             1000     C4             All D’s with a 1 in bit 4
9             1001          D5
10            1010          D6
11            1011          D7
12            1100          D8

C1 = D1     D2      D4      D5 D7
Each check bit works on every data bit who shares the same bit position
                             Example
  Input word: 00111001
  Databit D1 in rightmost position
  Calculate check bits:
  C1 = 1  0  1  1  0 = 1
  C2 = 1  0  1  1  0 = 1
  C3 = 0  0  1  0 = 1
  C4 = 1  1  0  0 = 0
  Stored word = 001101001111
• If data bit 3 sustains an error (001101101111)
  C1 = 1  0  1  1  0 = 1
  C2 = 1  1  1  1  0 = 1
  C3 = 0  1  1  0 = 1
                                                   C8   C4   C2   C1
  C4 = 1  1  0  0 = 0
                                                   0    1    1    1
• Calculate syndrome word:
  0110 = bit position 6.                           0    0    0    1
• D3 resides in bit position 6.                    0    1    1    0
                                Double Error Detecting
           Previous example is Single-Error-Correcting code.
           Semiconductor memory is usually equipped with SEC-DED (Single-error-
           correcting, double-error-detecting code. SEC-DED requires an extra bit.

  a.                                  b.                                           c.
                    1                              0           0           1                 1       0       1
                    1                                          1                                     0
                1       0                              1               0                         1       0

                                                               0                         1           0
                                           1

   Fill in data bits.                 Calculate check bits.                        Two errors are introduced

  d.                                   e.                                           f.
            1       0       1                          1           0           1             1       0       1
                    0                                              0                                 0
                1       0                                  1               1                     1       1

                    0                                              0                                 0
       1                                       1                                         1
                                       The extra bit checks                         The double error is
SEC identifies the
                                       for even parity.                             detected!
                      Performance
Access Time (latency)
  Random Access = time taken to perform a read or write.
  Non-random access memory = time to position read-write mechanism
  at desired location.
Memory Cycle Time
  Access time + additional time required before a second access can
  commence.
  Affected by behavior of the system bus not the processor.
Transfer Rate
  Rate at which data can be transferred into or out of a memory unit.
  For random access memory = 1/(cycle time).
  Non random-access memory
  TN = TA + ( N / R)
  TN = Average time to read or write N bits
  TA = Average access time
  N = Number of bits
  R = Transfer rate, in bits per second (bps)
                   Magnetic Disks
Tracks: Hard Disk platters arrange data
into concentric circles, rather than one
large spiral, as some other mediums use.
Each circle is called a Track.
Sectors: The smallest addressable unit on
a Track. Sectors are normally 512 bytes
in size, and there can be hundreds of
sectors per track, depending on location.
(Constant bit density – more sectors on
outer tracts)
Heads: The devices used to write and
read data on each platter.
Cylinders: Platters on a hard disk are
                                            http://www.pcguide.com/ref
stacked up, and so are the                  /hdd/geom/tracksDifference
heads. Concentric circles from each         -c.html
parallel platter form a cylinder. (Think
Stargate!)
              Reading and Writing
SEEK: Disk controller sends a command to move
the arm over the proper track. = Seek Time.
Seek time
     Minimum / Maximum
     Average? Sum of all possible seeks divided by the number of possible
     seeks. What is wrong with this???
Rotation latency (delay)
Time for requested sector to rotate under the head.
Average = halfway around disk. (0.5)
If a disk rotates at 10,000 RPM
Avg Rotation time = 0.5 / 10,000 RPM
                     = 0.5 / (10,000/60) RPS
                     = 0.0030 sec = 3.0 ms.
Transfer time
Time it takes to transfer a block of bits. (typically a sector)
Function of block size, disk size, rotation speed, recording density, etc.
                           Example
What is the average time to read or write a 512-byte sector for a disk? The
advertised average seek time is 5ms, the transfer rate is 40MB/sec, it rotates
at 10,000 RPM, and the controller overhead is 0.1ms. Assume the disk is
idle so that there is no queueing delay. In addition, calculate the time
assuming the advertised seek time is three times longer than the measured
seek time.
Answer:
Average disk access = average seek time + average rotational delay +
transfer time + controller overhead.

=    5ms      +     0.5     + 0.5KB          + 0.1ms
             10,000 RPM          40 MB/sec

= 1.67ms + 3.0ms + 0.013ms + 0.1ms = 4.783ms
                                RAID
Redundant Array of Independent Disks
Disk storage designers recognized that if access times etc can
only be improved to a certain extent – additional performance
can be gained by introducing multiple disks.
Introduced possibility of more errors.
RAID: Improve access time + improve reliability.
   Set of physical disk drives
   viewed as the Operating
   system as a single logical
   drive.
   Data are distributed across
   the drives of an array.
   Redundant disk capacity is used
   to store parity information –
   guaranteeing data recoverability
   in case of a disk failure.         Picture from:
                                         http://mst2.lcc.whecn.edu/byeager/whitepapers/raid.pdf
                           RAID Level 0
  Not a true member of the RAID family - does not include redundancy to
  improve performance.
  User and system data distributed across all disks in the array in strips.
  Imagine a large logical disk containing ALL data. This is divided into strips
  that are mapped ‘round robin’ to the strips in the array.
+ If two different I/O requests are pending for two different
  blocks of data – then there is a good chance that the data will be
  on different disks and can be serviced in parallel.
+ If a single I/O request is for multiple logically continuous
  strips – up to n strips can be handled in parallel.
Data Mapping for RAID Level 0
                                  RAID Level 1
  Redundancy achieved through duplicating all data.
  Each logic strip is mapped to two physical disks.
+ Read request can be serviced from either available disk.
- Write request requires both disks to be updated – but this can be done in
  parallel. (Slower write dictates overall speed).
+ Recover from failure is simple!




     Picture from: http://mst2.lcc.whecn.edu/byeager/whitepapers/raid.pdf
                        RAID Level 2
Utilizes parallel access techniques - All disks participate in the execution of
every I/O request.
Spindles of individual drives are synchronized so that each disk head is in the
same position on each disk at any given time.
Data striping – very small strips (single byte or word).
Error correcting code calculated across corresponding bits on each disk, and
the code bits are stored in corresponding bit positions on multiple parity disks.
For Hamming Code – number of parity disks is proportionate to the log of the
number of data disks.Array control can detect and fix single bit errors.
For write – all disks must be accessed.
Good choice – only for an environment in which many errors occur –
therefore not used much.
                        RAID Level 3
Similar to RAID 2 – parallel access with data distributed in small strips.
Only requires a single redundant disk because it uses a single parity bit for the
set of individual bits in the same position.
If drives X0-X3 contain data, and X4 contains parity bits.
X4(i) = X3(i)  X2(i)  X1(i)  X0(i)
Redundancy – in the case of disk failure, the data can be reconstructed.
If drive X1 fails – it can be reconstructed as:
X1(i) = X4(i)  X3(i)  X2(i)  X0(i)
Performance – can achieve high transfer rates, but only one I/O request can be
executed at one time. (Better for large data transfers in non transaction-
oriented environments).
                        RAID Level 4
Each disk operates independently - Separate I/O requests satisfied in parallel.
Suitable for applications with high I/O request rates and NOT well suited for
those requiring high data transfer rates.
Data striping. (Strips are larger than in lower RAIDs).
Bit-by-bit parity calculated across corresponding strips on each data disk, and
stored in corresponding strip on the parity disk.
Performance – write penalty when I/O request is small size. Write must
update user data + corresponding parity bits.
X4(i) = X3(i)  X2(i)  X1(i)  X0(i)
If X1(i) is changed to X1’(i) X4(i) = X3(i)  X2(i)  X1’(i)  X0(i)
                                    = X4(i)  X1(i)  X1’(i)
                          RAID Level 5
Same as RAID 4 – but parity strips distributed across all disks.
Typical allocation uses round-robin.
For an n-disk array, the parity strip is on a different disk for the first n strips.
Avoid potential bottleneck found in RAID 4.
                        RAID Level 6
Two different parity calculations carried out and stored in separate blocks on
different disks.
    Example: XOR and a second independent data check algorithm.
No. of disks required = N + 2 (where N = number of disks required for data).
Provides HIGH data reliability.
Incurs substantial write penalty as each write affects two parity blocks.
                              Homework
Stallings 5.3
Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64X1 bit.
Give the array configuration of the chips on the memory board showing all required input
and output signals for assigning this memory to the lowest address space. The design
should allow for both byte and 16-bit word accesses.
Stallings 5.5
Suppose an 8-bit data word stored in memory is 11000010. Using the Hamming algorithm,
determine what check bits would be stored in memory with the data word. Show how you
got your answer.
Stallings 5.6
For the 8-bit word 00111001, the check bits stored with it would be 0111. Suppose when
the word is read from memory, the check bits are calculated to be 1101. What is the data
word that was read from memory?
Stallings 6.3 (Question on RAID)
What is the average time to read or write a 512-byte sector for a disk? The advertised
average seek time is 4ms, the transfer rate is 35MB/sec, it rotates at 8,000 RPM, and the
controller overhead is 0.15ms. Assume the disk is idle so that there is no queueing delay.
                     Challenge Question
Stallings 5.3
Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64X1 bit.
Give the array configuration of the chips on the memory board showing all required input
and output signals for assigning this memory to the lowest address space. The design
should allow for both byte and 16-bit word accesses.
Stallings 5.5
Suppose an 8-bit data word stored in memory is 11000010. Using the Hamming algorithm,
determine what check bits would be stored in memory with the data word. Show how you
got your answer.
Stallings 5.6
For the 8-bit word 00111001, the check bits stored with it would be 0111. Suppose when
the word is read from memory, the check bits are calculated to be 1101. What is the data
word that was read from memory?
Stallings 6.3 (Question on RAID)
What is the average time to read or write a 512-byte sector for a disk? The advertised
average seek time is 4ms, the transfer rate is 35MB/sec, it rotates at 8,000 RPM, and the
controller overhead is 0.15ms. Assume the disk is idle so that there is no queueing delay.
CHALLENGE QUESTION – See handout.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:2/26/2013
language:English
pages:40