Computer Architecture Computer Architecture Davies

W
Description

Its all about Computer Architecture

Document Sample
scope of work template
							Computer Architecture




                          Davies Muche
                                &
                           Mike Li Luo
                          CS521 Spring 2003
                                                      1
  Computer Architecture                       2/25/2003
What is a digital computer ?
     A digital computer is a machine composed
     of the following three basic components
     - Input/Output
     - Central Processing Unit (CPU)
     - Memory




                                                        2
Computer Architecture                           2/25/2003
Early Computers
As early as the 1600s Calculating machines which
could do Arithmetic operations had been made, but,
non had the three basic components of a digital
computer

In 1823, Charles Babbage undertook the design of
the Difference Engine
 – The machine was to solve 6th Degree
   polynomials to 20 digit accuracy

                                                      3
 Computer Architecture                        2/25/2003
– the concepts of mechanical control and mechanical
  calculation put together into a machine that has the
  basic parts of a digital computer
– He was given 17,000 Pounds to construct the machine
  but, the project was abandoned in 1842 (uncompleted)
– 1856, Babbage conceived the idea of the Analytical
  Machine (After his death his son Henry tried to build it
  but never succeeded)
– In 1854, George Scheutz, built a working Difference
  machine based on Babbage’s design. (This machine printed
  mathematical, astronomical and actuarial tables with unprecedented
  accuracy, and was used by the British and American governments)



                                                                               4
Computer Architecture                                                  2/25/2003
                          Between 1847 and
Difference Engine No.1
                          1849 Babbage
                          designed the
                          Difference Engine
                          No.2.

                          He did not built it           5
  Computer Architecture                         2/25/2003
However, in 1834, Charles Babbage, developed
the hypothetical program to solve simultaneous
equations on the Analytical Machine




                                                    6
Computer Architecture                       2/25/2003
The John von Neumann Architecture
 consists of five major components (1940s)




                                                     7
Computer Architecture                        2/25/2003
A refinement of the von Neumann model, the
 system bus model has a CPU (ALU and
 control), memory, and an input/output unit




                                                      8
Computer Architecture                         2/25/2003
                                9
Computer Architecture   2/25/2003
The CPU
     CPU (central processing unit) is an older term for processor and
     microprocessor, the central unit in a computer containing the logic
     circuitry that performs the instructions of a computer's programs.

     NOTABLE TYPES
     - RISC: Reduced Instruction Set Computer
          -Introduced in the mid 1980s
          -Requires few transistors
          -capable of executing only a very limited set of
           instructions

     - CISC: Complex Instruction Set Computer
          -complex CPUs that had ever-larger sets of instructions


                                                                                  10
Computer Architecture                                                      2/25/2003
RISC or CISC “The great Controversy”
    RISC proponents argue that RISC machines are both cheaper
    and faster, and are therefore the machines of the future.

    Skeptics note that by making the hardware simpler, RISC
    architectures put a greater burden on the software. They argue
    that this is not worth the trouble because conventional
    microprocessors are becoming increasingly fast and cheap
    anyway.

    The TRUTH!
    CISC and RISC implementations are becoming more and more
    alike. Many of today's RISC chips support as many instructions
    as yesterday's CISC chips. And today's CISC chips use many
    techniques formerly associated with RISC chips.




                                                                            11
 Computer Architecture                                               2/25/2003
                  Under the hood of a typical CPU




                                                           12
Computer Architecture                               2/25/2003
 What you need to Know about a CPU
Processing speed
- The clock Frequency is one measure of how fast a
computer is ( however, the length of time to carry out an
operation depends not only on how fast the processor
cycles, but how many cycles are required to perform a
given operation.

Voltage requirement
Transistors (electronic switches) in the CPU requires some
voltage to trigger them.
- In the pre-486DX66 days, everything was 5 volts
- As chips got faster and power became a concern,
  designers dropped the chip voltage down to 3.3 volts
(external Voltage) and 2.9V or 2.5V core voltage          13
 Computer Architecture                                 2/25/2003
More on Voltage Requirements…
Power consumption equates largely with heat generation,
which is a primary enemy in achieving increased performance.
Newer processors are larger and faster, and keeping them cool
can be a major concern.

Reducing power usage is a primary objective for the designers
of notebook computers, since they run on batteries with a
limited life. (They also are more sensitive to heat problems
since their components are crammed into such a small space).

Compensate for by using lower-power semiconductor
processes, and shrinking the circuit size and die size. Newer
processors reduce voltage levels even more by using what is
called a dual voltage, or split rail design

                                                                  14
 Computer Architecture                                     2/25/2003
   More on Dual Voltage Design …

A split rail processor uses two different voltages.
The external or I/O voltage is higher, typically 3.3V
 for compatibility with the other chips on the
 motherboard.
The internal or core voltage is lower: usually 2.5 to
 2.9 volts. This design allows these lower-voltage
 CPUs to be used without requiring wholesale
 changes to motherboards, chipsets etc.

                                                         15
   Computer Architecture                          2/25/2003
Power consumption verses speed of some processors




                                                           16
      Computer Architecture                         2/25/2003
                          MEMORY
Computers have hierarchies of memories that may be classified according to
Function, Capacity and Response Times.
-Function
"Reads" transfer information from the memory; "Writes" transfer information to
the memory:
  -Random Access Memory (RAM) performs both reads and writes.
  -Read-Only Memory (ROM) contains information stored at the
   time of manufacture that can only be read.
  -Programmable Read-Only Memory (PROM) is ROM that can be written once
    at some point after manufacture.

-Capacity
   bit = smallest unit of memory (value of 0 or 1);
  byte = 8 bits;
In modern computers, the total memory may range from say 16 MB in a small
personal computer to several GB (gigabytes) in large supercomputers.

                                                                                 17
  Computer Architecture                                                   2/25/2003
More on memory …


 Memory Response
Memory response is characterized by two different measures:
 -Access Time (also termed response time or latency) defines how
 quickly the memory can respond to a read or write request.

 -Memory Cycle Time refers to the minimum period between two
 successive requests of the memory.

 -Access times vary from about 80 ns [ns = nanosecond = 10^(-9)
 seconds] for chips in small personal computers to about 10 ns or less
 for the fastest chips in caches and buffers. For various reasons, the
 memory cycle time is more than the speed of the memory chips (i.e.,
 the length of time between successive requests is more than the 80
 ns speed of the chips in a small personal computer).
                                                                        18
   Computer Architecture                                         2/25/2003
                               19
Computer Architecture   2/25/2003
                        The I/O BUS
     A Computer transfers data from disk to CPU, from CPU
     to memory, or from memory to the display adapter etc.
    To avoid having a separate circuits between every pair of
     devices, the BUS is used.

 Definition:
   The Bus is simply a common set of wires that connect all
   the computer devices and chips together




                                                                       20
Computer Architecture                                           2/25/2003
   Different functions for Different wires of the bus

 Some of these wires are used to transmit data.
  Some send housekeeping signals, like the clock pulse. Some transmit a number
  (the "address") that identifies a particular device or memory location

   Use of the address
  The computer chips and devices watch the address wires and respond when their
   identifying number (address) is transmitted before they can transfer data

Problem!
   Starting with machines that used the 386 CPU, CPUs and memory ran faster than
   other I/O devices

Solution
   - Separate the CPU and memory from all the I/O. Today, memory is only added by
   plugging it into special sockets on the main computer board.

                                                                                  21
     Computer Architecture                                                 2/25/2003
                            Bus Speeds

Multiple Buses with different speeds is an option or a single bus
 supporting different speeds is used

In a modern PC, there may be a half dozen different Bus areas.

There is certainly a "CPU area" that still contains the CPU, memory,
  and basic control logic.

There is a "High Speed I/O Device" area that is either a VESA Local
  Bus (VLB) or an PCI Bus




                                                                           22
    Computer Architecture                                           2/25/2003
Some Bus Standards
     ISA (Industry Standard Architecture) bus
     In 1987 IBM introduced a new
     Microchannel (MCA) bus
     The other vendors developed an extension
     of the older ISA interface called EISA
     VESA Local Bus (VLB), which became
     popular at the start of 1993

                                                       23
Computer Architecture                           2/25/2003
                          More Bus Standards …
The PCI bus was developed by Intel

PCI is a 64 bit interface in a 32 bit package

The PCI bus runs at 33 MHz and can transfer 32 bits of data (four
bytes) every clock tick.

That sounds like a 32-bit bus! However, a clock tick at 33 MHz is 30
nanoseconds, and memory only has a speed of 70 nanoseconds.
When the CPU fetches data from RAM, it has to wait at least three
clock ticks for the data. By transferring data every clock tick, the
PCI bus can deliver the same throughput on a 32 bit interface that
other parts of the machine deliver through a 64 bit path.


                                                                      24
  Computer Architecture                                        2/25/2003
     Things to know about I/O Bus
Buses transfer information between parts of a computer.
Smaller computers have a single bus; more advanced
computers have complex interconnection strategies.
                 Things to know about the bus
Transaction = Unit of communication on bus.
Bus Master = The module controlling the bus at a
particular time.
Arbitration Protocol = Set of signals exchanged to decide
which of two competing modules will control a bus at a
particular time.
Communication Protocol = Algorithm used to transfer data
on the bus.
Asynchronous Protocol = Communication algorithm that
can begin at any time; requires overhead to notify receivers
that transfer is about to begin.
                                                              25
 Computer Architecture                                 2/25/2003
Things to know about the bus continued …
 Synchronous Protocol = Communication algorithm
 that can begin only at well-know times defined by a
 global clock.
 Transfer Time = Time for data to be transferred
 over the bus in single transaction.
 Bandwidth = Data transfer capacity of bus; usually
 expressed in bits per second (bps). Sometimes
 termed throughput.
 Bandwidth and Transfer Time measure related
 things, but bandwidth takes into account required
 overheads and is usually a more useful measure of
 the speed of the bus.
                                                       26
  Computer Architecture                         2/25/2003
Supercomputer Architecture
     Background
     Architecture
     Approaches
     Trends
     Challenges



                                    27
Computer Architecture        2/25/2003
What is parallel computing
     Use of multiple computers or processors
     working together to do a common task.
      – Each processor works on its section of the
        problem
      – Processors are allow to exchange information
        with other processors




                                                              28
Computer Architecture                                  2/25/2003
Why parallel computing
     Limits of single computer
      – Available memory
      – Performance
     Parallel computing allows
      – Solve problems that don’t fit on a single
        computer
      – Solve problems that can’t be solve in the
        reasonable time
                                                           29
Computer Architecture                               2/25/2003
First Supercomputer
     1976, first supercomputer, the Cray-1
     It had a speed of tens of megaflops (one
     megaflop equals a million floating-point
     operations per second) and a memory
     capacity of 4 megabytes.
     Contribution from Los Alamos Lab, and
     Seymour Cray
     Less than the average speed of PC today
                                                       30
Computer Architecture                           2/25/2003
Growing Speed
     The performance of the fastest computers
     has grown exponentially from 1945 to the
     present, averaging a factor of 10 every five
     years
     Tens of floating-point operations per
     second, the parallel computers of the mid-
     1990s achieve tens of billions of
     operations per second

                                                           31
Computer Architecture                               2/25/2003
Pipeline

     Pipeline: start performing an operation on
     one piece of data while finishing the same
     operation on another piece of data
      – An operation consists of multiple stages.
      – After a set of operands complete a particular
        stage, they move into the next stage.
      – Then, another set of operands can move into
        the stage that was just abandoned.
                                                               32
Computer Architecture                                   2/25/2003
SuperPipeline

     Superpipeline: perform multiple pipelined
     operations at the same time
      – So, a superpipeline is a collection of multiple
        pipelines that can operate simultaneously.
      – In other words, several different operations can
        execute simultaneously, and each of these operations
        can be broken into stages, each of which is filled all
        the time.
      – So you can get multiple operations per CPU cycle.
      – For example, a IBM Power4 can have over 200
        different operations “in flight” at the same time.

                                                                        33
Computer Architecture                                            2/25/2003
Sample of superpipeline design




                                   34
Computer Architecture       2/25/2003
Drawbacks for pipeline
architecture---Pipeline Hazards
     structural hazards: attempt to use the same resource
     two different ways at the same time
      – e.g., multiple memory accesses, multiple register
        writes
      – solutions: multiple memories, stretch pipeline
     control hazards: attempt to make a decision before
     condition is evaluated
      – e.g., any conditional branch
      – solutions: prediction, delayed branch
     data hazards: attempt to use item before it is ready
      – solutions: forwarding/bypassing
                                                                   35
Computer Architecture                                       2/25/2003
Memory
     shared memory
     system, there is one
     large virtual memory,
     and all processors
     have equal access to
     data and instructions
     in this memory.




                                    36
Computer Architecture        2/25/2003
Memory cont…
     distributed memory,
     in which each
     processor has a local
     memory that is not
     accessible from any
     other processor.




                                    37
Computer Architecture        2/25/2003
Difference of two kind f
memories
     Software issue not hardware
     The difference determines how different
     parts of a parallel program will
     communicate.
     shared memory with semaphores, etc. or
     distributed memory with message passing.
     All problems run efficiently on a
     distributed memory BUT software is easier
     to develop
                                                        38
Computer Architecture                            2/25/2003
Cache Coherency




                               39
Computer Architecture   2/25/2003
Styles of parallel computing
(Hardware Architecture)
     SISD-single instruction stream, single data
     stream
     SIMD-single instruction stream, multiple
     data streams
     MISD-multiple instruction streams, single
     data stream
     MIMD-multiple instruction streams,
     multiple data streams
                                                          40
Computer Architecture                              2/25/2003
SISD
            Single Instruction, Single Data




                                                     41
Computer Architecture                         2/25/2003
SIMD
        Single Instruction, Multiple Data




                                                   42
Computer Architecture                       2/25/2003
MISD
  Multiple Instruction, Single Data




                                             43
Computer Architecture                 2/25/2003
MIMD
     Multiple Instruction, Multiple Data
     (simplest: program controlled message passing)




                                                             44
Computer Architecture                                 2/25/2003
Two parallel processing
approaches
     SMP: symmetric multiprocessing
      – SMP is the processing of programs by multiple
        processors that share a common operating system and
        memory
     MPP: massively parallel processing
      – MPP is the coordinated processing of a program by
        multiple processors that work on different parts of the
        program, with each processor using its own operating
        system and memory


                                                                         45
Computer Architecture                                             2/25/2003
Current Trend
     OpenMP:OpenMP is an open standard for
     providing parallelization mechanisms on
     shared-memory multiprocessors.
      – C/C++ and FORTRAN, several of the most
        commonly used languages for writing parallel
        programs.
      – based on a thread paradigm


                                                              46
Computer Architecture                                  2/25/2003
OpenMP execution model




                                47
Computer Architecture    2/25/2003
New Trend
     Clustering
      – The Widest Definition:
               Any number of computers communicating at any
               distance
      – The Common Definition:
               A relatively small number of computers (<1000)
               communicating at a relatively small distance
               (within the same room) and used as
               a single, shared computing resource

                                                                       48
Computer Architecture                                           2/25/2003
Comparison
     Programming
      – A Program written for Cluster Parallelism can
        run on an SMP right away
      – A Program written for an SMP can NOT run
        on a Cluster right away
     Scalability
      – Clusters are Scalable
      – SMPs are NOT Scalable above a Small
        Number of Processors
                                                               49
Computer Architecture                                   2/25/2003
Comparison cont..
     One big advantage of SMPs is the Single
     System Image
      – Easier Administration and Support
      – But, Single Point of Failure
     Cluster computing can be used for load
     balancing as well as for high availability



                                                         50
Computer Architecture                             2/25/2003
General highlights from Top 500
     The Earth Simulator build by NEC remains the
     unchallenged #1.
     100 systems have peak performance above 1
     TFlop/s up from 70 systems 6 month ago
     PC Cluster are now present at all levels of
     performance
     IBM is still leading the list with respect to the
     installed performance ahead of HP and NEC
     Hewlett-Packard stays slightly ahead of IBM
     with respect to the number of systems installed
     (HP 137 and IBM 131)

                                                                51
Computer Architecture                                    2/25/2003
NEC Earth-Simulator/ 5120 from
Japan




                                        52
Computer Architecture            2/25/2003
Basic Idea/Component
     Environment Research
     The Earth Simulator consists of 640
     supercomputers that are connected by a high-
     speed network (data transfer speed; 12.3
     GBytes). Each supercomputer (1 node) contains
     eight vector processors with a peak performance
     of 8GFlops and a high-speed memory of 16
     GBytes. The total number of processors is 5120
     (8 x 640), which translates to a total of
     approximately 40 TFlops peak performance, and
     a total main memory of 10 TeraBytes.
                                                              53
Computer Architecture                                  2/25/2003
Hewlett-Packard SuperDome
supercomputer




                                   54
Computer Architecture       2/25/2003
Terms need to know
     flops: Acronym for floating-point operations per
     second. Note: For example, 15 Mflops equals 15
     million floating-point arithmetic operations per
     second. It is a unit of measurement of the
     performance of a computer
     LINPACK is a collection of Fortran subroutines
     that analyze and solve linear equations and linear
     least-squares problems.
     Rmax----431.70 (Maximal LINPACK
                       performance achieved )
     Rpeak----672.00 (Theoretical)
                                                                 55
Computer Architecture                                     2/25/2003
Challenges
      Faster algorithms
      Good data locality
      Low communication requirement
      Efficient software
      High level problem solving environment
      Changes of architecture



                                                      56
Computer Architecture                          2/25/2003
Reference
     power comsuption of processor - http://www.macinfo.de/hardware/strom.html
     Under the hood - http://www.kids-online.net/learn/clicknov/details/cpu.html
     Difference Machine and Charles Babbage- http://www.cbi.umn.edu/exhibits/cb.html
     John Von Neumann - http://ei.cs.vt.edu/~history/VonNeumann.html
     I/O - http://sophia.dtp.fmph.uniba.sk/pchardware/bus.html

     cpu & memory- http://csep1.phy.ornl.gov/guidry/phys594/lectures/lectures.html
     memory - http://www.howstuffworks.com/computer-memory.htm
     general idea -
     http://www.ccs.uky.edu/~douglas/Classes/cs521-s02/index.html
     http://www.ccs.uky.edu/~douglas/Classes/cs521-s01/index.html
     http://www.ccs.uky.edu/~douglas/Classes/cs521-s00/index.html
     voltage -
     http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/http://www.hardwarecentral.com/hardwar
     ecentral/tutorials/19/1/
     csep-http://www.ccs.uky.edu/csep/csep.html
     top500 -http://www.top500.org
     cray co. -http://www.cray.com/company/h_systems.html
     definition of terms-htt[://www.whatis.com




                                                                                                                   57
Computer Architecture                                                                                       2/25/2003
                        Thank You!




                                            58
Computer Architecture                2/25/2003

						
Related docs
Other docs by svranga
Benefits of Green Tea in Our Daily Life
Views: 66  |  Downloads: 0