Computer Architecture Computer Architecture Davies
W
Description
Its all about Computer Architecture
Document Sample


Computer Architecture
Davies Muche
&
Mike Li Luo
CS521 Spring 2003
1
Computer Architecture 2/25/2003
What is a digital computer ?
A digital computer is a machine composed
of the following three basic components
- Input/Output
- Central Processing Unit (CPU)
- Memory
2
Computer Architecture 2/25/2003
Early Computers
As early as the 1600s Calculating machines which
could do Arithmetic operations had been made, but,
non had the three basic components of a digital
computer
In 1823, Charles Babbage undertook the design of
the Difference Engine
– The machine was to solve 6th Degree
polynomials to 20 digit accuracy
3
Computer Architecture 2/25/2003
– the concepts of mechanical control and mechanical
calculation put together into a machine that has the
basic parts of a digital computer
– He was given 17,000 Pounds to construct the machine
but, the project was abandoned in 1842 (uncompleted)
– 1856, Babbage conceived the idea of the Analytical
Machine (After his death his son Henry tried to build it
but never succeeded)
– In 1854, George Scheutz, built a working Difference
machine based on Babbage’s design. (This machine printed
mathematical, astronomical and actuarial tables with unprecedented
accuracy, and was used by the British and American governments)
4
Computer Architecture 2/25/2003
Between 1847 and
Difference Engine No.1
1849 Babbage
designed the
Difference Engine
No.2.
He did not built it 5
Computer Architecture 2/25/2003
However, in 1834, Charles Babbage, developed
the hypothetical program to solve simultaneous
equations on the Analytical Machine
6
Computer Architecture 2/25/2003
The John von Neumann Architecture
consists of five major components (1940s)
7
Computer Architecture 2/25/2003
A refinement of the von Neumann model, the
system bus model has a CPU (ALU and
control), memory, and an input/output unit
8
Computer Architecture 2/25/2003
9
Computer Architecture 2/25/2003
The CPU
CPU (central processing unit) is an older term for processor and
microprocessor, the central unit in a computer containing the logic
circuitry that performs the instructions of a computer's programs.
NOTABLE TYPES
- RISC: Reduced Instruction Set Computer
-Introduced in the mid 1980s
-Requires few transistors
-capable of executing only a very limited set of
instructions
- CISC: Complex Instruction Set Computer
-complex CPUs that had ever-larger sets of instructions
10
Computer Architecture 2/25/2003
RISC or CISC “The great Controversy”
RISC proponents argue that RISC machines are both cheaper
and faster, and are therefore the machines of the future.
Skeptics note that by making the hardware simpler, RISC
architectures put a greater burden on the software. They argue
that this is not worth the trouble because conventional
microprocessors are becoming increasingly fast and cheap
anyway.
The TRUTH!
CISC and RISC implementations are becoming more and more
alike. Many of today's RISC chips support as many instructions
as yesterday's CISC chips. And today's CISC chips use many
techniques formerly associated with RISC chips.
11
Computer Architecture 2/25/2003
Under the hood of a typical CPU
12
Computer Architecture 2/25/2003
What you need to Know about a CPU
Processing speed
- The clock Frequency is one measure of how fast a
computer is ( however, the length of time to carry out an
operation depends not only on how fast the processor
cycles, but how many cycles are required to perform a
given operation.
Voltage requirement
Transistors (electronic switches) in the CPU requires some
voltage to trigger them.
- In the pre-486DX66 days, everything was 5 volts
- As chips got faster and power became a concern,
designers dropped the chip voltage down to 3.3 volts
(external Voltage) and 2.9V or 2.5V core voltage 13
Computer Architecture 2/25/2003
More on Voltage Requirements…
Power consumption equates largely with heat generation,
which is a primary enemy in achieving increased performance.
Newer processors are larger and faster, and keeping them cool
can be a major concern.
Reducing power usage is a primary objective for the designers
of notebook computers, since they run on batteries with a
limited life. (They also are more sensitive to heat problems
since their components are crammed into such a small space).
Compensate for by using lower-power semiconductor
processes, and shrinking the circuit size and die size. Newer
processors reduce voltage levels even more by using what is
called a dual voltage, or split rail design
14
Computer Architecture 2/25/2003
More on Dual Voltage Design …
A split rail processor uses two different voltages.
The external or I/O voltage is higher, typically 3.3V
for compatibility with the other chips on the
motherboard.
The internal or core voltage is lower: usually 2.5 to
2.9 volts. This design allows these lower-voltage
CPUs to be used without requiring wholesale
changes to motherboards, chipsets etc.
15
Computer Architecture 2/25/2003
Power consumption verses speed of some processors
16
Computer Architecture 2/25/2003
MEMORY
Computers have hierarchies of memories that may be classified according to
Function, Capacity and Response Times.
-Function
"Reads" transfer information from the memory; "Writes" transfer information to
the memory:
-Random Access Memory (RAM) performs both reads and writes.
-Read-Only Memory (ROM) contains information stored at the
time of manufacture that can only be read.
-Programmable Read-Only Memory (PROM) is ROM that can be written once
at some point after manufacture.
-Capacity
bit = smallest unit of memory (value of 0 or 1);
byte = 8 bits;
In modern computers, the total memory may range from say 16 MB in a small
personal computer to several GB (gigabytes) in large supercomputers.
17
Computer Architecture 2/25/2003
More on memory …
Memory Response
Memory response is characterized by two different measures:
-Access Time (also termed response time or latency) defines how
quickly the memory can respond to a read or write request.
-Memory Cycle Time refers to the minimum period between two
successive requests of the memory.
-Access times vary from about 80 ns [ns = nanosecond = 10^(-9)
seconds] for chips in small personal computers to about 10 ns or less
for the fastest chips in caches and buffers. For various reasons, the
memory cycle time is more than the speed of the memory chips (i.e.,
the length of time between successive requests is more than the 80
ns speed of the chips in a small personal computer).
18
Computer Architecture 2/25/2003
19
Computer Architecture 2/25/2003
The I/O BUS
A Computer transfers data from disk to CPU, from CPU
to memory, or from memory to the display adapter etc.
To avoid having a separate circuits between every pair of
devices, the BUS is used.
Definition:
The Bus is simply a common set of wires that connect all
the computer devices and chips together
20
Computer Architecture 2/25/2003
Different functions for Different wires of the bus
Some of these wires are used to transmit data.
Some send housekeeping signals, like the clock pulse. Some transmit a number
(the "address") that identifies a particular device or memory location
Use of the address
The computer chips and devices watch the address wires and respond when their
identifying number (address) is transmitted before they can transfer data
Problem!
Starting with machines that used the 386 CPU, CPUs and memory ran faster than
other I/O devices
Solution
- Separate the CPU and memory from all the I/O. Today, memory is only added by
plugging it into special sockets on the main computer board.
21
Computer Architecture 2/25/2003
Bus Speeds
Multiple Buses with different speeds is an option or a single bus
supporting different speeds is used
In a modern PC, there may be a half dozen different Bus areas.
There is certainly a "CPU area" that still contains the CPU, memory,
and basic control logic.
There is a "High Speed I/O Device" area that is either a VESA Local
Bus (VLB) or an PCI Bus
22
Computer Architecture 2/25/2003
Some Bus Standards
ISA (Industry Standard Architecture) bus
In 1987 IBM introduced a new
Microchannel (MCA) bus
The other vendors developed an extension
of the older ISA interface called EISA
VESA Local Bus (VLB), which became
popular at the start of 1993
23
Computer Architecture 2/25/2003
More Bus Standards …
The PCI bus was developed by Intel
PCI is a 64 bit interface in a 32 bit package
The PCI bus runs at 33 MHz and can transfer 32 bits of data (four
bytes) every clock tick.
That sounds like a 32-bit bus! However, a clock tick at 33 MHz is 30
nanoseconds, and memory only has a speed of 70 nanoseconds.
When the CPU fetches data from RAM, it has to wait at least three
clock ticks for the data. By transferring data every clock tick, the
PCI bus can deliver the same throughput on a 32 bit interface that
other parts of the machine deliver through a 64 bit path.
24
Computer Architecture 2/25/2003
Things to know about I/O Bus
Buses transfer information between parts of a computer.
Smaller computers have a single bus; more advanced
computers have complex interconnection strategies.
Things to know about the bus
Transaction = Unit of communication on bus.
Bus Master = The module controlling the bus at a
particular time.
Arbitration Protocol = Set of signals exchanged to decide
which of two competing modules will control a bus at a
particular time.
Communication Protocol = Algorithm used to transfer data
on the bus.
Asynchronous Protocol = Communication algorithm that
can begin at any time; requires overhead to notify receivers
that transfer is about to begin.
25
Computer Architecture 2/25/2003
Things to know about the bus continued …
Synchronous Protocol = Communication algorithm
that can begin only at well-know times defined by a
global clock.
Transfer Time = Time for data to be transferred
over the bus in single transaction.
Bandwidth = Data transfer capacity of bus; usually
expressed in bits per second (bps). Sometimes
termed throughput.
Bandwidth and Transfer Time measure related
things, but bandwidth takes into account required
overheads and is usually a more useful measure of
the speed of the bus.
26
Computer Architecture 2/25/2003
Supercomputer Architecture
Background
Architecture
Approaches
Trends
Challenges
27
Computer Architecture 2/25/2003
What is parallel computing
Use of multiple computers or processors
working together to do a common task.
– Each processor works on its section of the
problem
– Processors are allow to exchange information
with other processors
28
Computer Architecture 2/25/2003
Why parallel computing
Limits of single computer
– Available memory
– Performance
Parallel computing allows
– Solve problems that don’t fit on a single
computer
– Solve problems that can’t be solve in the
reasonable time
29
Computer Architecture 2/25/2003
First Supercomputer
1976, first supercomputer, the Cray-1
It had a speed of tens of megaflops (one
megaflop equals a million floating-point
operations per second) and a memory
capacity of 4 megabytes.
Contribution from Los Alamos Lab, and
Seymour Cray
Less than the average speed of PC today
30
Computer Architecture 2/25/2003
Growing Speed
The performance of the fastest computers
has grown exponentially from 1945 to the
present, averaging a factor of 10 every five
years
Tens of floating-point operations per
second, the parallel computers of the mid-
1990s achieve tens of billions of
operations per second
31
Computer Architecture 2/25/2003
Pipeline
Pipeline: start performing an operation on
one piece of data while finishing the same
operation on another piece of data
– An operation consists of multiple stages.
– After a set of operands complete a particular
stage, they move into the next stage.
– Then, another set of operands can move into
the stage that was just abandoned.
32
Computer Architecture 2/25/2003
SuperPipeline
Superpipeline: perform multiple pipelined
operations at the same time
– So, a superpipeline is a collection of multiple
pipelines that can operate simultaneously.
– In other words, several different operations can
execute simultaneously, and each of these operations
can be broken into stages, each of which is filled all
the time.
– So you can get multiple operations per CPU cycle.
– For example, a IBM Power4 can have over 200
different operations “in flight” at the same time.
33
Computer Architecture 2/25/2003
Sample of superpipeline design
34
Computer Architecture 2/25/2003
Drawbacks for pipeline
architecture---Pipeline Hazards
structural hazards: attempt to use the same resource
two different ways at the same time
– e.g., multiple memory accesses, multiple register
writes
– solutions: multiple memories, stretch pipeline
control hazards: attempt to make a decision before
condition is evaluated
– e.g., any conditional branch
– solutions: prediction, delayed branch
data hazards: attempt to use item before it is ready
– solutions: forwarding/bypassing
35
Computer Architecture 2/25/2003
Memory
shared memory
system, there is one
large virtual memory,
and all processors
have equal access to
data and instructions
in this memory.
36
Computer Architecture 2/25/2003
Memory cont…
distributed memory,
in which each
processor has a local
memory that is not
accessible from any
other processor.
37
Computer Architecture 2/25/2003
Difference of two kind f
memories
Software issue not hardware
The difference determines how different
parts of a parallel program will
communicate.
shared memory with semaphores, etc. or
distributed memory with message passing.
All problems run efficiently on a
distributed memory BUT software is easier
to develop
38
Computer Architecture 2/25/2003
Cache Coherency
39
Computer Architecture 2/25/2003
Styles of parallel computing
(Hardware Architecture)
SISD-single instruction stream, single data
stream
SIMD-single instruction stream, multiple
data streams
MISD-multiple instruction streams, single
data stream
MIMD-multiple instruction streams,
multiple data streams
40
Computer Architecture 2/25/2003
SISD
Single Instruction, Single Data
41
Computer Architecture 2/25/2003
SIMD
Single Instruction, Multiple Data
42
Computer Architecture 2/25/2003
MISD
Multiple Instruction, Single Data
43
Computer Architecture 2/25/2003
MIMD
Multiple Instruction, Multiple Data
(simplest: program controlled message passing)
44
Computer Architecture 2/25/2003
Two parallel processing
approaches
SMP: symmetric multiprocessing
– SMP is the processing of programs by multiple
processors that share a common operating system and
memory
MPP: massively parallel processing
– MPP is the coordinated processing of a program by
multiple processors that work on different parts of the
program, with each processor using its own operating
system and memory
45
Computer Architecture 2/25/2003
Current Trend
OpenMP:OpenMP is an open standard for
providing parallelization mechanisms on
shared-memory multiprocessors.
– C/C++ and FORTRAN, several of the most
commonly used languages for writing parallel
programs.
– based on a thread paradigm
46
Computer Architecture 2/25/2003
OpenMP execution model
47
Computer Architecture 2/25/2003
New Trend
Clustering
– The Widest Definition:
Any number of computers communicating at any
distance
– The Common Definition:
A relatively small number of computers (<1000)
communicating at a relatively small distance
(within the same room) and used as
a single, shared computing resource
48
Computer Architecture 2/25/2003
Comparison
Programming
– A Program written for Cluster Parallelism can
run on an SMP right away
– A Program written for an SMP can NOT run
on a Cluster right away
Scalability
– Clusters are Scalable
– SMPs are NOT Scalable above a Small
Number of Processors
49
Computer Architecture 2/25/2003
Comparison cont..
One big advantage of SMPs is the Single
System Image
– Easier Administration and Support
– But, Single Point of Failure
Cluster computing can be used for load
balancing as well as for high availability
50
Computer Architecture 2/25/2003
General highlights from Top 500
The Earth Simulator build by NEC remains the
unchallenged #1.
100 systems have peak performance above 1
TFlop/s up from 70 systems 6 month ago
PC Cluster are now present at all levels of
performance
IBM is still leading the list with respect to the
installed performance ahead of HP and NEC
Hewlett-Packard stays slightly ahead of IBM
with respect to the number of systems installed
(HP 137 and IBM 131)
51
Computer Architecture 2/25/2003
NEC Earth-Simulator/ 5120 from
Japan
52
Computer Architecture 2/25/2003
Basic Idea/Component
Environment Research
The Earth Simulator consists of 640
supercomputers that are connected by a high-
speed network (data transfer speed; 12.3
GBytes). Each supercomputer (1 node) contains
eight vector processors with a peak performance
of 8GFlops and a high-speed memory of 16
GBytes. The total number of processors is 5120
(8 x 640), which translates to a total of
approximately 40 TFlops peak performance, and
a total main memory of 10 TeraBytes.
53
Computer Architecture 2/25/2003
Hewlett-Packard SuperDome
supercomputer
54
Computer Architecture 2/25/2003
Terms need to know
flops: Acronym for floating-point operations per
second. Note: For example, 15 Mflops equals 15
million floating-point arithmetic operations per
second. It is a unit of measurement of the
performance of a computer
LINPACK is a collection of Fortran subroutines
that analyze and solve linear equations and linear
least-squares problems.
Rmax----431.70 (Maximal LINPACK
performance achieved )
Rpeak----672.00 (Theoretical)
55
Computer Architecture 2/25/2003
Challenges
Faster algorithms
Good data locality
Low communication requirement
Efficient software
High level problem solving environment
Changes of architecture
56
Computer Architecture 2/25/2003
Reference
power comsuption of processor - http://www.macinfo.de/hardware/strom.html
Under the hood - http://www.kids-online.net/learn/clicknov/details/cpu.html
Difference Machine and Charles Babbage- http://www.cbi.umn.edu/exhibits/cb.html
John Von Neumann - http://ei.cs.vt.edu/~history/VonNeumann.html
I/O - http://sophia.dtp.fmph.uniba.sk/pchardware/bus.html
cpu & memory- http://csep1.phy.ornl.gov/guidry/phys594/lectures/lectures.html
memory - http://www.howstuffworks.com/computer-memory.htm
general idea -
http://www.ccs.uky.edu/~douglas/Classes/cs521-s02/index.html
http://www.ccs.uky.edu/~douglas/Classes/cs521-s01/index.html
http://www.ccs.uky.edu/~douglas/Classes/cs521-s00/index.html
voltage -
http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/http://www.hardwarecentral.com/hardwar
ecentral/tutorials/19/1/
csep-http://www.ccs.uky.edu/csep/csep.html
top500 -http://www.top500.org
cray co. -http://www.cray.com/company/h_systems.html
definition of terms-htt[://www.whatis.com
57
Computer Architecture 2/25/2003
Thank You!
58
Computer Architecture 2/25/2003
Get documents about "