A Simple Computer consists of a
Processor (CPU-Central Processing
Unit), Memory, and I/O
Basic Functional Units of a
• Input – accepts coded information from
human operators, from electromechanical
devices (such as keyboards), or from other
digital medium via digital communication
• The information received is either stored in
the memory or immediately used by the
arithmetic and logic unit (ALU) to perform
the desired operations.
• The results are sent back out through the
• All actions are coordinated through the
• Categorized as either instructions or data
• Instructions (or machine instructions) are
explicit commands that
– Govern the transfer of information within a
computer as well as between the computer
and its I/O devices.
– Specify the arithmetic and logic operations to
• A list of instructions that performs a task is
called a program.
• Usually the program is stored in memory.
• The program fetches the instructions from
memory, one after another, and performs the
• The computer is completely controlled by the
stored program, except for possible
interruption by an operator or by I/O devices
connected to the machine.
• Data are numbers and encoded characters
that are used as operands by the instructions.
Computer System Organization
Inside the CPU
• Control Unit (CU)
coordinates the sequencing of steps
involved in executing machine instructions
• Arithmetic Logic Unit (ALU)
performs arithmetic and logical operations
synchronizes the internal operations of the
CPU with the other system components
• Bus - a group of parallel wires that transfer
information from one part of the computer to
– Control Bus
synchronizes the actions of all of the devices
attached to the system bus.
– Address Bus
passes the addresses of instructions and data
between the CPU and memory (or I/O).
– Data Bus
transfers instructions and data between the
CPU and memory (or I/O).
• For the 8086 Processor
– Address Bus – 20 bits
• can access 1M of memory
• Addresses defined as $00000-$FFFFF
– Data Bus – 16 bits (16-bit processor)
• A word is 16 bits
• Each word is byte addressable
More Facts on The 8086 Processor
Generation External Internal Address Numeric L1 L2
Data Register Bus Width Data Cache Cache
Bus Width Processor
P1 16 16 20 External None None
The Intel CPU Family
Notes from Intel Family Chart
• Notice that 386 – Pentium 4 are 32-bit
processors (32-bit data bus – 4 bytes)
• Notice that 386 and beyond have 32-bit
address bus can access (4G of memory
• Most basic unit of time for machine
• = the time required for one complete clock
• Machine instructions require at least 1
clock cycle to execute. Most require more.
• Wait states – empty clock cycles of
machine execution time (due to memory
access time being slower than speed of
Instruction Execution Cycle
• If using Memory operand (mov ax, 0A69Bh)
– Calculate address of operand
– Place address of operand on address bus
– Wait for memory to get operand and pass it on
The data path of a typical
von Neumann Machine
A + B
ALU Input Register
ALU Input Bus
ALU Output Register
A + B
Instruction Execution Cycle
The CPU executes each instruction in a series of
• Fetch the next instruction from memory into the
• Change the program counter to point to the
• Decode the instruction.
• Fetch any memory operands necessary into a
• Execute the instruction.
• Store output operand into a CPU register.
Execution of von Neumann Machines
To fetch the next instruction while the first is executing would speed up the machine
Instructions are stored in prefetch buffers (registers), to be accessed more quickly
than waiting for fetch from memory. Prefetching divides instruction execution up
into two parts: fetching and actual execution.
Pipelining divides up instruction execution into many parts, each one handled by a
piece of dedicated hardware, all which can run in parallel.
• Execution Unit: executes the
• Bus Interface Unit: accesses memory
and provides I/O
A Five-stage Pipeline
A five-stage pipeline.
The state of each stage as a function of time.
How Fast Does This Machine Run?
• Suppose that the cycle time of this machine
is 2 nsec.
• Then it takes 10nsec for an instruction to
progress all the way through the five-stage
• Does the machine run at 100MIPS (1/10n)?
• No, at every clock cycle (2nsec) a new
instruction is completed, so the actual rate
of processing is 500MIPS.
How many cycles are required to
execute n instructions?
(Pipelined Versus Non-Pipelined Systems)
• For a system with k stages
• In non-pipelined systems, n instructions require
(n*k) cycles to process.
– 5 instructions require 5 clock cycles
• Using a pipelined system with k pipeline stages,
n instructions require (k + (n-1)) cycles to
– 5 instructions require (5 + (5-1)) = 9 clock cycles
(*refer to slide #14)
Pipelining allows a tradeoff between
• How long it takes to execute an instruction
• Latency = nT nanosec (where cycle time is T nanosec and
the number of stages is n)
– Processor Bandwidth
• How many MIPS the CPU has
• Bandwidth = 1000/T MIPS
*logically we should measure CPU bandwidth in BIPS or GIPS since we are measuring T
in nanosec, but nobody does this.
IA-32 Processor Pipelining
(6-stage Execution Cycle)
• Bus Interface Unit: accesses memory and provides
• Code Prefetch Unit: receives instructions from the
BIU and inserts them into a holding area (instruction
• Instruction Decode Unit: decodes machine
instructions from the prefetch queue and translates
them into microcode.
• Execution Unit: executes the microcode instructions.
• Segment Unit: translates logical addresses into
linear addresses and performs protection checks
• Paging Unit: translates linear addresses into
physical addresses, performs page protection
checks and keeps a list of recently accessed pages
• If one pipeline is good, then two pipelines
must be better.
• Parallel paths exist through which different
instructions can be executed in parallel.
• It is possible to start the execution of
several instructions in every clock cycle.
• The logical correctness of programs must
Dual five-stage pipelines with a
common Code Prefetch Unit
The code prefetch unit fetches pairs of instructions together and puts each one into
Its own pipeline, complete with its own ALU for parallel operation.
Superscalar processor with 5
Four pipelines duplicates too much hardware.
Instead, use a single pipeline and give it
multiple functional units. This assumes that
the S3 stage can issue instructions faster than
the S4 stage can execute them. (Pentium II)
• So far we have dealt with instruction-level
• There is also processor-level parallelism
– Array processors
Complex Instruction Set Computer
• A large number of variable length
instructions (more than 128)
• Multiple addressing modes
• A small number of internal processor
• Instructions that require multiple numbers
of clock cycles to execute
(A Real CISC)
• Over 3000 different instruction forms, each
requiring anywhere from one to six bytes
• Nine different addressing modes are
• The processor only has eight general
• Instruction execution times range from 2
clock cycles to more than 80 cycles for
ASCII adjust for multiplication instruction.
Intel’s i860 RISC Processor
• 82 instructions, each 32 bits in length
• Four addressing modes
• 32 general purpose registers
• All instructions execute in one clock cycle
Why hasn’t RISC won out?
• Backward compatibility (companies have
spent billions of dollars on Intel processor
• Intel has built CPU cores with RISC like
structure that executes the simplest and
most common instructions in a simgle data
path, while interpreting the more complex
instructions in the usual CISC way.
Design Principles of Modern
• All instructions are directly executed in
• Maximize the rate at which instructions are
• Instructions should be easy to decode
• Only loads and stores should be able to
• Provide plenty of registers
Application Specific Microprocessors
Digital Signal Processors
• Previously, analog signals had to be
handled with discrete circuits (op-amps,
capacitors, inductors, and resistors
forming filters, amplifiers, etc…)
• Now low-cost analog-to-digital and digital-
to-analog converters are available.
• => thus we have digital signal processing
• DSPs are used to perform repetitive complex
mathematical computations on the converted
• One computation may require as many as
500,000 add-multiply operations.
• Data and instructions are stored in two different
memory areas each with their own buses
• Hardware multipliers and adders are built into
the processor and optimized to perform a
calculation in a single clock cycle.
• Arithmetic pipelining is used so that several
instructions can be operated on at once.
• Hardware DO loops are provided to speed up
• Multiple (serial) I/O ports are provided for
communication with other processors.
• Mulitmedia sound cards (used to
compress speech and music signals)
• DSP can be reprogrammed (allows some
sound cards to double as a modem
• Cellular phones
• Speech and image compression
• Optical character recognition
• Video conferencing
• A collection of programs (a large program), that
are used to control the sharing of and interaction
among various computer units as they execute
• Performs the tasks required to assign computer
resources to individual application programs.
– Assigning memory and magnetic disk space to
program and disk files
– Moving data between memory and disk units
– Handling I/O operations
Example of How A Operating System Manages
the execution of more than one application
program at the same time
• Application program has been compiled from a high level language
form into machine language form and is stored on disk
• Assume somewhere in the program, a data file must be read,
perform some computation on the data, and print results .
– Transfer file into memory
– When transfer is complete, begin execution
– When point in program is reached that data file is needed, the
program requests the operating system to transfer the data file
from the disk to memory.
• The OS performs this task and passes execution control back to the
application program, which then proceeds to perform the required
• When the computation is completed and the results are ready to be
Can Multitasking be used for concurrent
execution of application programs?