Processor Construction
W
Document Sample


Processor Construction
ENEL211 Digital Technology
Processor Structures
• So far we have looked at all the fundamental
building blocks of digital electronic circuits.
• Now we are going to look at how to combine
them to build a CPU.
• First we need some insight into how data
moves around inside a processor.
• Design presented here based on WRAMP
microprocessor – very simple!
Lecture Outline
• Processor Operation
– Fetch decode execute cycle
• Datapath building blocks
– Registers (flip flops)
– ALU (combinational logic)
– Memory interface
– Buses
• Control logic (state machine)
• Processor Operations
– Execution of example instructions
Processor Structures
• Introduction
– The CPU must perform three main
tasks:
• Communication with memory
– Fetching Instructions
Processor
– Fetching and storing data
• Interpretation of Instructions
• Execution of Instructions System Bus
Memory I/O
Von Neumann Architecture
• John von Neumann is credited with
– Stored Program concept
– Logical Organization, the “codes” by which a
fixed system of wiring could solve a great variety
of problems.
• In other words a general purpose computer controlled
by a computer program
– Implicit separation between CPU and storage
• Data and Instructions stored together
Fetch-Decode-Execute Cycle
• The CPU is endlessly looping through these
steps
– Actual steps will vary from processor to processor
• Typical steps
1. instruction fetch & Program Counter update
2. instruction decode & operand load
3. operation execution (control instructions update
Program Counter)
4. memory access
5. register update
Assembly Language
• Processor instructions are binary coded words
that include an operation and some operands.
• Operands may be register names or immediate
values
• Usually we use assembly language to represent
these as mnemonics.
– e.g to add: add $1, $2, $3
– to load from memory lw $3, 16435
– to jump to another instruction j 7256
Program Counter
• Instructions are stored in sequences.
• Normally the processor executes one
instruction and then the next.
• It uses a special program counter register
to keep track of which instruction it is up to.
– It gets incremented by 1 for each instruction.
• Branch and Jump instructions perform
operations on the pc register.
Instruction Cycle
• Fetch the next
instruction
• If everything is normal //Load program
– execute the instruction PC = startAddress;
– increment PC HALT = BRANCH = false;
• Otherwise, halt //Instruction cycle
• This is a simplification IR = memory[PC];
– varied length instructions while (!HALT){
– multiple memory PC++;
accesses
execute(IR);
– pipelining
IR = memory[PC];
• Which line could cause
HALT = true? }
Instruction Cycle
PC 1000 lw $3, 1005
1000
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
j 1000 1003 sw $3, 1020 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
0
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
Suppose the CPU just executed “j execute(IR);
1000”, the jump instruction. This IR = memory[PC];
changed the PC to 1000 and set the
BRANCH flag. }
Instruction Cycle
PC 1000 lw $3, 1005
1000
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $3, 1005 1003 sw $3, 1005 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
0
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
Now the instruction register execute(IR);
contains the word of data starting at IR = memory[PC];
memory location 1000. The 32 bits
represent a load word instruction. }
Instruction Cycle
PC 1000 lw $3, 1005
1001
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $3, 1005 1003 sw $3, 1005 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
0
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
The program counter (PC) is execute(IR);
incremented before an instruction is IR = memory[PC];
executed.
}
Instruction Cycle
PC 1000 lw $3, 1005
1001
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $3, 1005 1003 sw $3, 1005 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
3
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
The data stored in the word execute(IR);
beginning at memory address 1005 IR = memory[PC];
is stored in register $3.
}
Instruction Cycle
PC 1000 lw $3, 1005
1001
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $4, 1006 1003 sw $3, 1005 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
3
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
The next instruction is fetched. execute(IR);
IR = memory[PC];
}
Instruction Cycle
PC 1000 lw $3, 1005
1002
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $4, 1006 1003 sw $3, 1005 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
3
1006 21 IR = memory[PC];
$4 1007 12
0 HALT false
while (!HALT){
PC++;
The program counter (PC) is execute(IR);
incremented before an instruction is IR = memory[PC];
executed.
}
Instruction Cycle
PC 1000 lw $3, 1005
1001
//Load program
1001 lw $4, 1006
IR 1002 add $3, $3, $4 PC = startAddress;
lw $4, 1006 1003 sw $3, 1020 HALT = false;
1004 j 984
$3 //Instruction cycle
1005 3
3
1006 21 IR = memory[PC];
$4 1007 12
21 HALT false
while (!HALT){
PC++;
The contents of memory location execute(IR);
1006 are stored in register $4. IR = memory[PC];
}
Processor Structures
– There are many possible ways of putting a
CPU together. However, four main building
blocks are used to construct a CPU:
•Registers
•ALUs
•Memory
Interface Sometimes termed the
•Buses Datapath
– These are controlled by the state machine
Processor Building Blocks
• Registers: local storage within the CPU
• ALU - Arithmetic Logic Unit: performs
arithmetic and logic operations
• Memory Interface: to load instructions and
data
• Buses: connect the other parts together
Registers
– Both general purpose registers and special purpose
registers are often constructed of D-type Flip-Flops,
one for each bit in the register.
Data Out
Control
32
control
control register D inn D Q
clk D outn
Clk
control Enable
32
Data In
Registers (continued)
• In order to minimize connections and circuitry used
to move data from one place to another within the
processor, data paths are shared using a bus system.
• Signals are used to control what device is using the
data path at any given time; these are termed “control
lines”
• A tri-state output device is used, to prevent circuit
malfunction which would occur if two devices were
to drive the bus at the same time.
Registers (continued)
– General purpose registers contained within a register file
– Want to be able to output two operands and receive result at
the same time
B
B C
C Bout
32 32 SELB
SELA SELC
Aout R0
Cin R1
SELB Reg File ...
Bout (R0-R15) R15
SELC
SELA
Cin
Aout
NOTE: SELx are 32
control lines A A
ALU
Functions within WRAMP
– Carries out arithmetic and .
logic operations as commanded arithmetic Bitwise Test/
by the control unit &misc set
– out = A func B add sll slt
addu and sltu
func sub srl sgt
A n subu or sgtu
mult sra sle
multu Xor sleu
All data paths
out div sge
32 bits! divu lhi segu
rem inc seq
remu sequ
sne
B sneu
ALUoe
Controls when output
from ALU is placed on
common data bus
An ALU (arithmetic logic unit)
• build an ALU to support the andi and ori
instructions
– we'll just build a 1 bit ALU, and use 32 of them
operation op a b res
a result
b
• Possible Implementation (sum-of-products):
Different Implementations
• Not easy to decide the “best” way to build something
– Don't want too many inputs to a single gate
– Dont want to have to go through too many gates
– for our purposes, ease of comprehension is important
• Let's look at a 1-bit ALU for addition:
CarryIn
a cout = a b + a cin + b cin
Sum
sum = a xor b xor cin
b
CarryOut
Full adder circuit
A
B Sum
Cout
Cin
Sum = Cin xor A xor B Cout = A.B + A.Cin + B.Cin
Building a 32 bit ALU
CarryIn Operation
Operation
a0 CarryIn
CarryIn ALU0
Result0
b0
CarryOut
a
0 a1 CarryIn
Result1
ALU1
b1
CarryOut
1
Result
a2 CarryIn
Result2
ALU2
b2
CarryOut
2
b
CarryOut a31 CarryIn
Result31
ALU31
b31
What about subtraction (a – b) ?
• Two's complement approch: just negate b and
add.
• How do we negate? Binvert Operation
CarryIn
a
0
1
Result
• A very clever solution: b 0 2
1
CarryOut
Binvert CarryIn Operation
a0 CarryIn
b0 ALU0 Result0
Less
CarryOut
a1 CarryIn
b1 ALU1 Result1
0 Less
CarryOut
a2 CarryIn
b2 ALU2 Result2
0 Less
CarryOut
CarryIn
a31 CarryIn Result31
b31 ALU31 Set
0 Less Overflow
Test for equalityBnegate Operation
a0 CarryIn Result0
b0 ALU0
Less
• Notice control lines:
CarryOut
a1 CarryIn Result1
b1 ALU1
0 Less
000 = and CarryOut Zero
001 = or a2
b2
CarryIn
ALU2
Result2
0 Less
010 = add CarryOut
110 = subtract
111 = slt a31 CarryIn
Result31
b31 ALU31 Set
0 Less Overflow
•Note: zero is a 1 when the result is zero!
ALU Conclusion
• We can build an ALU to support the WRAMP
instruction set
– key idea: use multiplexor to select the output we want
– can efficiently perform subtraction using two’s complement
– can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware
– the speed of a circuit is affected by the number of gates in
series
(on the “critical path” or the “deepest level of logic”)
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
Memory Interface
– Interacts with memory to fetch instructions and read or
write data
– Must have some way of initiating memory read and write
cycles read
write
Data out
Data in
Memory MI All data paths
32 bits!
Address
Buses—General considerations
– Connect together the components of the processor.
Different numbers of buses can be used to form
different architectures
– Consider first: single bus structure
• only one data transfer can occur at a time
• need extra temporary registers (e.g. T1 & T2)
A
Memory MI
Reg File
T1 T2 (R0-R31)
PC
B
To be more specific, consider an
instruction fetch.
• In an instruction fetch, the contents of the program
counter (PC) are used to “point” to a location in memory,
where the next instruction is resident. Then, the contents
of that location are written to the instruction register (IR).
In a single bus system, there will need to be some buffer
register associated with memory, to hold the address,
while the data is fetched. So instruction fetch takes three
cycles:
– PC count onto bus, to be saved in the MAR (memory address
register)
– Data from the memory address to bus, copied into IR
– PC incremented to point at next instruction
Pictorial—Instruction fetch
M
A
Memory A MI
R
Reg File
T1 T2 (R0-R31) PC IR
B
Buses (continued)
• Two bus structure
– This structure is common for microcontrollers and
microprocessors, e.g. 8051. Some data transfers
require several steps and temp registers.
Data Bus
C A
A
Memory MI
Reg File
ALU T2 PC
T1 B
(R0-R31)
B
Address Bus
Buses (continued)
• Three bus structure, as used on WRAMP
– Three bus transfers can take place at same time
– Two operand buses and a result bus or an address bus plus
data bus for memory transfers.
C Bus
B Bus
C A
A
Memory MI T
E Reg File
M ALU (R0-R15)
PC IR
P B
B
A bus
Some details we leave out of the
drawings, for clarity
• Actually, the routes to/from each element are controlled with
switches, but are not shown. Nevertheless, there is a switch that
controls each dataflow element. For instance, temp_out controls
when the temp register drives the A bus.
C Bus
B Bus
Memory MI T
E Reg File
M ALU (R0-R15)
PC IR
P
A bus
temp_out
Processor Design
• Firstly construct a datapath which will allow
execution of all the instructions
– This is a matter of drawing block diagrams
• Decide on instruction encodings
• Design control logic to perform the operations
– Draw state machines
Control Logic
• The control logic is a state machine that goes
through the fetch, decode, execute states.
• “execute” has many sub states that the
machine may go to depending on the contents
of the instruction.
• The outputs of the state machine are the
signals that control the datapath components.
Instruction Encodings
I-Type instruction
4 bits 4 bits 4 bits 4 bits 16 bits
OPcode Rd Rs Func Immediate
R-Type instruction
4 bits 4 bits 4 bits 4 bits 12 bits 4 bits
OPcode Rd Rs Func 0000 0000 0000 Rt
J-Type instruction
4 bits 4 bits 4 bits 20 bits
OPcode Rd Rs Address/Offset
Example Control
Insn Fetch PC Incr
OPcode = “0000”
OPcode = “0110”
OPcode = “0100” ALU op
Save $ra
Do Jump
Control Signals
• We need to decide which control signals should be set
in each state. Eg.
– Insn Fetch
• pc_out, ir_in, mem_read
– PC Incr
• pc_out, alu_out, alu_func = inc, pc_in
– Save $ra
• pc_out, b_out, sel_b = $0, alu_out, alu_func = add, c_in, sel_c =
$ra
– Do jump
• imm_20_out, a_out, sel_a = $0, alu_out, alu_func = add, pc_in
Descriptions of each of the control signals
Component Signal Name Description
Register File a out Causes the contents of the register selected by
sel a to be output onto the A bus.
sel a Select which register will be output onto the Abus
if a out is asserted.
b out Causes the contents of the register selected by sel b
to be output onto the B bus.
sel b Select which register will be output onto the B bus
if b out is asserted.
c in Causes the value from the C bus to be written into
the register selected by sel c.
sel c Select which register to write the value from the C
bus into when the c in signal is asserted.
ALU alu out Causes the result of the current ALU function se-
lected by alu func to be output to the C bus.
alu func Defines the current operation that the ALU should
perform.
Memory Interface mem read Causes the contents of the memory address specified
on the A bus to be read and output onto the C bus.
mem write Causes the value on the B bus to be written into the
memory address specified on the A bus.
Signal descriptions (continued)
Program Counter pc out Causes the contents of the PC register to be
output onto the A bus.
pc in Causes the value on the C bus to be written into
the PC.
Instruction imm 16 out Causes the least signi_cant 16 bits of the IR to
Register be output onto the B bus.
imm 20 out Causes the least signi_cant 20 bits of the IR to
be output onto the B bus.
sign extend Causes the output from the IR to be sign
extended to 32bits.
ir in Causes the value on the C bus to be written into
the IR.
Temp Register temp out Causes the contents of the temporary register to
be output onto the A bus.
temp in Causes the value on the C bus to be written into
the temporary register.
Sample processor operations
• Following are datapath drawings, showing
which paths are used for which items of data,
for common operations.
• Colored lines depict main flow; black lines
depict connections which are inactive.
An instruction fetch
• For an instruction fetch, the contents of the program
counter (PC) must be sent to the memory, and the
contents of the selected memory location loaded into the
instruction register (IR).
C bus
Data
B bus
C A
A
Memory MI
Reg File
ALU
(R0-R15) PC IR
B
Address
B
A bus
An Addition
• An addition takes the contents of two registers, adds it,
and places the result in another register. add $3, $4,
$5
• And so, control signals route $4 and $5 to the ALU
inputs, and the result from the ALU into $3.
C A
A
Memory MI
$3
ALU $4 PC
B
$5
B
A memory operation
• In WRAMP, a memory fetch requires two steps
– compute the effective address
– Apply the address to memory and perform the indicated
operation (read or write)
– Step one: compute effective address C Bus
B Bus
Memory MI T
E Reg File
M ALU (R0-R15)
PC IR
P
A bus
Whichever register used as base
Memory operation (continued)
• Then, the effective address is applied to A bus and
through the memory interface to the Memory. The data at
that address in memory is placed on the C bus, which is
then routed to the destination register (e.g. $4)
C Bus
B Bus
Memory MI T
E $4
M ALU PC IR
P Reg File
(R0-R15)
A bus
Operations… a comparison
• In a comparison, the ALU is used to determine if the
specified condition exists… e.g. sgt Rd, Rs, Rt .
• Rs is applied to input A, Rt is applied to input B and the
output from the ALU is applied to Rd.
C Bus
B Bus
A
Memory MI T
E $4
M ALU $5 PC IR
P B $6
Sgt $4, $5, $6 Reg File
(R0-R15)
A bus
Operations…conditional branch
• In a conditional branch, you must test the condition, and if the
condition is met, change the contents of the PC from where it is now
pointing (the next instruction in sequence) to the branch instruction
(found in the lower 20 bits of the instruction). E.g. beqz $1, loop
C Bus
B Bus
C A
A
Memory MI T
E Reg File
M ALU (R0-R15)
PC IR
P B
You must examine
result. And then do B
the second part… A bus
And then the second part, depending upon the
result of part one: assume branch taken…
Switches to set: pc_out, imm_20_out, sign_extend*, alu_out, alu_fcn= add,
pc_in. *= does not matter
C Bus
B Bus
Memory MI T
E Reg File
M ALU (R0-R15)
PC IR
P
A bus
Processor Structures Summary
• All computers use a Von Neumann
architecture where instructions and data are
stored in memory separate from the processor.
• The processor works on an instruction fetch-
decode-execute cycle.
• The Program Counter keeps track of which
instruction to load next.
Processor Structures - Summary 2
• CPUs are made up of Registers, ALUs,
Memory Interface and Buses.
• There are temporary registers and the PC is a
register as well as the general purpose
registers.
• The Instructions drive control lines to make
the components perform the desired operation.
Get documents about "