CPU Architecture
Different CPU architectures may do the following things differently:
Number of operands per instruction
Location of the operands
Instruction types
Data types
5.2.1 Design Decisions for Instruction Sets
Things to consider when designing an instruction set:
How long should the instructions be?
What should the format of an instruction be?
What addressing modes should be implemented?
How should the data be stored (high-order byte first, or low-order byte first)?
How many registers should there be and what should they be used for?
Byte order
The bytes in a data item can be arranged in two ways. If we have the address of a multi-
byte data item, the item begins at the address and occupies bytes at increasing address
numbers. So if we have a 4-byte integer whose address is 100, it occupies bytes 100, 101,
102, and 103.
The designers of the CPU need to decide whether to put the high-order byte at the first
address location or to put the low-order byte there.
Example
The integer 12345678 (in hex) can be stored like this (cells 100, 101, 102, and 103,
respectively):
12 34 56 78
or like this:
78 56 34 12
The first method is called big endian because the "big" (high-order) digits are first in
memory. The second method is called little endian because the "little" (low-order) digits
are first in memory.
The byte order is largely irrelevant to the high-level language programmer. All of the low-
level details like this are hidden from you by the compiler. The only time where it might
make a difference is if you are reading a data file on one machine that was created on
another machine.
Intel CPUs are little-endian. Computer network addresses are big endian (it made routing of
phone calls easier).
5.2.3 Internal storage in the CPU: Stacks versus Registers
Different CPU designs have different ways to store the data. There are three basic choices:
A stack architecture
An accumulator architecture
A general-purpose register architecture
Stack architecture
All operands are stored on a stack. Because you can only retrieve the top element of the
stack, it is difficult to write programs on such a computer. Instructions are short because
only the op code is needed! The operands are always on the top of the stack!
While modern mainstream CPUs are not usually stack-based, the stack is a very common
data structure that is used in other contexts, so we will take a look at how a stack can be
used to evaluate an arithmetic expression.
One of the best (and most useful) applications of stacks is in evaluating post-fix
expressions.
There are three ways to write an arithmetic expression:
INFIX (the way we usually do it)
The operator is between the operands: A + B
PREFIX (the way we call functions!)
The operator is before the operands: +AB
POSTFIX
The operator is after the operands: AB+
Rules for conversion from infix to postfix:
1. Convert operations with the highest precedence first
2. After an expression has been converted, treat it as a single operand
3. Remove all parentheses when done
Examples:
1:
A + B – C
(AB+) – C
AB+C-
2:
(A+B)*(C-D)^E*F
(A+B)*(C-D)E^*F
(AB+)*(CD-)E^*F
(AB+)(CD-)E^*F*
AB+CD-E^*F*
3:
A+B*C
A+(BC*)
A(BC*)+
ABC*+
4:
(A+B)*C
(AB+)*C
AB+C*
Notice from the last two examples that even though we have the same 3 operands and the
same two operators, all in the same order, we do not need parentheses in the postfix
expression because they both convert to a different postfix expression. Note that the higher
priority operator goes on the left.
So now we know how to convert an infix expression to a postfix expression. However, the
important thing with arithmetic expressions is to evaluate them. And, evaluating them
involves the use of a stack.
Procedure to evaluate a postfix expression:
1. Scan the expression from left to right, one token at a time.
2. If the token is an operand (number), push it onto the stack.
3. If the token is an operator, perform the operation on the top two elements of the
stack, and replace those two elements with the result.
Example
((2+1)*3-(8-4))^(2+1)
Postfix:
2 1 + 3 * 8 4 - - 2 1 + ^ = 125
Example
4 – 8 / (1*2^2)
Postfix:
4 8 1 2 2 ^ * / - = 2
Example
6 2 3 + - 3 8 2 / + * 2 ^ 3 + = 52
1. Push 6, 2, 3
2. Add 2 + 3 and push 5. Stack is 5, 6
3. Sub 6 – 5 and push 1. Stack is 1
4. Push 3, 8, 2. Stack is 2, 8, 3, 1
5. Divide 8/2 and push the 4. Stack is 4, 3, 1
6. Add 4 + 3 and push the 7. Stack is 7, 1
7. Multiply 7 * 1. Push the 7. Stack is 7.
8. Push the 2. Stack is 2, 7.
9. Raise 7 to the 2 power. Push 49. Stack is 49.
10. Push the 3. Stack is 3, 49.
11. Add 49 + 3. Push 52.
5.5 INSTRUCTION PIPELINING
The design of a von Neumann computer involves repetition of the fetch-execute cycle:
1. Fetch instruction
2. Decode instruction
3. Fetch operands
4. Carry out instruction
5. Store results
A lesson in computer design can be learned from the way people do laundry. Assume that it
takes 30 minutes to wash a load, 40 minutes to dry a load, and 20 minutes to fold a load. If
we did our laundry the way early computers executed instructions, this is what we would
have:
This is called pipelining.
Pipelining is a technique used to speed up today's computer systems. It should be called
"assembly lining" because it works just like an automobile assembly line. If you compare
the building of a single automobile to the execution of a single instruction, a non-pipelined
computer works like an assembly line where no work is done on the second automobile until
the first automobile is completely finished. No work is done on a second instruction until the
first instruction is completely finished. With an assembly line, however, as soon as the first
phase of the automobile is completed, it is passed to the next phase, and the worker who
did the first phase can begin working on the first phase of the next automobile. Similarly, in
a computer, when the first phase of an instruction is completed, the results can be passed
on to circuitry that can carry out the second phase, freeing up the first phase circuitry to
begin working on the next instruction in the pipeline. The more phases you can divide an
instruction into, the more instructions you can have in the "pipeline", and the more work
can be done in parallel, resulting in a faster computer.
If each phase of the fetch execute cycle can be done in a single tick of the clock, then we
have a computer that executes one instruction per clock cycle. If the execution of an
instruction can be divided into only 2 parts, and you can pipeline those parts, you have
effectively doubled the speed of your CPU!
Problems with pipelining
Data dependencies. If instruction b follows instruction a in the pipeline and instruction b
needs the result of the value that instruction a is computing, instruction b will be stalled
until instruction a is completed.
Branch prediction. Branching is also a problem. If we have instructions a, b, c, d, and e
and c is a conditional branch instruction that branches past instructions d and e, then we
have executed parts of d and e for nothing (and need to be sure to undo any changes they
may have made to registers and data, because they were never supposed to have been
executed at all!).