Computer Architecture
Document Sample


LOGO
P r i n c e s s S u m a y a U n i v e r s i t y f o r Te c h n o l o g y
Computer
Architecture
Dr. Esam Al_Qaralleh
2
Instruction Set
Architecture
(ISA)
3
Outline
Introduction
Classifying instruction set architectures
Instruction set measurements
Memory addressing
Addressing modes for signal processing
Type and size of operands
Operations in the instruction set
Operations for media and signal processing
Instructions for control flow
Encoding an instruction set
MIPS architecture
4
LOGO
Instruction Set Principles and
Examples
Basic Issues in Instruction Set Design
What operations and How many
Load/store/Increment/branch are sufficient to do any
computation, but not useful (programs too long!!).
How (many) operands are specified?
Most operations are dyadic (e.g., AB+C); Some are
monadic (e.g., A B).
How to encode them into instruction format?
Instructions should be multiples of Bytes.
Typical Instruction Set
32-bit word
Basic operand addresses are 32-bit long.
Basic operands (like integer) are 32-bit long.
In general, Instruction could refer 3 operands (AB+C).
Challenge: Encode operations in a small number of
bits.
6
Brief Introduction to ISA
Instruction Set Architecture: a set of instructions
Each instruction is directly executed by the CPU’s hardware
How is it represented?
By a binary format since the hardware understands only bits
6 5 5 16
opcode rs rt Immediate
Options - fixed or variable length formats
Fixed - each instruction encoded in same size field (typically 1
word)
Variable – half-word, whole-word, multiple word instructions are
possible
7
What Must be Specified?
Instruction Format (encoding)
How is it decoded?
Location of operands and result
Where other than memory?
How many explicit operands?
How are memory operands located?
Data type and Size
Operations
What are supported?
8
Example of Program Execution
Command
1: Load AC from
Memory
2: Store AC to
memory
5: Add to AC
from memory
Add the contents
of memory 940
to the content of
memory 941 and
stores the result
at 941
Fetch Execution
9
LOGO
Classifying
Instruction Set
Architecture
Instruction Set Design
The instruction set influences everything
11
Instruction Characteristics
Usually a simple operation
Which operation is identified by the op-code field
But operations require operands - 0, 1, or 2
To identify where they are, they must be addressed
• Address is to some piece of storage
• Typical storage possibilities are main memory, registers, or a stack
2 options explicit or implicit addressing
Implicit - the op-code implies the address of the operands
• ADD on a stack machine - pops the top 2 elements of the stack,
then pushes the result
• HP calculators work this way
Explicit - the address is specified in some field of the instruction
• Note the potential for 3 addresses - 2 operands + the destination
12
Classifying Instruction Set Architectures
Based on CPU internal storage options
AND # of operands
These choices critically affect - #instructions, CPI, and
cycle time
13
Operand Locations for Four ISA Classes
14
C=A+B
Stack Register (register-
Push A
Push B
memory)
Add Load R1, A
• Pop the top-2 values of Add R3, R1, B
the stack (A, B) and push
the result value into the Store R3, C
stack
Pop C Register (load-store)
Accumulator (AC) Load R1, A
Load A Load R2, B
Add B
• Add AC (A) with B and Add R3, R1, R2
store the result into AC Store R3, C
Store C
15
Modern Choice – Load-store Register
(GPR) Architecture
Reasons for choosing GPR (general-purpose registers)
architecture
Registers (stacks and accumulators…) are faster than memory
Registers are easier and more effective for a compiler to use
• (A+B) – (C*D) – (E*F)
– May be evaluated in any order (for pipelining concerns or …)
» But on a stack machine must left to right
Registers can be used to hold variables
• Reduce memory traffic
• Speed up programs
• Improve code density (fewer bits are used to name a register)
Compiler writers prefer that all registers be equivalent
and unreserved
The number of GPR: at least 16
16
Characteristics Divide GPR Architectures
# of operands
Three-operand: 1 result and 2 source
operands
Two-operand – 1 both source/result and 1
source
How many operands are memory
addresses
Load-store
0 – 3 (two
Register-memory sources + 1 result)
Memory-memory
17
Pro’s and Con’s of Three Most Common
GPR Computers
Register-Register: (0,3)
+ Simple, fixed length instruction encoding.
+ Simple code-generation model.
+ Similar number of clocks to execute.
- Higher instruction count.
Memory-memory: (3,3)
+ Most compact.
- Different Instruction size.
- Memory access bottleneck.
Register-Memory: (1,2)
+ Data access without loading first.
+ Easy to encode and yield good density.
- One operand is destroyed.
- Limited number of registers.
18
LOGO
Memory Addressing
Memory Addressing Basics
All architectures must address memory
What is accessed - byte, word, multiple words?
Today’s machine are byte addressable
Main memory is organized in 32 - 64 byte lines
Big-Endian or Little-Endian addressing
Hence there is a natural alignment problem
Size s bytes at byte address A is aligned if
A mod s = 0
Misaligned access takes multiple aligned memory
references
Memory addressing mode influences instruction
counts (IC) and clock cycles per instruction (CPI)
20
Byte Ordering
Idea
Bytes in long word numbered 0 to 3
Which is most (least) significant?
Can cause problems when exchanging binary data
between machines
Big Endian: Byte 0 is most, 3 is least
IBM 360/370, Motorola 68K, SPARC.
Little Endian: Byte 0 is least, 3 is most
Intel x86, VAX
Alpha
Chip can be configured to operate either way
DEC workstation are little endian
Cray T3E Alpha’s are big endian 21
Byte Ordering Example
union {
unsigned char c[8];
unsigned short s[4];
unsigned int i[2];
unsigned long l[1];
} dw;
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0] s[1] s[2] s[3]
i[0] i[1]
l[0]
22
Byte Ordering on Alpha
Little Endian
f0 f1 f2 f3 f4 f5 f6 f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB MSB LSB MSB LSB MSB LSB MSB
s[0] s[1] s[2] s[3]
LSB MSB LSB MSB
i[0] i[1]
LSB MSB
l[0]
Print
Output on Alpha:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long 0 == [0xf7f6f5f4f3f2f1f0]
23
Byte Ordering on x86
Little Endian
f0 f1 f2 f3 f4 f5 f6 f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB MSB LSB MSB LSB MSB LSB MSB
s[0] s[1] s[2] s[3]
LSB MSB LSB MSB
i[0] i[1]
LSB MSB
l[0]
Print
Output on Pentium:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long 0 == [f3f2f1f0]
24
Byte Ordering on Sun
Big Endian
f0 f1 f2 f3 f4 f5 f6 f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
MSB LSB MSB LSB MSB LSB MSB LSB
s[0] s[1] s[2] s[3]
MSB LSB MSB LSB
i[0] i[1]
MSB LSB
l[0]
Print
Output on Sun:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]
Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]
Long 0 == [0xf0f1f2f3]
25
Addressing Modes
Immediate Register
Add R4, #3 Add R4, R3
Regs[R4] Regs[R4]+3 Regs[R4] Regs[R4]+Regs[R3]
Operand:3 R3
Register Indirect
Add R4, (R1) Operand
Regs[R4] Regs[R4]+Mem[Regs[R1]]
R1
Registers
Operand
Registers Memory 26
Addressing Modes(Cont.)
Direct Memory Indirect
Add R4, (1001) Add R4, @(R3)
Regs[R4] Regs[R4]+Mem[1001] Regs[R4] Regs[R4]+Mem[Mem[Regs[R3]]]
R3
1001
Operand Operand
Memory Registers Memory
27
Addressing Modes(Cont.)
Displacement Scaled
Add R4, 100(R1) Add R1, 100(R2) [R3]
Regs[R4] Regs[R4]+Mem[100+R1] Regs[R1] Regs[R1]+Mem[100+
Regs[R2]+Regs[R3]*d]
R1 100 R3 R2 100
Operand Operand
*d
Registers Memory Registers Memory
28
Typical Address Modes (I)
29
Typical Address Modes (II)
30
Use of Memory Addressing Mode (Figure 2.7)
Based on a VAX which
supported everything
Not counting Register
mode (50% of all)
31
Displacement Address Size
Average of 5 programs from SPECint92 and
SPECfp92.
1% of addresses > 16 bits.
Integer Average
FP Average
32
Immediate Addressing Mode
10 Programs from SPECInt92 and
SPECfp92
33
Immediate Addressing Mode
50% to 60% fit within 8 bits
75% to 80% fit within 16 bits
gcc
spice
Tex
34
Short Summary – Memory Addressing
Need to support at least three addressing
modes
Displacement, immediate, and register
deferred (+ REGISTER)
They represent 75% -- 99% of the addressing
modes in benchmarks
The size of the address for displacement
mode to be at least 12—16 bits (75% –
99%)
The size of immediate field to be at least
8 – 16 bits (50%— 80%)
35
Operand Type & Size
Typical types: assume word= 32 bits
Character - byte - ASCII or EBCDIC (IBM) - 4
per word
Short integer - 2- bytes, 2’s complement
Integer - one word - 2’s complement
Float - one word - usually IEEE 754 these
days
Double precision float - 2 words - IEEE 754
BCD or packed decimal - 4- bit values packed
8 per word
36
Data Access Patterns
37
Short Summary – Type and Size of
Operand
The future - as we go to 64 bit machines
Larger offsets, immediate, etc. is likely
Usage of 64 and 128 bit values will
increase
DSPs need wider accumulating registers
than the size in memory to aid accuracy in
fixed-point arithmetic
38
LOGO
ALU Operations
40
What Operations are Needed
Arithmetic + Logical
Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT
Logical operation: AND, OR, XOR, NOT
Data Transfer - copy, load, store
Control - branch, jump, call, return, trap
System - OS and memory management
We’ll ignore these for now - but remember they are needed
Floating Point
Same as arithmetic but usually take bigger operands
Decimal
String - move, compare, search
Graphics – pixel and vertex,
compression/decompression operations
41
Top 10 Instructions for 80x86
load: 22% The most widely
conditional branch: 20% executed instructions
compare: 16% are the simple
store: 12% operations of an
instruction set
add: 8%
The top-10
and: 6%
instructions for 80x86
sub: 5% account for 96% of
move register-register: instructions executed
4%
Make them fast, as
call: 1% they are the common
return: 1% case
42
Control Instructions are a Big Deal
Jumps - unconditional transfer
Conditional Branches
How is condition code set? – by flag or part of the
instruction
How is target specified? How far away is it?
Calls
How is target specified? How far away is it?
Where is return address kept?
How are the arguments passed? Callee vs. Caller
save!
Returns
Where is the return address? How far away is it?
How are the results passed?
43
Breakdown of Control Flows
Call/Returns
Integer: 19% FP: 8%
Jump
Integer: 6% FP: 10%
Conditional Branch
Integer: 75% FP: 82%
44
Branch Address Specification
Known at compile time for unconditional and
conditional branches - hence specified in the
instruction
As a register containing the target address
As a PC-relative offset
Consider word length addresses, registers, and
instructions
Full address desired? Then pick the register option.
• BUT - setup and effective address will take longer.
If you can deal with smaller offset then PC relative
works
• PC relative is also position independent - so simple linker
duty
45
Returns and Indirect Jumps
Branch target is not known at compile time
Need a way to specify the target
dynamically
Use a register
Permit any addressing mode
Regs[R4] Regs[R4] + Mem[Regs[R1]]
Also useful for
case or switch
Dynamically shared libraries
High-order functions or function pointers
46
Branch Stats - 90% are PC Relative
Call/Return
TeX = 16%, Spice = 13%, GCC = 10%
Jump
TeX = 18%, Spice = 12%, GCC = 12%
Conditional
TeX = 66%, Spice = 75%, GCC = 78%
47
Branch Distances
48
Condition Testing Options
PSW: program Switch Word
49
What kinds of compares do Branches Use?
Large comparisons are with zero 50
Direction, Frequency, and real
Change
Key points – 75% are forward branch
• Most backward branches are loops - taken about 90%
• Branch statistics are both compiler and application dependent
• Any loop optimizations may have large effect
51
Short Summary – Operations in the
Instruction Set
Branch addressing to be able to jump to
about 100+ instructions either above or
below the branch
Imply a PC-relative branch displacement of at
least 8 bits
Register-indirect and PC-relative
addressing for jump instructions to support
returns as well as many other features of
current systems ( dynamic allocations)
52
LOGO
Encoding an
Instruction Set
Encoding the ISA
Encode instructions into a binary representation for
execution by CPU
Can pick anything but:
Affects the size of code - so it should be tight
Affects the CPU design - in particular the instruction decode
So it may have a big influence on the CPI or cycle-time
Must balance several competing forces
Desire for lots of addressing modes and registers
Desire to make average program size compact
Desire to have instructions encoded into lengths that will be easy
to handle in a pipelined implementation (multiple of bytes)
54
3 Popular Encoding Choices
Variable (compact code but difficult to encode)
Primary opcode is fixed in size, but opcode modifiers may exist
Opcode specifies number of arguments - each used as address fields
Best when there are many addressing modes and operations
Use as few bits as possible, but individual instructions can vary widely in
length
e. g. VAX - integer ADD versions vary between 3 and 19 bytes
Fixed (easy to encode, but lengthy code)
Every instruction looks the same - some field may be interpreted
differently
Combine the operation and the addressing mode into the opcode
e. g. all modern RISC machines
Hybrid
Set of fixed formats
e. g. IBM 360 and Intel 80x86 Trade-off between size of program
VS. ease of decoding
55
3 Popular Encoding Choices (Cont.)
56
An Example of Variable Encoding -- VAX
addl3 r1, 737(r2), (r3): 32-bit integer add
instruction with 3 operands need 6 bytes to
represent it
Opcode for addl3: 1 byte
A VAX address specifier is 1 byte (4-bits: addressing
mode, 4-bits: register)
• r1: 1 byte (register addressing mode + r1)
• 737(r2)
– 1 byte for address specifier (displacement addressing + r2)
– 2 bytes for displacement 737
• (r3): 1 byte for address specifier (register indirect + r3)
Length of VAX instructions: 1—53 bytes
57
Short Summary – Encoding the
Instruction Set
Choice between variable and fixed
instruction encoding
Code size than performance variable
encoding
Performance than code size fixed encoding
58
LOGO
Role of Compilers
Critical goals in ISA from the compiler
viewpoint
What features will lead to high-quality code
What makes it easy to write efficient
compilers for an architecture
60
Compiler and ISA
ISA decisions are no more for programming AL
easily
Due to HLL, ISA is a compiler target today
Performance of a computer will be significantly
affected by compiler
Understanding compiler technology today is
critical to designing and efficiently implementing
an instruction set
Architecture choice affects the code quality and
the complexity of building a compiler for it
61
Goal of the Compiler
Primary goal is correctness
Second goal is speed of the object code
Others:
Speed of the compilation
Ease of providing debug support
Inter-operability among languages
Flexibility of the implementation - languages
may not change much but they do evolve - e.
g. Fortran 66 ===> HPF
Make the frequent cases fast and the rare case correct
62
Optimization Observations
Hard to reduce branches
Biggest reduction is often memory
references
Some ALU operation reduction happens
but it is usually a few %
Implication:
Branch, Call, and Return become a larger
relative % of the instruction mix
Control instructions among the hardest to
speed up
63
How can Architects Help Compiler
Writers
Provide Regularity
Address modes, operations, and data types should be
orthogonal (independent) of each other
• Simplify code generation especially multi-pass
• Counterexample: restrict what registers can be used for a
certain classes of instructions
Provide primitives - not solutions
Special features that match a HLL construct are often
un-usable
What works in one language may be detrimental to
others
64
How can Architects Help Compiler
Writers (Cont.)
Simplify trade-offs among alternatives
How to write good code? What is a good code?
• Metric: IC or code size (no longer true) caches and
pipeline…
Anything that makes code sequence performance
obvious is a definite win!
• How many times a variable should be referenced before it is
cheaper to load it into a register
Provide instructions that bind the quantities
known at compile time as constants
Don’t hide compile time constants
• Instructions which work off of something that the compiler
thinks could be a run-time determined value hand-cuffs the
optimizer
65
Short Summary -- Compilers
ISA has at least 16 GPR (not counting FP
registers) to simplify allocation of registers using
graph coloring
Orthogonality suggests all supported addressing
modes apply to all instructions that transfer data
Simplicity – understand that less is more in ISA
design
Provide primitives instead of solutions
Simplify trade-offs between alternatives
Don’t bind constants at runtime
Counterexample – Lack of compiler support for
multimedia instructions
66
LOGO
The MIPS
Architecture
Expectations for New ISA
Use general-purpose registers, with a load-store architecture
Support displacement (offset size12-16 bits), immediate (size 8 to
16 bits), and register indirect
Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754
floating-point numbers
Support the following simple instructions: load, store, add, subtract,
move register-register, and, shift, compare equal, compare not equal,
branch (with a PC-relative address at least 8 bits long), jump, call,
return
Use fixed instruction encoding if interested in performance and use
variable instruction encoding if interested in code size
Provide at least 16 general-purpose registers (GPA) + separate
floating-point registers, be sure all addressing modes apply to all
data transfer instructions, and aim for a minimalist instruction set
68
MIPS
Simple load- store ISA
Enable efficient pipeline implementation
Fixed instruction set encoding
Efficiency as a compiler target
MIPS64 variant is discussed here
69
Register for MIPS
32 64-bit integer GPR’s - R0, R1, ... R31,
R0= 0 always
32 FPR’s - used for single or double
precision
For single precision: F0, F1, ... , F31 (32-bit)
For double precision: F0, F2, ... , F30 (64-bit)
Extra status registers - moves via GPR’s
Instructions for moving between an FRP
and a GPR
70
Data Types for MIPS
8-bit byte, 16-bit half words, 32-bit word, and 64-
bit double words for integer data
32-bit single precision and 64-bit double
precision for FP
MIPS64 operations work on 64-bit integer and
32- or 64-bit floating point
Bytes, half words, and words are loaded into the
GPRs with zeros or the sign bit replicated to fill the 64
bits of the GPRs
All references between memory and either
GPRs or FPRs are through load or stores
71
Addressing Modes for MIPS
Data addressing : immediate and displacement
(16 bits)
Displacement: Add R4, 100(R1)
(Regs[R4]Regs[R4]+Mem[100+Regs[R1]])
Register-indirect: placing 0 in displacement field
• Add R4, (R1) (Regs[R4]Regs[R4]+Mem[Regs[R1]])
Absolute addressing (16 bits): using R0 as the base
register
• Add R1, (1001) (Regs[R4]Regs[R4]+Mem[1001])
Byte addressable with 64-bit address
Mode selection for Big Endian or Little Endian
72
MIPS Instruction Format
Encode addressing mode into the opcode
All instructions are 32 bits with 6-bit
primary opcode
73
MIPS Instruction Format (Cont.)
I-Type Instruction
6 5 5 16
opcode rs rt Immediate
Loads and Stores LW R1, 30(R2), S.S F0, 40(R4)
ALU ops on immediates DADDIU R1, R2, #3
rt <-- rs op immediate
Conditional branches BEQZ R3, offset
rs is the register checked
rt unused
immediate specifies the offset
Jump registers ,jump and link register JR R3
rs is target register
rt and immediate are unused but = 011
74
MIPS Instruction Format (Cont.)
R-Type Instruction
6 5 5 5 5 6
opcode rs rt rd shamt func
Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3
Function encodes the data path operations: Add, Sub...
read/write special registers
Moves
J-Type Instruction: Jump, Jump and Link, Trap and return from exception
6 26
opcode Offset added to PC
75
MIPS instruction MIX
SPECint2000
76
MIPS instruction MIX (Cont.)
SPECfp2000
77
Get documents about "