Slide 1 - ECE Users Pages

Document Sample
Slide 1 - ECE Users Pages Powered By Docstoc
					ECE 4100/6100
Advanced Computer Architecture

Lecture 4 ISA Taxonomy

         Prof. Hsien-Hsin Sean Lee
         School of Electrical and Computer Engineering
         Georgia Institute of Technology
Instruction Set Architecture
• Specification of a microprocessor design

• Interface between user and machine’s functionality

• Good instruction set design principles
   –   Compatibility
   –   Implementability
   –   Programmability
   –   Usability
   –   Encoding efficiency


                                                       2
Main ISA Design Philosophy
• CISC (Complex Instruction Set Computer)



• RISC (Reduced Instruction Set Computer)



• VLIW (Very Long Instruction Word)



• EPIC (Explicitly Parallel Instruction Computer)



                                                    3
CISC
• Complex Instruction Set Computers

• Close “semantic gap” between programming and execution
   – Smaller code size (memory was expensive!)
   – Simplify compilation

• Another state machine (controlled by microcode) inside the
  machine

• Example: x86, Intel 432, IBM 360, DEC VAX



                                                               4
CISC Example: x86
• MOVSD ;; move a double word, 1-byte instruction

              MOVSD // m32[DS:EDI] = m32[DS:ESI]


• REP;; 1-byte prefix to repeat string operations

              REP MOVSD // count set up in ECX


  LOCK ADD ds:[esi+ecx*2+0x67452301], 0xEFCDAB89 // 13-byte


              F0 3E 81 84 4E 01 23 45 67 89 AB CD EF
               prefix

              [--][--]+disp32   ESI+ECX*2
                                                              5
RISC
• Observation made by IBM (John Cocke, Eckert-Mauchly
  Award’85, Turing Award’87, Nat’l Medal of Technology’91,
  Nat’l Medal of Science’94)
   – Few of the available instructions are used

• CISC : “n+1” phenomenon
   – Adding an instruction requiring an extra level of decoding
     logic can slow down the entire ISA

• Reduced Instruction Set Computer
   – Originated at IBM in 1975, a telephone project
       • To achieve 12 MIPS (300 calls per sec, 20k inst per call)
       • Simple instructions
   – IBM 801 in 1978
   – More compiler effort to gain performance

                                                                     6
A Typical RISC
•   Smaller number of instructions
•   Fixed format instruction (e.g., 32 bits)
•   3-address, reg-to-reg arithmetic instructions
•   Single cycle operation for execution
•   Load-store architecture
•   Simple address modes
     – Base + displacement
     – No indirection
•   Simple branch conditions
•   Hardwired control (No microcode)
•   More compiler effort
•   Examples:
     – RISC I and RISC II at Berkeley
     – MIPS (Microprocessors without Interlocked Pipe Stage) at Stanford
     – IBM RISC Technology, Sun Sparc, HP PA-RISC, ARM
                                                                           7
RISC Example: MIPS
R-format (Register-Register)
   31        26 25      21 20        16 15     11 10   6 5           0

        Op       Rs             Rt        Rd      Shamt      Funct       add $1, $2, $3

I-format (Register-Immediate)
   31        26 25      21 20        16 15                           0
                                               immediate                 addi $1, $2, -5
        Op       Rs             Rt
I-format (Load/Store)
   31        26 25      21 20        16 15                           0

        Op       Base           Dest           immediate                 lw $1, 24($9)

I-format (Branch)
   31        26 25      21 20        16 15                           0

        Op       Rs         Rt                 immediate                 beq L1, $4, $0

J-format (Jump / Call)
   31        26 25                                                   0

        Op                           target                                    j L2

                                                                                           8
CISC vs. RISC
                     CISC                                             RISC
Variable length instructions                        Fixed-length instructions, single-cycle
                                                    operation
Abundant instructions and addressing                Fewer instructions and addressing
modes                                               modes
Long, complex decoding                              Simple decoding

Contain mem-to-mem operations                       Load/store architecture


Use microcode                                       No microinstructions, directly decoded
                                                    and executed by HW logic
Closer semantic gap (shift complexity               Needs smart compilers, or intelligent
to microcode)                                       hardware to reorder instructions
IBM 360, DEC VAX, x86, Moto 68030                   IBM 801, MIPS, RISC I, IBM POWER,
                                                    Sun Sparc

• Some definitions were from the paper by Colwell et al. in 1985                              9
CISC vs. RISC (Reality)
                           CISC                             RISC

                 IBM         VAX       Xerox       IBM      Berkeley   Stanford
               370/168      11/780     Dorado      801       RISC1      MIPS
     Year        1973        1978       1978       1980      1981       1983
 introduced
      #           208        303         270       120        39         55
instructions
 Microcode       54KB       61KB        17KB        0          0          0
 Instruction    2 to 6 B   2 to 57 B   1 to 3 B     4B        4B         4B
    size
 Execution      Reg-reg     Reg-reg     Stack     Reg-reg   Reg-reg    Reg-reg
  model        Reg-mem     Reg-mem
               Mem-mem       Mem-
                             mem



                                                                                  10
    Observation and Controversy
•    ”Instruction Set and Beyond: Computers, Complexity and Controversy” by Bob
     Colwell (Eckert-Mauchly Award, 2005) and gang from CMU, also see response
     from RISC camp: Patterson (Eckert-Mauchly Award, 2008) and Hennessy (Eckert-
     Mauchly Award, 2001)

• CISC/RISC classification should *not* be a dichotomy

• Case in point: MicroVAX-32 by DEC, a single chip implementation
   – Subsetting VAX instructions (but still, 175 instructions!)
   – Emulate complex instructions
   – a RISC or a CISC? (Well, it has variable length instructions, not a ld/st
      machine, with a microcode control, have all VAX addressing mode)
• Effective processor design = CISC experiences + RISC tenets
• RISC features are not incompatible or mutually exclusive
   – Large register file (w/ register windows)

• RISC/CISC issues are best considered in light of their function-to-
  implementation level assignment


                                                                                    11
Modern X86 Machine Design
• CISC outfit
• RISC inside

• E.g., Intel P6/Netburst/Core, AMD Athlon/Phenom/Opteron

• Each x86 instruction is decoded into “micro-op” (op) or
  “RISC-op” on-the-fly

• Internal microarchitecture resembles RISC design
  philosophy

• Processor dynamically schedules “ops”

• Compiler’s scheduling is still beneficial
                                                             12
Recent ISA Design Trend
• Look at this instruction in MIPS (CISC or RISC?)

       CABS.LE.PS $fcc0, $f8, $f10 ;; |y||w| , |x||w|?


• Many complex instructions emerged for new apps
   – Viterbi instruction for wireless communication/DSP
   – Sum of absolute differences in SSE (PSAD) or other DSP: C = |A-B|
     for MPEG (motion estimation)

• In embedded domain, code size is critical
• Reducing programming efforts
• Optimizing performance via
   – Specialized hardware (accelerator-based)
   – Co-processor (controlled by main processor)
   – ISA plug-in (flexible)
                                                                          13
VLIW
•   Very Long Instruction Word
     – Originated from microcode compaction
     – Coined by Josh Fisher (Eckert-Mauchly Award, 2003)
•   Compiler will
     –   Perform instruction scheduling (latency-aware)
     –   Pack several independent instructions into a VLIW instruction
•   Issues
     –   Compatibility
     –   Many nop’s
     –   Very complex compiler
             •   Information unavailable at static compile time
             •   interprocedural optimization is difficult)


Pioneers
• Culler Scientific
     –   Led by Prof. Glen J. Culler (National Medal of Technology winner 2000, Berkeley Prof. David Culler’s father)
•   Multiflow (Fisher)
     –   Led by Josh Fisher (Eckert-Mauchly Award 2003), John O’Donnell, John Ruttenberg, David Papworth, Bob Colwell
         (Eckert-Mauchly Award 2005), Geoffery Lowney, etc.
     –   Several Multiflow TRACE were delivered
•   Cydrome (Rau, Yen’s) in the 80’s
     –   Led by Bob Rau (Eckert-Mauchly Award 2002), David Yen, Wei Yen, etc.
     –   Had a working prototype

Modern Processors
• Most DSP embrace VLIW (e.g., TI C6x, StarCore, ADI TigerSHARC, etc.)
• Transmeta Crusoe (internal, never released ISA)

                                                                                                                        14
Intel/HP EPIC
• Explicitly Parallel Instruction Computer
• A kin breed of VLIW (e.g., compiler holding the key to high
  performance)
• Some new features
   – Stop bits to address compatibility
   – ISA enabling data speculation and control speculation (minimum
     hardware support needed)
   – Fully predicated ISA
   – Rotating registers, RSE (not so new, e.g., MRS in RISC I)
• Lots of ideas from Polycyclic architecture (TRW) and
  Cydrome by the late Bob Rau (Eckert-Mauchly Award, 2002)
                     An Itanium Instruction Bundle
  ld4 r43=[r38]      add r38=16,r38        br.call.sptk b0=printf# ;;


                                                                        15
VLIW Tradeoffs
• Plentiful registers, simple encodings, …

• Potentially lower # of transistors than other designs
   – Reduced speculation, OoO not needed
   – Size efficiencies, price, power consumption
   – Is this true for Itanium?

• Drawbacks
   – Backward compatibility or upgradeability
   – Due to exposed implementation details

• VLIW is orthogonal to other techniques
   – Pipeline, SMT, and CMP/Multi-core can be built on top of processors
     including VLIW

                                                                           16
Design Philosophy: VLIW vs. Superscalar
                                    RISC
Static _VOID
                                 Object code
_DEFUN(_mor_nu),
 struct _reent                       IM1 = I–1
 *ptr _AND                           IM2 = I–2     Scheduling and
 register size_t    Normal           IM3 = I–3       Operation
{ .                Compiler          T1 = LOAD .   Independence:
  .                                  T3 = 2*T1      Recognizing
  .                                  .                hardware
                                     .


    Same                                           Run-time
    Normal                                                          The same ILP
  Source code                                                        Hardware in
                         Compile Time                                Both cases



Static _VOID
_DEFUN(_mor_nu),   Normal compiler
 struct _reent     plus scheduling
 *ptr _AND           and operation
 register size_t    Independence:
{ .                  Recognizing
  .                    software
  .
                                                                                   17
Design Philosophy: VLIW vs. Superscalar
• VLIW
  – Requiring less hardware and lower power
  – Programs need to be changed to run correctly
    when even small changes (not always though)


• Superscalar
  – Object-code compatible
     •Sequential programs can be presented to different
      superscalar implementation of the same ISA



                                                          18
Design Philosophy: VLIW vs. Superscalar




                                          19
Superscalar or VLIW?
• Reality: the current world is dominated by …
   – X86: Core (quad-issue) & ATOM (dual-issue)
   – And ARM (Cortex A8 is a dual-issue; A9 has OOO)


• VLIW is largely embraced by the DSP camp




                                                       20
Should we continue to teach this Chapter about ISA?




                                                      21

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:8/8/2011
language:English
pages:21