Document Sample
lec1 Powered By Docstoc
					1 • Trends in Microprocessor

  R05 Chip Multiprocessors (ACS MPhil)
  Robert Mullins
• Computer architecture
• Scaling performance and CMOS
     – Where have performance gains come from?
     – Modern superscalar processors
     – The limits of superscalar processors
• Going parallel
• This course

Chip Multiprocessors (ACS MPhil)   2             2010/11
Computer architecture

“Computer architecture is the interface between what
technology can provide and what the marketplace

“Computer architecture is a science of trade-offs”

Yale Patt

Chip Multiprocessors (ACS MPhil)   3                 2010/11
Computer architecture

“Computer architecture is the science and art of
selecting and interconnecting hardware components to
create computers that meet functional, performance
and cost goals”
Mark Hill

“Computer architecture forms the bridge between
application need and the capabilities of the underlying
Tilak Agerwala and Siddhartha Chatterjee

Chip Multiprocessors (ACS MPhil)   4                2010/11
Computer architecture
• We cannot architect a new computer without defining
  performance, power and cost goals. The design
  process is all about understanding and making trade-
• What is our target market and what applications will
  we be running?
• The “best” architecture is a moving target
   – The needs of the marketplace change
   – Shifting fabrication technology characteristics
   – New technologies
       • memory, packaging, compiler, languages, ...

Chip Multiprocessors (ACS MPhil)   5              2010/11
Computer architecture

“Computer architect's often err by preparing for
                          yesterday's computations”
Bill Dally

(Easy to make the same error during a PhD!)

Tomorrow's applications and technologies are not easy
to predict!

Chip Multiprocessors (ACS MPhil)   6              2010/11
Historic performance gains

                      Reproduced from “Computer architecture: A quantitative approach”, Hennessy/Patterson

Chip Multiprocessors (ACS MPhil)                     7                                            2010/11
Historic performance gains
• Microprocessor performance increased at a
  rate of ~52%/year between 1986-2002
     – ~800X improvement over 16 years
     – How was such an improvement in performance
     – Is this a reasonable rate of performance growth
       given the advances in fabrication technology?

    Exe. time = Instr. count x CPI x Clock Period

Chip Multiprocessors (ACS MPhil)   8                 2010/11
Historic performance gains
• Technology scaling
     – 7 process generations
     – Scaling provides
       ~1.4x transistor
       improvment per
     – 10.5X
     – (careful, this doesn't
       automatically translate
       directly into
       performance gains)

                                       Reproduced with kind permission
                                              of Mark Horowitz

Chip Multiprocessors (ACS MPhil)   9                                     2010/11
Historic performance gains
• Gates per clock
     – Less logic between
       pipeline registers
     – Reduction from ~100 to
       10 gate delays
     – 10X
• How?
     – Pipelining
        • 5 to 20 stages (~4X)
     – Circuit-level advances
        • e.g. new logic families
        • ~2.5X
                                         Reproduced with kind permission
                                                of Mark Horowitz

Chip Multiprocessors (ACS MPhil)    10                                     2010/11
Historic performance gains


                                   Reproduced from “CMOS VLSI Design” Weste/Harris (2005)
Chip Multiprocessors (ACS MPhil)       11                                          2010/11
Historic performance gains
• IPC/instr. count
     – ~5-8X improvement
       in SPECint/MHz
     – This is despite clock
     – Includes advances
       in compiler
       technology and
       impact of increased
       bus widths
                                    Improvement in SPECint95/Mhz over time
                                        Reproduced with kind permission
                                               of Mark Horowitz
Chip Multiprocessors (ACS MPhil)   12                                  2010/11
Historic performance gains
• How was it possible to maintain and even
  decrease CPI (improve IPC)
     – Moore's law!
     – How were the additional transistors exploited?

• Intel 386 to Pentium 4
     – 386: 275K transistors (die size = 43mm2)
     – P4: 42M transistors (die size = 217mm2)
           • 5X from increased die size
           • 27X from technology scaling

Chip Multiprocessors (ACS MPhil)   13                   2010/11
Historic performance gains

                                   Reproduced from CMOS VLSI Design, Weste and Harris (2005)

Chip Multiprocessors (ACS MPhil)           14                                          2010/11
Modern superscalar processors
• Revision (See Hennessy/Patterson)
     – Significant hardware support for Instruction Level
       Parallelism (ILP) in most commercial
           •   Multiple-issue architectures
           •   Deep pipelines, branch prediction, speculative execution
           •   Large on-chip caches (L1/L2/L3)
           •   Out-of-order execution, register renaming
           •   Dynamic memory address disambiguation
           •   SIMD instructions
           •   ...

Chip Multiprocessors (ACS MPhil)      15                          2010/11
Modern superscalar processors

Chip Multiprocessors (ACS MPhil)   16   2010/11
Limits of superscalar processors
• Cost and complexity of extracting ILP
     – Diminishing returns
     – Increased complexity limits ability to optimise
           • The underlying fabrication technology characteristics
             are becoming more challenging too
     – Increases verification complexity and time
     – Increases time-to-market

Chip Multiprocessors (ACS MPhil)    17                           2010/11
Limits of superscalar processors
• Pipeline depth limits
     – Interruptions to the pipeline (branches)
     – Performance of the memory system
     – Clocking overheads (registers/clock skew)
     – Need to balance stages
     – Atomic operations
     – Power cost

(See wiki for discussion and papers)

Chip Multiprocessors (ACS MPhil)       18          2010/11
Limits of superscalar processors

                                   "Coming challenges in microarchitecture and architecture", Ronen et al, 2001

Chip Multiprocessors (ACS MPhil)                       19                                              2010/11
Limits of superscalar processors
• Interconnect versus transistor scaling
     – Smaller transistors = faster/lower power
     – Wires don't scale in the same way ☹
     – Centralised structures don't scale well
     – Pressure to decentralise
     – Consider bypass network between FUs
           • Clustered implementations

Chip Multiprocessors (ACS MPhil)   20             2010/11
Limits of superscalar processors
• Voltage scaling and power limits
     – Voltage scaling has slowed
           • 5V to 1V - gave us 25X power savings
           • 1V to 0.7V (limit at end of CMOS around 2020)
           • Only 2X power savings left from voltage scaling!
     – Sensible power limits already reached
     – Pressure to reduce power consumption
• Process variation complications
     – Fault tolerance requirements in the longer term

Chip Multiprocessors (ACS MPhil)    21                          2010/11
Going parallel
• Accept we can make little progress with
  single-thread performance
• Look towards thread-level parallelism
     – Achieve our performance gains in a new way:
     – Rapidly increase the number of cores
           • 2X-3X per generation
     – Don't scale the clock frequency
           • Create simpler more power efficient cores instead

Chip Multiprocessors (ACS MPhil)    22                           2010/11
Going parallel

                                   Pawlowski (Intel)

Chip Multiprocessors (ACS MPhil)                       23   2010/11
Going parallel
• Going parallel is simple?
     – Replicate existing processor designs to ease
       design process
     – Many applications already exist where thread-level
       parallelism is plentiful
     – We've had 30+ years of experience writing parallel

Chip Multiprocessors (ACS MPhil)   24                2010/11
Going parallel
• Many new challenges:
     – On-chip and off-chip communication
     – Simpler cores and Amdahl's law
     – Power constrained design
     – Support for the shared-memory paradigm?
     – Synchronization and thread-scheduling support?
     – Everyone must now write scalable and correct
       parallel programs!

Chip Multiprocessors (ACS MPhil)   25              2010/11
Going parallel
• Power is a first order design constraint
     – Power consumption is already at a sensible limit
       (for many applications we would like to reduce it)
     – We are going to increase the number of cores by
       2-3X per generation
           • Power savings?
                 –   Core shrink (<1.4X)
                 –   Simpler cores (1.4-2X?)
                 –   Some VDD savings
                 –   Need to add “uncore” logic too!
                 –   Techniques for adaptive EPI?

Chip Multiprocessors (ACS MPhil)            26         2010/11
Going parallel
• Homogenous multicore?
     – Power consumption may become the ultimate
       limiting factor in the design of multicore processors
     – This may require us to exploit greater numbers of
       specialized accelerators
           • An ASIC implementation of an algorithm may be 10-
             1000X more energy efficient that a software

Chip Multiprocessors (ACS MPhil)   27                        2010/11
This course
• Introduction to the challenges of building and
  programming chip multiprocessors
     – Lots to learn from traditional parallel computers,
       but many problems and trade-offs are new
           • New applications
           • The trade-offs on-chip are very different to those when
             designing physically larger parallel machines
           • Power and energy constraints
           • Parallel programming for the masses

Chip Multiprocessors (ACS MPhil)    28                           2010/11
This course
1. Trends in microprocessor architecture
2. Introduction to parallel computing
3. Parallel algorithms
4. Chip Multiprocessors (I)
5. Chip Multiprocessors (II)
6. Transactional memory
7. On-chip interconnection networks
8. Manycore research issues
     – Guest lecture

Chip Multiprocessors (ACS MPhil)   29      2010/11