Intel Itanium (PowerPoint) by haseeb09

VIEWS: 126 PAGES: 34

									Intel Itanium Architecture(64-bit)‫‏‬
Overview
                         Overview

   Why develop?
     RISC processing limit of one instruction per cycle
      predicted 1989 by HP
     Led to HP development of EPIC(Explicitly Parallel
      Instruction Computing)‫‏‬
           Uses a form of VLIW(Very Long Instruction Word)‫‏‬
     HP decides to partner with Intel to develop new
      Architecture based off EPIC in 1994
     IA-64 is born
                           Versions

   Merced
       Codename of the first Intel/HP joint IA-64 chip
       Development problems
            Transistor numbers
            Teams had different priorities
            Unanticipated research
   Itanium
       Official name of Merced
       Released 2001
       Due to development delays was lacking
            Called the Itanic
            RISC and CISC performance increases due to
             superscaler architectures
                       Versions

   Itanium 2
       Released 2002
       Codenamed McKinley
       Improved on Itanium design
       Outperformed comparable RISC and CISC
        processors
   Madison
       Released 2003
       Basis for all future versions until 2006
                Competing Chips

   UltraSPARC(Scalable Processor Architecture)‫‏‬
       Developed by Sun Microsystems
       RISC Architecture
   SPARC64
       Developed by Fujitsu
       RISC Architecture
   POWER6(Performance Optimization With
    Enhanced RISC)‫‏‬
       Developed by IBM
       RISC Architecture
                Competing Chips

   Opteron
       Developed by AMD
       X86 Architecture
   Xeon
       Developed by Intel
       X86 Architecture
Intel Itanium Architecture




      Chip Layout
                Chip Layout

   Itanium Architecture Diagram
Chip Layout
               Itanium Specs

   4 Integer ALU's
   4 multimedia ALU's
   2 Extended Precision FP Units
   2 Single Precision FP units
   2 Load or Store Units
   3 Branch Units
   10 Stage 6 Wide Pipeline
   32k L1 Cache
   96K L2 Cache
   4MB L3 Cache(extern)‫‏‬
   800Mhz Clock
                  Itanium Specs

   Process 180nm
   System Bus Speed 2.1GB/s
       266Mhz
       64 bit Wide
                 Itanium2 Specs

   6 Integer ALU's
   6 multimedia ALU's
   2 Extended Precision FP Units
   2 Single Precision FP units
   2 Load and Store Units
   3 Branch Units
   8 Stage 6 Wide Pipeline
   32k L1 Cache
   256K L2 Cache
   3MB L3 Cache(on die)‫‏‬
   1Ghz Clock initially
       Up to 1.66Ghz on Montvale
                 Itanium2 Specs

   180nm Process
       Increased to 130nm in 2003
       Further increased to 90nm in 2007
   System Bus Speed 6.4GB/s
       400Mhz
       128 bit Wide
           Itanium2 Improvements

   Initially a 180nm process
       Increased to 130nm in 2003
       Further increased to 90nm in 2007
   Improved Thermal Management
   Clock Speed increased to 1.0Ghz
   Bus Speed Increase from 266Mhz to 400Mhz
   L3 cache moved on die
       Faster access rate
            IA-64 Pipeline Features

   Branch Prediction
     Predicate Registers allow branches to be turned on
      or off
     Compiler can provide branch prediction hints
   Register Rotation
       Allows faster loop execution in parallel
   Predication Controls Pipeline Stages
Instruction Set Architecture
                 Registers

 128 Integer Registers
 128 Floating Point Registers
 64 One-Bit Predicates
 8 Branch Registers
                Overview

 RISC architectures approaching processing limit of
  1 instruction per clock cycle
 Explicitly Parallel Instruction Computing (EPIC)
  allowed multiple instructions in one cycle
 Implements a form of Very Long Instruction Word
  (VLIW)‫‏‬
 Compiler determines in advance which instructions
  can be executed in parallel
                      VLIW

   Normally, pipelining is done by checking for
    interdependencies, then resolving them
   This comes at the cost of hardware complexity
   With VLIW, determining which operations can
    execute in parallel is done by the compiler
   Extra scheduling hardware not needed
   Result is less hardware complexity, but greater
    compiler complexity
            Processor Units

 The processor has 30 functional units in 11 groups
 Each unit can execute a particular subset of the
  instruction set
 Common instructions can be executed by multiple
  units
         Processor Units – cont.
   6 general-purpose ALUs, 2 integer units, 1 shift
    unit
   4 data cache units
   6 multimedia units, 2 parallel shift units, 1 parallel
    multiply, 1 population count
   2 floating-point multiply-accumulate units, 2
    "miscellaneous" floating-point units
   3 branch units
      Processor Units – cont.

 Some of the units are designed for specific tasks,
  to improve performance
 For instance, the floating-point multiply-
  accumulate unit
 Allows an instruction that has a multiply followed
  by an add
 Very common in scientific processing
            Branch Predication

   All possible branches are executed
   Correct path is kept, all others discarded
   Almost every instruction in the IA-64 instruction set
    is predicated (qp field)‫‏‬
   Predicates stored in special registers
   One of these registers is always TRUE, so
    unpredicated instructions always have the value
    true
           Register Renaming

   Sometimes instructions share the same
    register name, but do not depend on each
    other
   This makes it impossible to run the instructions
    in parallel
   In this case, a special technique can be used
    to rename the conflicting registers
   This is also performed by the compiler
     Register Renaming - Example

1.   lw $1, 1024
2.   addi $1, $1, 2
3.   sw $1, 1032
4.   lw $1, 2048
5.   addi $1, $1, 4
6.   sw $1, 2056

• Instructions 4, 5, and 6 are independent of 1, 2,
  and 3, but the processor cannot finish 4 until 3 is
  done, because 3 would write the wrong value
     Register Renaming - Example

1.   lw $1, 1024
2.   addi $1, $1, 2
3.   sw $1, 1032
4.   lw $2, 2048
5.   addi $2, $2, 4
6.   sw $2, 2056

• Now instructions 4, 5, and 6 can be executed in
  parallel with 1, 2, and 3.
Intel Itanium Architecture




       Conclusions
                           Conclusions

   Several Differences exist over MIPS for
    example:
       Large Instruction Sizes
       Deeper Pipeline
            8 and greater for IA-64
            5 for MIPS
       Large Instruction Set
   Pros
       Very Fast FP Units
       Very useful for companies operating large servers
       Supercomputing
            Thunder (LLNL)‫‏‬
                 2nd Fastest supercomputer in the world
                 19.94 TFlops
                       Conclusions

   Cons
       Very costly
            < $4000 per chip
     Requires very smart compilers that are very hard to
      develop
     GCC machine code still has bugs
            Fails to compile at times
            May be fixed when new optimizations introduced into
             mainline GCC
   OVERALL
       Great processor for high end servers
       Not useful for the average user
                           Conclusions

   Future Work
       Tukwila
            May use 32nm Process
            30 Mb on die caches
            Itanium Bus replaced with Intel Quick Path Interconnect
                 Faster data xfer rates
            4 Cores
       Poulson
            Will use 32nm Process
            More cores, More Parallelism
            Not much known as of yet
       Kittson
            Codename for newest IA-64 project
                 Not much else known, Stay tuned for more!
  Q&A




Questions?
      Thank You




Thanks for listening!
                      References

   Intel Itanium Architecture Presentation
   http://www.rrze.uni-erlangen.de/dienste/arbeiten-
    rechnen/hpc/vortraege/IntelCornelius.pdf

   Itanium Solutions Alliance
   http://www.itaniumsolutionsalliance.org/news/pr/view?item_key
    =8e2e31463df96d0033d7d1450f50492523b9e842

   Wikipedia
   http://www.wikipedia.com

   Intel Itanium Developers Manual
   www.intel.com/design/itanium/manuals/iiasdmanual.htm

								
To top