Docstoc

greg

Document Sample
greg Powered By Docstoc
					    Computer Architecture and Compilers

                        Greg Steffan


                      ECE Department
                    University of Toronto




Research Overview            1              Steffan
   Trend1: Range of a Wire in One Clock Cycle



                     We are here!




    having one big processor soon infeasible
this motivates distributed processing on a chip
 Research Overview       2                  Steffan
            Some Current Chip Multiprocessors

          PE
          PE
          PE
          PE
                                     CPU
    CPU


                                           CPU CPU
          PE
          PE

          PE
          PE                         CPU
           IBM Cell
                                            Intel Yonah
                            AMD Opteron
CPU CPU
                      Improved throughput is straightforward
                      How can these run one program faster?
 IBM Power4

Research Overview                3                        Steffan
          The Dream: Automatic Parallelization




                                                   CPU CPU
       User writes a     Compiler and Runtime        Multiple CPUs
    sequential program    System Parallelize it   Execute the Program


         easily exploit chip multiprocessors
Research Overview                4                             Steffan
                     My Research (Part1)




• Making parallelization easier
   – For desktop apps, scientific simulations, databases…

• New chip multiprocessor architectures
• New compiler technologies

            towards automatic parallelization
 Research Overview              5                           Steffan
                                        Trend2: Power Density

      1000

                                                                                                  Rocket
                                                                                                  Nozzle
                                            Nuclear Reactor
             100                                          Pentium® 4
Watts/cm 2




                                 Hot plate                   Pentium® III
                                                      Pentium® II
              10
                                              Pentium® Pro
                   i386                 Pentium®
                                 i486
               1
                   1.5m     1m      0.7m   0.5m   0.35m    0.25m   0.18m   0.13m   0.1m   0.07m


             Research Overview                                     6                                       Steffan
      The Dream: Doing More with Less Power


                    cell phone + PDA + MP3 player + digital camera + TV + ?




   Ultra-Efficient Systems: Chameleon Computing
   • Hardware that can be reconfigured
         – To exactly match application requirements

   • Design custom systems automatically
         – To enable really fast time-to-market

Research Overview                      7                              Steffan
                           My Research (Part2)
                    FPGA                                            Processor
                                                                                       Zero
                                                                                       Test
                                                          Instr                                          datIn
                                                            15:0
                                                                    Xtnd                                   Data
                                                            20:0
                                                                           Xtnd << 2                       Mem.
                                                            25:21                                            datOu t
                                                                      re gA     datA                     addr
                                                                                          aluA
                                                            20:16     re gB
                                        P        Instr.                       Reg.
                                        C        Mem.                         Array
                                                                      datW                    ALU
                                                                      re gW     datB      aluB
                                                            20:13


                                                            4:0
                                                                                                 Wdest
                                                            25:21
                                                 IncrPC
                                            +4


                                                                                         Wdata




• Understanding “soft processor” architecture
     – A processor built out of an FPGA’s reconfigurable logic
     – Revisit processor architecture in this new context

• “Soft Systems” that target FPGAs
     – New processors, compilers, OSs, platforms, applications
Research Overview                   8                                                                                  Steffan
 Our Soft Processors vs Altera Nios II Variants

               _
                                              9000
                                                                                        SPREE Processors

               Average Wall Clock Time (us)
                                              8000                                          Altera Nios II/e
                                                                                            Altera Nios II/s
                                              7000
                                                                                            Altera Nios II/f
                                              6000

                                              5000
better
  faster
                                              4000

                                              3000

                                              2000

                                              1000
                                                  500   700   900       1100    1300    1500        1700       1900
                                                                    Area (Equivalent LEs)


                                                                        smaller
Competitive and can dominate (9% smaller, 11% faster)
Research Overview                                                         9                                           Steffan
                    The PaCRaT Team
        Parallelization and Customization
             Research at U of Toronto




Research Overview          10               Steffan

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:9/7/2012
language:English
pages:10