Docstoc

spatial capacity

Document Sample
spatial capacity Powered By Docstoc
					              Computing Media and
                 Languages for
           Space-Oriented Computations

                       (opening remarks)




                                           1
DeHon Dagstuhl 06361
            Big Idea: Matter Computes
       •      Our physical world implements
              computations
       •      Double implication
            1. Computing landscape determined by
               laws of physical world
            2. Understand our physical world in terms of
               the computation it performs
                 •     Control our physical world by programming
                       the computation it performs.
                                                                   2
DeHon Dagstuhl 06361
            Convergence of Concerns
       • Dealing with space as a physical issue
         when implementing modern computing
         components and systems
            – DSM IC, sublithographic effects
       • Realizing shapes/behaviors/properties
         using computations
            – Distributed robotics, programmable matter
       • Programming physical systems
            – Self-assembly, protein networks, …
                                                          3
DeHon Dagstuhl 06361
                         Viewpoint

       • Traditional/mainstream
            – Abstractions, models, algorithms,
              languages
       • Have not adequately dealt with spatial
         issues
            – Either as optimization
            – Or as computational goal
       • Now have several communities
         approaching this from different
         perspectives                             4
DeHon Dagstuhl 06361
                       Week Outline




                                      5
DeHon Dagstuhl 06361
                        Monday
         9:00am Opening Remarks
         9:15am DeHon – Spatial Compute
        11:15am Coore – Amorphous Computing
        12:15pm LUNCH
         1:30pm Goldstein – Programmable Matter
         2:30pm Gruau – Blob Computing
         3:30pm Coffee/Cake
         4:00pm Giavitto –Data Structures as Space
                                                     6
DeHon Dagstuhl 06361
        Challenges and Opportunities for
               Spatial Computing
                         André DeHon
                       <andre@acm.org>


                                         7
DeHon Dagstuhl 06361
                             Message
       • Opportunity
            –   Large and capable computing systems
            –   Continued scaling  primarily in spatial capacity
            –   Performance capabilities from parallelism
            –   Dynamically (re)programmable/adaptive
       • Spatial Challenges
            – Distance=Delay
            – Communications take up space and energy
       • Demands
            – New models/abstractions/algorithms

                                                                    8
DeHon Dagstuhl 06361
            Convergence of Concerns
       • Dealing with space as a physical issue
         when implementing modern computing
         components and systems
            – DSM IC, sublithographic effects
       • Realizing shapes/behaviors/properties
         using computations
            – Distributed robotics, programmable matter
       • Programming physical systems
            – Self-assembly, protein networks, …
                                                          9
DeHon Dagstuhl 06361
                             Outline
     •   Scaling
     •   Spatial vs. Temporal Computation
     •   Ground spatial examples: FPGAs, nanoPLA
     •   Spatial Challenges:
          – Scaling, Interconnect Delay and Requirements
          – Defects, faults, lifetime effects
     • Opportunities
          – Capacity, parallelism, scaling, adaptation
     • Why not: C, VHDL…
     • Design Patterns
     • System Architectures
                                                           10
DeHon Dagstuhl 06361
                       Capacity




                                  11
DeHon Dagstuhl 06361
            Capacity Scaling from Intel




                                          12
DeHon Dagstuhl 06361
          Area Perspective




                             13
DeHon Dagstuhl 06361
      Spatial Capacity Scaling Continues

       • Tl2 by end of Silicon roadmap
            – Another decade
            – Over Billion gates
       • Molecular-scale promises
            – Two orders of magnitude more than that
       • All 2D  still have third dimension to
         exploit
            – [paper at NanoNets2006 in 2 weeks]
                                                       14
DeHon Dagstuhl 06361
                        Implication
       • Qualitatively not in the same world we
         were in 1945  1985
       • Orders of magnitude shift in resources
            – Suggest dramatic changes in strategy
       • We have been on exponential curve
            – But…up to about 1990
            – Shrinking same kind of computers down to
              one chip

                                                     15
DeHon Dagstuhl 06361
                       Temporal vs. Spatial




                                              16
DeHon Dagstuhl 06361
                            Example
       • Compute:
                       Y=Ax2+Bx+c




                                      17
DeHon Dagstuhl 06361
              Temporal Implementation
       • Single Operator
       • Reuse in time
       • Store instructions
       • Store intermediates
       • Communication
         across time
       • One cycle per
         operation                      18
DeHon Dagstuhl 06361
                 Spatial Implementation
       •   One operator for every operation
       •   Instruction per operator
       •   Communication in space
       •   Computation in single cycle




                                              19
DeHon Dagstuhl 06361
              Conventional Processors
     • Have temporal organization
     • Large instruction and data
       memory per active
       processing element
     • Economize on instruction memory with
       word-wide SIMD organization
                                  w

                        op   op       op   op
                                                20
DeHon Dagstuhl 06361
           Conventional Processors

     • Economize on Area
     • Pack Large computation
     • Into small area
     • By storing description of computation
       compactly
     • Reusing small number of processing elements
       in time
     • Trade time for area
     • Absolutely the right thing for 1985 Silicon
                                                21
            – (and pre-integrated circuits)
DeHon Dagstuhl 06361
                       Early Challenge
       • How do I make my large program fit on
         an economical computer?
            – Can compute with 10K vacuum tubes?
            – Fit in caches hold 100 instructions
            – 64K address space


       • Heavy sequentialization was a good
         engineering solution…for 19451990
                                                    22
DeHon Dagstuhl 06361
         Field-Programmable Gate Array
                     (FPGA)

    K-LUT (typical k=4)
     Compute block
       w/ optional
    output Flip-Flop




   LUT = Look Up Table                   23
DeHon Dagstuhl 06361
        Field-Programmable Gate Arrays

       • Have spatial computing organization
       • Small instruction area per active
         operator
            – pack more computation on die

       • Bit-level control
            – use more of available ops


                                               24
DeHon Dagstuhl 06361
        Field-Programmable Gate Arrays

       • Put more area into computing
       • Have more compute elements per die
       • Support more computation per cycle
        • Trade area for time
        • With more capacity
             – More applications fit
               spatially
        • More appropriate 2000+              25
DeHon Dagstuhl 06361
                       Component Example
       • Single die in 0.35mm
              XC4085XL-09 3,136 CLBs        4.6ns
                         682 Bit Ops/ns
              Alpha 1996       264b ALUs   2.3ns
                          55.7 Bit Ops/ns




                                                    26
DeHon Dagstuhl 06361
                      Empirical
                Raw Density Comparison
  Computational
    Density


    ALU bitops
         l2 s




                       Time              27
DeHon Dagstuhl 06361
                       Spatial Computing
       • Enabled by high capacity
       • Has a density advantage
       • Now have sufficient capacity to hold
         large range of interesting problems
            – 100,000 bit-level operators on a single chip
            – More on the way
       • Can exploit the kinds of capacities now
         becoming available
                                                         28
DeHon Dagstuhl 06361
                Spatially Programmable




                                  FPGA




                                         29
DeHon Dagstuhl 06361
                       Ground Examples




                                         30
DeHon Dagstuhl 06361
                       Today’s FPGAs
  • 100,000s of LUTs
  • Embedded blocks
       – Many small distributed
         memories
       – Megabits of memory
       – Data rates ~10Tb/s
            • 10—100x over uP
  • Operate 100s of MHz
  • Easily scale up spatially
       – Step and repeat               31
DeHon Dagstuhl 06361
         Simple Nanowire-Based PLA




    NOR-NOR = AND-OR PLA Logic
                                 FPGA 2004   32
DeHon Dagstuhl 06361
                       Tile into Arrays




                                      FPGA 2005   33
DeHon Dagstuhl 06361
                       nanoPLA Capacity
   • 10mm x 5mm subarrays
   • Millions on single-layer
     modest die
   • 100 Product Terms per
     subarray
   • Include memory
     blocks
   • Stack in 3D

                                          34
DeHon Dagstuhl 06361
                       Interconnect




                                      35
DeHon Dagstuhl 06361
                 Interconnect Challenge
       • With 100,000 processing elements
         cooperating on a task
            – (can get today with FPGAs)
       • Must communicate
       • Interconnect becomes dominate
            – Area, delay, energy
       • Replaces memory for communications
            – Less heavily studied
                                              36
DeHon Dagstuhl 06361
          Motivating Example: Memories
             (Memory from mux bits)




                                         37
DeHon Dagstuhl 06361
                       Large Memories
       • Build larger memory
            – Simple model: multiplex together more
              cells




                                                      38
DeHon Dagstuhl 06361
                   Delay vs. Memory (1)
  • How does delay grow with memory size (N) ?
        Tmem = Tdecode+Tcell+Tmux
        Tmux = log(N)  Tmux4




                                          39
DeHon Dagstuhl 06361
                   Delay vs. Memory (2)
    • Tmem = O(log(N))
    • Does this make sense for large N?
         – Speed of light?
    • Tmem = Tlogic + Twire
    • 2D memory:
          Twire = O(N)
    • Tmem=C1log(N) + C2N
                                          40
DeHon Dagstuhl 06361
                       Chips >> Cycles
       •   Chips growing
       •   Gate delays shrinking
       •   Wire delays aren’t scaling down
       •   Will take many cycles to cross chip




                                                 41
DeHon Dagstuhl 06361
                       Clock Cycle Radius
• Radius of logic can reach in one cycle (45 nm)
    – Radius 10
         • Few hundred PEs
    – Chip side 600-700 PE
         • 400-500 thousand PEs
    – 100s of cycles to cross



                                            42
DeHon Dagstuhl 06361
            Communication Expensive
       • What if we just built a crossbar?
            – Interconnect area scales as N2
       • Must exploit typical locality in design to
         reduce area
            – Rent’s Rule: IO=cNp
                 • (0.5p0.75) typical
            – How well can we engineer low p?
                 • Where show up in algorithm/computation design?
                                                              43
DeHon Dagstuhl 06361
                        Optimizing
       • Must exploit physical locality (placement)
            – Reduce wire requirement (reduce p)
            – Reduce distance traveled over wires
       •  new meaning to spatial locality
       • Interconnect must show up in our design
            – Run-time management
            – Algorithms

                                                    44
DeHon Dagstuhl 06361
           Clock Cycle Scaling has Ended

   • Up to ~2000 scaled down clock cycle
        – Architecture scaling: fewer gates/clock
        – Now down to ~10 gates/clock
   • Energy-limited computation
        – Could run a few devices faster…but not all
          of them                                                Old




                                            Distribution
                                            Probability
   • Variation at nanoscale                                New

     diminishing clock frequencies
    Future scaling is spatial
                                                  Delay


                                                                 45
DeHon Dagstuhl 06361
        Atomic-Scale Physical Effects
       • As our devices approach the atomic
         scale, we must deal with statistical
         effects governing the placement and
         behavior of individual atoms and
         electrons.




                                                46
DeHon Dagstuhl 06361
          Three Atomic-Scale “Problems”
       • Defects: Manufacturing imperfection
            – Occur before operation; persistent
                 • Shorts, breaks, bad contact
       • Faults:
            – Occur during operation; transient
                 • node X value flips: crosstalk, ionizing particles, bad
                   timing, tunneling, thermal noise
       • Operational/lifetime defects:
            – Parts become bad during operational lifetime
                 • Fatigue, electromigration, burnout….
            – …slower
                 • NBTI, Hot Carrier
                                                                            47
DeHon Dagstuhl 06361
                             Message
       • Opportunity
            –   Large and capable computing systems
            –   Continued scaling  primarily in spatial capacity
            –   Performance capabilities from parallelism
            –   Dynamically (re)programmable/adaptive
       • Spatial Challenges
            – Distance=Delay
            – Communications take up space and energy
       • Demands
            – New models/abstractions/algorithms

                                                                48
DeHon Dagstuhl 06361
                       Questions so far?




                                           49
DeHon Dagstuhl 06361
                       Challenge




                                   50
DeHon Dagstuhl 06361
                            Challenge
       • How do we design and program
         scaleable spatial solutions?
            – That exploit the capabilities of spatially
              programmable hardware
                 • High computational throughput performing the
                   same computation
                 • High bandwidth to local, distributed memories
                 • Adaptable to problem

                                                                   51
DeHon Dagstuhl 06361
                       Status Quo




                                    52
DeHon Dagstuhl 06361
                       Negative Experience
       FPGA world attempting this for past decade+
       • Knowing what the hardware can do
         …but fighting the tools to extract it.
            – Because wrong level of spatial knowledge and
              control
                 • None or not abstracted
       • Designs which must be discarded
         (redesigned) when newer/larger/faster
         devices come out
       • Students, software background
            – How design for this?
            – Poor solutions not exploiting strengths        53
DeHon Dagstuhl 06361
                       Challenge
   • How do we design and program scaleable
     spatially programmable solutions?
        – How point new developers in the right direction?
        – How make design less painful?
        – How preserve design investment?
        – How improve the tools?



                                                      54
DeHon Dagstuhl 06361
                  Why Use FPGAs/RC?
       • Exploit what hardware can do:
                 Spatial parallelism
                 High BW and low latency access to memory
       • Ride technology curve (avoid NRE)
       • Adapt to change
            – Standards, trends…
       • Adapt to app./deployment requirements
       • Reduce Risk
                           Our design methodology
                           should exploit and support
                                                            55
DeHon Dagstuhl 06361       these things!
                  Why Use FPGAs/RC?
  • Exploit what hardware
    can do:
                                     • Design approaches
            Spatial parallelism        derived from
            High BW and low            processors and
          latency access to            ASICs don’t support
          memory
  • Ride technology curve
    (avoid NRE)
  • Adapt to change                                 FPGA
       – Standards, trends…
  • Adapt to
    app./deployment Our design methodology
    requirements
                    should exploit and support
  • Reduce Risk                                          56
DeHon Dagstuhl 06361        these things!
                       Design for Scalability
       • How long lived is:
            – An application?
            – Before you answer
               • …. don’t forget Y2K
            – A device generation?
            – Remember:
                 • X/A put out new chips every 12-24 mo.
       • Is it good economics to throw away
         designs that often?
                                                           57
DeHon Dagstuhl 06361
          VHDL/Verilog (ASIC Design)
    • Design one level of parallelism
         – Must redesign to scale to new hardware
    • Specifies physical cycles
         – Burden on developer to figure out what should
           go together in a cycle
    • Maybe adequate for the design of one chip
    • Cannot express the implementation
      freedom in the task

                                                      58
DeHon Dagstuhl 06361
                       Fortran / C / C++

       • Single threaded
       • Single memory
       • Implicit communications
       • Thin abstraction over sequential
         machines from the 1960’s
       • Leads designers in wrong direction
       • Gives compilers hard problem
         discovering parallelism/communication
                                                 59
DeHon Dagstuhl 06361
       • We need a better way to design
         spatial computing solutions




                                          60
DeHon Dagstuhl 06361
                       Design Approach
       • How do we organize parallelism in
         applications?
            – System Architectures


       • How do we refine and optimize
         implementations?
            – Design Patterns
                                     Jump to end

                                                   61
DeHon Dagstuhl 06361
                   System Architectures




                                          62
DeHon Dagstuhl 06361
                   System Architectures
       • Disciplines for organizing computation
       • Capture the gross structure



       • Deliberate connection to
                “Software Architectures”
            – Extending and adapting for RC setting
                                                      63
DeHon Dagstuhl 06361
                        Starting Point?
       • You have:
            – Millions of programmable 4-LUTs/ALUs/etc.
            – Thousands of memory blocks
            – Programmable interconnect
       • You get to worry about:
            –   What goes into a clock cycle
            –   Where things are placed
            –   Which algorithms to use
            –   ….
       • Go design an efficient solution….
                                                          64
DeHon Dagstuhl 06361
         Analogy: Planning a Meeting
       • You have:
            – Group of 100s of people
            – Each with some information and expertise
            – Some pictures, diagrams, plots


       • How will organize interaction to share
         information and expertise?

                                                         65
DeHon Dagstuhl 06361
                       Gross Organizations
       •      Common problem, with a small number of
              general solution types
            1.    Lecture
            2.    Panel
            3.    Breakout Group
            4.    Poster Session
       •      Each solution comes with a number of
              details
            –     Physical layout, interaction types, coordination
                  plan….
                                                                     66
DeHon Dagstuhl 06361
                              Lesson
       • Unconstrained problems is daunting
            – Gives little guidance
       • Small number of useful archetypes
            – Not one
            – Not hundreds
            – With a catalog
                 • We can assess which best addresses our
                   problem
                                                            67
DeHon Dagstuhl 06361
                       Application
       • Unconstrained
            – Resource Sets
            – Parallelism
       • Are hard problems
            – Don’t give guidance
            – Hard to manage correctness
            – Hard to manage details

                                           68
DeHon Dagstuhl 06361
         System Architecture Hypothesis
       • There are a small number of useful
         system architectures
       • These architectures
            – Give guidance for organizing resources
            – Make manageable
            – Allow share lessons between applications
            – Provide basis for scalability
            – Point toward efficient solutions
                                                         69
DeHon Dagstuhl 06361
                       Unconstrained Model
       • Multithreaded programming
         (equivalently Communicating Sequential
         Processes)
            –   Application is collection of threads
            –   Communicate with each other
            –   May or may not have shared memory
            –   Programmer responsible for
                 •   Synchronization
                 •   Parallelism
                 •   Data layout
                 •   Communications…
       • Like putting all the attendees in a room and
         saying “interact”….
                                                        70
DeHon Dagstuhl 06361
               Architectural Restrictions
       • Sequential Control
            – Data Parallel  all parallel processing
              does the same thing
            – Lock-Step  all parallel processing does
              different things at synchronized time (e.g.
              VLIW)
            – Bulk Synchronous  periodic barrier
              synchronization

                                                            71
DeHon Dagstuhl 06361
          Architectural Restrictions (2)
       • Dataflow interactions
            – Allow multithreaded operation
            – Use data presence for synchronization
       • E.g.
            – Pipe-and-filter / Streaming Dataflow
            – Synchronous Dataflow (SDF)


                                                      72
DeHon Dagstuhl 06361
          Architectural Restrictions (3)
       • Regular Communication Patterns
            – Systolic
            – Cellular Automata  regular grid of
              homogeneous FSMs




                                                    73
DeHon Dagstuhl 06361
          Architectural Restrictions (4)
       • Memory/Data Centric
            – Computation is collection of objects in
              memory
            – Each object triggered by input changes
            – Compute and potentially trigger other
              objects
       • E.g.
            – Repository models
            – GraphStep
                                                        74
DeHon Dagstuhl 06361
          System Architecture Taxonomy




         (Subject to continuing refinement and embellishment)75
DeHon Dagstuhl 06361
  System
  Architecture
  Taxonomy
                                 Deterministic


      • Further down the hierarchy
           – More restricted the model
           + More guidance provided
           + More efficient potential implementation
           + More amenable to analysis
                •  tools and optimizations
      • Restrictions provide power
                                                       76
DeHon Dagstuhl 06361
         System Architecture Hypothesis
       • There are a small number of useful
         system architectures
       • These architectures
            – Give guidance for organizing resources
            – Make manageable
            – Allow share lessons between applications
            – Provide basis for scalability
            – Point toward efficient solutions
                                                         77
DeHon Dagstuhl 06361
        Example System Architecture

                            SCORE
                       Streaming Dataflow
                               or
                         Pipe-and-Filter

                                            78
DeHon Dagstuhl 06361
               SCORE / Pipe-and-Filter
     • Basic Idea: Computation is a graph
          – Processing nodes are filters
          – Processing nodes are connected by pipes
            (streams)




                       DCT   Encode
                                                      79
DeHon Dagstuhl 06361
       Scalable Abstraction: Streams
       • Logical abstraction of a persistent
         point-to-point communication link
       • Captures communications structure
            – Explicit producerconsumer link up
            – Expose to Compiler and Runtime System
       • Abstract communications
            – Physical resources or implementation
            – Delay from source to sink
                                                      80
DeHon Dagstuhl 06361
                         Stream Freedom
              transform
                                   spatial           tran          qnt
               quantize

                  RLE
                                                     rle           enc
                encode

                                             swap

                       tran swap    qnt      swap   rle     swap    enc

                                                                          81
DeHon Dagstuhl 06361
                  SCORE Basics
           Stream-Based Compute Model

     • Abstract computation is a dataflow graph
          – Capture potential parallelism
     • Connected with Stream Links
          – Abstracts communications
     • Allow instantiation/modification/destruction of
       dataflow during execution
          – separate dataflow construction from usage



                                                         82
DeHon Dagstuhl 06361
                Virtual Hardware Model

       • Dataflow graph is arbitrarily large
       • Hardware has finite resources
            – resources vary from implementation to
              implementation
       • Dataflow graph is scheduled on the
         hardware
       • Happens automatically (software)
            – physical resources are abstracted in
              compute model                           83
DeHon Dagstuhl 06361
                   Hardware Abstraction
       • Separate:
            – Logical reconfiguration: the compute graph
              changes
            – Physical reconfiguration: what’s running on the
              reconfigurable hardware changes
       • Model (user program) supports logical
         reconfiguration:
          – new operator, new stream, operator ends
       • Runtime System responsible for physical
         scheduling
       • Abstracts hardware size  allows scaling
                                                                84
DeHon Dagstuhl 06361
          Typical Device Organization
       • Compute Pages
            – Spatial
            – Temporal (mP)
                 • For infrequent code
       • Memory Blocks
       • Network
       • Scale with available
         capacity

                                         85
DeHon Dagstuhl 06361
      JPEG: Dynamic<->Quasi-static
                               P3-500
                               19M Cycles




                                       86
DeHon Dagstuhl 06361    [Markovskiy…/FPGA’02]
         System Architecture Hypothesis
       • There are a small number of useful system
         architectures
       • These architectures
            – Give guidance for organizing resources
                 • Think of pipes-and-filters
            – Make manageable
                 • Composable operators, dataflow synchronization
            – Provide basis for scalability
                 • Time-multiplexing of filters
            – Allow share lessons between applications
            – Point toward efficient solutions
                 • Set of optimizations amenable to this model      87
DeHon Dagstuhl 06361
                       Design Patterns




                                         88
DeHon Dagstuhl 06361
                       SCORE Lessons
       • Good for many things
       • Does allow scaling
       • Avoided many of the pitfalls of
         “unconstrained” parallel programming
            – Non-determinism, sequential bottlenecks
       • But
            – There are things the hardware could do
              which did not fit well
            – Time Multiplexing of a fixed graph, not
              always the best way to scale
                                                        89
DeHon Dagstuhl 06361
                       Patterns
       • Below the gross organization
         (architecture) there are still a large
         number of decision to make

       • What’s the bag of tricks?



                                                  90
DeHon Dagstuhl 06361
                        Challenge
       • How teach new designers how to build
         reconfigurable solutions?
            – Undergraduates?
            – Software Programmers interested in power
              of Reconfigurable Computing?
       • What would we need in a language to
         encourage people toward good RC
         solutions? Support what RC does well?
                                                     91
DeHon Dagstuhl 06361
                          Scenario
       • Student:
            – This RC stuff cannot do sorting well
            – Best I can see is to build a memory and
              implement sequential, quicksort
       • André sees:
            – Sequential solution
            – Monolithic memory
            – Requires random access
                                                        92
DeHon Dagstuhl 06361
                         Quicksort:
       • André sees:
            – Sequential solution
            – Monolithic memory
            – Requires random access
       • …but, a good RC solution
            – Admit spatial/parallel implementation
            – Exploit small/fast distributed memories
            – Regular data access?
                                                        93
DeHon Dagstuhl 06361
                       Quicksort:
       • André sees:
            – Sequential solution
            – Monolithic memory
            – Requires random access
       • Student knows techniques for good
         sequential/processor based solutions
       • Has not been exposed to techniques
         which exploit spatial parallelism
                                                94
DeHon Dagstuhl 06361
                         RC solution #1

• How large/fast do I need this sort?
• Build a spatial sorting network:



                                                            (from Knuth)




                       Too big, too fast?  bit serial datapath elements?
                                                                    95
DeHon Dagstuhl 06361
                       RC Solution #2
    • Often receive data as a sequential stream
    • Can I sort the data as it arrive?
    • Build a systolic solution?
         – Use only local interconnect


                                         Cell traps largest value



                                                            96
DeHon Dagstuhl 06361
                       RC Solution #2
    • Often receive data as a sequential stream
    • Can I sort the data as it arrive?
    • Build a systolic solution?
         – Use only local interocnnect




                                                                97
DeHon Dagstuhl 06361   [based on Leiserson’s Systolic Priority Queue]
                       There are Solutions
       1.     Sorting Network
       2.     Systolic Priority Queue
       3.     Mesh Sorts O  N  [Leighton]
       4.     Radix Sort
       •      …but these aren’t the solutions the
              student would see in a classic
              algorithms text
       •      …sometimes very application specific
                                                     98
DeHon Dagstuhl 06361
                       Understanding RC
       • Sequential Computing relies on a certain
         set of idioms/patterns/paradigms
       • Reconfigurable/Spatial Computing
         opens up a new set of patterns
            – Broader set of design “tricks”


       • Need to expose these to new designers
       • Need to nurture and share them
                                               99
DeHon Dagstuhl 06361
       • This points us towards a study of
         Design Patterns




                                             100
DeHon Dagstuhl 06361
                       Design Pattern
       • Solution to a recurring problem
       • E.g.
            – Problem: design too big to fit on available
              hardware
            – Solution: time-multiplex large design onto
              small hardware
       • Usually organizing and structuring
         principles
            – Contrast library elements/IP Blocks
                                                        101
DeHon Dagstuhl 06361
                Design Patterns Legacy
       • Alexander – A Pattern Language
       • Floyd – Paradigms of Programming
            – Turing Lecture 1978
       • Gamma, Helm, Johnson, Vlissides
         (Gang of Four) – Design Patterns
            – OO Reuse
            – Stylized way of capturing

                                            102
DeHon Dagstuhl 06361
                       Lecture Patterns
       • How assure high quality lectures
         run in allotted time?
       • Patterns:
            – Request slides in advanced of meeting
            – Slide Reviews
            – Session moderator
                 • Hand time signals
                 • Timing lights
                                                      103
DeHon Dagstuhl 06361
             Pattern Example:
      Coarse-Grained Time Multiplexing
       • Intent: reduce area required to solve problem
       • Motivation:
            –   Application task graphs can be large
            –   Platforms may be small
            –   Platforms vary in size over time
            –   Want to automatically scale across different
                platform sizes
       • Applicability: graphs with limited feedback

                                                               104
DeHon Dagstuhl 06361
                       Hindsight
       • SCORE contained a collection of good,
         complementary patterns




                                             105
DeHon Dagstuhl 06361
                       SCORE Patterns
       • Area-Time Scaling              • Partial Reconfiguration
            – Coarse-Grained Time          – Fixed-Sized and Std. IO
              Multiplexing                   Page
            – Sequential vs. Parallel   • Processor-FPGA
       • Communications                   Integration
            – Streaming Data               – Streaming Coprocessor
       • Synchronization                • Value-added Memories
            – Tagged Data Presence         – Address Generators
            – Queues with
              Backpressure


                                                                   106
DeHon Dagstuhl 06361
                       Pattern Space
    • Motivated by things
         – Hardware could do
         – SCORE could not do
    • Class of useful patterns
      is much larger than
         – Ones use for
           microprocessors
         – Ones use in SCORE
    • Useful to identify the
      patterns which can be
      exploited
                                       107
DeHon Dagstuhl 06361
                       Pattern Classes
       •   A-T Tradeoffs       • Number Represent
       •   Parallelism         • Layout and Comm.
       •   Processor-FPGA      • Comm. Dynamics
       •   Common Case         • Value-Added
       •   Hardware Reuse        Memories
       •   Specialization      • Debug
       •   Partial Reconfig.
                               • Energy
       •   Synchronization
                               • Defect/Fault
                                 Tolerance        108
DeHon Dagstuhl 06361
        What good is Pattern Collection?

       • Reference for Designers
            – Language for discussing options, lessons,
              pitfalls
       • Pedagogy – directly expose students to
         problems and solutions
       • Language and Tool Design – should
         support the techniques that work well

                                                      109
DeHon Dagstuhl 06361
                       Design Approach
       • How do we organize parallelism in
         applications?
            – System Architectures
       • How do we refine and optimize
         implementations?
            – Design Patterns



                                             110
DeHon Dagstuhl 06361
                       Sysarch/DP Message
      • Unconstrained/low-level view of RC is limiting
           – Unnecessarily difficult for new users
           – Designs don’t survive (scale)
      • Systematic Process for Design
      • System Architectures
           – Give structure to decomposing parallelism,
             interaction, and scaling
      • Design Patterns
           – Elements of design
           – Tools should support
                • Automate where possible

                                                          111
DeHon Dagstuhl 06361
                       Overall Message
       • Opportunity
            –   Large and capable computing systems
            –   Continued scaling  primarily in spatial capacity
            –   Performance capabilities from parallelism
            –   Dynamically (re)programmable/adaptive
       • Spatial Challenges
            – Distance=Delay
            – Communications take up space and energy
       • Demands
            – New models/abstractions/algorithms

                                                               112
DeHon Dagstuhl 06361
                   Additional Information
       • <http://ic.ese.upenn.edu/>




                                            113
DeHon Dagstuhl 06361

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/31/2013
language:English
pages:113
dominic.cecilia dominic.cecilia http://
About