Design - Sequence Diagrams - Get Now PowerPoint by wanghonghx

VIEWS: 32 PAGES: 63

									                      Computer Engineering
                        Senior Projects
                               &
                       Research Overview
                An informal overview of past & current
                               projects

                         students & my own

                                   by

                                Al Davis



School of Computing                                      1
    The Engineering Discipline
        Role
          – design and build things
          – change the world around us
              » hopefully for the better
              » hence faced with a continuous ethical dilemma
        Ultimate requirement
          – what we build must work
        Requisite skills
          – science: math, physics, chemistry, materials, CS, …
          – engineering: state of the art, current practice, technology trends,
            manufacturing, testability, maintenance, life cycle costs, …
          – art: creative component that is clearly evident in the great
            engineers




School of Computing                                                               2
    Computer Engineering
        Design and build computer systems
          – inherently involves both software and hardware design skills
        System software
          – compiler, operating system, device drivers, …
          – as opposed to application specific software
              » applications are the target system “user”
              » hence they are used in design evaluation (pre- and post-build)
        Hardware: possibly many disciplines and levels
          – VLSI chip design: analog and digital circuit aspects
              » CS, EE, physics are the key disciplines
              » yet cooling is a big issue – enter ME aspects
          – board design: CS, EE, and manufacturing issues are dominant
          – system design: balance of HW and SW capabilities



School of Computing                                                              3
    CE Senior Projects at Utah
        Logistics
          – CE program run jointly by SoC and ECE departments
          – Senior project is capstone project course
              » team based
              » students choose their own project
              » best mechanism to demonstrate your abilities to future
                 employers
          – CE Senior Project is a year long activity
              » at least for the last 2.5 years
              » Spring term of junior year: plan and propose
              » Summer: get parts and start building (optional)
              » Fall term of senior year: build and demonstrate
          – Exit interview feedback
              » rave reviews for being hard, fun, and instructive



School of Computing                                                      4
    04 Projects
        Satellite Tracking station
        Weaver – a 802.11 remote control vehicle interface
          – camera on car: image and commands to base station via wireless
          – car has autonomous anti-collision capability (infrared)
        GPS Hummer
          – autonomous navigation and anti-collision
          – some AI in route finding since Hummer remembers obstacles that it saw
            previously
        PCI Coprocessor
          – efficient acceleration via PCI add-on
        Jiggawax
          – build your own iPod
        RVI – remote vehicle interface
          – control via web or cell phone
          – control windows, engine, and door locks from RF base station



School of Computing                                                                 5
    05 Projects
        Carputer
          – OBDII car data and 802.11g auto-sync to base station
          – monitor your car or your kids
        IR tag
          – paintball without the mess
        Athlete monitor system
          – real time tracking of position and heart rate to central coaching
            station
          – GPS, RF, and HRM on-athlete
        Inverted pendulum 2-wheeled robot
        Multi-carrier reflectometry
          – finding faults in aircraft wires without tearing the plane apart
        Glider avionics package
          – using accelerometers, GPS, and strain sensors



School of Computing                                                             6
    Current 06 projects (underway now)
        PEN
          – electronic paper – the only paper you’ll ever buy!
        Recipedia
          – a cook book that talks and listens to you
        GPS tracker
          – use campus ubiquitous wireless to keep track of where things are via your
            cell phone or computer
        OmegaCore
          – a DVR that knows how to remove commercials for you
        NoCPR
          – bathtub drowning prevention
        Tracking Visor
          – virtual reality on your head




School of Computing                                                                     7
    Selected Examples
        Some images to illustrate previous projects




School of Computing                                    8
                      Satellite Tracking Station




                       Final dual band antenna
                       on the roof of MEB during
                       demo day




School of Computing                                9
   2 meter (VHF side) antenna specs – students used an antenna design CAD tool




School of Computing                                                              10
School of Computing   11
School of Computing   12
School of Computing   13
School of Computing   14
        GPS Hummer




School of Computing   15
      Controlling direction and speed with transistors




School of Computing                                      16
    GPS internals




School of Computing   17
    A build your own GPS kit from Motorola




School of Computing                          18
                      Autonomous anit-collision
                      system




School of Computing                               19
School of Computing   20
School of Computing   21
School of Computing   22
School of Computing   23
   Glider Avionics Package (note this ended up being done by a single student as a thesis)




School of Computing                                                                          24
                      Designing an electronic compass is non-trivial
                      especially if you want tilt-compensation




School of Computing                                             25
 Board
 Schematic




School of Computing   26
        Power Supply                   Filters and Registers


                       Board Artwork



School of Computing                                            27
School of Computing   28
School of Computing   29
    Senior Project Synopsis
        This was just a peek
        Just remember
          – if you can imagine it you can usually build it
               » there are some things you just can’t do
               » like a perpetual motion machine
                         which violates the laws of physics
          – all it takes is dedication and time
        Huge diversity of both opportunities and problems
        You might have noticed the world isn’t perfect
          – so help fix it!




School of Computing                                            30
    Personal Research Overview
        Past
          – dataflow, VLSI, asynchronous circuits, parallel computing, high
            performance architectures (50% academia, 50% industry)
        Currently there are 4 projects
          – Domain specific architectures
             » target highly constrained embedded systems
             » will highlight the perception processor today
                         have also worked in signal processing and cell phone domains
          – Interconnect driven architecture
              » w/ Rajeev Balasubramonian & students
          – RPU design
              » w/ Erik Brunvand, Pete Shirley, Steve Parker, & students
          – VLSI wire scaling theory
              » w/ Stephanie Forrest & Melanie Moses @ UNM



School of Computing                                                                      31
                      Embedded Computing
                      Characteristics
            Historically
              –   narrow application specific focus
              –   typically cheap, low-power, provide just enough compute
                  power
                  » niche filled by small microcontroller/dsp devices
                  » AND often ASIC component(s)
            New Pressures
              –   world goes bonkers on mobility and the web
                  » expects ubiquitous information access
                  » expects better and cheaper everything
              –   sensors, microphones & cameras become free
                  » so use lots of them
              –   now we’re talking real computing



School of Computing                                                         32
                      New Look for ECS
            Sophisticated application suites
              –   not single algorithms – e.g.
                  » 3G and 4G cellular handsets
                          multiple channels and multiple encoding models
                          plus the usual DSP stuff
                  »    process what is streaming in from the net
                          includes real time media & web access
                  »    process the sensor, microphone, and camera streams
                          plus network information from the neighborhood
                          since things are starting to happen in groups
              –   wide range of services
                  » dynamic selection
                  »  no single app will do
            Rate of algorithmic change is staggering




School of Computing                                                         33
                      ECS Economics
            Traditional reliance on the ASIC design cycle
              –   lengthy IC design - > 1 year typical
              –   little re-use
                  »     IP import works but there are many pitfalls
                          HDL code  synthesize  ed inefficiency
                          Macroblock  forces process and layout issues
              –   turning an IC is costly
                  »    even when it works the first time
            ECS product cycles
              –   lifetime similar to a mayfly
              –   need next improved version “real soon now”
            Result
              –   sell monster volumes in a short time or lose




School of Computing                                                        34
   What is Perception Processing ?

            Ubiquitous computing needs natural human interfaces
            Processor support for perceptual applications
              –   Gesture recognition
              –   Object detection, recognition, tracking
              –   Speech recognition
              –   Biometrics
            Applications
              –   Multi-modal human friendly interfaces (our focus)
              –   Intelligent digital assistants
              –   Robotics, unmanned vehicles
              –   Perception prosthetics




School of Computing                                                   35
         Perception Processing Problem




                         consider always on aspect!!




School of Computing                                    36
              Current Processors Inadequate
            Too slow, too much power for embedded space!
              –   2.4 GHz Pentium 4 ~ 60 Watts

              –   400 MHz Xscale ~ 800 mW

              –   10x or more difference in performance but 100x in power

            Inadequate memory bandwidth
              –   Sphinx requires 1.2 GB/s memory bandwidth

              –   Xscale delivers 64 MB/s ~ 1/19th

            Our methodology
              –   Characterize applications to find the problems

              –   Derive acceleration architecture

                  »   History of FPUs is an analogy


School of Computing                                                         37
                      The Problem w/ GPP’s
            caches & speculation
              –   consume significant area and energy
              –   great when they work – a liability when they don’t
            rigid communication model
              –   data moves from memory to registers
              –   register  execution unit  register
              –   inability to support specialized computational pipelines
                  » ASIC advantage
            bottom line
              –   can process anything
              –   but not efficiently in many cases
              –   it’s the von Neumann trap
                  » lots of overhead for almost no work



School of Computing                                                          38
                      The FaceRec Application




School of Computing                             39
                      FaceRec In Action




                                          Bobby Evans




School of Computing                                     40
                      Application Structure
                                                ANN based
                                    Rowley
                                    Face
                                    Detector       Neural       Eigenfaces
                       Segment
        Flesh tone                                 Net Eye      Face
                       Image
                                                   Locator      Recognizer
                                    Viola &
        Image                       Jones
                                    Face
                                               ~200 stage        Identity,
                                    Detector
                                               Adaboost        Coordinates
            Flesh toning: Soriano et al, Bertran et al
            Segmentation: Text book approach
            Rowley detector, voter: Henry Rowley, CMU
            Viola & Jones’ detector: Published algorithm + Carbonetto, UBC
            Eigenfaces: Re-implementation by Colorado State University



School of Computing                                                           41
                      Application Profile




School of Computing                         42
    Face Recognition Analysis
        Cache
          – small L1D$  high hit rate
          – L2$ is useless – most L1 misses pass through
        IPC
          – low even with lots of FP execution units
          – Why?
              » load store register & memory ports saturate
                         multiple large matrix traversals are the critical kernel
                         several indirect accesses per operation
               » dominant loop is a SFP inner product
                         no single cycle accumulate

        Implications
          – restructure the code – loop fusion  more temporary reg’s
          – need architectures which move data well



School of Computing                                                                  43
                      CMU Sphinx 3.2 Profile




                              Feature Vector = 13 Mel + 1 st and 2nd derivative
                              10 ms of speech is compressed into 39 SP floats
                              iMic possibility



School of Computing                                                           44
    Speech Analysis
                         Results
                           – similar to FaceRec
                               » cache
                               » port saturation
                           – big difference
                               » also memory B/W starved
                               » due to language model




  (opt)



School of Computing                                        45
                      Simple ASIC Design Example:
                      Matrix Multiply

        def matrix_multiply(A, B, C): # C is the result matrix
          for i in range(0, 16):
             for j in range(0, 16):
                 C[i][j] = inner_product(A, B, i, j)

        def inner_product(A, B, row, col):
          sum = 0.0
          for i in range(0,16):
              sum = sum + A[row][i] * B[i][col]
          return sum




School of Computing                                              46
                      ASIC Accelerator Design:
                      Matrix Multiply
        Control Pattern

        def matrix_multiply(A, B, C): # C is the result matrix
          for i in range(0, 16):
             for j in range(0, 16):
                C[i][j] = inner_product(A, B, i, j)

        def inner_product(A, B, row, col):
          sum = 0.0
          for i in range(0,16):
              sum = sum + A[row][i] * B[i][col]
          return sum




School of Computing                                              47
                      ASIC Accelerator Design:
                      Matrix Multiply
        Access Pattern

        def matrix_multiply(A, B, C): # C is the result matrix
          for i in range(0, 16):
             for j in range(0, 16):
                 C[i][j] = inner_product(A, B, i, j)

        def inner_product(A, B, row, col):
          sum = 0.0
          for i in range(0,16):
              sum = sum + A[row][i] * B[i][col]
          return sum




School of Computing                                              48
                      ASIC Accelerator Design:
                      Matrix Multiply
        Compute Pattern
        def matrix_multiply(A, B, C): # C is the result matrix
          for i in range(0, 16):
             for j in range(0, 16):
                 C[i][j] = inner_product(A, B, i, j)

        def inner_product(A, B, row, col):
          sum = 0.0
          for i in range(0,16):
                  =
             sum sum      + A[row][i] * B[i][col]
          return sum




School of Computing                                              49
                      ASIC Accelerator Design: Matrix Multiply




   def matrix_multiply(A, B, C): # C is the result matrix
     for i in range(0, 16):
        for j in range(0, 16):
            C[i][j] = inner_product(A, B, i, j)

   def inner_product(A, B, row, col):
     sum = 0.0
     for i in range(0,16):
         sum = sum + A[row][i] * B[i][col]
     return sum



School of Computing                                              50
                      How can we generalize ?
            Decompose loop into:
              – Control pattern
              – Access pattern
              – Compute pattern
              Programmable h/w acceleration for each pattern




School of Computing                                            51
                      Architecture Family




School of Computing                         52
   Experimental Method
            Measure processor power on
              –   2.4 GHz Pentium 4, 0.13u process
              –   400 MHz XScale, 0.18u process
            Perception Processor
              –   1 GHz, 0.13u process (Berkeley Predictive Tech Model)
              –   Verilog, MCL HDLs
              –   Synthesized using Synopsys Design Compiler
              –   Fanout based heuristic wire loads
              –   Spice (Nanosim) simulation yields current waveform
              –   Numerical integration to calculate energy
            ASICs in 0.25u process
            Normalize 0.18u, 0.25u energy and delay numbers
              –   model = constant field scaling




School of Computing                                                       53
                      Benchmarks
            Visual feature recognition
              –   Erode, Dilate: Image segmentation operators
              –   Fleshtone: NCC flesh tone detector
              –   Viola, Rowley: Face detectors
            Speech recognition
              –   HMM: 5 state Hidden Markov Model
              –   GAU: 39 element, 8 mixture Gaussian
            DSP
              –   FFT: 128 point, complex to complex, floating point
              –   FIR: 32 tap, integer
            Encryption
              –   Rijndael: 128 bit key, 576 byte packets




School of Computing                                                    54
                      Results: IPC

   Mean IPC =
   3.3x R14K




School of Computing                  55
                      Results: Throughput

   Mean
   Throughput =
   1.75x Pentium
   0.41x ASIC




School of Computing                         56
                      Results: Energy
  Mean
  Energy/packet =
  7.4% of XScale
  5x of ASIC




School of Computing                     57
              Results: Energy Delay Product
  Mean EDP =
  159x XScale
  1/12 of ASIC




School of Computing                           58
                      Perception Results: Summary
            41% of ASIC’s performance
            But programmable!
            1.75 times the Pentium 4’s throughput
            But 7.4% of the energy of an XScale!
             advanced perceptive embedded systems are
             possible
              –   above results are maximally pessimistic
              –   and as always there are improvements in the works
            Problems
              –   manually intensive design process
              –   requires highly skilled programmer, architect, circuit
                  designer
              –   current effort is to fix this




School of Computing                                                        59
    Automating the design process
                             Application Suite

                                       C

         Host Code               Splitter                          opt. Stream Code

      C & ifc                Human     Interaction
                                                                       Stream
          Host                                                         Compiler
          Compiler

                                                       CoProcessor     CoProcessor    CoProcessor
                                                       Description     Simulator      Object Code
           Host
         Object Code          design
                              choice             dilation
                                                                     Simulation Analysis
                                                                              &
                                    Design Track                    Design Space Explore
                Synthesize             Graph           add point




School of Computing                                                                                 60
    DSE Results
                Power                             Performance Requirement


                            No Way Quadrant
                                                                Too “Watty”
                                                                 Quadrant
                        Power Limit




                            Too Dweeby Quadrant       Choice Quadrant




                                                                    Performance




School of Computing                                                           61
                      Conclusions
            Significant benefit
              –   3 forms of parallelism: control, address, execution
              –   program controlled communication patterns
                  » able to mimic ASIC flows
                  » more efficient use of execution units and memory
                      structures
            Results to date (in terms of ed)
              –   2-3 orders of magnitude improvement over GPP
              –   within 1 order of magnitude of an ASIC
              –   while maintaining most of the generality of the GPP
                  approach




School of Computing                                                     62
                       Thanks!
                      Questions?




School of Computing                63

								
To top