qualifying final by F05fLA

VIEWS: 0 PAGES: 72

									Memory Oriented System-level Optimizations
for Scripting Enabled Embedded Systems




Jiwon Hahn

PhD Qualifying Exam
University of California, Irvine
March 2006
Motivation
▶ Embedded system development
 Growing challenges
       Increasing end-user’s expectation
                More functionality                    motion sensing
                                                                        structural health preterm infant
                                       physiological
                Higher performance         sensing
                                                                        monitoring        monitoring

                Cheaper
                Smaller    eco node

       Very short time-to-market
       Wide gap between available techniques and user
        satisfaction
 Need new tools and methodology!

Jiwon Hahn, UC Irvine                                                                                2
Strategies

 Speed up the development!
       Need better programming/debugging methodology
        and tool
 Improve the current system’s bottleneck!
       Memory unit is one of the most costly components,
        and affects system’s performance, power, and
        overall application range
 Maximize the system’s capability!
       Since embedded system is resource constrained, it
        helps to partition the system workload to the host

Jiwon Hahn, UC Irvine                                        3
About My Research

 Framework
       Enhanced programming/debugging methodology
       Host-assisting runtime environment
 Optimization
       Reducing data memory requirements and
        increasing memory utilization
       Power and performance co-optimization




Jiwon Hahn, UC Irvine                                4
Outline

    Scripting Framework
    Memory-oriented Optimization
    Implementation
    Experimental Platforms
    Summary & Research Plan




Jiwon Hahn, UC Irvine               5
Outline

▶ Scripting Framework
      ⊳ Scripting Engine Synthesis
      ⊳ Runtime Environment
      ⊳ Preliminary Results
    Memory-oriented Optimization
    Implementation
    Experimental Platforms
    Summary & Research Plan


Jiwon Hahn, UC Irvine                6
Motivating Example
▶ Building a small embedded system
 Application                                            Hardware
       temperature sensor                                  Solder RF module
              sense temperature,
              send to the host every 5 min.             Software (or Firmware)
                                                            no OS support!
 Platform                                                  no interactivity
        TecO particle                                      no partial testing
                 17 x 35 mm
                 PIC18LF452 at 20 MHz                   1. Write the FW (C/assembly)
                 32KB program Flash                     2. Compile
                 1.5KB RAM                              3. Connect board to the host
                                               repeat
                 32KB external EEPROM                   4. Enter the bootloading mode
                 temperature sensor
                                                         5. Erase/Load/Verify Program
                 RF interface
                 Etc.                                   6. Restart the board
                                                         7. Run

Jiwon Hahn, UC Irvine                                                                    7
Motivation
▶ Alternative approach: Scripting!
 Environment Setup                          Scripting
 1. Generate the FW                          1. Write the script




                                   repeat
    (Scripting engine synthesis)
                                             2. Connect board to the host
 2. Compile
                                             3. Load & Run
 3. Connect board to the host
 4. Enter the bootloading mode
 5. Erase/Load/Verify Program
 6. Restart the board
 7. Run



    Scripting Engine Synthesis     +                  Runtime



Jiwon Hahn, UC Irvine                                                       8
Motivation
▶ Scripting vs. Traditional Programming

 Aspects                Traditional                   Scripting
 Language               C, Assembly                   Python, Tcl, Perl, …
                        less human readable           higher level
 System Query           No interactivity              Instant feedback
                        need oscilloscope, multimeter
                        to check the status
 System Update Recompile, reboot required             On-the-fly

 Code Size              5x~ 10x more lines            Shorter
                        [J. Ousterhout ’98]
 Performance            None                          Scripting engine-
 Overhead                                             dependant
                                                      (could be None or less)

Jiwon Hahn, UC Irvine                                                           9
Related Work
▶ Frameworks for runtime support

 Name                   high level            interacti   reconfigu   kernel       hetero.   code size
                        (language)            vity        rability    synthesis    sys.

 SOS                    no (C)                   no         yes*         yes         yes       20K

 Mate                   no (asm-like)            no         yes*          no          no       39K

 TinyOS                 no (nesC)                no          yes         yes*         no       18K

 Agilla                 no (asm-like)            yes        yes*          no          no       55K

 Pushpin                no (C-subset)            no         yes*          no          no       34K
                                                                      (berthaOS)

 Sensorware             yes* (Tcl)               yes        yes*          no          no      >237K

 Actornet               yes* (S-expression)      N/A         yes          no          no      <128K


 VM*                    yes (java)               no         yes*         yes         N/A       25K

 Our work               yes (python-like)        yes         yes         yes         yes       <17K



Jiwon Hahn, UC Irvine                                                                                    10
Our Framework: Rappit
▶ Overview

                  Host                           Target System
               Rappit S/W
                                                                   Application
                                                                     Script
                                           Receive packets
                                             #include <stdio.h>
                                             void main(void)




                     Wired/Wireless link           Rappit F/W
                                             {




                                           Interpret the command
                                               int a;




   >> readTemperature()
                                                .
                                                .
                                                For(i=0;i<2;i++)
                                                {
                                                    .
                                                    a =b * c;
                                                  }
                                                  .




                                           Execute primitives
                                                  .
                                                return;




   52
                                             }




                                                  Device Drivers
                                           (e.g., ADC read)
                                           Return the result
                                                                   H/W Device


    Framework to provide user an integrated scripting environment
                   of the host and target systems


Jiwon Hahn, UC Irvine                                                            11
Rappit
▶ Scripting engine synthesis
                                              System Description
                            Architecture          Application         Communication



                                  // part of Scripting engine
                                  switch (opcode)
                                  { # example: pin mapping for an RF module // part of primitives
                                                       Code # instantiate an atmega169 MCU
                                                                           Component
                                          = MCU(ATmega169)
                                     mcucase 0x00:
                                             RF      Synthesis                   char
                                                                             Library ADC_read(void)
                                     import val = ADC_read(); # load a transceiver module
                                                                                 {
                                         = RF(nRF2401)
                                     rf case 0x01:             # instantiate nRF2401
                                                                                      …
                                              mcu.PORTB[0]
                                     rf.CS =RF_send(val);      # connect the chip select pin
                                                                                 }
                                        case mcu.PORTB[1]
                                     rf.CE = 0x02:             # connect the chip enable pin
                                             = mcu.PORTB[2]    #
                                     rf.DR1 RF_packetize(val); connect the data ready pin
                                                                                        Binary
                                                                                 void RF_send(char pck)
                    Interactive      rf.CLK1 = mcu.PORTF[1] # connect the clock pin
                                        …                                        {
                                                  Compatible
                                  } rf.DOUT1 = mcu.PORTF[2] # connect theF/W pin
                                                                                      Executable
                     Language      Host S/W                          Target data …
                                   (Parser,     Message format        (Scripting }
                                     # example: packet format
                                                                        Engine,
                                   MsgGen, = src(1),dst(1),msgID(1),opcode(1),arg(3),crc(1)
                                     c_format
                                     r_format = src(1),dst(1),msgID(1),mtype(1),dtype(1),\ Target
                                    GUI, …)                         Primitives,…)
                                                                       data(v), crc(1),eop(1)
                                                                                              System
       Host


Jiwon Hahn, UC Irvine                                                                                     12
Rappit
▶ Runtime environment

                       Host                                               Target System
                         Optimizer
                         Optimizer




                                                                      Depcktzer
                                         Generator




                                                                       Pcktzer/
                                                         Dispatcher
                                                                                             Scripting




                                                          Pcktzer/
              Parser




                                           Msg




                                                                                  Buffer
GUI                                                                                           Engine




                                                                                   Pck
                                                                        Admission            Native
              Component                               Packet                                Routines
                                                                        Controller
                Library                              Manager


                                                                                           command
                                     Host Assisting modules                                response

Jiwon Hahn, UC Irvine                                                                                    13
Rappit
▶ Host assistance
 Script Parsing (Parser)
                                                                 To target node
      “readTemp()”           Host Parser,         “0x4A0x01”
                            Msg. generator
     • User friendly                          • Easy to parse at node
        Syntax                                • Compact and efficient
                                               representation

 Memory Management (Optimizer)
                                                                     To target node
                        Script Scheduler,
   Raw script                                     Optimized script
                         Buffer Mapper
                                             • Minimal script size
  • Written by user
                                             • Minimized memory usage
                                             • Minimized runtime overhead
                                              (Fixed schedule and buffer usage)
Jiwon Hahn, UC Irvine                                                                 14
Rappit
▶ Scripting examples
 Interactive port-setting                          System configuration
      >>   PORTA[2]     =   1 # toggle clock          >> mcu.sysclock = 1 MHz
      >>   PORTA[2]     =   0                         >> uart.baudrate = 9600 bps
      >>   PORTA[1]     =   1 # set port A pin 1      >> rf.power = -5 db
      >>   PORTA[0]     #   read input pin            >> rf.speed = 1 Mbps
      0                                               >> rf.config # query
      >>   PORTA[2] = 1                               {’payload’: 1, ’power’: -5,
      >>   PORTA[2] = 0 # toggle clock                 ’speed’: 1000000,
      >>   PORTA[0] # read input pin                  ’channel’:100, ’mode’: TX’}
      1


                                                    Periodic-task scheduling
                                                      >> s = (every 50 ms: sample())
                                                      >> s.start()
                                                      >> s.stop()




Jiwon Hahn, UC Irvine                                                                  15
Rappit
▶ Experimental platform
 AVR Butterfly Board
       Atmel ATmega169
       8-bit MCU @ 8MHz, 512B
        EEPROM, 1KB SRAM,
        16KB program flash
       Includes dataflash,
        speaker, sensors, joystick, LCD
       USART serial link at 9600 baud




        AVR Butterfly                     AVR Butterfly w/ Wireless module

Jiwon Hahn, UC Irvine                                                        16
Rappit
▶ Experimenting metrics and modality
 Observation Metrics
      Metric                  Unit
      Code size               Bytes
      Execution Speed         Cmds/sec


 Execution Modality
      Modality          Approach      Programming Method
      Native            Compiled      Program the firmware onto the Flash
      Batch             Scripting     Preload a script program onto the RAM
      Interactive       Scripting     Send one line of command to the RAM


Jiwon Hahn, UC Irvine                                                         17
Rappit
▶ Preliminary results




 Code size reduction                       Performance overhead
       61.8 – 66.3% reduction                 Batch mode scripting can be
       Scripting engine consists a thin        faster than native!
        layer                                  Observed up to 25.7%
       Most reduction in application           speed-up
        code size

Jiwon Hahn, UC Irvine                                                         18
Outline

 Scripting Framework
▶ Memory-oriented Optimization
      ⊳ Memory Optimization
      ⊳ Multi-metric Optimization
 Implementation
 Experimental Platforms
 Summary & Research Plan



Jiwon Hahn, UC Irvine               19
Motivating Example
▶ Installing Rappit primitives on Butterfly
 Problem Arise                          Problem Analysis
       Choose primitives                                          SD_buffer
                                                  .data                          512B
              ADC_read, RF_send,
               RF_read, SD_write,               .bss
               SD_read, …                     static unsigned char sd_buffer[512];
                                    1KB         heap                 RF_buffer
                                              static unsigned char rf_buffer[30];
       Compile & Install               char error_msg1 = “No SD Card detected!”;
                                              static unsigned char ADC_buffer[30];
                                                                    ADC_buffer
                                        char error_msg2 = “Card Read Error!”;
       Runtime Error!                        …                    Static strings
                                        …
       Why?
              exceeded 1KB RAM usage             stack
                                                 SRAM
 Solution
        Sharing memory space                Memory Sharing

        Mapping static data to              Map to dataflash
                                                                                 600B ?
                                                                 Shared_buffer
         dataflash                     1KB
                                                  heap

 Result
        Increased board capability
        Increased application range              stack
                                                 SRAM
Jiwon Hahn, UC Irvine                                                                20
Data Memory Minimization
▶ Assumptions and Approach
 Assumptions
       Optimizing scripts
              script size  buffer size
       Optimizing at runtime
              Need low complexity algorithm
 Approach
          High-level optimization
          Using scheduling and buffer mapping techniques
          Priority on data memory minimization
          Based on model of computation (MoC)
Jiwon Hahn, UC Irvine                                       21
Models of Computation (MoC)

 Synchronous Dataflow (SDF) [E. Lee ’87]
       Extensively used as specification for block-
        diagram based programming environments for
        signal processing
       Special case of dataflow
              No notion of time
              The number of tokens (=data) consumed and produced
               by each actor (=node) during each firing (=invocation)
               cycle is statically fixed.
 Fractional Rate Dataflow (FRDF) [H. Oh, S. Ha ’02]
       Extension of SDF that allows fractional flow of I/O
        samples of the original SDF
Jiwon Hahn, UC Irvine                                                   22
Why SDF?

 Formal representation for optimization, simulation
  and analysis
 System-level optimization
       Application flow of various primitives
 Static scheduling
       Minimize runtime overhead for resource constrained
        embedded systems
       Deadlock detection
       Bounding the memory requirements
 Good match for sensor applications
       collect data, process, transmit

Jiwon Hahn, UC Irvine                                        23
SDF
▶ Notations
 SDF graph G = (V, E, p, c)
                                              e1              e2                e3           e|E|
       V: {v1, v2, … v|V|}                                                          …

                                     v1                v2                 v3                            v|V|
       E: {e1, e2, … e|E|}               1        2          2       1          3       …          5
                src(e) : source node
                snk(e): sink node
                p(e) : produce rate src(e1) p(e1) c(e1) snk(e1)
                -c(e) : consume rate              v1 v2 v3 … v|V|
       T(e,v): topology matrix                        e1         1   -2        0 … 0
              p(e) if v = src(e),                     e2         0    2       -1 … 0
                                          T=           e3         0    0        3 …
              -c(e) if v = snk(e)                     …                          …
              0 otherwise                             e|E|       0       0     0 … -5

Jiwon Hahn, UC Irvine                                                                                   24
SDF
▶ Example
 Surge Application
                         A                 B                    C
                                   x                    y
                        ADC                 RF                   RF
                        read               pack                 send
                               1       1            1       1

          Actors: A, B, C
          Buffers: x, y
          Schedule: ABC                          every 2048:
          Rappit Script (4L):                        x = ADC.read()
                                                      y = RF.pack(x)
                                                      RF.send(y)



Jiwon Hahn, UC Irvine                                                  25
SDF
▶ Example (cont’d)
       Same code in Java (20L) [J. Koshy ’05]:
             SurgePacket sgPkt;
             char eList, eVector;
             byte sHandle;
             sgPkt = new SurgePacket();
             evList = Select.setEventId( eList, Events.TIMEOUT | Events.RADIO RECV );
             sHandle = Select.requestSelectHandle();
             char val;
             Clock.startTimeout( 2048 );
             while (true) {
                 eVector = Select.select(sHandle, eList);
                 if (Select.eventOccurred( eVector, Events.TIMEOUT )) {
                     val = PhotoSensor.sense();
                     sgPkt.setReading( val );
                     Surge.sendPacket( sgPkt );
                     Clock.startTimeout( 2048 );
                 }
                 else if (Select.eventOccurred( eVector, Events.RADIO RECV)) {
                     handleRadioEvent( sgPkt ); // if base, forward to uart
                 }
             }



Jiwon Hahn, UC Irvine                                                                   26
Problem Statements

1. Find the best schedule and buffer mapping
   that minimizes the buffer size requirement
            Goal-oriented
            Previous work
2. Find the best schedule and buffer mapping
   that fits into, and maximizes the utilization of
   a given memory size
            Constraint-driven
            Novel
            Practical
Jiwon Hahn, UC Irvine                                 27
Buffer Mapping Problem
▶ Spatial representation
 Token-lifetime chart (t-chart)
       row: token’s lifetime, produced  placed  consumed
       column: fixed number of token changes caused by firing event
local
buffer
                 t2    t2    t2 
     x
                  t1    t1 


                                 t4      t4       t4 
     y
                         t3     t3      t3 
                                                               time
                   A      B       B        C         C
Jiwon Hahn, UC Irvine                                                  28
Buffer Mapping Problem
▶ Spatial representation (cont’d)
 Memory-usage profile (m-profile)
memory




                                                                       time
                    A    B           B          C         C
 Metrics
       Msize = 4, Mtotal = 20, Mused = 11, Mwasted = 9, Mutil = 55%
       T=5

Jiwon Hahn, UC Irvine                                                         29
Related Work
▶ Data memory optimization based on MoC

 Technique              Group                             Idea
 Optimal Scheduling     [Bhattacharyya et al] in          Buffer minimized by optimal
                        Ptolemy Group                     scheduling, optimize each local
                                                          buffer
 Buffer sharing by      [Bhattacharyya et al] in          Local buffer lifetime is analyzed to
 lifetime analysis      Ptolemy Group, [Ha et al] in      share global buffers
                        PeaCE group, [Ritz et al] in
                        Meyr Group
 Buffer merging         [Bhattacharyya et al] in          Input/output buffer is shared (finer
                        Ptolemy Group                     grain than buffer sharing)
 Model checking         [Geilan et al] in Eindhoven       Reduced the problem to a model-
                        Univ.                             checking problem on the state-space
                                                          of SDF graph
 Etc. (MBRO, PAPS,      [Govindarajan et al] in Gao       Rate-optimal / Vectorization/
 MRSP, …)               Group, [Peperstraete et al],      Application to real-time systems / etc
                        [Goddard et al], [Ade et al] in
                        GRAPE group

Jiwon Hahn, UC Irvine                                                                              30
Memory Optimization Techniques

1) *Scheduling w/ Unshared Buffer
2) *Buffer Sharing
3) *I/O Buffer Merging
4a) **Fractionizing
4b) Rate Selection (new)
5) Pipelining (new)

* Well established previous work
** Recently proposed

Jiwon Hahn, UC Irvine               31
 Memory Optimization Techniques
 ▶ 1) Scheduling with unshared buffer
                                      x                   y
                             A      2 1        B      1       1
                                                                  C
      Schedule 1: A B B C C                               Schedule 2: A B C B C
x = A()                  x[0..1] = A()             x = A()                x[0..1] = A()
repeat 2:                y[0] = B(x[0])            repeat 2:              y[0] = B(x[0])
    y = B(x)             y[1] = B(x[1])                y = B(x)           C(y[0])
repeat 2:                C(y[0])                       C(y)               y[0] = B(x[1])
    C(y)                 C(y[1])                                          C(y[0])

        Buffer requirement:                                   Buffer requirement:
        |x| + |y| = 2 + 2 = 4                                 |a| + |b| = 2 + 1 = 3
     By efficient ordering of actors, buffer requirement is reduced!
     Each edge is directly mapped to its dedicated buffer space

 Jiwon Hahn, UC Irvine                                                                32
Memory Optimization Techniques
▶ Comparing 1), 2), 3)
                                              x = A() Assuming the
                        x           y
              2 1
                  A         B              C
                                 Use the same
                            1 1 space for the
                                              repeat 2: token is
                                                  y = B(x)
                                                       consumed
                                  input/output       2:
                                              repeat before output is
                      AB
           Schedule: the B C C
               Reuse                              C(y) produced…
                                   Data
                                     tokens
               available       consumed…
 x[0..1] = A() space!x[0..1] = A()           x[0..1] = A()
 y[0] = B(x[0])        y[0] = B(x[0])        x[0] = B(x[0])
 y[1] = B(x[1])        x[0] = B(x[1])        x[1] = B(x[1])
 C(y[0])               C(y[0])               C(x[0])
 C(y[1])               C(x[0])               C(x[1])

1) Unshared Buffer          2) Shared Buffer        3) Merged I/O Buffer
Buffer requirement:         Buffer requirement:      Buffer requirement:
|x| + |y| = 2 + 2 = 4       |x| + |y| = 2 + 1 = 3    |x| + |y| = 2 + 0 = 2
Jiwon Hahn, UC Irvine                                                        33
  Memory Optimization Techniques
  ▶ Comparing 1), 2), 3) (cont’d)
1) Unshared Buffer                    2) Shared Buffer                  3) Merged I/O Buffer

                            |x|+|y|     :   4            3            2
                            Mtotal      :   20           15           10
                            Mused       :   11           11           9
   local
                            Mwasted    :    9            4            1
   buffer
                            Mutil       :   55%          73%          90%

                    t2      t2             t2 
         x
                     t1       t1 
                                            t2 t4t4          t4          t4 
        y                                      
                            t1 t3 t3 
                                
                                              t3              t3 
                                                                                       time
                      A        B               B               C            C
  Jiwon Hahn, UC Irvine                                                                        34
Memory Optimization Techniques
▶ 4a) Fractionizing
 Idea: w                        x                           w          x
                          A              B                        A’           B
                  1             3 1                        1/3      1 1
                        Schedule: A 3(B)                     Schedule: 2(AB)
       Don’t wait until A produces big chunk of data
       Modify actor A to process only fractional amount of the
        original data at a time
 Trade-off
       Local effect
              Possible time and energy overhead
                       e.g., resource’s access time, packet overhead
       Global effect
              Reduced bottleneck: shorter processing interval of A
              Reduced buffer size: min|x|: 2  1

Jiwon Hahn, UC Irvine                                                              35
Memory Optimization Techniques
▶ 4b) Rate Selection
 Idea                             w             x             Schedule1: 2(A)B
                                         A                 B   Schedule2: AB
       Generalize fractionizing (1,3)       (2,6) (4,4)
                                                               Schedule3: 2(A)3(B)
       Not only allow fractions but also multiples
       Rate is defined as range, but fixed before schedule finalizes
       Each actor is modeled with timing and power function with
        respect to the I/O range
 Benefits
       Combines the power of flexibility and static determinism
       Increases buffer reduction opportunity
 Challenge
       Need an efficient way to handle considerably increased
        exploration space at runtime
Jiwon Hahn, UC Irvine                                                            36
Memory Optimization Techniques
▶ 5) Pipelining
 Idea
       Allow multiple actor firing at once
 Benefits
       Reduced buffer requirement
       Higher memory utilization
       Increased throughput
 Challenges
       Need multiprocessors
       Need to resolve resource conflict
       Need to consider synchronization problem

Jiwon Hahn, UC Irvine                              37
  Memory Optimization Techniques
  ▶ Comparing 1), 4), 5)
                                                               x                   y
1) Unshared Buffer
                                              1
                                                    A                      B               C
                                                           2       1           1       1

                 t2       t2      t2          t2 
      x                   Buffer Size: 33% reduction
                 t1        t1 
                          Utilization: 66.7%  100%
     y                      t3        t3          t4                t4 
                  A                   
                            B Time: 5 C 4 firing unit
                                                 B                     C
                                                               x                   y
4) Fractionized / Rate Selected
5) Pipelined
                                              1/2
                                                    A’     1       1
                                                                           B   1       1
                                                                                           C
      x           t1       t1                     t3                t3 

      y           t4       t2       t2                             t4         t4 
               CA           B           C           A                  B               C
  Jiwon Hahn, UC Irvine                                                                    38
  Memory Optimization Techniques
  ▶ Summary
                           0      1         1+2       1+2+3    1+4     1+2+4     1+2+3+4     1+4+5
         M_size            4      3          3             2    2        2           1            2

        M_used             11    10          10            9    8        8           6            8

      M_wasted             9      5          5             1    4        4           0            0

                T          5      5          5             5    6        6           6            4

   M_utilization        55%     66.7%       66.7%       90%    66.7%   66.7%      100%         100%


                    0: None (baseline) 1: Unshared Scheduling 2: Shared Buffer
                    3: Merged I/O      4: Fractionized        5: Pipelined

                    t1           t1               t3         t3 
global              t1 
                     4         t1t2t 
                                    2              t2         t3 
                                                                 4      t3  t4          t4 
                    A             B                 C            A           B             C
  Jiwon Hahn, UC Irvine                                                                               39
Multi-metric Optimization

 Trade-offs
       In actor point of view (local),
        processing large amount of
        data at once tends to reduce
        time and energy overhead                                 Energy
       In SDF-flow point of view
        (global), processing small          Data                Execution
        amount of data at once             Memory                 Time
        reduces buffer requirement
 Goal
       Find a pareto-optimal point that            data-flow
        resides in a range of solution                rate

        set that satisfies constraints



Jiwon Hahn, UC Irvine                                                       40
 Applying it to Rappit
 ▶ Quasi-static optimization
                           Rappit Flow            Performed Tasks

 Host
Compile-time                 Compile            Kernel and primitives
                                                compiled and installed


                           Load script               SDF defined
                                                                             Optimization
                                            Actor-to-processor assignment,
                           Preprocess       Actor ordering (scheduling),
Run-time                                    Buffer mapping


Target                   Load script code       Static schedule loaded


                                               Deterministic execution
                            Execute            w/o runtime overhead

 Jiwon Hahn, UC Irvine                                                               41
Outline

 Scripting Framework
 Memory-oriented Optimization
▶ Implementation
      ⊳ Synthesis Tool
      ⊳ Simulator
      ⊳ Runtime Host-assisting Tool (GUI)
 Experimental Platforms
 Summary & Research Plan


Jiwon Hahn, UC Irvine                       42
Implementation
▶ Scripting engine synthesis tool
 System Template
       GUI-based check-box approach
       easily capture existing systems
       model new systems for simulation and design
        space exploration
       includes communication description
 Component Library
       binds according to template configuration
       consists of MCU, on-chip devices, off-chip
        peripherals
       each component has I/O pins and driver modules
Jiwon Hahn, UC Irvine                                    43
Implementation
▶ Memory simulator




Jiwon Hahn, UC Irvine   44
Implementation
▶ Interactive runtime tool




Jiwon Hahn, UC Irvine        45
Implementation
▶ Tool integration

                                                        Node 1

                               Parser
                                          Dispatcher     Node 2

                        GUI   Scheduler

                                                          Node 3
                                           Node
                              Memory      Manager
                              Optimizer

                                                       Node N



Jiwon Hahn, UC Irvine                                              46
Outline

    Scripting Framework
    Memory-oriented Optimization
    Implementation
▶    Experimental Platforms
    Summary & Research Plan




Jiwon Hahn, UC Irvine               47
HW Platforms and Real-world Applications

 Eco
       ultra-compact sensor node
       pre-term infant monitoring
       dancing motion detection
 Mini-FDPM
       active laser sensing device
       breast cancer detection
 DuraNode
       real-time data acquisition system
       structural health monitoring
 Butterfly
       low-power, i/o rich development board
       prototyping (SD-card, speaker, sensors, RF)

Jiwon Hahn, UC Irvine                                 48
Outline

    Scripting Framework
    Memory-oriented Optimization
    Implementation
    Experimental Platforms
▶    Summary & Research Plan




Jiwon Hahn, UC Irvine               49
Summary

 A novel scripting framework for embedded
  systems
       Scripting engine synthesis
       Host assisting runtime environment
 Memory optimization techniques
       Comparison of techniques
       Integration and multi-objective problem
 Tool Implementations
       Rappit GUI, memory simulator

Jiwon Hahn, UC Irvine                             50
Contributions

 Empowered Embedded Systems
       Unleashing the severely constrained embedded
        systems
 SDF Extensions
       Extension of SDF model
       Extending the application area of SDF
 Memory Savings
       Reduced memory requirement by integration of
        policies, including new techniques


Jiwon Hahn, UC Irvine                                  51
Research Plan
▶ finished, ongoing, future work
 Framework                             Optimization
       Language definition*               Survey and comparison
       Initial implementation and         Simulator implementation
        prototyping                        Integrating techniques
       Component library                  SDF extension on rate
        generation*                        Rate-selection algorithm
       Code generation                    Buffer-mapping protocol
       Overhead analysis                  Cost function modeling of
       Tool integration                    multi-metric optimization
       Test on multinode scenario         SDF extension on timing
                                        Case Study
                                             AVR butterfly
                                             mini-FDPM
                                             eco
       *with Qiang Xie & Jinfeng Liu
                                             DuraNode

Jiwon Hahn, UC Irvine                                                   52
Publications

 Jiwon Hahn, Qiang Xie, and Pai H. Chou, Rappit: A
  Framework for the Synthesis of Host-Assisted Light-
  Weight Scripting Engines for Adaptive Embedded
  Systems, in Proc. International Conference on
  Hardware Software Codesign and System Synthesis
  (CODES+ISSS), 2005.
 Jiwon Hahn, Dexin Li, Qiang Xie, Pai H. Chou, Nader
  Bagherzadeh, David W. Jensen, Alan C. Tribble, Power
  Reduction in JTRS Radios with ImpacctPro," in Proc.
  IEEE Military Communication Conference (MILCOM),
  2004.


Jiwon Hahn, UC Irvine                                    53
Bibliography
    Murthy PK, Shuvra S. Bhattacharyya, Buffer merging - a powerful technique for reducing
     memory requirements of synchronous dataflow specifications. ACM Transactions on Design
     Automation of Electronic Systems (TODAES), 2004.
    Murthy PK, Shuvra S. Bhattacharyya, Shared buffer implementations of signal processing
     systems using lifetime analysis techniques, IEEE Transactions on Computer-Aided Design of
     Integrated Circuits & Systems (TCADICS), 2001.
    Shuvra S. Bhattacharyya., Murthy PK, Edward A. Lee, APGAN and RPMC: Complementary
     Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, Design
     Automation for Embedded Systems (DAES), 1997
    Shuvra S. Bhattacharyya, Murthy PK, Edward A. Lee, Joint Minimization of Code and Data for
     Synchronous Dataflow Programs, 1997.
    Hyunok Oh, Soonhoi Ha, Fractional rate dataflow model and efficient code synthesis for
     multimedia applications, SIGPLAN Not, 2002.
    Hyunok Oh, Soonhoi Ha, Data memory minimization by sharing large size buffers, Asia and
     South Pacific Design Automation Conference (ASPDAC), 2000.
    Hyunok Oh, Soonhoi Ha, Efficient Code synthesis from extended dataflow graphs for multimedia
     applications, Design Automation Conference (DAC), 2002.
    Geilen M, Basten T, Stuijk S, Minimising buffer requirements of synchronous dataflow graphs
     with model checking, 42nd Design Automation Conference (DAC), 2005.
    Eckart Zitzler and Jurgen Teich and Shuvra S. Bhattacharyya, Multidimensional Exploration of
     Software Implementations for DSP Algorithms, Journal of VLSI Signal Processing (JVLSI), 1999
    John K. Ousterhout, Scripting: Higher Level Programming for the 21st Century, IEEE Computer
     magazine, 1998
    TecO Home, http://particle.teco.edu/
Jiwon Hahn, UC Irvine                                                                            54
Acknowledgements

 This work is sponsored in part by the National
  Science Foundation grant CCR-0205712 and
  NSF CAREER Award CNS-0448668
 Professor Pai Chou
 Qiang Xie
 Jinfeng Liu




Jiwon Hahn, UC Irvine                              55
Backup Slides




Jiwon Hahn, UC Irvine   56
Scripting Overhead

 Scripting for General Purpose Computers
       Assume unlimited resources
       Full feature scripting engine for convenience
       Slower than system programming language
 Scripting for Embedded Systems
       Limited memory, CPU, power, …
       Need scripting engine optimization
                Host assist
                Language subsetting
                Library subsetting
                Efficient memory usage
       Scripting may be even faster than compiled code!

Jiwon Hahn, UC Irvine                                      57
Rappit
▶ Packet format example
 Command Packet Format
   Dst.           Msg ID         Opcode             Input[3]         Output[3]      CRC


                                        Command Message Format

 Opcode           In_addr    In_start     In_size     Out_addr      Out_start    Out_size



 Response Packet Format
  Src.        Msg ID        Msg Type        Data Type          Payload     CRC      EOP


                                 Response Message Format

Jiwon Hahn, UC Irvine                                                                       58
 Rappit
 ▶ Scripting engine optimization in code synthesis

  Language subsetting
        eg., assignment (=), loop (repeat)
  Library subsetting
        customized for target applications and platform
  Full-Featured
                           MCU
Component Library
                                                        Interrupts RF
     RF       SPI UART GPIO Interrupts ADC
                                                        GPIO    UART

Dataflash            LCD   Joystick   Sensor1 Sensor2   ADC    Sensor1


 Jiwon Hahn, UC Irvine                                                   59
Memory Organizations
▶ Comparing previous work and Rappit
 Previous approaches consider both data and code memory
  minimization, but prioritize code size*
 We mainly focus on data size** minimization
                        On-chip Flash
                        or EEPROM
                                                      On-chip Flash
                                                      or EEPROM       Data Flash
                                           RAM
  RAM                    Application
                                        Script Code    Primitives
                           Code*

  Buffer                                Buffer **        Rappit
                                                         Kernel
      Previous work                                   Our work

Jiwon Hahn, UC Irvine                                                         60
Rappit
▶ Code size of runtime components
  Host Code (.py)       Lines   Size (KB)   MCU Code (.c)   Lines   Size
  GUI                   644     21.8                                (KB)

  Cmd                   127     2.87        Interpreter     260       -

  Parser &              221     4.97        Primitives      90        -
  Msg Generator
                                            Packetizer &    300       -
  Library               263     6.396
                                            Depacketizer
  Packetizer &          82      2.0
                                            Total           750     1.484
  Depacketizer
  Packet Mgr            42      0.92
  Total                 1379    38.96




Jiwon Hahn, UC Irvine                                                       61
Rappit
▶ Summary of results
 Code size reduction
           Application         Native             Rappit       Reduction
           Reg setting     4.356 KB            1.664 KB         61.8%
            LCD usage      12.45 KB             4.2 KB          66.3%

     Performance overhead components analysis
                         Native     Interactive        Batch
       Communication       1             3                 1
       RAM Access          3             1                 1
       ROM Access          3             1                 1       1: fast
       Packetization       1             2                 2       2: tolerable
       Interpretation      1             2                 2       3: slow
       Total cmd/sec      92            4.75           111         (bottleneck)

Jiwon Hahn, UC Irvine                                                             62
Rappit
▶ Subset of primitives

   Device         Primitive       Device    Primitive   Device     Primitive
   MCU            reset           GPIO      set pin     Timer      register fcn
   MCU            power save      GPIO      get pin     Timer      remove fcn
   MCU            initialize      GPIO      clear pin   RTC        set clock
   MCU            get sys clock   USART     TX          RTC        read clock
   MCU            set sys clock   USART     RX          LCD        clear
   RF             INIT            SD        read        LCD        write
   RF             set channel     SD        write       LCD        set contrast
   RF             set power       ADC       read        Joystick   get key
   RF             set frequency   Sensor1   read        Speaker    set volume
   RF             send            Sensor2   read        Speaker    play tone
   RF             receive         Sensor3   read        Speaker    play song




Jiwon Hahn, UC Irvine                                                             63
Rappit
▶ Language
       key                      Usage                                  Example
 import          import methods of each device           from RF import *

 doc, dict       look up documentation, included         RF.__doc__
                 methods                                 RF.__dict__
 open, close     open/close a connection to a target     node1 = open(MCU1, uart1)
                 system                                  node1.close()

 ls              list all connected instances            ls
 every,          schedule events with certain period     s1 = (every 30ms: a+=
                                                         ADC1.read()); s1.start();
 start, stop                                             s1.stop()
 repeat          looping                                 repeat 3:
                                                              SD.write(a)
 def             define of a function with a series of   def readTemperature():
                 methods                                      ...

 =, +            assign/configure or add value           a = SD.read(10); a+=SD.read(20)


Jiwon Hahn, UC Irvine                                                                      64
SDF
▶ Strength and limitations
 Strength
       Ability to express multi-rate systems, parallelism
       Deadlock detection and scheduling can be determined at
        compile-time
       Bounded memory requirements
       No runtime supervisory overhead
 Limitations
       Lack of conditional control flow
       Does not model asynchronous nodes
       Does not adequately address the real-time nature of
        connections to the outside world
       Does not address data-dependent run times
Jiwon Hahn, UC Irvine                                            65
Superset of SDF
▶ Dynamic dataflow (DDF)
 Allows asynchronous actors with non-fixed rate
  of each actor
 Captures dynamic constructs
          if/else
          for-loop
          do/while loop
          recursion




Jiwon Hahn, UC Irvine                              66
SDF
▶ Notations
 Firing & Tokens
          f(n) : nth firing vector
          tk(n) : number of live tokens after nth firing
          tk(n+1) = tk(n) + G · f(n)
          f = n=0T f(n) : firing frequency
          q = fmin : firing vector (minimum # of firings)
          q(src(ei)) x p(src(ei)) = q(snk(ei)) x c(snk(ei))  balance equation
 Consistent SDF
       rank (G) = |N|-1
       G·q=0
 Scheduling
       Given G, tk(0), and q, find a firing order which satisfies tk(n) >= 0,
        and q = n=0T f(n)
       Deadlocked if no node can be fired before reaching q = n=0T f(n)

Jiwon Hahn, UC Irvine                                                             67
SDF
▶ Our extensions
 SDF previously used in multimedia-oriented
  applications targeting DSPs and FPGAs
 To target more general types of applications,
  non-buffered edges (dummy channels) should
  be added, which only denotes precedence
 The produce/consume rate of each actor is
  not given as fixed, but as a range
 Add timing (future work)


Jiwon Hahn, UC Irvine                             68
SDF
▶ Another example
 Extended Surge Application
       A                     C             D            E                 F
    ADC            a         SD      d     SD     e     Kernel   f        RF
    read        1 10        store   1 3   read   10   1 pack 1       1   send
                        c
            b B
                LCD
                show

       Valid Schedules:
              30(A) 3(B) 3(C) D 10(E) 10(F)      – Flat SAS
              3 (10(A) BC) D 10(EF)              – SAS
              30(A) 2(BC) BCD 10(EF)             – Non SAS

Jiwon Hahn, UC Irvine                                                           69
SDF
▶ Another example (cont’d)
       Script (SAS)
                        enable Timer1, RF, SD, LCD
                        every 2048:
                           repeat 10:
                                 repeat 10:
                                       a = ADC.read()
                                 LCD.show(a)
                                 SD.store(a)
                           repeat 10:
                                 b = SD.read()
                                 repeat 3:
                                       c = Kernel.pack(b)
                                       RF.send(c)

Jiwon Hahn, UC Irvine                                       70
Script-to-SDF Transform

        User script
       x = A()
       repeat 2:             V = { A, B, C }
           y = B(x)          E = { x, y } = {eAB, eBC}
           C(y)
                             πinit = A2(BC)

     eAB p (A) = (2, 3)
             c (B) = (1,1)         x            y
     eBC p (B) = (1,1)       A             B             C
                                 2/3 1/1       1/1
             c (C) = (1,2)       1/2


Jiwon Hahn, UC Irvine                                        71
Multimetric Optimization
▶ Cost function modeling
 Constraints
       Energy
              Battery lifetime or other source of power budget
       Time
              Deadline in given real-time application
       Memory
              Given memory size for a platform
 Each node is modeled with:
       Pv(c,p): power consumption w.r.t. consume/produce rate (i.e.,
        input/output data size)
       Tv(c,p): execution delay w.r.t. consume/produce rate


Jiwon Hahn, UC Irvine                                                   72

								
To top