Fast Paths in Concurrent Programs by theregoesthatman

VIEWS: 8 PAGES: 22

									Fast Paths in Concurrent Programs



               Wen Xu, Princeton University
         Sanjeev Kumar, Intel Labs          .
                 Kai Li, Princeton University
         Concurrent Programs
               Message-Passing Style                                          Processor 1        Processor 1
                       Processes & Channels                                                 P1
                       E.g. Streaming Languages
               Uniprocessors
                       Programming Convenience                                         C1             C3
                        ─     Embedded devices                      Processor 2
                        ─     Network Software Stack
                                                                                  P2         P3
                        ─     Media Processing
               Multiprocessors
                       Exploit parallelism                                       C2
                       Partition Processes


                                Problem:                                          P4
               Compile a concurrent program
             to run efficiently on a Uniprocessor


Intel Labs & Princeton University          Fast Paths in Concurrent Programs                            2
         Compiling Concurrent Programs
            Process-based Approach                                   Automata-based Approach
                   Keep processes separate                                Treat each process as a
                   Context Switch between                                  state machine
                    the processes                                          Combine the state machines
            Small executable                                         Small Overhead
                   Sum of Processes                                  Large Executables
            Significant overhead                                          Potentially Exponential

            One Study Compared the two approaches and found:
                   Compared to Process-based approach, the Automata-based
                    Approach generates code that is
                    ─      Twice as fast
                    ─      2-3 Orders of magnitude larger executable
                   Neither approach is satisfactory
Intel Labs & Princeton University       Fast Paths in Concurrent Programs                             3
         Our Work
               Our Goal: Compile Concurrent Programs
                       Automated using a Compiler
                       Low Overhead
                       Small Executable Size



               Our Approach: Combine the two approaches
                       Use process-based approach to handle all cases
                       Use automata-based approach to speed up the
                        common cases



Intel Labs & Princeton University   Fast Paths in Concurrent Programs    4
         Outline
               Motivation
               Fast Paths
               Fast Paths in Concurrent Programs
               Experimental Evaluation
               Conclusions




Intel Labs & Princeton University   Fast Paths in Concurrent Programs   5
         Fast Paths
               Path: A dynamic execution path in the program
               Fast Path or Hot Path: Well-known technique
                       Commonly-executed Paths (Hot Path)
                       Specialize and Optimize (Fast Path)
               Two components
                       Predicate that specifies the fast path
                       Optimized code to execute the fast path
               Compilers can be used to automate it

               Mostly in sequential Programs

Intel Labs & Princeton University   Fast Paths in Concurrent Programs   6
         Manually implementing Fast Paths
               To achieve good performance in Concurrent
                programs
                       Start: Insert code that identifies the common case
                        and transfer control to fast path code
                       Extract and optimize fast path code manually
                       Finish: Patch up state and return control at the end
                        of fast path


               Obvious drawbacks
                       Difficult to implement correctly
                       Difficult to maintain

Intel Labs & Princeton University   Fast Paths in Concurrent Programs          7
         Outline
               Motivation
               Fast Paths
               Fast Paths in Concurrent Programs
               Experimental Evaluation
               Conclusions




Intel Labs & Princeton University   Fast Paths in Concurrent Programs   8
         Our Approach
              Baseline (Process-based)                          Fast Path (Automata-based)

                                                                        1    Optimized Code
                                                                 Test           a = b;
                                                                                b = c *   d;
                                                                                               2
                                                                                d = 0;
                                                                                if (c >   0)
                                                                                   c++;
                                                                                a = c;
                                                                                b = c *   d;
                                                                Abort?          d = 3;
                                                                                if (c >   0)
                                                                        3          c++;




Intel Labs & Princeton University   Fast Paths in Concurrent Programs                              9
         Specifying Fast Paths
               Multiple processes                             fastpath example {
                       Concurrent Program                       process first {
                                                                   statement A, B, C, D, #1;
               Regular expressions                                start      A ? (size<100);
                       Statements                                 follows    B ( C D )*;
                                                                   exit       #1;
                       Conditions (Optional)
                                                                 }
                       Synchronization                          process second {
                        (Optional)                                   ...
                                                                 }
               Support early abort                              process third {
               Advantages                                           ...
                                                                 }
                       Powerful
                                                               }
                       Compact
                       Hint



Intel Labs & Princeton University     Fast Paths in Concurrent Programs                  10
         Extracting Fast Paths
               Automata-based approach to extract fast paths
                 A Fast Path involves a group of processes
                 Compiler keeps track of the execution point for
                  each of the involved processes
                 On exit, control is returned to the appropriate
                  location in each of the processes
                Baseline: Concurrent. Fast Path: Sequential Code
               Fairness on Fast Path
                       Embed scheduling decisions in the fast path
                        ─     Avoid scheduling/fairness overhead on the fast path
                       Rely on baseline code for fairness
                        ─     Always taken a fraction of the time

Intel Labs & Princeton University       Fast Paths in Concurrent Programs           11
         Optimization on Fast Path
               Enabling Traditional Fast Paths
                       Generate and Optimize baseline code
                       Generate Fast path code
                        ─     Fast Paths have exit/entry points to baseline code
                       Use data-flow information from baseline code at the
                        exit/entry point to start analysis and optimize the
                        fast path code
               Speeding up fast path using lazy execution
                       Delay operations that are not needed when fast
                        paths are executed to the end
                       Such operations can be performed if the fast path is
                        aborted

Intel Labs & Princeton University       Fast Paths in Concurrent Programs          12
         Outline
               Motivation
               Fast Paths
               Fast Paths in Concurrent Programs
               Experimental Evaluation
               Conclusions




Intel Labs & Princeton University   Fast Paths in Concurrent Programs   13
         Experimental Evaluation
               Implemented the techniques in the paper
                       In ESP Compiler
                        ─     Supports concurrent programs


               Two class of programs
                       Filter Programs
                       VMMC Firmware


               Answer three questions
                       Programming effort (annotation complexity) needed
                       Size of the executable
                       Performance

Intel Labs & Princeton University      Fast Paths in Concurrent Programs   14
         Filter Programs
               Well-defined structure                                     P1
                       Streaming applications
               Use Filter Programs by Probsting et al.                    C1

                       Good to evaluate our technique
                                                                           P2
                        ─     Concurrency overheads dominate
               Experimental Setup                                         C2
                       2.66 GHz Pentium 4, 1 GB Memory, Linux 2.4
                       4 Versions of the code                             P3

               Annotation Complexity                                      C3
                       Program sizes: 153, 125, 190, 196 lines
                       Annotation sizes: 7, 7, 10, 10 lines               P4


Intel Labs & Princeton University      Fast Paths in Concurrent Programs   15
                      Filter Programs Cont’d
                            Process-based                                        Automata-based
                            Process-based with Manual Fast Path                  Process-based with Automatic Fast Path


                              9.47                28.33                          23.52                 4.17
                      2.5
    Performance




                        2
                                                                                     Better Performance than Both
                      1.5
                        1
                      0.5
                        0
                                 Program 1
                                Program 1             Program 2
                                                       Program
                                                               2                      Program 3
                                                                                       Program
                                                                                               3        Program 4
                                                                                                         Program
                                                                                                                 4
                                                                                       5.53              5.15
                      2.5
    Executable Size




                       2                                                             Relatively Small Executable
                      1.5
                       1
                      0.5
                       0
                                 Program 1              Program 2                      Program 3         Program 4
Intel Labs & Princeton University                Fast Paths in Concurrent Programs                                  16
         VMMC Firmware
               Firmware for a gigabit network (Myrinet)
               Experimental Setup
                       Measure network performance between two
                        machines connected with Myrinet
                        ─     Latency & Bandwidth
                       3 Versions of the firmware
                        ─     Concurrent C version with Manual Fast Paths
                        ─     Process-based code without Fast Paths
                        ─     Process-based code with Compiler-extracted Fast Paths
               Annotation Complexity (3 fast paths)
                       Fast Path Specification: 20, 14, and 18 lines
                       Manual Fast Paths in C: 1100 lines total


Intel Labs & Princeton University       Fast Paths in Concurrent Programs             17
                           VMMC Firmware Cont’d
                                  Generated Code Size                                           Performance: Latency
                                                                                         70
                                  Hand-Optimized C with Manual Fast Paths                           Hand-Optimized C with Manual Fast Paths
                                  Process-based Code                                                Process-based
                                                                                         60         Process-based with Automatic Fast Paths
                                  Process-based with Automatic Fast Paths


                                                                                         50
                          40000
  Assembly Instructions




                                                                                         40




                                                                                 s
                          30000
                                                                                         30


                          20000
                                                                                         20


                          10000                                                          10



                                                                                          0
                              0
                                                                                                4     8    16    32    64   128   256   512

                                                                                                     Message size (in Bytes)
Intel Labs & Princeton University                           Fast Paths in Concurrent Programs                                           18
         Outline
               Motivation
               Fast Paths
               Fast Paths in Concurrent Programs
               Experimental Evaluation
               Conclusions




Intel Labs & Princeton University   Fast Paths in Concurrent Programs   19
         Conclusions
               Fast Paths in Concurrent Programs
                       Evaluated using Filter programs and VMMC firmware


               Process-based approach to handle all cases
                       Keeps executable size reasonable


               Automata-based approach to handle only the
                common cases (Fast Path)
                       Avoid high overhead of process-based approach
                       Often outperforms the automata-based code


Intel Labs & Princeton University   Fast Paths in Concurrent Programs   20
Questions ?
                        ABCDEF
                                                                        Abcdef Ghijk
                        ABCDEF                  Abcdef Ghijk            Abcdef Ghijk

                                                Abcdef Ghijk            Abcdef Ghijk
                        ABCDEF
                                                                        Abcdef Ghijk




Intel Labs & Princeton University   Fast Paths in Concurrent Programs                  22

								
To top