Architecture Aware Design for a Parallel Object Recognition System.pdf

Document Sample
Architecture Aware Design for a Parallel Object Recognition System.pdf Powered By Docstoc
					                                              Architecture Aware Design for a Parallel Object Recognition System
                                                                                                      Bor-Yiing Su, Bryan Catanzaro, Tasneem Brutch, Kurt Keutzer
                                                                                                                                                                                                                                                                                                                                 Parallel Computing Lab

                                                                                                 This research is supported in part by Microsoft (Award #024263 ) and Intel (Award #024894)
                                                                                                         funding and by matching funding by U.C. Discovery (Award #DIG07-10227)




                            Object Recognition                                                                                                            Training                                                                                                                                Classification
                                                                                                                        C. Gu, J. Lim, P. Arbelaez, and J. Malik, “Recognition Using Regions," Conference on                                                             C. Gu, J. Lim, P. Arbelaez, and J. Malik, “Recognition Using Regions," Conference on
                                                                                                                               Computer Vision and Pattern Recognition (CVPR'09), Miami, FL, 2009.                                                                              Computer Vision and Pattern Recognition (CVPR'09), Miami, FL, 2009.
            Trained Categories                  Image Queries                       Outputs
Bottles                          Mugs
              Apple Logos                                                                                                                             •     Contour feature extraction
                                                                                                                                                           •    Use a 128-dimension histogram to represent contour feature                                                                           Input                    Trained
                                                                                                                                                           •    Collect contour strength of 8 orientations on a 4 x 4 grid                                                                          Image                      Data


                                                                                                                                                      •     Distance computation                                                                                                                  Contour
                                                                                                                                                           •     Chi-squared distance                                                                                                             Detection
                                                                                                                                                                 between each feature                                                                                                                                      Classification
          Swans                                                                                                                                                  vector
                             Giraffes
                                                                                                                                                                                                                                                                                                   Image
                                                                                                                                                      •     Weight learning                        1 NI I 2      T
                                                                                                                                                                                               min  ( wi )  C  t                                                                            Segmentation
                                                                                                                                                           •    Find discriminative            w, 2
                                                                                                                                                                                                     i 1       t 1
                                                                                                                                                                                                   NI
                                                     Object                                                                                                     features
                                                                                                                                                                                               s.t. wiI (d iIK t  diIJ t )  1  t , t  1,2,..., T
                                                   Recognition                                                                                             •    Quadratic optimization             i 1                                                                                                                      Object
                                                    System                                                                                                      problem                            w  0, i  1,2,..., N I
                                                                                                                                                                                                        i
                                                                                                                                                                                                         I                                                                                        Feature                   Bounding
                                                                                                                                                           •    Solve by simplex                                                                                                                 Extraction
                                                                                                                                                                                                  t  0, t  1,2,..., T                                                                                                     Box
                                                                                                                                                                algorithm




                            System Performance                                                                                   Parallel Pair-wise Distance                                                                                                            Parallel Graph Traversal on Images
                                                                                                                                                                                                                                                              Bor-Yiing Su, Tasneem Brutch, and Kurt Keutzer, "Parallel BFS Graph Traversal on Images Using Structured

       Detection Quality
                                               Speedup by Parallel Implementation on                            •   Widely used to measure the difference between features                                                                                        Grid", in International Conference on Image Processing (ICIP-2010), Hong Kong, September 2010.
                                                       Nvidia Tesla C1060
                                                                                                                                                                                                                                                          •      Graph representation of an image
                                                               Computation time (s)
                                            Computation                                Speedup                                                                                                                                                                  • Each pixel is represented by a node
                                                               Core i7      Tesla
                                              Contour          236.7        2.243       106x
                                                                                                                                                                                                                                                                • Neighborhood relationship between
                                            Segmentation        2.27        0.357       6.36x                                                                                                                                                                       pixels represented by edges
                                               Feature          7.97        0.279       28.6x
                                                                                                                                                                                                                                                          •      BFS graph traversal algorithm is widely used in
                                            Hough Voting       84.13        1.688       49.8x                   •    Parallelization strategies
                                                                                                                                                                                                                                                                 region and boundary analysis
                                            Classification      331         4.567       72.5x                       • Inner product based algorithm
Original Serial Algorithm                                        Classification                                                                                                                                                                           •      Parallelization strategies
                                                                                                                          for i = 1 to m                                                                                                                        • Transform the BFS traversal problem into structured grids
                                                                                                                               for j = 1 to n                                                                                                                       computation
                                                               Computation time (s)
                                            Computation                                Speedup                                      for s = 1 to k
                                                               Core i7      Tesla
                                                                                                                                         Update(Result(i, j), Vset(i, s), Vset(j, s));
                                               Feature          543          24.9       21.8x
                                              Distance          1732         2.9        597x                        •    Outer product based algorithm
                                               Weight            57          2.16       26.4x
                                                                                                                          for s = 1 to k
                                              Training          2332        29.96       77.8x
                                                                                                                               for i = 1 to m
                                                             Training on 127 images                                                  for j = 1 to n
   Parallel Algorithm
                                                                                                                                          Update(Result(i, j), Vset(i, s), Vset(j, s));
                                                                                                                                                                                                                                                                •     Parallelize the task queue in the BFS traversal algorithm
                              Benchmark: ETHZ shape benchmark
                                                                                                                •    Cache choices
                                                                                                                    • No cache at all
                                                                                                                    • Use texture memory to cache both vector sets
                                                                                                                    • Use shared memory to cache vector elements
                                  Conclusion
                                                                                                                                                                                                                                                                •     Apply the BFS algorithm on partitioned sub-graphs
                                                                                                                •    Experimental Results
                                                                                                                    • If the # of vector pairs is small, apply the inner product algorithm
                                                                                                                    • If the # of vector pairs is large, apply the outer product algorithm
   •       The performance of parallelizing a computation will be                                                   • Always use shared memory to cache vector elements
           influenced by
          • Parallelization strategy
          • Underlying hardware architecture
                                                                                                                                                                                                                                                          •      Experimental Results
          • Input data properties
                                                                                                                                                                                                                                                                • If the image is complicated,
   •       We need to understand the trade-offs between different
                                                                                                                                                                                                                                                                    apply the structured grid
           parallelization implementations to optimize the
                                                                                                                                                                                                                                                                    method on GPU
           computation
                                                                                                                                                                                                                                                                • If the image is simple,
   •       Ideally, we should dynamically adjust the parallelization
                                                                                                                      Chi-squared distance                                              Chi-squared distance                                                        apply the graph partition
           strategy according to the input data properties at runtime                                               computation on GTX 480                                           computation on Tesla C1060                                                     method on CPU
                                                                                                                                                                                                                                                                                                                              Benchmark: Berkeley segmentation dataset

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/23/2012
language:
pages:1
zhaonedx zhaonedx http://
About