Embassring Parrallel Computation by pptfiles

VIEWS: 12 PAGES: 31

									         CUDA Lecture 6
       Embarrassingly Parallel
          Computations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The
                   University of Akron.
Embarrassingly Parallel Computations
 A computation that can obviously be divided into a
  number of completely independent parts, each of
  which can be executed by a separate process(or).




   No communication or very little communication
    between processes.
   Each process can do its tasks without any interaction
    with other processes.
                                   Embarrassingly Parallel Computations – Slide 2
Embarrassingly Parallel Computation
Examples
 Low level image processing
 Mandelbrot set
 Monte Carlo calculations




                               Embarrassingly Parallel Computations – Slide 3
Topic 1: Low level image processing
 Many low level image processing operations only
  involve local data with very limited if any
  communication between areas of interest.




                               Embarrassingly Parallel Computations – Slide 4
Partitioning into regions for individual
processes




Square region for each process   Can also use strips




                                  Embarrassingly Parallel Computations – Slide 5
Some geometrical operations
 Shifting
   Object shifted by x in the x-dimension and y in the y
    -dimension:
                    x = x + x   y = y + y
    where x and y are the original and x and y are the new
    coordinates.
 Scaling
   Object scaled by a factor of Sx in the x-direction and Sy
    in the y-direction:
                     x = xSx      y = ySy


                                    Embarrassingly Parallel Computations – Slide 6
Some geometrical operations (cont.)
 Rotation
   Object rotated through an angle  about the origin of
    the coordinate system:
         x = x cos  + y sin  y = -x sin  + y cos 
 Clipping
   Applies defined rectangular boundaries to a figure and
    delete points outside the defined area.
   Display the point (x,y) only if
             xlow  x  xhigh       ylow  y  yhigh
    otherwise discard.

                                  Embarrassingly Parallel Computations – Slide 7
Notes
 GPUs sent starting row numbers
   Since these are related to block and thread i.d.s each
    GPU should be able to figure it out for themselves
 Return results are single values
   Not good programming but done to simplify first
    exposure to CUDA programs
 No terminating code is shown




                                     Embarrassingly Parallel Computations – Slide 8
Topic 2: Mandelbrot Set
 Set of points in a complex plane that are quasi-stable
  (will increase and decrease, but not exceed some
  limit) when computed by iterating the function
                     zk+1 = zk2 + c
  where zk+1 is the (k+1)th iteration of the complex
  number z = a + bi and c is a complex number giving
  position of point in the complex plane.

   The initial value for z is zero.



                                       Embarrassingly Parallel Computations – Slide 9
Mandelbrot Set (cont.)
 Iterations continued until magnitude of z is greater
  than 2 or number of iterations reaches arbitrary limit.
 Magnitude of z is the length of the vector given by




                                 Embarrassingly Parallel Computations – Slide 10
Sequential routine computing value of
one point returning number of
iterations
     structure complex
     {
        float real; float imag;
     }

     int cal_pixel(complex c)
     {
        int count=0, max=256;
        complex z;
        float temp, lengthsq;

         z.real = 0; z.imag = 0;
         do {
            temp = z.real * z.real – z.imag * z.imag + c.real;
            z.imag = 2 * z.real * z.imag + c.imag;
            z.real = temp;
            lengthsq = z.real * z.real + z.imag * z.imag;
            count++;
         } while ((lengthsq < 4.0) && (count < max));
         return count;
     }



                                        Embarrassingly Parallel Computations – Slide 11
Notes
 Square of length compared to 4 (rather than
  comparing length to 2) to avoid a square root
  computation
 Given the terminating conditions, all of the
  Mandelbrot points must be within a circle centered at
  the origin having radius 2
 Resolution is expanded at will to obtain interesting
  images




                                Embarrassingly Parallel Computations – Slide 12
Mandelbrot set




                 Embarrassingly Parallel Computations – Slide 13
Parallelizing Mandelbrot Set
Computation
 Dynamic Task Assignment
   Have processor request regions after computing
    previous regions.
   Doesn’t lend itself to a CUDA solution.
 Static Task Assignment
   Simply divide the region into a fixed number of parts,
    each computed by a separate processor.
   Not very successful because different regions require
    different numbers of iterations and times.
   …but what we’ll do in CUDA.


                                  Embarrassingly Parallel Computations – Slide 14
Dynamic Task Assignment: Work
Pool/Processor Farms




                    Embarrassingly Parallel Computations – Slide 15
Simplified CUDA Solution
 #include <stdio.h>
 #include <conio.h>
 #include <math.h>
 #include “../common/cpu_bitmap.h”
 #define DIM 1000

 struct cuComplex {
    float r, i;
    cuComplex(float a, float b) : r(a), i(b) {}
    __device__ float magnitude2 (void) {
       return r*r + i*i;
    }
    __device__ cuComplex operator*(const cuComplex& a) {
       return cuComplex(r*a.r – i*a.i, i*a.r + r*a.i);
    }
    __device__ cuComplex operator+(const cuComplex& a) {
       return cuComplex(r + a.r, i + a.i);
    }
 };




                                        Embarrassingly Parallel Computations – Slide 16
Simplified CUDA Solution (cont.)
 __device__ int mandelbrot(int x, int y) {
    float jx = 2.0 * (x – DIM/2) / (DIM/2);
    float jy = 2.0 * (y - DIM/2) / (DIM/2);
    cuComplex c(jx,jy);
    cuComplex z(0.0,0.0);
    int i = 0;
    do {
       z = z * z + c; i++;
    } while ((z.magnitude2() < 4.0) && (i < 256));
    return i % 8;
 }




                                        Embarrassingly Parallel Computations – Slide 17
Simplified CUDA Solution (cont.)
    __global__ void kernel (unsigned char *ptr) {
       // map from blockIdx to pixel position
       int x = blockIdx.x; int y = blockIdx.y;
       int offset = x + y * gridDim.x;
       // now calculate the value at that position
       int mValue = mandelbrot(x,y);
       ptr[offset*4 + 0] = 0; ptr[offset*4 + 1] = 0;
       ptr[offset*4 + 2] = 0; ptr[offset*4 + 3] = 255;
       switch (mValue) {
          case 0: break;                           // black
          case 1: ptr[offset*4 + 0] = 255; break; // red
          case 2: ptr[offset*4 + 1] = 255; break; // green
          case 3: ptr[offset*4 + 2] = 255; break; // blue
          case 4: ptr[offset*4 + 0] = 255;         // yellow
                   ptr[offset*4 + 1] = 255; break;
          case 5: ptr[offset*4 + 1] = 255;         // cyan
                   ptr[offset*4 + 2] = 255; break;
          case 6: ptr[offset*4 + 0] = 255;         //
    magenta
                   ptr[offset*4 + 2] = 255; break;
          default: ptr[offset*4 + 0] = 255;        // white
                   ptr[offset*4 + 1] = 255;
                   ptr[offset*4 + 2] = 255; break;
       }
    }
                                     Embarrassingly Parallel Computations – Slide 18
Simplified CUDA Solution (cont.)
    int main(void) {
       CPUBitmap bitmap(DIM, DIM);
       unsigned char *dev_bitmap;
       cudaMalloc((void**) &dev_bitmap,
    bitmap.image_size());
       dim3 grid(DIM, DIM);
       kernel<<<grid,1>>>(dev_bitmap);
       cudaMemcpy(bitmap.get_ptr(), dev_bitmap,
                  bitmap.image_size(),
                  cudaMemcpyDeviceToHost));
       bitmap.display_and_exit();
       cudaFree(dev_bitmap);
    }




                                     Embarrassingly Parallel Computations – Slide 19
Topic 3: Monte Carlo methods
 An embarrassingly parallel computation named for
  Monaco’s gambling resort city, method’s first
  important use in development of atomic bomb during
  World War II.
 Monte Carlo methods use random selections.
   Given a very large set and a probability distribution
    over it
   Draw a set of samples identically and independently
    distributed
   Can then approximate the distribution using these
    samples

                                  Embarrassingly Parallel Computations – Slide 20
Applications of Monte Carlo
Method
 Evaluating integrals of arbitrary functions of 6+
  dimensions
 Predicting future values of stocks
 Solving partial differential equations
 Sharpening satellite images
 Modeling cell populations
 Finding approximate solutions to NP-hard problems




                              Embarrassingly Parallel Computations – Slide 21
Monte Carlo Example: Calculating 
 Form a circle within a square, with unit radius so that
  the square has sides 22. Ratio of the area of the circle
  to the square given by




   Randomly choose points within square.
   Keep score of how many points happen to lie within
    circle.
   Fraction of points within the circle will be /4 given a
    sufficient number of randomly selected samples.
                                   Embarrassingly Parallel Computations – Slide 22
Calculating  (cont.)




                        Embarrassingly Parallel Computations – Slide 23
Calculating  (cont.)

     x                x           x
             x                x                   x
                  x
     x       x        x   x               x


         x                        x
                 xx                   x
         x                x                   x




                                                      Embarrassingly Parallel Computations – Slide 24
Relative Error
 Relative error is a way to measure the quality of an
  estimate
 The smaller the error, the better the estimate
 a: actual value
 e: estimated value
 Relative error = |e-a|/a




                                 Embarrassingly Parallel Computations – Slide 25
Increasing Sample Size Reduces Error
          n          Estimate   Abs. Error            1/(2n)
          10         2.40000     0.23606              0.15811

          100        3.36000     0.06952             0.05000

         1,000       3.14400     0.00077              0.01581
        10,000        3.13920    0.00076             0.00500

        100,000       3.14132    0.00009              0.00158

       1,000,000     3.14006     0.00049             0.00050

      10,000,000      3.14136    0.00007              0.00016

      100,000,000     3.14154    0.00002             0.00005

     1,000,000,000    3.14155    0.00001             0.00002



                                    Embarrassingly Parallel Computations – Slide 26
Next Example: Computing an Integral
 One quadrant of the construction can be described by
  the integral




   Random numbers (xr,yr)
    generated, each between 0
    and 1.
   Counted as in circle if
    xr2+yr2  1.

                                Embarrassingly Parallel Computations – Slide 27
Example
 Computing the integral

 Sequential code
    sum=0;                           Routine randv(x1,x2)
    for (i=0; i<N; i++) {
       xr = rand_v(x1,x2);           returns a pseudorandom
       sum += xr * xr - 3*xr;
    }                               number between x1 and x2.
    area = (sum / N) * (x2 - x1);



 Monte Carlo method very useful if the function
  cannot be integrated numerically (maybe having a
  large number of variables).
                                    Embarrassingly Parallel Computations – Slide 28
Why Monte Carlo Methods Interesting
 Error in Monte Carlo estimate decreases by the factor
  1/n
 Rate of convergence independent of integrand’s
  dimension
   Deterministic numerical integration methods do not
    share this property
   Hence Monte Carlo superior when integrand has 6 or
    more dimensions
 Furthermore Monte Carlo methods often amenable to
  parallelism
   Can find an estimate about p times faster or reduce
    error of estimate by p
                                  Embarrassingly Parallel Computations – Slide 29
Summary
 Embarrassingly Parallel Computation Examples
   Low level image processing
   Mandelbrot set
   Monte Carlo calculations
      Applications: numerical integration
      Related topics: Metropolis algorithm, simulated annealing
      For parallelizing such applications, need best way to
       generate random numbers in parallel.




                                      Embarrassingly Parallel Computations – Slide 30
End Credits
 Based on original material from
   The University of Akron
      Tim O’Neil, Saranya Vinjarapu
   The University of North Carolina at Charlotte
      Barry Wilkinson, Michael Allen
   Oregon State University: Michael Quinn
 Revision history: last updated 7/28/2011.




                                        Embarrassingly Parallel Computations – Slide 31

								
To top