Andy

Document Sample
Andy Powered By Docstoc
					ECE 1747 Parallel Programming Course Project                   Dec. 2006




                     Parallel Computation of the 2D
                     Laminar Axisymmetric Coflow
                     Nonpremixed Flames

                     Qingan Andy Zhang

                     PhD Candidate
                     Department of Mechanical and Industrial Engineering
                     University of Toronto
Outline
n   Introduction
n   Motivation
n   Objective
n   Methodology
n   Result
n   Conclusion
n   Future Improvement
n   Work in Progress

                         2
Introduction
n   Multi-dimensional flame
n   Easy to model
n   Computationally OK with detail
    sub-models such as chemistry,
    transport, etc.
n   Lots of experimental data
n   Resembles the turbulent flames
    in some cases (eg. flamelet
    regime)                          Flow configuration




                                                          3
Motivation
The run time is expected to be long if:
n   Complex Chemical Mechanism
    ¨ Appel   (2000) mechanism (101 species,543 reactions)
n   Complex Geometry
    ¨ Large 2D coflow laminar flame (1,000*500=500,000)
    ¨ 3D laminar flame (1,000*500*100=50,000,000)

n   Complex Physical Problem
    ¨ Soot  formation
    ¨ Multi-phase problem

                                                          4
Objective
To develop parallel flame code based on the sequential flame code

n Speedup
n Feasibility
n Accuracy
n Flexibility




                                                                    5
Methodology -- Options
n   Shared Memory
    ¨ OpenMP
    ¨ Pthread

n   Distributed Memory
    ¨   MPI
n   Distributed Shared Memory
    ¨ Munin
    ¨ TreadMarks

MPI is chosen because it is widely used for scientific
computation, easy to program and also the cluster is a
Distributed Memory system.
                                                         6
Methodology -- Preparation
n Linux OS
n Programming tool (Fortran, Make, IDE)
n Parallel computation concepts
n MPI commands
n Network (SSH, queuing system)




                                          7
Methodology –Sequential code

n   Sequential Code Analysis
    ¨ Algorithm
    ¨ Dependency
            ¨ Data
            ¨ I/O

    ¨ CPU       time breakdown

    Sequential code is the backbone for parallelization!


                                                           8
Methodology
                          Continuity equation
                          Momentum equation        Constitutive relation
                          Gas species equation + Initial condition
                          Energy equation          Boundary condition




                                                CFD
                                      With parallel computation


 Flow configuration and
 computational domain

                                                                      9
Methodology
                          CFD:
                          Finite Volume Method
                          Iterative process on Staggered grid

                          Quantities solved (primitive variables):
                          U, V, P’, Yi (i=1,KK), T
                          Yi --- ith gas species mass fraction
                          KK --- total gas species number

                          If KK=100, then we have to solve (3+100+1)=104
                          equations at each point.
                          If mesh is 1000*500, then, we have to solve
                          104*1000*500=52,000,000 equations in each iteration.
 Flow configuration and   If 3000 iterations are required to get converged solution, we
 computational domain     have to totally solve 52,000,000*3000=156,000,000,000
                          equations.
                                                                                          10
General Transport Equation
Unsteady Term + Convection Term = Diffusion Term + Source Term
Unsteady: time variant term
Convection: caused by flow motion
Diffusion:
For species: molecular diffusion and thermo diffusion
Source term:
For species: chemical reaction




                                                           11
Mass and Momentum equation
 Mass:




Axial momentum:




Radial momentum:




                             12
Species and Energy equation
Species




Energy




                            Diffusion of species


Chemical   Radiation heat
reaction   transfer
                                                   13
Methodology –Sequential code

n   Start iteration from scratch or continued job
n   Within one iteration
    ¨ Iteration   starts
       n   Discretization à get AP(I,J) and CON(I,J)
       n   Solve à TDMA or PbyP Gauss Elimination
       n   Get new valueà update F(I,J,NF) array
       n   Do other equations
    ¨ Iteration   ends
n   End iteration if convergence reached

                                                       14
Methodology –Sequential code




                                               Most time-consuming part:  
                                               Species Jacobian matrix 
                                               DSDY(K1,K2,I,J) evaluation

Fig. 1 CPU time for each sub-code summarized   Dependency??
after one iteration with radiation included                              15
Methodology -- Parallelization
Domain Decomposition Method (DDM) with Message Passing
Interface (MPI) programming




                                    R, V



                                                     Z, U
                                           Six processes used to decompose the computational
                                                 domain of 206*102 staggered grid points

                Ghost Points are placed at the boundary to reduce
                communication among processes!                                      16
Cluster Information
n   Cluster location: icpet.nrc.ca in Ottawa
n   40 nodes connected by Ethernet
        n1-5 n2-5 n3-5 n4-5  |  n5-5 n6-5 n7-5 n8-5
        n1-4 n2-4 n3-4 n4-4  |  n5-4 n6-4 n7-4 n8-4
        n1-3 n2-3 n3-3 n4-3  |  n5-3 n6-3 n7-3 n8-3
        n1-2 n2-2 n3-2 n4-2  |  n5-2 n6-2 n7-2 n8-2
        n1-1 n2-1 n3-1 n4-1  |  n5-1 n6-1 n7-1 n8-1


n   AMD Opteron 250 (2.4GHz) with 5G memory
n   Redhat Linux Enterprise Edition 4.0
n   Batch-queuing system: Sun Grid Engine (SGE)
n   Portland Group compilers (V 6.2) + MPICH2
                                                      17
Results  --Speedup
 Table 1 CPU time and speedup for 50 iterations with Appel et al. 2000 mechanism

                    Sequential   4 processes    6 processes    12 processes
  Processes
  CPU time(s)       51313        15254          10596          5253
  Speedup           1            3.36           4.84           9.77



  (1)Speedup is good
  (2)CPU time spent on 50 iterations for the original
     sequential code is 51313 seconds, i.e. 14.26
     hours. Too long!

                                                                               18
Results  --Speedup




     Fig. 3 Speedup obtained with different processes
                                                        19
 Results  --Real application
                               Temperature field (in K)           OH field (in mole fraction)


 Flame field calculation
 using the parallel code
 (Appel 2000 mechanism)


                               Benzene field (in mole fraction)   Pyrene field (in mole fraction)

The trend is well predicted!




                                                                                           20
Conclusion
n The sequential flame code is parallelized
  with DDM
n Speedup is good
n The parallel code is applied to model a
  flame using a detailed mechanism
n Flexibility is good, i.e. geometry and/or #
  of processors can be easily changed

                                                21
Future Improvement
n Optimized DDM
n Species line solver




                        22
Work in Progress
n   Fixed sectional soot model
     ¨ Add 70 equations to the original system of
       equations




                                                    23
Experience
n   Keep communication down
n   Wise parallelization method
n   Debugging is hard
n   I/O




                                  24
Thanks



         Questions?




                      25

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:11/21/2013
language:English
pages:25