Docstoc

Evolutionary Algorithms for the Protein Structure ... - Docking_GRID

Document Sample
Evolutionary Algorithms for the Protein Structure ... - Docking_GRID Powered By Docstoc
					Laboratoire d’Informatique          Parallel Cooperative
Fondamentale de Lille               Optimization Research
                                    Group


     Evolutionary Aglorithms for the
   Protein Structure Prediction Problem
          and Molecular Docking


A.-A. Tantar, N. Melab and E-G. Talbi
           {tantar, melab, talbi}@lifl.fr


                                                INRIA DOLPHIN Project


      Supported by the French Research Agency
       Outline

n   Protein Structure Prediction, Molecular
    Docking: modeling and complexity
    analysis
n   Parallel Hybrid Metaheuristics for the PSP 
    Problem
n   Grid experimentation on GRID5000
n   Conclusion and Future Work
          Protein Structure Prediction Problem
                                                      Conformational Sampling      E
                 Amino-acids                                                       N
                                                                                   E
                                                                                   R

                          ...                                                      G
                                                                                   Y

                                                                                   M
                                                                                   I
                                                                                   N
                                                                                   I
                                                                                   M
                          ...                                                      I
                                                                                   Z
                                                                                   A

                     A protein                                                     T
                                                                                   I
                                                                                   O
                                                         Native Conformation       N

Protein Structure Prediction (PSP) ~ finding the ground-state (tertiary structure)
conformation of a protein, given its amino-acid sequence - the primary structure
           Molecular Docking
                                         HIV-1 PROTEASE + XK263
RECEPTOR




              HIV-1 PROTEASE

                   +
LIGAND




             XK263 INHIBITOR


Molecular Docking ~ the prediction of the optimal bound conformation of 
two molecules exerting geometrical and chemical complementarity.
          Scoring Energy Function (A)

         Empirical Force Fields – Classical Mechanics
                                                                                    WEAK      London Forces
               n   AMBER – Assisted Model Building with Energy Refinement                     Van der Waals

               n   CHARMM – Chemistry at HARvard Molecular Mechanics
               n   OPLS – Optimized Potentials for Liquid Simulations
                                                                                              dipole-dipole
               n   GROMOS – GROningen MOlecular Simulation                                    stacking
               n   SYBYL
                                                                                              hydrogen bond
                      n   D-Score - DOCK
                      n   F-Score - FlexX
                      n   G-Score - Gold
               n   AutoDock/AutoGrid – suite of toos for automated docking
                                                                                  STRONG      ionic interactions
               n    ...


         From recent results Empirical Methods OUTPERFORM Ab Initio Methods (!!)

§ Neumaier, A. (1997). Molecular modeling of proteins and mathematical prediction of protein structure. 
SIAM Review, 39(3):407 – 460.
  Scoring Energy Function (B)
§ AutoDock




§ DOCK - SYBYL/D-Score
            Scoring Energy Function (C)
        § GOLD - SYBYL/G-Score




        § FlexX - SYBYL/F-Score




§ Renxiao Wang. Yipin Lu, and Shaomeng Wang, Comparative Evaluation of 11 Scoring Functions for Molecular
Modeling, J. Med. Chem., 10.1021/jm0203783, 2003, 46, 2287-2303
             Scoring Energy Function (D)




the factors model oscillating entities – forces simulated by interconnecting springs between
atoms; offer an easy concept and fast energy evaluations.
   constants derived from higher-level calculations (e.g. Ab initio) – difficult to obtain and to fit.
empirical force fields have the DRAWBACK of offering results not directly comparable with
results obtained through another differently parameterized force field.
   RMSD (Root Mean Squared Deviation) – typical distance measure between conformations.
      PSP modeling:
      Encoding of the conformations
                                      § Cartesian atomic coordinates 
                                      representation
                   AMINO-ACID
                                      § amino-acid-based encoding – 
                                      hydrophobic/hydrophilic models

286             TORSION-ANGLE BASED
                                      § all heavy atoms coordinates
165             CHROMOZOME

316                                   § backbone Ca coordinates

.                                     § backbone coordinates and side
.                                     -chain centroid coordinates
.

                                      § torsion-angle based 
7
69
                                      representation
           Molecular Docking:
           Encoding of the conformations
                                                                        Model complexity
                                                                        High Complexity
                                                                             § flexible docking - both the 
                                                                             ligand and the receptor are 
                                                                             modeled as flexible molecules; 
                                                                             limitations may be imposed

                                                                             § partially flexible docking – 
                                                                             to some extent flexibility is 
                                                                             modeled by focusing on the 
                                                                             smaller molecule or by defining 
                                                                             comprehensive regions of 
                                                                             significance
AutoDock Representation - string of real valued genes
     § three Cartesian coordinates for ligand translation                    § rigid docking – extreme 
                                                                             simplification; both the ligand 
     § four variables for defining a quaternion specifying ligand            and the receptor are rigid 
     orientation                                                             entities, no flexibility being 
     § one real value for each ligand torsion                                allowed at any point

§ Garret M. Morris et al., Automated Docking Using a Lamarckian         Low Complexity
Genetic Algorithm and an Empirical Binding Free Energy Function, 
Journal of Computational Chemistry, Vol. 19, No. 14, 1639-1662, 1998.   + Multiple Potential Binding Sites
 Complexity Analysis
                         A molecule of 40                10 conformations per
   Levinthal’s Paradox

                            residues                           residue

                                        1040 conformations

                                  1014 conformations per second

                                            1028 years



1011 local optima for the [met]-enkephalin
pentapeptide

 § 75 atoms
 § Five amino-acids (Tyr-Gly-Gly-Phe-Met)
 § 22 variable backbone dihedral angles
         Motivation and objectives

n   PSP and Molecular Docking can be modeled as a
    optimization problems …
n   … which are multi-modal
n   … and require a huge amount of resources for large proteins
    n   Need of hybrid metaheuristics
    n   Need of   large scale parallelism (Grid computing)
n   $$$ : $150000-$250000 per X-ray structure
n   Study of two hybrid metaheuristics for PSP:
    n   Genetic Algorithms and Simulated Annealing
         Classification of the Existing Methods

n   Energy minimization vs. Geometry complementarity approaches

n   Ab initio, de nuovo electronic structure calculations
     n   rely on quantum mechanics for determining different molecular characteristics
     n   comprise no approximations and no a priori experimental data is required 
     n   high computational complexity => reduced size systems
n   Semi-empirical methods
     n   make use of approximations as substitution for ab initio techniques
     n   employ simplified models for electron-electron interactions
n   Empirical methods
     n   rely upon molecular dynamics (classical mechanics based methods)
     n   often the only applicable methods for large molecular systems, i.e. proteins and polymers
     n   do not dissociate atoms into electrons and nuclei - indivisible entities

n   Continuous vs. Discretization - Combinatorial Optimization
          Classification – Algorithmic Overview (A)

n   Global Optimization – Energy Landscape Projections, Convex Global
    Underestimation, etc.
      n   construct a metamodel of the energy surface - avoid most of the inconvenients determined by
          the funnel with bumps structure of the landscape
      n   tend to require less computational time than classical techniques – a few orders of magnitude
      n   apparently have a lower probability of remaining blocked in kinetic traps
      n   employ suppositions regarding the landscape which may not extend to different classes
      n   require the construction of a metamodel, underestimation, etc. which offers no
          optimality/corectness guarantee

n   Molecular Dynamics
      n   offer accurate results as far as the force field model fits the real energy landscape
      n   the method can be applied straightforward – apart the formalism, no parameters to be tuned
      n   require extensive computational resources – may only be applied to reduced models
      n   the design and the development of the formalism may be far from trivial
      n   given the high computational complexity, stand-alone approaches cannot be practical

bond vibration        isomerization, water dynamics         fastest folders         typical folders   slow folders


10-15 femto MD step        10-12 pico         10-9 nano,     10-6 micro       10-3 mili               100 seconds
                                              MD long run
         Classification – Algorithmic Overview (B)

n   Monte Carlo / Simulated Annealing / Metropolis-based algorihtms
     n   do not require a large number of parameters – adaptive versions overcome to an extent the
         inconvenients determined by the initial specified values
     n   offer a performance guarantee for ataining the optimal value
     n   compared to gradient based methods, are less prone to getting traped in local minima
     n   require a non feasible number of sampling points for having the performance guarantee
     n   appropiate for local search optimization or for the refinement of specific areas
     n   might employ specific distributions for performing the conformational sampling
     n   depending on the nature of the method, it may pose important parallelization problems

n   Evolutionary algoritmhs – Genetic Algorithms
     n   exhibit strong exploration characteristics while lacking intensification capabilities –
         hybridization approaches tend to be more appropiate
     n   offer the base for complex parallelization techniques
     n   require implementing different components, having different parameters
     n   the design of the operators may require to consider structural aspects (i.e. xover operator
         should not mix angles from different amino-acids when performing the recombination, etc.)
     n   do not offer any performance guarantee – statistical convergence
      The sketch of an evolutionary algorithm

Initialization                 Best individual


 Evaluation        Parents         Stop ?          Selection


                                  Evolutionary      Genitors
                 Replacement        engine


                                                 Recombination,
                  Evaluation      Offspring
                                                    mutation
            Generation of the Initial population
        n    Seeding - information inserted from structural databases
              n   ensure that the population is seeded with the substructures that lead to the optimal 
                  conformation
              n   does not represent the desired approach as strucutural information may not be available in all 
                  the cases

        n    Ab initio approach – no initial structural information is given
              n   uniform random initialization of the dihedral angles – the most common option
              n   depending on structural data different distributions may be employed for offering a bias

        n    The size of the population
              n   may have an impact on the convergence rate:
                    n  reduced number of chromosomes -> premature convergence 
                    n  larger number of chromosomes -> extra computational time
              n   should be related to the cardinality of the alphabet used for encoding, chromosome length, 
                  algorithm parameters, etc.; empirical, restricted to special cases - no rigorous specifications


§ DUSAN P. DJURDJEVIC, MARK J. BIGGS, Ab Initio Protein Fold Prediction Using Evolutionary Aglorithms:
Influence of Design and Control Parameters on Performance, 10.1002/jcc.20440, Wiley InterScience, 2005.
Selection and Replacement (A)
Selection pressure
the probability of selecting the best chromosome as compared to the average selection 
probability of all the chromosomes

Selection intensity
population expected average fitness value after performing a selection step having as base a 
normalized Gaussian distribution

Selection variance
population expected variance of the fitness distribution after performing a selection step having 
as base a normalized Gaussian distribution

Loss of diversity
the percentage of non-selected individuals during the selection phase

Bias
the absolute difference between the expected reproduction probability of a specific 
chromozome and the chromosome’s normalized fitness

Spread
range of possible values for the offsprings of a given chromosome 
Selection and Replacement (B)
Selection strategy

n   fitness-proportionate selection
     n   no bias, does not guarantee a minimum spread

n   rank-based selection
     n   selection relative to rank and not to the chromosome’s real fitness value

n   stochastic tournament selection
     n   tournament among randomly chosen individuals
     n   selection pressure can be adjusted by modifying the size of the tournament
     n   offer a constant selection pressure

n   uniform selection 
     n   selection bias towards sparse fitness levels and not towards best fit chromosomes
     n   might not behave well under extensive genetic diversity conditions (i.e, initial phases)
Selection and Replacement (C)
Replacement strategy

n   Global replacement vs. Local replacement
      n   generational replacement – replace all the parents with the offsprings
      n   uniform replacement - less offsprings than parents, replace at random
      n   elitist replacement - less offsprings than parents, replace worst parents
      n   fitness-based replacement - more offsprings than parents, insert only the best 
          offsprings

n   random, first-in-first out – the simplest ones, do not exert strong performace 
    characteristics

n   worst-fit replacement – the least fit chromosome(s) are replaced

n   exponential replacement – chromosomes are ranked from the least fit to the 
    the most fit, the replacement process employing an associated probability 
    distribution

n   ...
Recombination Operators

n   discrete recombination – exchange of chromosome values; for each locus in the 
    offspring, the parent chromosomes have equal probabilities of contributing

n   intermediate recombination – offsprings obtained by interpolation of the parents
           Offspring[ loci ] = ui * A[ loci ] + ( 1-ui ) * B [ loci ],  ui Î[ -a, 1+a ],  i Î1,n

n   line recombination – particular case of intermediat recombination, a single 
    uniformly random generated variable being considered
           Offspring[ loci ] = u * A[ loci ] + ( 1-u ) * B [ loci ],  u Î[ -a, 1+a ],  i Î1,n

n   one/multi point crossover – implies the generation of one/multiple cutting 
    points marking the segments to be combined into offsprings

n   uniform crossover – each locus is consider as potential cutting point

n   ...
Mutation Operators

n   real valued mutation – the main parameters to adjust are the mutation step size 
    and the mutation probability – usually chosen to be 1/n, n – no. of variables

n   variable step mutation – small step mutations are accepted with high probability 
    while large step mutations are accepted with low probability
           M[ loci ] = M[ loci ] + si * ri * pi , Î1,n
           n  siÎ{ -1, +1 } 
           n  ri = r * |Bi – Ai|,  r ~ 0.1,  M[ loci ] Î [Ai, Bi]
           n  pi =2-u*k,  uÎ[ 0, 1 ]

n   step-size adaptation mutation – n step-sizes / one direction / n-directions; do 
    not offer consistent improvements for extreme multi-modal / high noise 
    functions

n   ...
       Guided Mutation - Mutation + LS
  GA population
             ...                     ...



    ∂1 ∂2   ...    ∂i          ...         ∂n


Angle Mutation


    ∂1 ∂2   ...    ∂i '        ...         ∂n


            Local Search


              ∂'1 ∂'2                 ...        ∂'n
                          Optimized Individual
Termination Criteria

n   preset number of fitness function evaluations / 
    number of generations

n   termination once the evolution ceases

n   optimal value has been found (!!)

n   ...
Distance - Levenberg-Marquardt
                           Distance - Diversification

                              GA population                                Root Mean Squared Deviation
                                        ...                  ...
         CONVERGENCE




                                        ...   ∂i        ∂j   ...   ∂k


                                Compute Distances
         DIVERSIFICATION




                              between Conformations

                                          Levenberg-Marquadt
                                         RMSD(i,j) RMSD(i,k) RMSD(j,k)



                              Transform Conformations


                                        ...   ∂i        ∂j   ...   ∂k


§ Ananth Ranganathan, The Levenberg-Marquardt Algorithm, citeseer.ist.psu.edu/638988.html, June 2004
    Literature examples (A)
n   Dusan P. Djurdjevic, Mark J. Biggs, Ab Initio Protein Fold Prediction Using Evolutionary
    Algorithms: Influence of Design and Control Parameters on Performance, DOI 
    10.1002/jcc.20440, Wiley Interscience
     n   Generational Evolutionary Algorithm
     n   Steady-State Evolutionary Algorithm
     n   tournament selection, single-point/multi-point and uniform crossover, uniform distribution 
         mutation with angles within the range of 0-360, termination in case of lack of improvement 
         for a specified number of generations

n   Christopher D. Rosin University of California, San Diego, A Comparison of Global and Local
    Search Methods in Drug Docking, Proceedings of the Seventh International Conference on 
    Genetic Algorithms, ICGA, 1997
     n   Simulated Annealing – 100 tests with 1.5~1.8 billion function evaluations per test, linearly 
         reduced temperature
     n   Solis and Wet’s Algorithm (a class of local and global search algorithms) – part of the GA+LS 
         hybrid, deviations chosen from a normal distribution - does not require gradient information
     n   Genetic Algorithm+LS – local search applied to only 7% of the population in each generation; 
         stochastic selection, two-points crossover, Cauchy-deviate mutation
    Literature examples (B)

n   Garrett M. Morris et al, Automated Docking Using a Lamarckian Genetic Algorithm and an
    Empirical Binding Free Energy Function, Journal of Computational Chemistry, Vol. 19, No. 14, 
    1639-1662, 1998.
     n   Monte Carlo – applied on a pre-calculated grid of interaction energies
     n   Simulated Annealing
     n   Genetic Algorithm, Lamarckian Genetic Algorithm

     n   uniform random value initialization (-180.0, 180.0), 
     n   maximum number of generations/function evaluations, 
     n   proportional elitist selection, constant-size population,
     n   two-point crossover with breaks between genes, Cauchy distribution-based mutation, 
     n   local search at the end of each generation on a user-defined percentage of the population

     n   each method is given ~1.5 million function evaluations / up to 41.5 minutes on a 200MHz  
         Silicon Graphics MIPS with 128 Mb of RAM
       Outline

n   Protein Structure Prediction (PSP): modeling 
    and complexity analysis
n   Parallel Hybrid Metaheuristics for the PSP
    Problem
n   Grid experimentation on GRID5000
n   Conclusion and Future Work
     PARAllel and DIStributed Evolving Objects
     http://paradiseo.gforge.inria.fr/

                                ParadisEO for computational Grids
                                                         PVM, PThreads
                                MO          MOEO         MPI (LAM, CH)
                                                                             Condor-MW Globus
Evolving Objects
framework (EO)                                                         Transparent use
European project
                                   EO
(Geneura Team, http://eodev.sourceforge.net
                                                           §   Message passing (MPI, PVM) 
INRIA, LIACS)
                                                                §   Clusters, Networks of Workstations, 
                                                           §   Multi-programming (PThreads)
                                                                §   Shared Memory Multi-processors (SMP)
Our contributions                                          §   Parallel distributed computing
                                                                §   Clusters of SMPs (CLUMPS)
§   Multi-Objective EO (MOEO) for the design 
                                                           §   Grid computing
    of multi-objective evolutionary algorithms
                                                                § Condor-MW and Globus
§   Moving Objects (MO) for the design of local                   (MPICH-G2)
    search algorithms
§   ParadisEO for parallel hybrid metaheuristics 


 § S. Cahon, N. Melab and E-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and
 Distributed Metaheuristics. Journal of Heuristics, Elsevier Science, Vol.10(3), pages 357-380, May 2004.
          Contributions
     ParadisEO-EO (A framework of Evolutionary Algorithms)
                                                                 ParadisEO -MOEO
                                                                     Techniques
                                                                      related to
                                   ParadisEO-PEO                    multi-objective
                       (parallel and distributed metaheuristics)     optimization
ParadisEO-MO
     Solution-based         http://paradiseo.gforge.inria.fr
     metaheuristics
                                               Parallelism
                  Hill Climbing                (Partitioning solutions at several steps)

                  Simulated Annealing

                  Tabu search                  Cooperation/Hybridization
                                               (Algorithms exchange solutions)
                                               Ex. Island cooperation
              Development of hybrid metaheuristics

                                                                     Unifying view of the three parallel
                       Hybrid metaheuristic                              hierarchical levels
Independently cooperating              Encapsulated metaheuristics
     metaheuristics                                                  n   The deployment of concurrent 
                                                                         independent/cooperative 
                 High level             Low level
                                                                         metaheuristics


           Trivial                     Inheritance relationships     n   The parallelization of a single step 
                                                                         of the metaheuristic (based on 
                           ParadisEO
                                                                         distribution of the handled 
                 distributed multi-agent model
                                                                         solutions)
           Relay                            Coevolutionary

   Metaheuristics executed               Concurrently executing      n   The parallelization of the processing 
        in sequence                         metaheuristics               of a single solution




 E-G. Talbi, « A taxonomy of hybrid metaheuristics », Journal of Heuristics, 2002.
      Design of several levels of 
      parallelization/hybridization
                                           Processing of a single solution
                                           (Objective / Data partitioning)


  Independent walks,
    Multi-start model,
Hybridization/Cooperation
    of metaheuristics


                                           Parallel evaluation of
                                       the neighborhood/population

        Scalability

                 Heuristic   Population / Neighborhood          Solution
Parallel asynchronous hierarchical
Lamarkian GA
                   Cooperative GAs (Island
                          model)


      Parallel
   evaluation of
  the population

                                             Low-level co-
                                             evolutionary
                                             hybridization
Asynchronous Island Model
                                      E.A.




                                          E.A.
                       Migration




                                   E.A.
                E.A.
The Parallel Evaluation of the Population


E.A.



                    Solution

            Full
          fitness
           Parallel Hybrid Simulated Annealing
  Synchronous Multi-Start Model
                                        Generate S0
                                        k := 0
                                        while Tk > Tthreshold do
                                         for s:=1 to nbSamples do
                                              Srand := randomMove( S0 );
Parallel Neighbourhood Exploration            ΔE := eval( Srand ) - eval( S0 );
                                              if ΔE < 0 then
                                                S0 := Srand;
                                             else
                                               S0 := Srand with prob. 1.0 / ( 1.0 + eΔE/Tk);
                                            endif
                                        endfor
                                             k := k + 1; Adjust Tk;
                                        endwhile

   Gradient Local Search Optimization
      Outline


n   Protein Structure Prediction (PSP): 
    modeling and complexity analysis
n   Parallel Hybrid Metaheuristics for the PSP 
    Problem
n   Grid experimentation on GRID5000
n   Conclusion and Future Work
            Deployment of ParadisEO-G4
• Lille, Nice-Sophia Antipolis, Lyon, Nancy, Rennes: 400CPUs
                                      CLUSTER A                          1. Reservation of the desired nodes
                                                     M2
                                                           M2
• exclusive reservation – no interference may occur; the                 2. Select a master node for the Globus GRID
                                             M1                  M3
processor are completely available during the reservation 
                                               M1                  M3

time.                                             FRONTALE
                                                                         3. Configure the Globus GRID (certificates, 
                                                    FRONTALE
                                                                         user credentials, xinetd, postgresql, etc.)
                                                   M6            M4
                                                     M6            M4    4. Deployment and execution – MPICH-G2
                                                          M5
                                                            M5

                                                                               CLUSTER C
                              CLUSTER B
                                                                                 M2
                                                                                   M2

                                        M2                              M1                  M3
                                          M2                              M1                  M3

                              M1                  M3                           FRONTALE
                                M1                  M3                           FRONTALE

                                     FRONTALE                           M6                  M4
                                       FRONTALE                           M6                  M4

                              M6                  M4                              M5
                                M6                  M4                              M5

GRID5000: A fully reconfigurable grid! The configuration 
                                     M5
                                       M5
phase relies on the deployment of pre-built Linux « images » 
having Globus and MPICH-G2 already installed.
      1L2Y and α-Cyclodextrin




Triptophan-Cage –
Protein Data Bank ID: 1L2Y
                             α-β-γ Cyclodextrin
       Parallel asynchronous hierarchical
       Lamarkian GA - Cyclodextrin

§ Native energy at 242 kcal mol-1


MAX_GEN              100
POPULATION           300

CXOVER               0.95
CMUTATION            0.05
LS_XOVER             0.15
LS_MUTATION          0.05

XOVER                1.0
MUTATION             0.1

MIGRATION_R          15%
MIGRATION_STEP       5Gen
        Parallel Hybrid SA – 1L2Y &
        Cyclodextrin

§ the same number of individuals is sampled by both algorithms, GA and
SA

 TRYPTOPHAN-CAGE (1L2Y) GA vs. SA         CYCLODEXTRIN SA
                          Cyclodextrin Native Energy: 242.4 kcal mol-1
                          Tryptophan-cage Native Energy: 46.446 kcal mol-1



    Experimental Results
       Cyclodextrin             Min          Max           Avg       StDev

                GA             2470       5845.27       3790.56      708.54
                SA               
                            1029.14         4593        2359.26          281.2
       Random+LS               1204       6329.91        3132.8      800.85


                      GA+LS, no isl.      GA+LS             SA      SA+LS

       Cyclodextrin         264.599       164.973       1029.14     904.046
              1L2Y          93.5521       57.5447        201.37     86.3106


                                Min          Max           Avg       StDev
Random Search 1L2Y               
                            1012.54       13471.4       4420.91     1521.67

                      GA+LS, Island     GA+LS,No Ils.     GA+Isl.          GA

       Cyclodextrin          31m20s       29m32s            7m58s    7m24s
              1L2Y           29m57s       31m17s            7m47s    7m45s
      Outline


n   Protein Structure Prediction (PSP): 
    modeling and complexity analysis
n   Parallel Hybrid Metaheuristics for the PSP 
    Problem
n   Grid experimentation on GRID5000
n   Conclusion and Future Work
        Conclusion and Future Work (1)

n   The GA behaves better on the considered
    benchmarks, especially on Cyclodextrin
    n   Bellow native-conformation energy results were
        obtained
n   The SA results are comparable to the results
    obtained by the GA with no hybridization
n   The SA allows for little intrinsic parallelism …
n   … but, it is not getting easily trapped in local
    minima (like directed LS, i.e. gradient LS)
        Conclusion and Future Work (2)

n   Hierarchical multi-stage parallel models
    may prove to be efficient on the considered
    benchmarks
n   Hybridization schemes combining …
    n   … the GA (strong exploration capabilities)
    n   … and the SA (intensification + less prone to
        getting trapped in local minima than gradient
        methods)
ParadisEO-PEO Implementation
   ParadisEO-PEO Evolutionary Algorithm

  peoEA (                                       §   evaluation function - the designed class has 
       eoContinue< EOT >&       __cont,             to be derived from the peoPopEval class
       peoPopEval< EOT >&       __pop_eval,          §   peoSeqPopEval
       eoSelect< EOT >&           __select, 
                                                     §   peoParaPopEval
       peoTransform< EOT >&   __transform, 
       eoReplacement< EOT >& __replace 
  );                                            §   selection strategy – applied at each iteration 
                                                    in order to obtain the offspring population

  
  pop_eval ( population );                      §   transformation operators - crossover 
                                                    operator(s), mutation operator(s); peoEA 
  do {                                              requires a peoTransform derived object
       eoPop< EOT > offsprings;                      §   peoSeqTransform
       select ( population, offsprings );    
                                                     §   peoParaSGATransform
       transform ( offsprings );    
                                                §   replacement strategy – for integrating the 
       pop_eval ( offsprings );                     offsprings back into the initial population
       replace ( population, offsprings );
                                                §   continuation criterion - maximum number of 
  } while ( cont ( population ) );                  generations, checkpoints, etc.
ParadisEO-PEO EA 
Representation of the Individuals
#include <EO.h>
...

class Conformation : public EO< eoScalarFitness< double, greater< double > > > {
     ...
     int operator[]( const int& index ) const { 

            return chromosome->getChromo( index+1 ); 
     }
     ...

     Individu* chromosome;
     ...
     static Molecule* molecule;    
     static Hamiltonian* hamiltonian;    
     static Population* population;
};

extern void pack( const dockingAtGrid::Conformation& conformation );
extern void unpack( dockingAtGrid::Conformation& conformation );
ParadisEO-PEO EA 
Representation – Packing & Unpacking
                       void pack( const dockingAtGrid::Conformation& conformation ) {  
   Pack Data to Send


                           if ( conformation.invalid() ) 
                                     pack( (unsigned int)0 );  
                                                                                                  M
                           else {
                                     pack( (unsigned int)1 );                                     I
                                                                                                  D
                                     pack( conformation.fitness() );  
                           }  



                       }
                           for ( unsigned int index = 0; index < conformation.size(); index++ )
                                    pack( conformation[ index ] );                                L
                       void unpack( dockingAtGrid::Conformation& conformation ) {                 E
 Unpack Recv. Data




                                   eoScalarFitness<double, std::greater<double> > fitness;  
                                   unsigned int validFitness;  
                                                                                                  W
                                   unpack( validFitness );  
                                   if ( validFitness ) {
                                                                                                  A
                                                   unpack( fitness );
                                                   conformation.fitness( fitness );               R
                                   }  else {

                                   } 
                                                   ...                                            E
                       }
ParadisEO-PEO EA 
Random chromozome initialization
Random initialization for each of the active torsional angles with a value within a
   specified angle domain (typically 0..360).


#include <eoInit.h>
...

class ConformationInit : public eoInit< Conformation > {  
public:

     ...
     void operator()( Conformation& conf ) {

            for ( unsigned int i =1; i <= conf.molecule->getNvars(); i++ ) {

                  conf.chromosome->setChromo( i, eo::rng.random( ANGLE_DOMAIN ) );    
            }
     }

     ...
};
              Asynchronous Island Model
              // migrations will occur periodically at every N generations
              eoPeriodicContinue < Conformation > mig_cont ( MIGRATIONS_AT_N_GENERATIONS ); 

              // selection of the emigrant conformations is performed as a stochastic tournament
Parameters




              eoStochTournamentSelect < Conformation > mig_select_one;
              eoSelectNumber < Conformation > mig_select ( mig_select_one, NUMBER_OF_MIGRANTS ); 

              // migrations are merged with the population - the worse individuals are discarded
              eoDeterministicSaDReplacement < Conformation > mig_elit_replace ( 0.3, 0.0, 0.1, 0.0 );   

              // the migration stream will follow a ring topology
              RingTopology ring_topology;
Async. Ils.




              peoAsyncIslandMig < Conformation > island_mig ( 
                                    mig_cont, mig_select, mig_elit_replace, ring_topology, alg.population, alg.population 
              );    

              // to be activated at the end of every generation
              checkpoint->add ( island_mig ); 
EA




              peoEA < Conformation > eAlg (checkpoint, pop_eval, select, paraTransform, replacement );    
              island_mig.setOwner ( eAlg );
                The Parallel Evaluation of the
                Population
// functor for performing the evaluation of a specified conformation  
ConformationEval conformationEval;


// wrapper for the evaluation functor – in addition it offers parallel evaluation
peoParaPopEval< Conformation > pop_eval ( conformationEval );
peoParaSGATransform< Conformation > paraTransform ( crossover, XOVER_RATE, mutation, MUTATION_RATE );
…
peoEA < Conformation > eAlg ( checkpoint, pop_eval, select, paraTransform, replacement );


                    <?xml version="1.0"?>
                    <schema>
                                 <group scheduler="0">
 XML Grid Mapping




                                              <node name="0" num_workers="0">               SCHEDULER
                                              </node>
                                              <node name="1" num_workers="0">
                                                          <runner>1</runner>                ALGORITHMS
                                                          ...
                                              </node>
                                              <node name="2" num_workers="1">
                                              </node>
                                              <node name="3" num_workers="1">
                                                                                            WORKER NODES
                                              ...
                                 </group>
                    </schema>
End of story...

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:11/26/2013
language:Unknown
pages:54