VIEWS: 24 PAGES: 5 POSTED ON: 2/24/2010 Public Domain
METHODS AND BENEFITS OF SIMPLIFICATION IN SIMULATION Roger J. Brooks Department of Management Science The Management School Lancaster University Lancaster LA1 4YX U.K. roger.brooks@lancaster.ac.uk Andrew M. Tobias Operational Research Group School of Manufacturing and Mechanical Engineering University of Birmingham Birmingham B15 2TT. U.K. a.m.tobias@birmingham.ac.uk ABSTRACT Given the widespread acceptance of the importance of simplicity in modelling, the scarcity of research into simplification is surprising. The model used affects all aspects of the project and there are many advantages of a simple model. However, often simplification is not attempted, and on the (misguided) assumption that more detailed models are necessarily more accurate and therefore better, the time available is used to build the most complex model possible. This paper briefly describes a number of case studies that were carried out in which models at different levels of detail were built. In each case, the ways in which the original model was able to be simplified are set out along with the benefits that can be obtained by using the simple model. The general factors influencing the extent to which models can be simplified are then discussed. In particular, where all that is required in terms of output is averages, it may be possible to reduce the model to such a simple version that an analytical solution becomes feasible and the simulation redundant. In other circumstances it may be best to build both a complex model for validation and a simplified model for the experimentation. INTRODUCTION The assessment of the performance of a model The choice of model used in a project is essential to should cover its impact on the whole project, with the success of the project. However, little guidance the aim being to use the model which gives the best is available about how to make this choice. A overall performance. We have previously set out the modelling project can usually be split into the steps following 11 elements of model performance which of problem formulation, collection and analysis of are discussed in Brooks and Tobias [1996]: data, model formulation, model construction, verification and validation, experimentation, analysis of results, conclusions and implementation of the recommendations. The initial choice of model Results constitutes model formulation and this step, along 1. The extent to which the model output describes with problem formulation, stand out as being the behaviour of interest (whether it has regarded as more of an art than a science. By adequate scope and detail). contrast, the other steps have been the subject of 2 The accuracy of the model's results. considerable research resulting in a much more 3. The ease with which the model and its results scientific approach. For example, well established can be understood. statistical techniques can be applied to the Future use of the model collection and analysis of data, validation, 4. The portability of the model and the ease with experimentation and the analysis of results. Where which parts of the model can be reused in the model takes the form of a computer program, future models. software methodologies for the efficient Confidence in the model (verification, validation construction and testing of the model may be able and credibility) be used. Even for problem formulation, techniques 5. The probability of the model containing errors. such as Checkland's [1981] soft systems 6. The accuracy with which the model output fits methodology have been developed. the historical data. 7. The strength of the theoretical basis of the poppy, Papaver rhoeas [Lawrence et al., 1978]. The model including the quality of input data (the pollen from the plants is dispersed by bees and the credibility of the model). plants have a genetically-based mechanism that Resources required prevents plants pollinating themselves. The gene 8. The time and cost to build the model exists in a number of different types, called alleles. (including data collection, verification and Each plant contains two alleles although the pollen validation). grains produced by the plant each have just one of 9. The time and cost to run the model. these alleles at random. When a pollen grain is 10. The time and cost to analyse the results of the deposited on a plant, if the pollen grains allele model. matches either of the plant's alleles then a reaction 11. The hardware requirements (such as computer occurs that prevents the pollen from fertilising the memory) of running the model. plant. The aim of the modelling was to investigate the frequency distribution of the alleles in a typical The relative importance of each of these elements P. rhoeas population in steady state. will vary in different projects. The actual performance of a given model will also partly The simulation model was built in FORTRAN. A depend on how it is used in the project, with trade- preliminary simulation of a simplified idealised offs existing between some of the elements. population had been built [Lawrence et al., 1994] but its results were not consistent with samples from Simplification is central to modelling since any natural populations [Campbell and Lawrence, 1981; model is a simplification of the object it represents. Lawrence and O'Donnell, 1981]. Several more The choice of model can, in one sense, be viewed as realistic assumptions were therefore built into the choosing the most appropriate simplification. model including realistic probability distributions However, the concepts of simplicity and complexity for pollen and seed dispersal, the ability of seed to in modelling are rarely defined. We will take lie dormant in the soil for several years and two complexity to be some combined measure of the sizes of plant. The sizes of real plants vary amount of elements, inter-connections and considerably but, in particular, a proportion of calculations included within the model [Brooks and plants tend to be very much larger than the Tobias, 1996]. With respect to the model remainder of the population and to produce much performance elements, there are a number of more seed and pollen. The main output variable potential advantages of a simpler model compared recorded was the variance of the allele frequencies. to a more complex one. The simpler model would A comparison of models with and without the often be expected to be easier to understand, to realistic factors enabled their effects to be assessed. contain fewer errors and be quicker and easier to Variation in plant size and seed dormancy had a build, run and analyse. However, these relationships considerable effect on the allele frequency variance will not always apply [Brooks and Tobias, 1996] whereas the inclusion of realistic seed and pollen dispersal distributions (compared to uniform The arguments for simplicity in Management dispersal) had only a small effect. The detailed Science modelling were considered by Ward results are contained in Brooks et al [1996]. [1989], who, in particular, suggested several reasons why a client may prefer a simple model and so is In order to identify other possible models, attempts more likely to implement the results. The time were made to simplify the simulation model. available for the decision making process is often Analysis of the mechanisms contained in the model severely limited and a simple model gives quicker using some statistical analysis resulted in the results which can be more readily assimilated. This development of a simple analytical model. This allows the client more time to consider the expresses the allele frequency variance as a simple alternative courses of action and more time for function of the two most important factors, variation implementation. It also requires less effort on the in plant size and overlapping generations [Brooks et client's part. The results from a simple model are al, 1997a,b]. The analytical model results closely likely to be less specific than from a complex model matched those from the simulations. Alternative and so may allow more scope for incorporating the scenarios can easily be investigated with this model client's knowledge and preferences. Clients may and a number of further theoretical results were also associate a quick model with the modeller developed. Three intermediate model types were having a good understanding of the problem, and a identified between the analytical and the simulation simpler model can be more easily explained by the models. client to third parties. WHEAT This paper describes several modelling projects in which simplification of existing models was The SIRIUS wheat model [Jamieson et al., 1998a] is attempted and discusses the implications of the a mechanistic model, based on the known experiences. phenology of wheat, that has been successfully tested in a variety of conditions [Jamieson et al., POPULATION GENETICS 1998b]. The model simulates the growth of wheat on a daily basis using input data for the This model was built to simulate the population characteristics of the wheat variety and soil genetics of the self-incompatibility gene in the field conditions as well as daily temperature, precipitation and solar radiation values. The growth The projects described here illustrate some of the of the roots, leaves and then the grain are simulated benefits that may be obtained from a simple model. as well as the internal processes that determine the The main advantage gained in these projects is a timing of the different stages of development. A soil much greater understanding of the system. This sub-model is also included that determines the should always be a primary objective in a simulation amount of water and nitrogen in the soil. If these project since pure black box prediction is only of amounts become too small, the growth rate of the limited value as it is difficult to apply the lessons plant is reduced. The main variable of interest is learned in the future. A sound understanding of the usually yield. system, on the other hand, can be used and adapted even when circumstances change. Previous work The combination of SIRIUS and stochastic weather has indicated that the size of the model and number generators [Semenov et al., 1998] have been used to of connections may be much more important than predict the effect of climate change on wheat the complexity of the detailed calculations in farming in the U.K. at the site scale [Wolf et al., determining ease of understanding [Brooks, 1996]. 1996]. The aim of the current modelling work was to extend this to the regional and national scale. As There can also be unexpected benefits in that the part of this work, a detailed sensitivity analysis was insights obtained may be able to be further applied. carried out of SIRIUS. Combined with analysis of In the population genetics case, for example, the the internal mechanisms within the model this again analytical model was used to derive several led to the development of a much simpler model interesting additional results such as the expected which estimates yield using four simple equations. number of alleles that could be supported by a given The relationships that make up the simple model population. Similarly, the simplified wheat model were used as the basis of the upscaling was used to develop the upscaling methodology by methodology. identifying conditions under which yield would be constant. MANUFACTURING SYSTEMS Analysis of results is also made much easier. Two discrete event simulation models of Sensitivity analysis can be easily carried out and manufacturing systems were examined, one built alternative scenarios easily investigated. This is several years ago as part of a student project and particularly the case for the analytical models. For one built by the particular company. Both models example, the effective capacity equations allow were built using the WITNESS simulation package changes in the bottleneck to be immediately [Lanner]. The objective, in each case, was taken to obtained when the parameters of one of the be to identify ways of improving the throughput of machines is changed. the line, although the original objective was not clear for the student project model. Both models On the other hand, it can be difficult to obtain were very detailed and both contained errors. The sufficient confidence in a simple model. Certainly, a complexity of the student project model meant that model that omits an important factor will tend to it only ran very slowly. result in a seriously flawed understanding. Complex systems may require a complex model [Bunge, For each model, a revised model was built initially 1963]. For both the population genetics and wheat by correcting the errors and simplifying the model models the confidence in the simplified models enough that it was feasible to run. Analytical models comes because the results match well withthose of were then developed for both models by calculating the detailed models. Both simplified models exclude the effective capacity of each machine and certain factors which might have been thought to be identifying the bottleneck. These calculations took important without detailed testing and analysis of into account set-ups, breakdowns, rejected parts and the original simulation model. the relative proportion of parts processed by each machine. The students project model contained no It is also useful to consider the simplification stochasticity and the analytical model was able to processes here. In the first two examples, it would reproduce the simulation model throughput exactly. have been very difficult to derive either the For the other model, a close match was still simplified population genetics or wheat models obtained. without first building the simulation model. This is not just because the models ignore some factors but DISCUSSION also because of the complexity of the situation being modelled. In each of the cases, the output variable Greater computing power and ease of use of model of interest was either a steady state average or an building software has meant that it is tempting to accumulated value. This tended to allow some of the include a lot of detail in simulation models. Tilanus processes within the model to be averaged, ignored [1985] found that too much complexity is often or treated by statistical analysis. However, care is given as a reason for failure of Management Science required in averaging the processes in a complex and Operational Research projects, with the use of a model. Modelling is conducted within a particular simple model commonly mentioned in the reasons experimental frame [Zeigler, 1976] and so models for the success. built for other purposes such as detailed analysis of queue lengths may be harder to simplify. The simplified models will tend to have more limited validity and be less portable than the detailed models. Brooks R.J. and Tobias A.M. 1996, "Choosing the best model: level of detail, complexity and model Considerable effort tends to be required for the performance". Mathematical and Computer simplification process [Rexstad and Innis, 1985]. Modelling, 24(4), Pp 1-14. This certainly applied to the population genetics and wheat models where the simplified models were Brooks R.J. Tobias A.M. and Lawrence M.J. 1996, produced by sensitivity analysis and examination of "The population genetics of the self-incompatibility the detailed workings of the model. Simplification polymorphism in Papaver rhoeas. XI. The effects of of existing models is risky since there is no limited pollen and seed dispersal, overlapping guarantee of success. However, the process itself, generations and variation in plant size on the even if unsuccessful, should still enhance variance of S-allele frequencies in populations at understanding. It is also a useful undertaking in equilibrium". Heredity, 76, Pp 367-376. helping to verify the model [Rexstad and Innis, 1985]. Brooks R.J. Tobias A.M. and Lawrence M.J. 1997a, "Time series analysis of the self-incompatibility Despite its importance, simplification has received polymorphism. 1. Allele frequency distribution of a relatively little attention in the modelling literature. population with overlapping generations and Zeigler [1976] distinguished four ways of variation in plant size", Heredity, 79, Pp 350-360.. simplifying a discrete event simulation model, namely dropping unimportant parts of the model, Brooks R.J. Tobias A.M. and Lawrence M.J. 1997b, replacing part of model by a random variable, "Time series analysis of the self-incompatibility coarsening the range of values taken by a variable polymorphism. 2. Frequency equivalent population and grouping parts of a model together. Innis and and the number of alleles that can be maintained in Rexstad [1983] listed seventeen simplification a population", Heredity, 79, Pp361-364. techniques for general modelling which they categorised under the modelling steps of hypotheses Bunge M. 1963, The Myth of Simplicity: Problems (identifying the important parts of the system), of Scientific Philosophy, Prentice-Hall, Englewood formulation (specifying the model), coding Cliffs, N.J. (building the model) and experiments. Courtois [1985] discusses scaling issues that can lead to Campbell J.M. and Lawrence M.J. 1981, "The model decomposition. However, none of these population genetics of the self-incompatibility contributions constitutes a methodology. polymorphism in Papaver rhoeas. II. The number and frequency of S-alleles in a natural population CONCLUSIONS (R106)", Heredity, 46, Pp81-90. There can be considerable benefits in using a simple Checkland P.B. 1981, Systems Thinking, Systems simulation model. Often this can only be Practice, John Wiley and Sons, New York. accomplished by first building a more detailed model and then attempting to simplify it. This is Courtois P.-J. 1985, "On time and space both because it would be difficult to derive the decomposition of complex structures", simple model from the problem definition and Communications of the ACM, 28(6), Pp 590-603. because it would not be possible to obtain sufficient confidence without first building the detailed model. Innis G. and Rexstad E. 1983 "Simulation model Although a number of authors [e.g. Pidd, 1998] simplification techniques", Simulation, 41(1), Pp7- advocate starting with a simple model and gradually 15. adding detail, the simplification of the resulting model is a different process since the simplified Jamieson P.D. Semenov M.A. Brooking I.R. and model will often have a different form to the Francis G.S. 1998a, "Sirius: a mechanistic model original. The simplified model gives greater of wheat response to environmental variation." understanding and ease of analysis whereas the European Journal of Agronomy, 8, Pp161-179. detailed model is required for cross validation to give sufficient confidence [Murdoch et al.,]1992]. Jamieson, P.D. Porter J.R. Goudriaan J. Ritchie J.T. The process of simplification is risky because it is Keulen H. and van Stol W. 1998b. "A comparison time consuming and may not lead to an acceptable of the models AFRCWHEAT2, CERES-Wheat, simplified model. There is a need both for a Sirius, SUCROS2 and SWHEAT with simplification methodology and for a better measurements from wheat grown under drought", understanding of the circumstances in which Field Crops Research, 55, Pp23 - 44. simplification is likely to be successful. Lanner Group Ltd., WITNESS User Manual, REFERENCES Redditch, Worcestershire, U.K. Brooks R.J. 1996, "A Framework for Choosing the Lawrence M.J. and O'Donnell S. 1981, "The Best Model in Mathematical Modelling and population genetics of the self-incompatibility Simulation", Ph. D. Thesis, University of polymorphism in Papaver rhoeas. III. The number Birmingham. and frequency of S-alleles in two further populations (R102 and R104)", Heredity, 1981, 47, Pp53-61. Lawrence M.J. Azfal M. and Kendrick J. 1978, "The genetical control of self-incompatibility in Papaver rhoeas", Heredity, 40, Pp239-253. Lawrence M.J. O'Donnell S. Lane M.D. and Marshall D.F. 1994, "The population genetics of the self-incompatibility polymorphism in Papaver rhoeas. VIII. Sampling effects as a possible cause of unequal allele frequencies", Heredity, 72, Pp345- 352. Murdoch, W.W. McCauley E. Nisbet R.M. Gurney W.S.C. and De Roos W.M. 1992, "Individual-based models: combining testability and generality", In Individual-based Models and Approaches in Ecology: Populations, Communities and Ecosystems (edited by D. L. DeAngelis and L. J. Gross), Chapman and Hall, New York, Pp. 18-35. Pidd M. 1998, Computer Simulation in Management Science, 4th Edition, Wiley, New York. Rexstad E. and Innis G.S. 1985, "Model simplification – three applications", Ecological Modelling, 27(1-2), Pp1-13. Semenov M.A. Brooks R.J. Barrow E.M. and Richardson C.W. 1998, "Comparison of the WGEN and LARS-WG stochastic weather generators for diverse climates", Climate Research, 10, Pp 95-107. Tilanus C.B. 1985, "Failures and successes of quantitative methods in management", European Journal of Operational Research, 19, Pp170-175. Wolf J. Evans L.G. Semenov M.A. Eckersten H. and Iglesias A. 1996, "Comparison of wheat simulation models under climate change. I. Model calibration and sensitivity analyses". Climate Research, 7(3), Pp 253 - 270. Zeigler B. P. 1976, Theory of Modelling and Simulation. John Wiley, New York. BIOGRAPHY Roger Brooks is a lecturer in Management Science at Lancaster University. Before coming to Lancaster he worked on an E.C. funded project investigating the effects of climate change on European agriculture. He received a Ph.D from Birmingham University and a B.A. in Mathematics from Oxford University. He is also a chartered accountant.