Document Sample

Review of Probability and Statistics in Simulation (2) 1 In this review • Use of Probability and Statistics in Simulation • Random Variables and Probability Distributions • Discrete, Continuous, and Discrete and Continuous Random Variables - “Mixed” Distribution • Expectation and Moments • Covariance • Sample Mean and Variance • Data Collection and Analysis • Parameter Estimation • Properties of a “Good” Estimator ------------------------------------------------------------------------------------- • Simulation data and output stochastic processes • Two Types of Statistics in simulation output • Distribution Estimation • Confidence Intervals (CI) • Run Length and Number of Replications 2 Four Properties of a “Good” Estimator (1) • Unbiasedness – An unbiased estimator has an expected value that is equal to the true value of the parameter being estimated, i.e., E[estimator] = population parameter – for mean E[XI] = E[Sx2] = 2 – but E[Sx] - the square root of a sum of #’s is not usually equal to the sum of the square roots of those same #’s 3 Four Properties of a “Good” Estimator (2a) • Efficiency – The net efficient estimator among a group of unbiased estimators is the one with the smallest variance – Ex: Three different estimators’ distributions 1, 2, 3 based on samples 2 3 of the same size 1 Value of Estimator Population Parameter – 1 and 2: expected value = population parameter (unbiased) – 3: positive biased – Variance decreases from 1, to 2, to 3 (3 is the smallest) – Conclusion: 2 is the most efficient 4 Four Properties of a “Good” Estimator (2b) • Efficiency (-continued) – Relative Efficiency: since it is difficult to prove that an estimator is the best among all unbiased ones, use: Variance of first estimator Relative Efficiency Variance of secondestimator – Ex: Sample mean vs. sample median Variance of sample mean = 2/n Variance of sample median = 2/2n Var[median] / Var[mean] = (2/2n) / (2/n) = /2 = 1.57 – Therefore, sample median is 1.57 times less efficient than the sample mean 5 Four Properties of a “Good” Estimator (4) • Sufficiency – A necessary condition for efficiency – Should use all the information about the population parameter that the sample can provide - take into account each of the sample observations – Ex: Sample median is not a sufficient estimator because only ranking of the observations is used and distances between adjacent values are ignored 6 Four Properties of a “Good” Estimator (4) • Consistency – Should yield estimates that converge in probability to the population parameter being estimated when n (sample size) becomes larger – That is, when n , estimator becomes unbiased and the variance of the estimator approaches 0 – Ex: X/n is an unbiased estimator of the population proportion i.e., X/n is a consistent estimator of p Variance: Var[X/n] = 1/n2 Var[X] = 1/n2 (npq) = pq/n (since X is binomially distributed) When n , pq/n 0 7 Two Types of Statistics • Statistics based on observations (observational data) – Concerned with the value of each observation but not the time at which these observations are made – Collected on a given number of observations – Observation: Often an “entity” - any object of interest – Value to be observed: Duration of certain activities e.g., Customer (entity, one observation for each entity) Waiting time (value observed) • Statistics on time-persistent variables (time-dependent statistics) – Variables that have values defined over time (not any single observation) – Collected over a given period of time e.g., Number of customers waiting in line 8 Formulas for Sample Mean and Sample Variance Statistics based Statistics for time on observation persistent variables I x T Sample i 1 i 0 x(t )dt mean X I I X T T I x T x (t )dt 2 I 2 2 Sample i X 2 I 2 0 2 i 1 variance S x I 1 S x T X T • Another useful statistics: coefficient of variation Sx/XI • Formally, estimates that specify a single value (parameter) of the population are called point estimates, while estimates that specify a range of values are called interval estimates 9 Distribution Estimation • Use collected data to identify (“fit”) the underlying distribution of the population • Approach – Assume the data follow a particular statistical distribution - Hypothesis – Apply one or more goodness-of-fit tests to the sample data - Inference (see how parameters are estimated) • Commonly used tests: Chi-Square test and Kolmogorov-Smirnov test – Judging the outcome of the tests - If fit (under a specified level of statistical significance) 10 Statistical Inference • Variability of simulation outputs should be considered • Confidence Interval (CI) – Point estimates: Single parameters – Interval estimates: A probability statement to specify the likelihood that the parameter being estimated falls within prescribed bounds – Simulation (to estimate population mean ): By Central Limit Theorem, the sample mean XI is approximately normally distributed for sufficiently large I (independence is not a necessary condition for CLT) 11 Confidence Interval (CI) • Assume XI is normally distributed, then the statistic: Z = (XI - )/X is a random variable that is normally distributed with a mean of zero and standard deviation of one – X(, 2) Z(0, 1) standard normal distribution – P [-Z/2 < Z < Z/2] = 1 - where Z/2 is the value for Z such that the area to its right on the standard normal curve equals /2 1- -- “level of significance” /2 0 /2 12 Confidence Interval (CI) • So, we can assert that with probability 1 - that: XI - Z/2 X < < XI + Z/2 X that is a proportion 1 - of confidence intervals based on I samples of X should contain (cover) the mean C.I. XI - Z/2 X XI XI + Z/2 X • Note: – I , 1 - ( ) bigger sample size, the more confident, but runs longer – (1 - ), I Less confident, less the number of required simulation runs 13 Confidence Interval (CI) • The above formula assumes knowledge of the standard deviation of the mean X which is usually unknown • If use the sample standard deviation of the mean SX to estimate X , can develop a similar relationship using the statistic: t = (XI - )/SX where t is a random variable having a student t-distribution with I - 1 degrees of freedom • Hence a 1 - confidence interval for is: XI - t/2 SX < < XI + t/2 SX C.I. XI - t/2 SX XI XI + t/2 SX ? - never known! • If the sample Xi are IID - X 2 2 S 2 S2 X I X IX and S X X I S X IX 14 Hypothesis Testing • Establish Null Hypothesis H0 – Based for comparison (statistical inference) – No significant change is present – Simulation: base model (baseline) - “as is” • Alternate Hypothesis H1 (or Ha) – Changes to the base model (deviation from the base model - can be one-sided or two-sided) • Experiment – A systematic approach that uses test statistics to signify statistical whether H1 should be accepted or rejected – H0 is the status quo, so burden of proof is on H1 - “Innocent until proven guilty” 15 Hypothesis Testing • Ex: H0: average waiting times of using rule A and rule B are the same H1: average waiting times of using rule A is less than that of using rule B - one-sided test (greater - one-sided; not the same - two-sided) – A two-scenario case • Two alternatives - Pairwise Comparison – More than two alternatives • A vs. B, B vs. C, C vs. A - Analysis of Variance (ANOVA) 16 Two Types of Errors The true situation maybe: H0 is True H0 is False Accept H0 Correct Decision Incorrect Decision (Reject H1) (Type II Error) Reject H0 Incorrect Decision Correct Decision (Accept H1) (Type I Error) • The probability () of a Type I error (Type II error) – level of significance of the test • Ex: An 1 - confidence interval for is XI - t/2 SX < < XI + t/2 SX 17 Some Statistical Problems in Simulation • Initial Conditions (IC) & Data Truncation – Most simulation start with the system “empty and idle” – Need to “warm-up” the system - to reach a steady state – Statistics of system performance only collected after warm-up period – How to determine - mostly empirical or use a “long” period before truncating the statistics 18 Run Length and Number of Replications • Deciding on the trade-off • A few long runs – Better estimate of the steady state mean because fewer initial bias – But variance may increase due to a reduced sample size • Many short runs – May have bias due to starting conditions – But variance may decrease 19 Run Length and Number of Replications • How long to run – A given time period • Convenient by sample sizes may vary • Statistics on observations – A given number of entities that enter the system • System ends “empty and idle” • Statistics on time-persistent variables – A given number of entities that depart the system • System not ending “empty and idle” • Useful especially when routing is complex, e.g., rework – Automatic stopping rules • Simulation results (statistics collected) monitored closely (periodically) • Stop simulation once a prescribed criteria (often accuracy) is satisfied • An implementation - the batch mean method 20 Number of Replications • When estimating the variance of an output variable X by replication method – X ~ N(, 2) – The number of independent replications required to attain a specified confidence interval for X is given by 2 t / 2, I 1 S X I g Where g is the half-width of the desired CI g - how accurate - how confident I - variable – Implementation of the formula is iterative - because I must first be assumed (a few runs, say 5 or 8) to obtain initial values of t & SX – Then test the sufficiency of the initial assumption and determine additional number of replications 21 Number of Replications • Practical use 1. Select (arbitrarily) a few runs - initial I 2. Compute SX 2 3. If t / 2, I 1 S X I g Then make additional runs, go to step 2 with an updated I, otherwise stop • Two key concepts – Confidence interval - the range g – 22

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 8/31/2012 |

language: | English |

pages: | 22 |

OTHER DOCS BY hcj

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.