Document Sample

EPS Output A Forecaster’s Approach 8/30/2010 EPS Training Edmonton 1 Outline EPS problems for the meteorologist A simple conceptual model Re-phrasing what we did yesterday Ensemble examples Uncertainty Clustering using Principal Component Analysis 8/30/2010 EPS Training Edmonton 2 Where does the MT fit? Project Phoenix has demonstrated that by focusing on meteorology and not on models in the first 18 to 24 hours, it is very easy to show huge improvements over first-guess SCRIBE forecasts. The impact on day 2 is uneven. How do we determine the point where the forecaster’s analysis and diagnosis no longer adds value? 8/30/2010 EPS Training Edmonton 3 Find the ensemble of the day Already have trouble marrying reality and model outputs from a handful of models after that initial time. What do we do when confronted with output from 10, 20, 100 ensembles? Kain et al (2002) showed that forecasters may not have a lot skill at determining the “model of the day”. How does the forecaster, if this is true, decide on which of potentially dozens of ensembles to select? 8/30/2010 EPS Training Edmonton 4 Information Bottleneck Front end Vast amounts of output that must be disseminated, visualized, analyzed, … Back end Once WE know what’s going on, how do we express that to the public? WeatherOffice Public forecast SCRIBE 8/30/2010 EPS Training Edmonton 5 Do Users Want Determinism? We assume that users want uncertainties spelled out in the forecast. What if all they want is to know whether it’s going to rain tomorrow? Can I go to the beach 8/30/2010 EPS Training Edmonton 6 A New Tool When you get a new tool, the first place you go is the owner’s manual There isn’t one for EPS. We need to write one. That means that the meteorologists MUST get involved. This is not just a Services issues We are users of these outputs Just as public clients are consulted, so should the meteorologists 8/30/2010 EPS Training Edmonton 7 A Thought Experiment Take a bag and put in ten pieces of paper, each numbered 1 through 10. Ask ten people to draw a piece of paper from the bag, but before they do so, ask them what number they think they’ll draw. 8/30/2010 EPS Training Edmonton 8 A Thought Experiment If 5 out of the 10 say that they think the number will be 3, does that mean that there’s a 50% chance that the number drawn will be 3? 8/30/2010 EPS Training Edmonton 9 Model Space vs. Real Space Real Space, R Model Space, M RM The forecaster’s role: evaluate R M then take the necessary steps to maximize that area. 8/30/2010 EPS Training Edmonton 10 Model Space vs. Real Space Reliability is desired It cannot be assumed Links between the two spaces have to be forged Statistical post-processing Based on past performance. Past performance does not necessarily extend to the current situation. Analysis and diagnosis 8/30/2010 EPS Training Edmonton 11 An Example 8/30/2010 EPS Training Edmonton 12 A Joke… Did you hear the joke about the lost Swiss mountaineers. Completely confused, they reach the top of a peak and one of them takes out his map and compass and triangulates on three nearby peaks. One of his partners anxiously asks him, "Do you know where we are?" "Yes," says the triangulator. "See that mountain over there? We're right on top of it." If the model and reality disagree, it might be a good idea to go with reality 8/30/2010 EPS Training Edmonton 13 The whole basis for creating EPS in the first place is the notion that when you perturb the model’s initial conditions, play with its physics and parameterizations, and alter boundary conditions, if there are any, you get different solutions from the model. In deterministic modeling there are no other solutions. You get one to work with. The distribution of the solutions is a delta function. 8/30/2010 EPS Training Edmonton 14 The solution PDF 0.25 Better solution In reality, there are an infinite 0.2 number of solutions that fall into Our Solution? some unknown distribution. We 0.15 Our Solution? don’t know it modality, its height and width, whether it’s skewed or 0.1 Our Solution? not. This distribution changes from 0.05 model run to model run and at each step down the timeline. 0 0 5 10 15 20 We don’t know where our one deterministic solution fits within this distribution. We assume that it’s in a favourable part of the distribution, but that need not be the case. There is no reason that reality must appear within this distribution. We hope that it will because our models are pretty good, but it doesn’t have to!! 8/30/2010 EPS Training Edmonton 15 Sampling the underlying PDF That’s what we’re attempting to do with EPS: sample the underlying distribution. If we can capture the nuances of the underlying distribution by generating multiple solutions, we can make some statements about probabilities and uncertainties. Only about the solutions, though. We can say nothing about reality!! 8/30/2010 EPS Training Edmonton 16 Some Statistics Consider a random sample taken from an unknown distribution. It turns out that the maximum likelihood estimator for the mean is the sample’s mean. The sample of the underlying PDF represented by the ensembles is not random, yet research has shown that, over time, the ensemble mean is the better solution. The maximum likelihood estimator for the variance is proportional and very nearly equal to the sample variance, though it tends to under-forecast the true variance. The ensemble spread tends to be under-dispersive, behavior we expect from the sample variance. 8/30/2010 EPS Training Edmonton 17 Ensemble Pathways (Modes) Think of ensemble solutions as pathways down the timeline. When all the solutions are tightly packed (i.e. they have a low variance) we can say that the ensembles are favoring a single pathway; the individual ensembles are moving down the same path but some move down the centre of the path, some down the right side, some down the left, and some meander along it. If all the ensemble members follow the same path, we can say that there is a 100% probability that the real solution is following the same path. 8/30/2010 EPS Training Edmonton 18 The Fork in the Road What happens when the paths branch? What if 9 members of a 10 member ensemble go down the right-hand path and only 1 goes down the left? There’s a 90% chance of the model solution going down the right path, and a 10% chance of it going down the left. The trap waiting for the forecaster is that he may well take the most simplistic option, blindly following the right path because more of the ensembles are taking it, when in fact, the outlier on the left path might be the most interesting simply because of its extreme nature. 8/30/2010 EPS Training Edmonton 19 The River Delta Now imagine the case where each ensemble follows a different path, like a river delta. Each ensemble, no matter how extreme, has an equal chance of being the correct one. This is the rub for the forecaster. On any given day, each ensemble member has same probability of occurring as the others. They are all based on the same rules of physics. It is only by looking at their output in terms of pathways that we can realistically talk about probabilities. 8/30/2010 EPS Training Edmonton 20 8/30/2010 EPS Training Edmonton 21 8/30/2010 EPS Training Edmonton 22 8/30/2010 EPS Training Edmonton 23 8/30/2010 EPS Training Edmonton 24 From Biswas et al, 2006 Hurricane Katrina Costliest and one of the five deadliest hurricanes First landfall near the border of Miami- Dade county and Broward county Final landfall near Louisiana / Mississippi border Around 1400 fatalities 8/30/2010 EPS Training Edmonton 25 8/30/2010 EPS Training Edmonton 26 8/30/2010 EPS Training Edmonton 27 8/30/2010 EPS Training Edmonton 28 Usefulness of the Ensemble Spread If you watch charts of the ensemble spread, a pattern emerges: a lot of the spread occurs in areas where we know that models will have difficulty Strong gradients Rapidly moving systems Essentially any area with strong spatial or temporal gradients. 8/30/2010 EPS Training Edmonton 29 Uncertainty Without the assumption of reliability… Uncertainty is really the degree of agreement, or the lack thereof, among the various ensembles. From the pathway POV, the more pathways that exist through model space, the more we are unsure of what the model is really telling us. Uncertainty is then measured by the pathway spread and the probability that the pathway will be well traveled 8/30/2010 EPS Training Edmonton 30 10 Member Ensemble Pathways Uncertainty All ten members following 1 path. Ensemble Low spread gives information about width of path. 2 paths. 9 members following 1 one path, Still low. Most of the time the outlier will be the other a second path. Pathway spread an outlier. Have to check to make sure. becoming important. If the spread is small, not much of a problem. As it increases, so does our uncertainty. 2 paths. 5 members going down each. Our Moderate. How do we evaluate the two? uncertainty grows, especially if the pathway spread is large. 10 paths, one member going down each. High. All bets are off. Uncertainty is maximized if the pathway spread is large. In this case, the ensemble mean and the pathway mean are the same. 8/30/2010 EPS Training Edmonton 31 Where do we add value? Time Forecaster Ensembles Short-Term Meteorology dominates Can play a role by identifying alternate pathways that the meteorologist can through on-going analysis explore or by supporting the analysis and diagnosis. and diagnosis that he has done Medium-Term Application of analysis and Statistically post-processed forecasts would be driven off the ensemble diagnosis is becoming mean. Higher probability pathways limited. would be favoured, but there would still be opportunities for the meteorologist to explore lower probability outliers and intervene when necessary Long-Term Very limited intervention by Ensembles dominate. the meteorologist, except to quantify uncertainty 8/30/2010 EPS Training Edmonton 32 Managing the data stream SPC meteorologists have a tremendous workload. In PNR, we forecast for 52% of the country This area gets more severe weather than almost all the other regions combined. We start with the worst SCRIBE forecasts in the country We do it with 2 people sliding, one in Winnipeg and the other in Edmonton. How can we successfully integrate EPS output into the SPC, given its high maintenance, when workloads are already so high? 8/30/2010 EPS Training Edmonton 33 Reducing Dimensionality Many statistical methods for accomplishing this Cluster Analysis Tubing Bayesian Techniques Factor Analysis Principle Component Analysis While they use different approaches, they all attempt to identify statistically significant pathways, or modes 8/30/2010 EPS Training Edmonton 34 Principle Component Analysis Definition: a procedure for transforming a set of correlated variables into a new set of uncorrelated variables. This transformation is a rotation of the original axes to new orientations that are orthogonal to each other. The blue lines are the two principle components. Note that they are orthogonal to each other 8/30/2010 EPS Training Edmonton 35 How do we calculate them? To find the principle components in any dataset, you need to find the Eigenvalues and Eigenvectors of its covariance or correlation matrix The Eigenvectors and their individual factor loadings define how to transform the data from x, y to the new coordinate system. 8/30/2010 EPS Training Edmonton 36 Eigenvalues and Eigenvectors Consider the square matrix A . We say that λ is an eigenvalue of A if there exists a non-zero vector x such that Ax = λx. In this case, x is called an eigenvector (corresponding to λ), and the pair (λ ,x) is called an eigenpair for A. 8/30/2010 EPS Training Edmonton 37 What Kind of Matrix? The matrix we use for calculating the eigenvectors and eigenvectors can be a number of different things A matrix of correlation coefficients A matrix of covariances I construct a covariance matrix. The matrix gives a measure of the how interrelated the members are. The matrix is real and symmetric Element (1,2) is equal to element (2,1) and so-forth The diagonals are variances of each member The size of the matrix is the number of ensembles 8/30/2010 EPS Training Edmonton 38 Variance and Covariance The variance is really a special case of the covariance and is the covariance of a variable with itself 8/30/2010 EPS Training Edmonton 39 Once the Eigenvalues and Eigenvectors are calculated The Eigenvectors and their individual factor loadings define how to transform the data from x, y to the new coordinate system. We rank the Eigenvectors in order of decreasing Eigenvalue The Eigenvector with the highest Eigenvalue gives the first principle component, the next highest gives us the second PC, etc. The Eigenvalues are also the variances of the observations in each of the new coordinate axes. 8/30/2010 EPS Training Edmonton 40 What we end up with … We've extracted a set of principle components from our ensemble output These are orthogonal and are ordered according to the proportion of the variance of the original data that each explains. The goal is to reduce the dimensionality of the problem by retaining a (small) subset of factors. The remaining factors are considered as either irrelevant or nonexistent (i.e. they are assumed to reflect measurement error or noise). 8/30/2010 EPS Training Edmonton 41 PC Retention The number of PC's to retain is a non-trivial exercise and there is no single method that is entirely successful. Retaining too few PC's results in under-factoring and a loss of signal. Retain too many and noise creeps back in (under-filtering) and you also increase computation times. Keeping in mind that the simplest approach is often the best, I use the Kaiser/Guttman criterion.. The normalized eigenvalue should be between 0 and n (the number of members in the ensemble). Since we cannot reduce the dimensionality of the problem to anything less than 1, we use this as the criteria: we retain only those PC's that have eigenvalues > 1. Each PC can be thought of as a pathway through model space. The amount of variance explained by each component gives us a measure of how well traveled the path is. It also provides a measure of when we need to move from a deterministic framework to a probabilistic one. 8/30/2010 EPS Training Edmonton 42 PCA Concerns PCA explores the linear relationships in the data.. Non-linear factors are not considered. This shouldn't be a problem since we're running the algorithm on specific fields (i.e. We're looking at msl pressures, 500 mb heights, QPF's). There might be a concern if we were comparing 500 mb heights and QPF's (and you can do that with PCA techniques) Sometimes higher order components are difficult to interpret physically (how do you interpret a negative QPF, for example). Since noise is shunted into the higher PC's, each successive component will be more and more noisy. 8/30/2010 EPS Training Edmonton 43 Varimax Rotation One lingering problem is that it becomes increasingly difficult to put successive PC's into physical terms. How do you interpret a QPF value that might end up being negative after a coordinate rotation? Our principle components do not exist in real space, but in component space and we need to describe what we see there in physical terms. The solution is to perform yet one more coordinate rotation, this one intended to maximize the variance between each PC: a so-called Varimax Rotation Developed by Kaiser in 1958 The goal is to obtain a clear pattern of factor loadings characterized by high loadings of some factors and low loadings of others. 8/30/2010 EPS Training Edmonton 44 Unrotated and Rotated Factor Loadings Variable Factor 1 Factor 2 WORK_1 0.654384 0.564143 For the unrotated case, the factor WORK_2 0.715256 0.541444 loadings are all of approximately the WORK_3 0.741688 0.508212 same for the first PC. HOME_1 0.63412 -0.563123 For the second, you have a mixture HOME_2 HOME_3 0.706267 0.707446 -0.572658 -0.525602 of positive and negative values Expl.Var 2.891313 1.791 After rotation, some factors are much Prp.Totl 0.481885 0.2985 closer to zero in one PC and they are maximized in the other and vice- versa. All are now positive. Variable Factor 1 Factor 2 Since the individual factor loadings WORK_1 0.862443 0.051643 are now different, so are the WORK_2 0.890267 0.110351 Eigenvalues. They are much closer WORK_3 0.886055 0.152603 together. HOME_1 0.062145 0.845786 HOME_2 0.10723 0.902913 The rotated PCs may not be HOME_3 0.140876 0.869995 orthogonal anymore, so we can no Expl.Var 2.356684 2.325629 longer say that they are uncorrelated, Prp.Totl 0.392781 0.387605 but at least we can interpret them. 8/30/2010 EPS Training Edmonton 45 8/30/2010 EPS Training Edmonton 46 8/30/2010 EPS Training Edmonton 47 8/30/2010 EPS Training Edmonton 48 Cool Facts About PCA Ensembles The principle components map out the relevant ensemble pathways through model space. If there is only one PC (i.e. one pathway), that PC is little different than the ensemble mean. This is good behavior since we know that the ensemble mean does produce forecasts that are less wrong. We don’t want solutions that show that the ensemble mean has no merit. Differences between the two are likely due to noise: the mean has it, the PC has it stripped out. Situations where there are more than one PC have multiple pathways and the ensemble mean should not even be considered. Careful here … too few ensembles may lead to a single PC when more ensembles may produce more PC’s. The variance explained by each PC gives a measure of how “well-traveled” the pathway is. The PCA analysis should tell the forecaster immediately when to and when not to use tools like the mean. 8/30/2010 EPS Training Edmonton 49

DOCUMENT INFO

Shared By:

Categories:

Tags:
eps file, eps files, encapsulated postscript, eps format, bounding box, eps graphics, postscript output, adobe indesign, quarkxpress users, postscript printer, vector files, page printer, free mps, studio artist, xtensions module

Stats:

views: | 13 |

posted: | 8/30/2010 |

language: | English |

pages: | 49 |

OTHER DOCS BY kig45481

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.