Statistics and Data General discussion The scope is not just advanced statistics such as MC design - also the use of basic core statistics e.g. experimental design of perturbed physics - even undergraduate statisticians could add value here. Need to consider aleatory uncertainty in hazard itself and quantifying or qualifying epistemic uncertainty. Statistical tools for presenting uncertainty include e.g. integration and alternatives, probabilistic tools for aleatory uncertainty and sensitivity analysis for epistemic uncertainty Different natural hazards communities have differing levels of statistical sophistication. Due partly to insufficient stats communication/collaboration - but also due to reframing questions (accounting for different lead times, different ways to account for social aspects etc - hope to address this with taxonomy). Statisticians don’t merge themselves in the applications of statistics enough. Can we transfer probabilistic methods from flood risk management to other areas? e.g. landslides and wildfires generate a lot of data but don't use/analyse it that much. The remit – to fill gaps in each hazard need to look at: 1. Training needs 2. Focus on 'call' view (i.e. NERC may use parts of study as a call) 3. Ways to undo bottlenecks to transfer good practice from one area to another Could look at providing widely applicable toolsets. Toolsets don't have to address the whole hazard problem all at once – can just be parts of it. Statisticians can have insight into transferability of these. There is a need to look at how statisticians relate to the insurance/reinsurance/economic sectors (reinsurance particularly relevant). Insurance is an example of an area where the regulator has demanded quantification where historically people in the industry had lots of experience but possibly not much in the way of quantitative skills. Organising insurance markets is an important part of mitigation, and the system is not working well. It would be good to come up with innovative market mechanisms. NERC are currently funding a scoping study in reinsurance. Alex McNeil and Paul Embrecks - wrote a book, very active in this area. But they don't seem to have much data (Danish fires). Open- source cat curve models. Florida has started one - is this the same as AGORA? There is rarely data on the social and psychological consequences/losses arising from natural hazards since family-breakups etc caused by natural hazards can arise even years after the event. Good Practice Examples For data access/preservation, the British Atmospheric Data Centre (BADC) http://badc.nerc.ac.uk/home/ (NERC’s other data centres too? http://www.nerc.ac.uk/research/sites/data/). Are there equivalents in natural hazards? 1. British Geological Survey http://www.bgs.ac.uk/data/databases.html 2. Philip Brohan mining met data from ship logbooks http://ams.allenpress.com/archive/1520-0477/90/2/pdf/i1520-0477-90-2- 219.pdf 3. Sheldus, http://webra.cas.sc.edu/hvri/products/sheldus.aspx, is a natural hazards events and losses database for the US, however not best practice as not accurate e.g. Hurricane Katrina had zero casualties, other example of casualties being in the negative. 4. FLoAT, http://badc.nerc.ac.uk/data/free/float.html, intended to collate a variety of data collected during the June and July 2007 Flood events in the UK. 5. Roger Pielke Jr and team would go and look at implications of disasters, e.g. effects on infrastructure after disasters http://sciencepolicy.colorado.edu/about_us/meet_us/roger_pielke/ Environmental Mathematics and Statistics, http://www.nerc.ac.uk/research/programmes/mathsstats/ , existing/past joint programme between NERC and EPSRC. The programme supported: 6 discipline- bridging awards, 5 fellowships, 35 studentships, 10 workshops / courses. Should be repeated (perhaps with more interdisciplinary post-docs). NB. Noted that "Best practice" is diverse for different problems - better to say something like "best practice alternatives"? Recommendations to NERC Recommend that NERC to fund stats rather than just building bridges with EPSRC, especially where the stats is specifically relevant to natural hazards but doesn’t have wide applications elsewhere. He draws a comparison with engineering, where gaps (in statistics?) are funded, also points out MRC funds stats. This would help writing research proposals i.e. submit stats proposals to NERC, with NERC scientists, rather than to EPSRC. Areas of stats that NERC should fund research into which are specifically relevant to natural hazards include: 1. Spacio-temporal modelling of extremes. Lots of UK expertise in this area. Approach would be "tool making" with hazard application. Maths embedded in the environmental research area. 2. Computer experiments and experimental design (much is medical). 3. Structural error, better ways of quantifying structural uncertainty (model structural error toolset would be useful). 4. How you observe systems you can’t control (e.g. hydrology flow gauge can't be used everywhere). Roy Stats paper on spatial modelling and measurements Not sure of reference (from Paul Bates / Jim Hall?). Two forthcoming in J Roy Stat C: C. Keef, J. Tawn and C. Svensson. Spatial risk assessment for extreme river flows. E. C. Tassone, M. L. Miranda and A. E. Gelfand. Disaggregated spatial modelling for areal unit categorical data. 5. Tail dependence a. Multivariate threshold exceedance. b. Normal has zero tail-dependence, for other distributions it is significant. c. Links to Wall St? 6. Micro correlations and their detection a. In aggregate variables (e.g. sums) correlations grow quickly, but don't see correlation in individual variables. b. e.g. climate change may introduce correlations between widely disparate things – such as through global sea level rise everywhere. c. Relevant to changing boundary conditions. 7. Semi-categorical data and quantifying uncertainty therein a. Data that are not strictly continuous: b. A result of ground people taking lots of measurements e.g. wildfires, landslides c. People tend to discretise small measurements more than large measurements. d. Difficult to analyse - need statistical techniques to assess uncertainty. Analyses like Maximum Likelihood start failing with this type of data - but it can be difficult to spot this effect, especially for users. 8. Sensitivity analysis which is not model intensive/with complex models a. Uncertainty in parameters, forcing function, initial conditions b. Does uncertainty make a difference to decision choices? c. How to do this for complex, slow models? Complex models have many inputs and outputs and are also computationally expensive. d. Might want to be more structured than simply sampling uncertainty: i.e. perturb values efficiently to estimate first and second order effects. e. Various ways of doing this, cheap and expensive, up to joint uncertainty analysis. f. GRC group in Italy has done a lot of work on this. But mostly model intensive. g. Morris method? h. DOE (design of experiment) methods? i. Pseudo-random and quasi-random numbers for sampling j. Can use sensitivity analysis for screening and to extract importance measures post-hoc – even for several hundred outputs. Nuclear industry has carried out multi-person multi-year studies. 9. New statistical methods to handle non-stationarity a. Hydrological community – a Science commentary on this a few months back. People have been choosing to ignore it. b. Where is the boundary between stationarity and non-stationarity? i.e. degree of weak stationarity. Information about degree of stationarity comes from observations and models. In hydrology, generally take single parameterisation and observational data. But other sources of uncertainty not represented. c. Recommendations? Stats may not be able to help much with this! But no-one else can either. "New statistical methods for non-stationarity"? "Tests for non-stationarity"? e.g. tests for structural breaks. Depends on scale. d. Not just a problem in hydromet hazards but also in landslides, wildfires, earthquakes e. Hard to find statistics literature on basic assessment of stationarity, correlations, clustering. Particularly difficult when many zeroes between data: is it unequally sampled, or unequally occurring? What is stationary if your second moment doesn't exist? Powerful methods exist: e.g. multi-resolution wavelet methods (correlation across different frequencies - generalised to non-stationarity) - but purely descriptive. Large body of literature in statistical physics on this - but rather sloppy and could use good input on statistical and physical processes. f. Models are supposed to deal with non-stationarity. Possible subtraction of trends with physical models (mechanical filtering) – but not if hazard can’t be modeled physically, e.g. earthquakes. Useful to apply to footprint though? 10. How to estimate probabilities from ensembles? a. People seem to be comfortable integrating over uncertainty in parameters (using perturbed physics ensembles, PPEs) b. But for IPCC multi-model ensemble there are two camps - whether to unify into representative result or leave as separate results. How much do we trust policy makers to understand? Will they just unify the models themselves? Incredibly difficult area to handle. c. Probabilities from PPEs - should these be used for forecasting? PPE should be a calibration exercise, where information from observations outweighs initial choices. And it is essential to account for structural error otherwise the calibration is meaningless/misused. d. Is there really a distinction between structural and parameter uncertainty? Can transform one into the other. Useful device for users (essentially code alteration Vs input alteration) but perhaps don't stick to this distinction doggedly. Toxicology community use "quantitation" instead of quantification! Need to be careful about terminology and meaning. NERC have started funding many of these areas in statistics (not specifically in natural hazards), but in Knowledge Transfer projects with companies - 20-30 projects were started early 2009. There are no KT calls anymore, since a few months ago, so could recommend this is brought back. NERC should support data rescue and enhanced access to data through data mining and data repositories. Data is often unavailable, especially commercially sensitive data. Also very important to preserve the metadata; documentation of context, conditions, limitations, meaning. Information post-disaster is often not accessible or analysed. We would need the data to be maintained – mistakes corrected where found etc – which could be expensive. (Would NERC consider this under their remit? BADC gets core funding).