professional documents
home
Upload
docsters
Upload
Powerpoint

The Evolution of Statistics center doc

educational

The Evolution of Statistics


Visions: The Evolution of Statistics Edward J. Wegman Center for Computational Statistics “Prediction is very hard, especially when it‟s about the future” Yogi Berra (Italian-American Philosopher and Baseball Player) Three Scientific Revolutions of the Twentieth Century • Quantum Revolution - Unlocking the Secrets of the Atom • DNA Revolution - Unraveling the Secrets of Life • Computing Revolution - Extending Human Intellect • Michio Kaku, CUNY Physicist and Futurist in his book VISIONS The Fourth Scientific Revolution of the Twentieth Century • Statistical Theory and Data Analysis – A major impact on the ability of the Earth‟s societies to care for the billions of people and trillions of dollars that currently inhabit the planet – Modern societies could not exist without this 4th scientific revolution An Unpopular and Perhaps Radical View • Mathematical statistics as we know it is essentially a completed theory – The general principles are well-developed and contemporary statistical theory of traditional parametric and nonparametric methods is largely a solved problem – That is not to say there are no valuable niches yet to be developed nor that the application of statistical methodology will not continue for the foreseeable future – Mathematical statistics occupies much the same position as Newtonian mechanics. It is a tremendously valuable tool that will continue to be applied in many practical settings and in the framework of which new specialized methods will continue to be developed Whither Statistics ? • What is the future of statistics and what are the new tools and techniques for statistics? – Statistics is essentially a computational science, it is about numbers and counts and measurements and exploring these numbers and counts and measurements in order to make inferences – The computing revolution is leading in my view to a statistical revolution. Perhaps not with statistics as we conventionally understand statistics, but with data analysis and inference – There is a reasonable prospect that data analysis and inference will be subsumed under the larger mantle of computer science and that what we understand today as statistics will become a subset of a larger data analysis and inference enterprise How Data Are Changing TRADITIONAL STATISTICS Small to Moderate Sample Size I.I.D. Data Sets One or Low Dimensional Manually Computable Mathematically Tractable Well Focused Questions Strong Unverifiable Assumptions in Relationships (linearity; additivity), in Error Structures (normality) Statistical Inference Predominantly Closed Form Algorithms Statistical Optimality COMPUTATIONAL STATISTICS Large to Very Large Sample Size Nonhomogeneous Data Sets High Dimensional Computationally Intensive Numerically Tractable Imprecise Questions Weak or No Assumptions in Relationships (nonlinearity); in Error Structures (distribution free) Structural Inference Iterative Algorithms Possible Statistical Robustness Table 1. Comparison of Traditional and Computational Statistics How Data Are Changing DESCRIPTOR Tiny Small Medium Large Huge Massive DATA SET SIZE IN BYTES 102 104 106 108 1010 1012 STORAGE MODE Piece of Paper A Few Pieces of Paper A Floppy Disk Hard Disk Multiple Hard Disks e.g. RAID Storage Robotic Magnetic Tape Storage Silos Table 2. The Huber Taxonomy of Data Set Sizes How Data Are Changing • Consider for example an O(n2) clustering algorithm applied to a massive data set. This would require O(1024) computations which on a teraflop computer (1012 computations per second) would require 1012 seconds or approximately 105 years. Clearly this is prohibitive. Standard ethernet operates at a maximum of 10 megabits per second. That same massive dataset would require 106 seconds or somewhat more than 1 month to transfer over standard ethernet operating at maximimal efficiency. The human eye contains approximately 107 cones. Even with the visualization capability of one observation per cone our eyes would be hopelessly overloaded. A massive dataset would require us to visualize 105 observations per cone. • • How Data Are Changing • If gigabyte and larger data sets and O(n3/2) complexity algorithms are problems, then to what extent are these factors appearing in real data? The answer is that they appear fairly commonly. Airline booking transactions, point-of-sale commercial purchases, bank transactions, and telephone call records are just a few such commercial databases that one might wish to exploit. In the scientific realm, catalogs of celestial objects, data from satellite remote sensing of the Earth, text and multimedia data exploitation from internet usage, ultrasound nondestructive evaluation data, radar data used in air traffic control, and image understanding and exploitation are just a few examples for which data quickly accumulate into the terabyte and higher range. • • How Statistics Is Evolving Statistics, The Guardian of the Scientific Method or Statistics: The Tool for Analyzing Data How Statistics Is Evolving • The focus of our discipline I believe should be on data and inferences to be made from data. • If the nature of data is changing, then the methods for analyzing and making inferences from that data must correspodingly change. • Many traditional methods are extremely valuable and will continue to be employed for the foreseeable future. • However, new data types require new methods and techniques. How Statistics is Evolving • • So how will statistics as a methodology and a discipline evolve? I believe many of the traditional dichotomies will become anachronisms. The Bayesian versus classical perspective will essentially disappear. Both of these approaches tend to refer to parametric techniques; these are poor at coping with really large scale data. Nonparametric versus parametric techniques still refer to model-based views of data. For many purposes models are unnecessary if the data speak in such a compelling fashion. If the data are not collected according to probabilistic sampling then both parametric and nonparametric statistical models are essentially irrelevant except as a heuristic tool. • • How Statistics Is Evolving I hope statistics as a discipline will embrace a larger view of the field and will take data, rather than methodology, to be the fundamental common denominator of the discipline. With this view, not only traditional statistics and probability are the focus of the discipline, but also topics like data mining, scientific visualization, image analysis, pattern recognition, databases, and related computational methods become the fundamental features of the discipline. Data Mining: An Issue for Statisticians – “Despite … somewhat lofty definitions, DM so far has been largely a commercial enterprise. As in most gold rushes of the past, the goal is to „mine the miners‟. The largest profits are made by selling tools to the miners, rather than doing the actual mining. The concept of DM is used as a device to sell computer hardware and software.” – Jerry Friedman, 1998 Data Mining: An Issue for Statisticians Data mining is exploratory data analysis with little or no human interaction using computationally feasible techniques, i.e., the attempt to find interesting structure unknown a priori. - Wegman, 1997 Data Mining: An Issue for Statisticians • Traditional Statistical Tools – classification and clustering – neural networks and genetic algorithms – CART – nonparametric regression – time series: trend and spectral estimation – density estimation, including the estimation of bumps and ridges Data Mining: An Issue for Statisticians • Other Tools – machine learning – pattern recognition – thinning and binning – data visualization • scintillation • saturation brushing • grand tour Computing, Networks, Distributed Data, Data Access • Metadata Center (MdC) – Automated Creation of Metadata – Query and Search • • • • Client Browser Expert System for Query Refinement Search Engine Reporting Mechanism The Evolution of Statistics This talk was the keynote talk at the Conference entitled New Techniques and Technologies in Statistics 98 sponsored by Eurostat, the IASC and the ISI. It was videotaped and a streaming video version is available locally at ftp://www.galaxy.gmu.edu/pub/papers/keynote.asx (keynote.asx is a redirector file and points to the streaming video server in Italy. The Windows Media Player plugin is required.) The Evolution of Statistics The paper entitled “Visions: The Evolution of Statistics” will be published in the journal Research in Official Statistics. It is available at ftp://www.galaxy.gmu.edu/pub/papers/visionstheevolutionofstatistics.pdf. This powerpoint file is available at ftp://www.galaxy.gmu.edu/pub/papers/EvolutionStatistics.ppt
flag this doc
222
13
not rated
0
7/13/2008
English
Preview

Math Stats

steph777 6/27/2008 | 58 | 4 | 0 | educational
Preview

Ten questions to ask your biology teacher about evolution

neophyteblogger 8/25/2008 | 68 | 1 | 0 | BUZZ
Preview

Evolution of a Program

NASSdocs 6/17/2008 | 139 | 0 | 0 | legal
Preview

Evolution

iamgod 9/24/2007 | 2563 | 0 | 0 | creative
Preview

Evolution

telekenetix 5/16/2008 | 106 | 6 | 0 | educational
Preview

History and Technological Evolution

dargen 4/25/2008 | 103 | 1 | 0 | educational
Preview

Evolution: Evidence of Change

dargen 4/25/2008 | 125 | 2 | 0 | educational
Preview

How Creationists Explain Evolution - SCARY

presentor 6/30/2008 | 125881 | 87 | 9 | educational
Preview

Evolution Petroleum

hartenergy 6/10/2008 | 384 | 4 | 0 |
Preview

Evolution PowerPoint

steph777 6/26/2008 | 296 | 7 | 0 | educational
Preview

The experience evolution

whatidiscover 7/9/2008 | 126 | 12 | 0 | creative
Preview

evolution of the digestive system

CrisologaLapuz 7/15/2008 | 83 | 0 | 0 | educational
Preview

evolution of the digestive system

CrisologaLapuz 7/15/2008 | 81 | 0 | 0 | financial
Preview

Evolution and Ethics

anonymous 7/16/2007 | 313 | 4 | 0 | educational
Preview

the-evolution-of-ipods

lifeadvice 12/22/2007 | 105 | 0 | 0 |
Preview

Creative pics in pics

DanaG 10/10/2008 | 30 | 1 | 0 | creative
Preview

Amazing Beach pics

DanaG 10/6/2008 | 186 | 6 | 0 | creative
Preview

The Prayer Booth

DanaG 10/6/2008 | 991 | 0 | 0 | creative
Preview

Amazing Wolves

DanaG 9/27/2008 | 427 | 7 | 1 | creative
Preview

How to Make Your House Stand Out

DanaG 9/27/2008 | 85 | 3 | 0 | creative
Preview

How To Make a Human Smiley Face for Google Earth

DanaG 9/25/2008 | 4413 | 4 | 0 | creative
Preview

The Worst Financial Crises of the 20th Century

DanaG 9/22/2008 | 1595 | 5 | 0 |
Preview

The Greatest Military Photographs .ppt

DanaG 7/13/2008 | 14442 | 51 | 0 | creative
Preview

The Weirdest Buildings in Japan

DanaG 7/13/2008 | 17483 | 23 | 1 | creative
Preview

Weird Buildings

DanaG 7/13/2008 | 1485 | 13 | 0 | creative
 
review this doc