Spatial Exploratory Data Analysis (SEDA): Maps, charts and statistical relationships Overview • Spatial Exploratory Data Analysis – Introduction • EDA • Spatial Analysis • SEDA – Maps – Distributions – Relationships • Outline of 2nd computer workshop • Requirements for 1 page report • NOT a review of statistics!!! Introduction: Doing research • Research is a process. It is iterative. It is messy and full of uncertainty and false steps. • „Scientific process‟ (inductive method – from empirical data to theory): – what are you interested in knowing? – how can this be formalised (hypotheses) – how can this operationalised (testable model) – gather the most appropriate data – validate the model (data analysis - this may or may not include formal statistical testing) – „laws‟ (firm, generalisable conclusions) -> Theory – in reality you will often use both induction and deduction • Application of formal statistical techniques (excel, Minitab, SPSS, SAS, GEODA,..) are just a small part of this process Exploratory Data Analysis • “Exploratory data analysis is an attitude, NOT a bundle of techniques” (Tukey 1977). • “Let data speak for themselves” • “Get a feel for the data” • Basically inductive (from data to hypotheses, theory,…) • Characteristics: Concentration on graphical procedures Stem and Leaf Diagram Spatial analysis • In essence geographical problems are about human activities which vary in space • This does not mean that we should ignore things that do not vary spatially! • But we are „experts‟ in looking at a phenomena spatially • We need to find out if there is or is not any spatial pattern • The essence of testing a geographical hypothesis is to find out whether or not there is any plausible reason why the phenomena varies in space Spatial analysis • Quantitative analysis can tell us if patterns we see are (statistically) significant (not “right” or “wrong”, only degrees of uncertainty) • Is a trend in our sample „real‟ or is it just a chance occurrence? • Positive and negative relationships are interesting • No discernible spatial causes (i.e. not statistically significant patterns) are also interesting because this will guide you to inquire further • Does where something happens influence why and/or how it happens? Exploratory Spatial Data Analysis • Exploring spatial patterns through maps, histograms, boxplots, scatterplots… • Identify outliers • Find “hotspots” • Formulate hypotheses • Look for statistical relationships between variables • Search for spatial spillovers Describing distributions • Mean • Standard deviation • Variance • Skewness • Kurtosis • Median • Quartiles, Percentiles • Inter-quartile Range (IQR) • Maximum, minimum values Looking for outliers: Boxplot Number of observations (London Wards) Outliers Hinge (1.5 times IQR) 75th percentile Median IQR 25th percentile Hinge (1.5 times IQR) Variable name: Percent students Looking for outliers: Histogram Mapping Linked windows / brushing Distributions and mapping BOXMAP Percent White British Standard Deviation Map Distributions and mapping Percent Bangladeshi Percent White British Distributions and mapping Relationships: Choosing a statistic depends on the questions you ask • What is A like? • Is A similar to B? • Is A different from B? • How much is A better/worse/different than B? • Are A and B related? [correlation] • Does A affect B? • Does A cause B? [regression] Are A and B related? • For example, you might want to know if the level of illness in an area is associated with poverty • Is there a relationship between health and wealth? • Do increasing poverty levels lead to increasing ill-health? • Do the variables co-vary consistently across space? • Easiest approach is to graph one variable against each other and look for associations • Association is seen in the pattern of points • Simplest pattern to spot and analyse is a linear relationship (i.e. resembles a straight line), although relationships could be curvilinear Is there a relationship? How strong? Does one cause the other? Identifying relationships with scatterplots Strong positive Strong negative Random Negative No relationship? Positive Relationship between variables (y = dependent; x = independent) Long term unemployment share = Constant + 0.3162*no qualification share + Error term Correlation between unemployment rate and long- term share of unemployed (standardized data) Correlation Coefficient now, how to objectively measure the strength and direction of these relationships? What is correlation? 0.73 • Correlation statistics allow you to measure the strength and direction of a association between two variables • Correlation provides a single number (correlation coefficient) that summarises level of variation between points (It is a standardised measure of covariance) • If a relationship is found, variables are said to be correlated • Useful for description, but also inferential (significance) Types of correlation Data type Nominal Ordinal Interval/ Ratio Display 2-way table 2-way table Scatterplot Direction Not applicable Sign of Sign of Pearson or Spearman correlation correlation (Spearman if not linear normal) Strength Size Cramer’s Size Spearman Size of Pearson Correlation V or lambda correlation (Spearman if not linear normal) Test Pearson, chi Test if Test if Pearson r = 0 square or Spearman rho (Test spearman r if non- Fisher’s exact =0 normal) Correlation • Assumes a linear association between variables • Pearson‟s correlation coefficient (known as r) is most commonly used correlation measure of linear relationships between 2 variables – (Spearman‟s rank correlation for non-linear ordered relationships) • Statistic measuring relationships between variables of interval (continuous) data, (e.g. census) • Census variables are interval data. the values are continuous, ranging from 0 - maximum • Generally put the „explanatory‟ (independent) variable as the x-axis • The variable you want to „explain‟ (dependent) is on the y- axis How to interpret Pearson’s correlation? measure is of how tightly the points cluster around an imaginary straight line through the scatterplot • r is ‘dimensionless’ number and can only be between 1 and -1 – an r of 1 = perfect positive relationship – an r of -1 = perfect negative relationship – an r of 0 = indicates no relationship Rule of thumb for interpreting the the magnitude of r Negative Description Positive Range Range 0.00 None 0.00 extent to which points -0.19 - -0.01 ‘Very weak’ 0.01 - 0.19 cluster tightly around the -0.39 - -0.20 ‘Weak’ 0.20 - 0.39 straight line -0.69 - -0.40 ‘Modest’ 0.40 – 0.69 -0.89 - -0.70 ‘Strong’ 0.70 – 0.89 -0.99 - -0.90 ‘Very strong’ 0.90 – 0.99 -1.00 Perfect 1.00 Significance testing • Can test to see whether the r is statistically significant • Key is the size of r and the size of sample • Seeking to reject the null hypothesis that the correlation coefficient is zero • Pearson‟s correlation coefficient can be tested only if both variables are normally distributed • (If not – test Spearman‟s correlation coefficient) • Look up r against a table of critical values for given degrees of freedom. If bigger, can reject H0 • Statistics package will report a p-value, a measure of significance • If p-value is less than 0.05 the correlation is significantly different to zero (with 95% certainty) • Can also use a t-statistic. again checking if critical value is exceeded Correlation limitations • With big sample sizes, almost everything is significantly related in purely statistical terms • Only works with linear relationships • Correlation is not causation. A high r may mean any one of these : – A causes B – some other factor causes A and B – B causes A – its just chance. another sample will be different • Need to use your knowledge, experience and common-sense as to likely underlying process. Is the relationship what you expect? Is it plausible? • Correlation is only concerned with the direction and strength of the relationship between values of two variables • Regression analysis determines the nature of that relationship and enables us to make predictions from it Statistics Sense • Danny Dorling‟s „Five Rules‟ • “If you have been concerned about your insecurities with statistics, don‟t be - you are normal - just try to use a few more simple facts to strengthen your arguments and try to feel less intimidated about the complex methods.” • 1. often there is little point in using statistics • 2. If you do use statistics make sure they can be understood • 3. do not overuse statistics in your work • 4. If you find a complex statistics useful, explain it clearly • 5. recognise and harness the power of statistics in geography • (Source: Chapter 21, “Using statistics to describe and explore data” in Clifford and Valentine (2003) Key Methods in Geography) Further reading • Danny Dorling‟s chapter in Clifford and Valentine (2003) Key Methods in Geography (chapter 21, “Using statistics to describe and explore data”) • 2 good stats books without equations! – Derek Rowntree (1981) Statistics without Tears: An Introduction for Non-Mathematicians (Science MATHEMATICS L5 ROW) – Michael Wood (2003) Making Sense of Statistics: A Non- Mathematical Approach (Main) • I strongly recommend David Ebdon, Statistics in Geography (Science GEOGRAPHY D 62 EBD) • Peter Rogerson (2001), Statistical Methods for Geography (Science GEOGRAPHY D 62 ROG) • Kenneth Berk and Patrick Carey, Data analysis with Microsoft Excel (Bartlett ARCHITECTURE BA 4.2 BER)

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 14 |

posted: | 7/22/2011 |

language: | English |

pages: | 31 |

OTHER DOCS BY liaoqinmei

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.