Document Sample

Crash Course in Elementary Statistical Methods Aim: understanding the relationship between two or more variables. Examples: (1) how technology, institutions, education or health is affecting growth of output and income in the country; (2) how income changes within the household affects the children’s schooling attainment; (3) how more funding to schools affects the childrens' performance; (4) how better health affects the nutrition of poor people etc... Elementary Statistical Methods Regression analysis is a statistical technique that allows the exploration of possible interrelationships between variables. Variables Assume we have two variables: x and y, and we want to study the relationship between them. x = annual income in the family y = school enrollment of children within the family 1 Data Data: may be collected on different levels, i.e. family, village, district, country etc... Data Cross-sectional data: observations collected at the same point in time but across different units (families, villages, countries etc) Data Time-series / Panel data: observations collected for the same unit but over different time periods: 2 Empirical Analysis Example: Estimating the effect of income on educational attainment Cross-sectional data on income and enrollment alone may not be enough There might be important (unobserved) differences that might obscure the "pure" effect of income on enrollment. OR, we might have excluded some variables in the regression that might be correlated with family income and which have its own effect on children's enrollment → biased effect. Empirical Analysis Biases caused by for example: Families with higher income => more progressive parents => bias the result upwards… Control for parental attributes => effect of income on education would be smaller Summary statistics The mean: The sum of all observations of the relevant variable divided by the total number of observations (n) : 3 Summary statistics The variance: We want to know whether the different observations lie more or less close to the mean (i.e. whether they are clustered together) or far from it (i.e. whether they are dispersed). => add up the all differences of the observations from the mean. Summary statistics The standard deviation. It makes the units comparable to those in which the variable originally was measured: Summary statistics Correlation: Our main goal is to understand whether two (or more) variable move together, i.e. whether they covary. 4 Summary statistics Interpretation of covariance: yi exceeds its mean, and xi exceeds its mean as well, then the covariance will be positive. Similarly, if xi tends to fall short of its mean when when yi exceeds its mean, then covariance will be negative. Regression analysis We are interested in finding out the form of the relationship between variables x and y not just in whether they are correlated. We want to study the marginal impact of x on y: by how much does an increase in x appear to affect y? This is the general question in regression analysis. Preliminary test: Scatter diagrams Regression analysis First: decide which is the "causal" variable and which is the variable that is affected by the movements of the "causal variable". Convention: let x stand for the causal variable = independent variable let y stand for the dependent variable. 5 Regression analysis Second: Construct a diagram in which you have the independent variable on the horizontal axis and the dependent variable on the vertical axis. Look at the scatter plot and see study the potential relationship. Scatter plot The basics of regression Suppose we think that the relationship between x and y can be described as a linear relationship. What does this mean? When x increases also y increase or when x decreases also y decreases.... x affects y with the same proportions. 6 The basics of regression The basics of regression A (linear) equation: where α is the constant and β explains the effect of x on y. When x=0: y=α, and y increases (or decreases) by the amount of β for each additional unit increase (or decrease) in the value of x. The basics of regression Given a set of observations, the regression analysis is finding the straight line that is the best fit to the data. The values of α and β are then estimated from that best fit line. "Best fit" = the actual data point should not be very far away from the line. 7 The basics of regression β> 0 => upward sloping curve β< 0 => downward sloping Slope = β curve “Best fit” line The basics of regression Running the regression equation in statistical program, i.e. STATA, the program gives you the optimally chosen values for β, i.e. the coefficient of the slope of the curve that best fits the data. β is called the regression coefficient. It tells us about the strength of the influence of x on y: a high value if β implies that a small change in x can bring about a large change in y; a low value of β implies the opposite 8 Multivariate regressions: A regression with more than ONE independent variable, x: where y=children's enrollment, x= family income, z= parental education Interpretation: β now tells us the effect on y of a change in x when the value of z is held constant. Can the estimated coefficient β be trusted? Think of a large set of (x,y) observations that we might have access to, but what we really have in our hands is a subset or a sample of these observations. Our sample allows us to construct estimates of α and β of the true relationship that we believe is "out there". Our estimates are random variables of observations from the entire sample of observations. Can the estimated coefficient β be trusted? The statistic will calculate how precise or how significant our estimates are: how confident can we be that our estimated value of β is close to the true β? 9 Hypothesis testing The underlying hypothesis which you want to test is called the null hypothesis. Null hypothesis: H₀=0, Alternative hypothesis: HA≠0. We want to test null hypothesis that family income has NO effect on children's enrollment. Hypothesis testing Example: A regression on family income and children's school enrollment gives us an estimate of β => We want to know whether β is significantly different from 0. We form a hypothesis that H₀=0, HA≠0. Using the sample data, compute the test for whether we can reject the H₀ or not. When you do this you get a test statistic (t-value) Hypothesis testing To put it simple: if the t-value >2, we reject the null hypothesis that H₀=0. => This means that we can be confident that the effect of i.e. family income on children's schooling is NOT 0. Then, look at the estimated β to see whether it is positive or negative. 10 Hypothesis testing “Estimate is significant at the 5% level”: The null hypothesis is rejected under the assumption that there is less than a 5% probability that we rejected the null hypothesis when it was indeed true. We can be confident that we have not rejected a true null hypothesis. Randomized evaluations “What would have happened to this person's behavior if she had been subjected to an alternative policy?” would she work more if marginal taxes are lower would she earn less if she had not gone to school would she had higher test scores if she had proper text books at the school Randomized evaluations Example: YiT = the average test scores of children in a given school i if the school has textbooks YiC = the average test scores of children in the same school i if the school has no textbooks We are interested in the difference YiT- YiC , which is the effect of having textbooks for school i. 11 Randomized evaluations We will never know the effect of having textbooks on a school in particular BUT we may hope to learn the average effect that it will have on schools: E[YiT - YiC ]. Randomized evaluations Difference = E[YiT | School has] – E[YiC | School has textbooks no textbooks] = E[YiT | T] – E[YiC | C]. Randomized evaluations The problem is: there may be systematic differences between schools with textbooks and schools without textbooks. I.e. schools with textbooks might have better teachers, more money etc... If we only run a regression of textbooks on test scores, we will not get the “causal” effect because we are not controlling for other variables such as teachers, funding etc. This is called that you have a bias in the estimate.. 12 Randomized evaluations How do we eliminate the bias in the estimate? One way to do this is to randomly decide which schools gets text books and which does not get textbooks Randomized evaluation: Evaluating policy programs, i.e. textbooks provision to schools, worms’ medicines to children in primary schools etc. Think about medical experiments - some people are given the drug and some are not... Ideal set-up to evaluate the effect of a policy X on outcome Y Randomized evaluation A sample of N individuals is selected from a population. This sample is then randomly divided into two groups: (1) Treatment group (2) Control group The Treatment group is treated with policy X while the Control group is not. 13 Randomized evaluation The effect of policy X is measured by the difference in empirical means of Y between the Treatment and Control groups: ^ ^ D = E[Y | T ] − E[Y | C ] ^ (where E denotes the empirical mean) Randomized evaluation Key: Since Treatment has been randomly assigned, the two groups are similar on other characteristics and hence, your estimate is going to be unbiased!!! 14

DOCUMENT INFO

Shared By:

Categories:

Tags:
statistical methods, data analysis, quantum mechanics, 2nd ed, statistical inference, how to, computer science, data sets, statistical physics, fluid mechanics, mathematical physics, statistical mechanics, quantum field theory, string theory, new york

Stats:

views: | 114 |

posted: | 5/27/2010 |

language: | English |

pages: | 14 |

OTHER DOCS BY htt39969

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.