VIEWS: 14 PAGES: 57 POSTED ON: 11/19/2011
AP Course Audit: Manlius Pebble Hill (AP Statistics) We will cover how to use the graphing calculators each time we encounter a feature that the graphing calculator can accommodate. These features include the following: calculating the mean, calculating the standard deviation, calculating the median, creating scatter plots, creating box plots, linear regression, non – linear regression, 1 sample t and z tests, 2 samples t and z tests, z confidence intervals, t confidence intervals, chi – squared tests for goodness of fit, chi – squared tests for homogeneity and independence. Assignments are designed so that each student uses a unique topic or ends up with a unique data set upon which they do an independent calculation. This fosters independence of thought, confidence and keeps students honest about what they personally understand. Each day we cover an AP problem in class from one of the released exams. The main goal is a complete understanding of the problem and how it relates to the day’s topic. All graphical displays that a student creates should be done with the help of Excel or other similar graphical tool. Projects: Each student must write an article for the school newspaper. Before they can submit an article they must design a survey or experiment, create a sampling plan, gather the data, analyze the data and come to a conclusion given the data set. Then they have to write an article summarizing what they found along with at least one graphical aid for any reader of that article. Day Topic Description Activity Assignment Textbook HW QI Quarter I: The Data Analysis Process, Collecting Data & Methods for Describing Data 1 Variability in We will discuss the We will measure the They will Inferential concept of what is a salinity of normal measure the Statistics ‘typical’ range of values drinking water at lengths of two for a measurable our school and then different types quantity. Then we will measure the salinity of leaves from discuss how we can use of that same water two different that range of values and after a ‘toxic spill’. trees. Then how ‘frequently’ those Then we will try to they will try to measurements occur determine whether give a criterion can help to make a the drinking water as to what decision. is contaminated. range of values distinguish one tree from another. 2 Types of Data Frequency We create a survey They pick two Sec 1.4 # and Simple Distribution for that we could use to of the United 1.9, 1.11, Graphical Categorical collect frequency States’ top 1.19 Displays Data information so that commodities Frequency we can practice (or from the Relative displaying bar charts student’s home Frequency and dot plots country) from Bar Charts the FAO and Dot Plots track the production and revenue for the past 6 years. Then they describe what they found with a bar chart. 3 Sampling Why sample? Designing a survey Exploring Sec 2.2 # Methods and Sample sizes to determine how sampling 2.5, 2.11, Bias Selection bias many hours students methods in the 2.13, 2.25 Measurement spend on homework context farming or response at our school in the (plant wilt) and bias upper school. Along how to get a Non-response with a discussion good sample bias about how to do the given some Conceptual bias actual sampling. uncontrolled Simple Random variables. samples Stratified random sampling Cluster sampling Systematic sampling Why not Convenience sampling? Why not volunteer sampling? How important sampling biases are for researchers when designing experiments. 4 Statistical Studies: Why do In groups they will Given four Sec 2.3 # Observation and statistical pretend that they different types 2.27, 2.33, Experiment studies? have just unearthed of statistical 2.35, 2.37 The difference a new archeological studies, each between find. Then they will student must observational try to list the types of determine the studies and things they may goal of the experiments. want to learn from study and what When you can the site along with would be draw cause and what types of enough effect information from the information to relationships site would influence draw a cause between a study. and effect measured relationship quantities. between any Confounding measured variables. quantities. 5 Simple The design of a We will discuss the Each student Sec 2.4 # Comparative good Stroop effect and will find and 2.39, 2.41, Experiments experiment each group will evaluate 2.43, 2.45 An example design and perform several famous experiment an experiment experiments Randomization testing the Stroop based on the Blocking effect. criteria that we Direct Control discussed in Blocking class. 6 More On Control groups Experimental Placebo Design Single blind experiments Double blind experiments 7 Survey Design The different We will discuss the Each student Sec 2.6 # tasks of a yearly class survey will read the 2.59, 2.61 respondent. that I give to the National Comprehension students and how to Geographic Retrieval from improve the article ‘Opium memory. questions in the Wars’ in order Answering the survey. to refine their questions. understanding Common of the concept stumbling ‘survey’. They blocks in will explore the responding. depth of understanding that the author has of the subject, but compare how small the sample size is in an article like that with what we know about sampling. 8 Review: Chapters 1 & 2 9 Exam: Chapters 1&2 10 Displaying Comparative Students explore the Each student Sec 3.1 # Categorical bar charts. comparative hunting will compare 3.3, 3.5, Data: Pie charts for success between the top 20 3.11, 3.15 + Comparative Bar categorical Egrets and Herons commodities Test Charts and Pie data. via comparative bar produced by Corrections Charts Stacked bar charts and frequency two different charts data. countries using comparative bar charts and pie charts. 11 Displaying How to We explore different Each student Sec 3.2 # Numerical Data: construct stem aspects of stem and will locate a 3.17, 3.21, Stem and Leaf and leaf plots leaf plots in order to different real 3.23 Plots Outliers clarify the life example of a Spread construction of these stem and leaf plots. plot. 12 Displaying Histograms for We revisit the data We revisit the Sec 3.3 # Numerical Data: discrete from measuring the students 3.25, 3.27, Frequency numerical data. salinity of the measurements 3.33, 3.35 Distributions Histograms for school’s drinking of the several and Histograms continuous water and use leaf lengths numerical data histograms to make from two (with the aid of any arguments different types stem and leaf clearer and visually of trees and plots) appealing. have them Frequency and describe relative visually what frequency the differences distributions. are. Examples 13 Displaying How to We will create an Each student Sec 3.4 # Bivariate construct and example of a scatter will have to 3.41, 3.43, Numerical Data label a scatter plot using raw data make several 3.49, 3.53 plot. from the FAO and new scatter Time series explore the meaning plots using raw plots. of the trend in the data from the Trends (linear data. Then we will FAO. They will and non – discuss the also have to linear). implications of the describe the data for the leaders trends that they of a given country. see along with any implications of those trends. 14 Describing the The difference “Stringing Students Using raw data Sec 4.1 # Center of a Data between the Along” is an activity from the FAO, 4.5, 4.9, Set Numerically words that explores how to each student 4.13, 4.15 ‘population’ sample objects like will make and ‘sample’. bank queues to several Mean. determine center estimations of Median. and variability. We the center of a Proportion of look and two data set. They successes. different sampling will also have to Trimming data. methods for strings find a data set of varying length in a for which an bag, and try to average does determine whether not make sense. either method shows any sampling bias. 15 Describing the The importance We will revisit the They will have Sec 4.2 # Variability in a of variability water salinity data to describe the 4.21, 4.23, Data Set and spread. and describe the variability of 4.25, 4.29 Standard data’s center and the deviation. variability commodities Interquartile numerically using that they chose range. the concepts from last time. the past couple of days. 16 Summarizing a How box plots “Capture – Activity 4.2 Sec 4.3 # Data Set: Box can summarize Recapture” is an (SADA) is an 4.31,4.33, Plots data. activity that activity that 4.35, 4.37 Skeletal box demonstrates a explores the plots. method used by possible shapes Modified box naturalists to of box plots plots. estimate the size of given different Outliers. populations that are data sets. Extreme hard to estimate. We outliers. will simulate the Cost – to – process with Charge ratio. Pepperidge Farm gold fish. 17 Interpreting How to “Sampling Pennies” Each student Sec 4.4 # Center and measure is an activity that will go back to 4.39, 4.41, Variability: ‘distance from acts as an their ERB 4.43, 4.45 Chebyshev’s the center’ in introduction to the scores and find Rule, the terms of concept of a the mean, Empirical Rule, standard distribution. It also standard and Z – Scores deviations. makes use of deviation of the Chebyshev’s calculations that population and estimate the center, then compare rule. variability of a data their score to The empirical set. We can then these. Then rule. check empirical they will Z – scores. results against calculate what percentile predictions for how score they many data points are would have supposed to be in a needed in order range. to get in a certain percentile. 18 Extra Day 19 Review: Chapters 3 & 4 QII Quarter II: Bivariate Data, Probability and Distributions 20 Exam: Chapters 3&4 21 Correlation How to In class we Each student must Sec 5.1 # 5.1, calculate explore the find a linear 5.5, 5.9, 5.11 correlation. concept of relationship in a What correlation by scholarly scientific correlation looking at article and means. GPA scores summarize what the When a set of (for 9th, 10th linear relationship. bivariate 11th and 1st numerical data semester has a good senior year) correlation. along with What the SAT scores formula for and ERB correlation scores to see mean. which pair of What numerical correlation2 data sets yield means. the strongest correlation. 22 Linear Formula for the We generate They will Sec 5.2 # Regression: y – intercept of several data measure/ask for the 5.17, 5.19, Fitting a Line to the regression sets for height and weight of 5.21, 5.25 Bivariate Data line. temperature 10 family members Formula for the and try to and calculate the slope of the estimate equation of the regression line. absolute zero. regression line for Formula for the their data set. They slope of a will have to make a regression line scatter plot of their that goes data and include the through the regression line. Then origin. they will try to Examples predict the height and weight of future showing the members of their difference family while avoiding between lines the danger of that are known extrapolation. to go through the origin and lines that might not go through the origin. The fact that the regression line goes through the point (average x value, average y values) Dependent versus independent variable. Danger of extrapolation. Absolute Zero. 23 Assessing the Fit Residuals. Students Each student look for Sec 5.3 # of a Line Predicted match two numerical data 5.33, 5.35, values. equations of sets that they think 5.37, 5.39 Residual plots regression will have a linear Coefficient of lines to relationship on determination. scatter plots data.gov and then How residual that are they create a (clearly plots can similar to labeled) scatter plot uncover each other. for the data, calculate curvature in a The scatter and graph the data set that plots are regression line, was previously created in calculate and thought to be such a way interpret the straight. that only one correlation, calculate point changes. and interpret the The points standard deviation moves either about the regression far away but line. on the regression line or far away and perpendicular to the regression line. We also compare the correlations in these instances. 24 Non-Linear We try fitting a In this activity Each student must go Sec 5.4 # Relationships straight line to non – we look at home and keep track 5.47, 5.49, and linear data. Then we data from the of the temperature of 5.51, 5.53 Transformations try changing the NOAA a cooling liquid (hot regression line to a regarding chocolate or tea that regression ‘curve’. We monthly they can drink revisit challenge of averages of afterwards). Then noticing when data CO2 over a few they will have to try that looks straight is decades and to fit the data with a not straight. Then we try to fit the line while looking for explore the concept of curve as clues as to how the ‘linearizing’ data. We accurately as data might not be finally make a list of possible. The linear. Then they traditional we plot the have to try to find a linearizations. actual carbon good non – linear level versus model. Finally they the predicted have to check that carbon level their non – linear and calculate model is a good fit by the plotting actual versus correlation predicted values. between these two, in order to see how good of a fit our model is. 25 Chance Chance We write out Each student then Sec 6.1 # 6.1, Experiments and experiment. the sample performs a similar 6.3, 6.5, 6.7 Events Sample space. space for the (but simpler) Event. sum of the top experiment at home Simple event. faces after with flipping a coin. Tree. rolling two First, they make a Sample space dice. Then we predicted sample tree. try to check space. Then they Compliment of this check actual A prediction experimental values ‘or’ versus ‘and’ against reality against that predicted disjoint by rolling two sample space. They dice and add must create a relative the numbers frequency histogram on the top of the predicted and faces. We actual data sets. check the relative frequency of the results from actually rolling the dice against the predicted . 26 Definition of Classical We explore Each student No textbook Probability definition of the difference performs a similar homework. probability. between the experiment to the Relative classical and bottle cap experiment frequency relative at home, but with definition of frequency Hershey Kisses. probability. definitions of Subjective / probability by weighted writing out definition of the sample probability. space for the Then we result of discuss the flipping a main plastic bottle differences cap. Then between the comparing different that definitions of prediction to probability by the actual checking the results of predictions of flipping a each one bottle cap. against the other. 27 Basic Properties Probabilities In this activity Students then design Sec 6.3 # of Probability. are between … we encounter a test for the false 6.15, 6.17, The probability the ‘law of version of the ‘law of 6.19, 6.21 of the whole averages’ in averages’ and look sample space is its popular, the results of those … but false form. tests to see if they What property We classify think the ‘law of do disjoint various averages’ is true or events have in statements false. This also helps the context of that use the introduce the concept probability? ‘law of of hypothesis testing. What is the averages’ and relationship try to find between an what is event and its correct and compliment in what is wrong the context of about them. probability. We use this to The law of gain a large numbers. stronger grasp of a more correct property found in the context of probability, the ‘law of large numbers’. 28 Conditional Definition of In this activity Each student will Sec 6.4 # Probability conditional students solve theoretically, 6.29, 6.33, probability. explore and try the following 6.35, 6.37 Why ‘Monty Hall’ experiment: Three conditional problem. We cards are put into a probability is introduce the box. One card is red needed. problem, on both sides, one How to use two solve the card is green on both way tables to problem and sides and one card is help calculate then try the red on one side and conditional problem green on the other. If probabilities. empirically you get a prize if you When you can with cards. correctly guess the use conditional color on the other probability. side of th card that you randomly picked form the box should you always guess the same color, a different color, or are the two strategies the same? 29 Independence Formula for In this activity Each student will do Sec 6.5 # independence. students some research on 6.41, 6.47, Why we need a investigate diffraction and 6.51, 6.57 concept like the frequency comment on whether independence. with which each electron is When we can push pins fall acting independently use the concept point down. of other electrons in of But they do so the diffraction independence. in two experiment. Examples. different ways. The first way is by dropping push pins one at a time. The second way, however, is by dropping 10 push pins at a time. The goal is to see if in the second method push pins fall show independence . 30 General General In this activity In this assignment Sec 6.6 # Probability Addition Rule. we look at the students explore the 6.59, 6.61, Rules General concept of concept (and 6.63, 6.69 Multiplication conditional formula) for Rule. probability in conditional Law of Total the context of probability in the Probability. defective and context of medical Bayes’ non – tests for a disease. Theorem. defective Since a medical test parts. can show positive Students are when the individual given two does NOT have the types of bolts disease, and since the from ‘two test can show a different negative when the machines’ individual DOES have that produce the disease, bolts. Each conditional machine has a probability is one of different the appropriate tools success rate of for dealing with producing medical tests. non – defective parts. Students then take samples from these bolt collections and compare the theoretical probabilities that they calculated first with the actual frequencies with which a specific type of bolt showed up. 31 Review: Chapters 5 & 6 32 Exam: Chapters 5&6 33 Random Random In this activity Each student has to Sec 7 # 7.1, Variables variable. we examine find 20 statistics, 10 7.3, 7.5, 7.7 Discrete the concept of of which are from a random ‘streaky discrete random variables. behavior’ and variable, and 10 of Continuous what which are from a random constitutes continuous random variables. streaky variable. The difference behavior. between First, as a discrete and class we look continuous. at a real sequence of coin flips versus a made up sequence of coin flips. We try to figure out which is which based on the ‘streakiness’ of the sequence. Then they construct their own real sequence of coin flips and analyze it for streakiness. 34 Probability Definition of a In this activity Each student must Sec 7.4 # Distributions for probability we create a watch a basketball 7.27, 7.29, Discrete Random distribution for probability game and keep track 7.31, 7.37 Variables a discrete distribution of the sequence of random for the shots and whether variable. machine bolt the shot was made or Properties of a activity that not. Then each probability we did before student will have to distribution. where two create a probability difference distribution function machine for the random create bolts variable ‘number of that are successful shots in a defective or row’. non – defective at different rates. 35 Probability Definition of In this activity Each student must Sec 7.3 # Distributions for probability we create a make a probability 7.21, 7.23 Continuous density probability density function for Random function for density the temperature Variables continuous function for readings of their random the house with the goal of variables. continuous being able to Relationship random distinguish one and difference variable pH in student’s house from between different another simply based continuous and liquids. on the data / discrete calculation / graphics probability that they make. distributions. Calculating probabilities using a table for a probability density function of a continuous random variable. Why the area represents probability. 36 Mean and Mean of a In this activity Each student will Sec 7.4 # Standard random we try to have to gather a 7.27, 7.29, Deviation of a variable. measure the range of gas station 7.31, 7.37 Random Standard ‘length of a information, Variable deviation of a mechanical including number of random pencil’. The gallons, total price, variable. challenge is to price per gallon, time Why measure it of day. They will have probability with respect to make a sampling shows up in the to a specific plan, get permission formula for the individual and to gather the data, mean. so get a sense gather the data and Some example of how long a then analyze the data. calculations particular They will have to using raw data. person likes make a probability the graphite distribution function to be when for their data and they write. create a visual aid for their function. 37 Binomial and When to use In this activity When going to a Sec 7.5 # Geometric the binomial we explore parking lot students 7.45, 7.51, Distribution distribution. the geometric will have to count 7.59, 7.61 How to distribution how many cars it calculate the in the context takes until they get to probabilities of prizes like say a Toyota. After associated with ‘cracker jack’ repeating this count the binomial prizes where several times each distribution. they have to student will comment How the buy a certain on whether they geometric number of think that their distribution boxes before distribution is relates to the they get the geometric and what binomial prize that they think the actual distribution. they want to. proportion of Toyotas When to use The students they think were in the the geometric make a parking lot. distribution. calculation for How to do predicting calculations how many with the tries it will geometric take and then distribution. test that Examples. prediction empirically. 38 Normal The general In this activity Each student will Sec 7.6 # Distributions shape of the we try to have to go home and 7.67, 7.69, normal measure the make sample the 7.71, 7.73 distribution. length of our electric meter How to classroom. readings when they calculate areas From the data get home during using a table. that we collect some time frame. How z – score we calculate They will have to relates to the mean and think about how they calculating standard will sample the areas. deviations. meter. Then after How We also they have gathered probability calculate a the data they need to notation works probability calculate the mean with normal distribution and standard distributions. function from deviation. They also Upper tailed, the data. need to thing about lower tailed whether the data that and two – tailed they have looks calculations. normal. Assuming Symmetry of that the data is the normal normal they will also distribution. have to make some guesses as to how many measurements they think will fall in a certain range of values. 39 Checking for What does it We check the Each student checks Sec 7.7 # Normality mean for a data data from our their data from the 7.81, 7.83, set to look measurement electric meter 7.85, 7.89 normal? of the length readings for What do you of our normality. compare to see classroom for if a data set it normality. normal. Using correlation between theoretical cumulative probability and actual cumulative probability to determine if data appears to be normal. How to use the correlation table to determine if data appears to be normal or not. QIII Quarter III: Distributions, Confidence Intervals and Hypothesis Testing 40 Approximating Sometimes the In this activity Each student Sec 7.8 # Discrete shape of a we explore continues the 7.93, 7.95, Distributions discrete the concept of experiment home by 7.97, 7.99 distribution is using a trying to measure out similar to the continuous 1 cup of pasta shells shape of a distribution by weight to see if continuous to that data produces distribution. approximate better results than When can we a discrete either measuring cup interchange the distribution did. two in the context distributions? of ‘process Binomial control’. Each versus Normal group will try distributions. to measure Examples of out 1 cup of checking ‘ice cream’ binomial data (pasta shells) for normality. in two Noting that the different two still ways. produce First, each slightly group will try different to measure probabilities. one cup with What other an opaque distributions measuring are similar to cup and each other? ‘intuition’. Then each group will use a transparent measuring cup and try to get an exact cup. Then we will average the results from each group and try to determine if either measurement shows a significant advantage over the other. The random variable ‘the number of pasta shells in the measuring cup’ is a discrete random variable. We will compare the results from this random variable with a continuous one. 41 Statistics and Comparing one In this activity Each student then Sec 8.1 # 8.1, Sampling measurement we do a does an extension to 8.3, 8.7, 8.11 Variability from one similar what we covered that sample with activity as in day by taking a look the mean of the at the random one sample out introduction variable X = ‘number of several except we use of students in one of samples. the random my classes’. They Example using variable X = must create a the random ‘number of probability variable X = ‘# children that distribution (along of cars in one a trustee from with a visual family’ our school representation for compared with has’. that distribution) for Y = ‘average the random variable number of cars X and then compare it between two to the probability families’. distribution for Y = Comparing the ‘the average number distribution for of students in two of a random my classes’. variable with the sampling distribution of the sample mean. How the shapes of X and Y do NOT have to be the same. 42 The Sampling Comparing the In ‘Cents and Each student then Sec 8.2, 8.17, Distribution of distribution of the Central will then look up the 8.19, 8.21, the Sampling single Limit number of coins 8.23 Mean measurements Theorem’ we minted in different of a random explore how years to help explain variable X with the Central why the distribution the distribution Limit of dates from pennies of the averages Theorem that we saw in class from samples works. The looked the way it did. of size N. distribution How the that we take distribution of samples from the averages is the from several distribution samples of size dates of 100 N can have a pennies. different shape than the original distribution. How the distribution of the averages from different samples of size N become more ‘Gaussian’ as the size of the sample increases. How the mean of the distribution of the averages from samples of size N gets closer to actual population mean from which the samples come. How the standard deviation of the averages from samples of size N get smaller as N increases. The Central Limit Theorem. Using simulation to demonstrate that the Central Limit Theorem works. 43 The Sampling Revisiting the In this activity Students then Sec 8.3 # Distribution of definition of we look at the continue this study by 8.27, 8.29, the Sample the sample proportion of looking up the 8.31, 8.33 Proportion proportion of non - proportion of successes. Caucasian different ethnic How the students at groups in our city and distribution of our school to develop a sampling the sample understand plan for determining proportion of the concept of the ethnicity from a successes is the sampling sample of people related to the distribution of (without having to distribution of the sample ask them their the sampling proportion. ethnicity – i.e. simply mean. Each group by watching people). An example of a creates a plan Then they compare rope making to sample their sample to the company that students in know/estimated size makes ropes the hall way of different ethnicity for two for their groups. different ethnicity. groups of Then we look people. One at the group uses distribution of rope the decoratively proportion of and the other non – uses rope to Caucasian haul cargo. The students in second group those samples needs the rope and compare to withstand a that certain level of distribution force before to the know the rope proportion of breaks. How non – does the rope Caucasian company students. determine how well their ropes satisfy the second customer? A calculation of the probability that if you buy 120 rope at least 110 of them will be able to withstand the required amount of force to haul a load. 44 Review: Chapter 7&8 45 Exams: Chapters 7&8 46 Point Estimation The definition In this activity Each student has to Sec 9.1 # 9.1, of a point we try to compare statistics 9.3, 9.4, 9.7 estimate estimate the from both sides of a True value of a value of the controversial issue population gravitational and try to determine characteristic constant if the statistics are Unbiased using several consistent or statistics different inconsistent with methods. We each other. versus biased try to statistics determine if Precision the method is versus biased and if accuracy the method is valid based on the data. 47 Large Sample The definition In a large jar Each student has to Sec 9.2 # Confidence of a confidence with pennies find an example of a 9.11, 9.13, Interval for a interval. and quarters confidence interval in 9.15, 9.17 Population Confidence students find the news and explain Proportion level. an what the statistic is 95% appropriate measuring and if they confidence sample size in think the interval is a interval. order to get at good one. Large sample least 10 confidence quarters with interval for the 95% population confidence in proportion. a random Standard error. sample of Bound on the coins from the standard error jar. Then they of the generalize estimation B their associated with calculation to a 95% predict confidence sample sizes interval. needed in Sample size order to get N requirements quarters with 95% confidence in a random sample of coins from a jar. 48 Confidence Assumptions In this activity Each student extends Sec 9.3 # Interval for a before using a we consider this activity by 9.29, 9.31, Population Mean one – sample z the mean of finding a statistic 9.33, 9.35 confidence executives whose mean is interval for a and reported. They then population determine a must calculate a mean. sample size to sample size to Sample size estimate the estimate the true requirements true population mean for before use a population that statistic. After one – sample mean salary getting a random confidence of executives. sample they must interval for a Then we look then compare the population up the mean that they mean. salaries of a calculated with the Student’s t – random reported mean. distributions sample of versus z executives distributions. and compare One – sample t the mean with confidence the reported intervals for a mean. sample mean. 49 Hypothesis and A test of In this activity Each student must Sec 10.1 # Test Procedures hypotheses or we take then find one 10.1,10.3, test procedure several experiment and 10.5, 10.7 Null hypothesis experiments describe the null Alternative and hypothesis and hypothesis determine alternative The different what the null hypothesis. possible and alternative alternative hypotheses. hypotheses are for each experiment. We also explore why the researchers did not chose different alternative hypotheses. 50 Errors in Test In this activity Each student must Sec 10.2 # Hypothesis procedures. we revisit the find an example of a 10.11, 10.13, Testing Type I error. ‘cards in box’ treatment with 10.15, 10.17 Type II error. problem and known type I and type Level of calculate the II errors. significance. known type I How to choose and type II an alpha level errors. Then and why should we test those not make the predicted alpha level values against smaller than it experimental needs to be. results that we make in class. 51 Large Sample Test statistics We make a Each student then has Sec 10. 3 # Hypothesis Tests P – value hypothesis to make a hypothesis 10.23, 10.25, for a Population Observed regarding the regarding the number 10.27, 10.29 Proportion significance proportion of of students with black boys and girls hair at our school. level at our school. Then they need to What the P – Then we decide if our student value means. decide if our body is large enough How to phrase student body to perform a large – a response to a is large sample hypothesis given P – value enough to test for the (accept versus perform a population fail to reject). large – sample proportion of Upper tailed hypothesis students with black tests and lower test for the hair. Then they need tailed tests population to take a random versus two proportion of sample to estimate tailed tests. boys (or girls) the number of An outline of in the school. students with black the steps in a Then we take hair at our school. hypothesis a random Then each student testing sample to will compare results analysis. estimate the with other students in number of class. boys (or girls) at our school. Then we compare the estimation with the actual number. (or we can do a skittles/m&m’ s related activity) 52 Hypothesis Tests Z and T We make a Each student must Sec 10.4 # for a Population confidence hypothesis for make a hypothesis for 10.41, 10.43, Mean intervals when the mean SAT the mean sunrise 10.45, 10.47 the population score in our time in our city. Then standard school over they need to deviation is the past few determine how many known and not years. Then days they would need known. we determine in order to use a The definition how many hypothesis test for a of degrees of years and population mean. freedom and students we Then they need to how to need in order design a sampling calculate the to use a plan for getting a degrees of hypothesis random sample of freedom in the test for days. Then after basic sense. population gathering the data Upper tailed mean. Then and calculating a and lower we take a sample mean they tailed tests random need to compare their versus two sample of SAT results with one tailed tests. scores from year’s worth of The definition the given sunrise times. They of statistically years and need to make a box significant. compare it plot of each to show with the their results visually. reported school mean SAT score. 53 Power and The definition In this activity Each student needs to Sec 10.5 # Probability of of the power of we compare find two test 10.59, 10.61, Type II Error a test. two test procedures with 10.63, 10.65 Visually how to procedures. known type I and type think about the First, we II errors. Then they power of a test. calculate the need to compare the What factors probabilities power of each test have an effect for type I and and describe under on the power of type II errors what circumstances a test? in each test their conclusion is When the null procedure. valid. hypothesis is Then check true versus those when the null probabilities hypothesis is empirically in false. class. 54 Review: Chapters 9 & 10 55 Exam: Chapters 9 & 10 56 Inferences When you In this activity Each student must Sec 11.1 # Concerning the might need to we return to clearly state a null 11.1, 11.5, Difference use a difference our data from hypothesis of the 11.9, 11.13 Between Two of means. the activity difference of between Population or Comparing where we the mean electrical Treatment treatments. measured the usage in their house Means Using Formulas for constant of when everyone is Independent the difference gravity using awake and the mean Samples between different electrical usage in sample means methods. We their house after using use this data every one has gone to independent to determine bed. They need to samples. if either create a sampling Assumptions method shows plan to get a random for the using a significant sample during those the above difference times. Then after formulas. from each gathering their data other and the and making their accepted calculations they value for the need to determine if gravitation the data shows any constant. significant difference from their prediction. They should speculate any causes given their conclusion. 57 Inferences The definition In this activity Each student must Sec 11.2 # Concerning the of ‘paired’. students clearly state a null 11.31, 11.35, Difference Examples of compare the hypothesis for the 11.37, 11.39 Between Two situations that difference difference of the Populations or require paired between the mean temperature of Treatment values. mean listed one floor of his or her Means Using Assumptions weight of family’s house Paired Samples. before making candies with compared with the inferences the same size mean temperature of about the and the mean another floor of difference measured his/her family’s between means weight of house. The when using candies with temperature readings paired samples. the same size. should happen at the Paired t same time. The confidence students should then intervals. comment on how well paired the data sets are. They should also speculate as to any causes given their conclusion. 58 Large Sample Assumptions In this activity Each student must Sec 11.3 # Inferences before making we compare challenge a member 11.41, 1.43, Concerning a inferences the difference of their family to a 11.45, 11.47 Difference about the between the game of basketball Between Two difference proportion of and try the day’s Populations or between two basketball activity at home. Treatment population (or shots made by Proportions treatment) team A and proportions. the Formulas for proportion of the difference basketball between two shots made by population (or team B. Two treatment) teams from proportions. class make a series of basketball shots and keep track of successful shots and misses. They think about what they need to do in order to satisfy the assumptions of the test. They also make a clear statement about what the null hypothesis is in this context. After they make enough shots we compare actual proportion of successes between the two teams. 59 Chi –Squared What the null In this activity Each student must Sec 12.1 # Tests for hypothesis we look at find data on 12.1, 12.3, Univariate looks like for data from www.data.gov upon 12.5, 12.7 Categorical Data univariate drosophila which they can categorical fruit flies and perform a chi- data. compare squared test for How to create predicted univariate categorical the alternative ratios of data. They must make hypothesis and inherited sure the data satisfies how to notice traits with the assumptions the alternative actual ratios needed in order to hypothesis. of inherited perform the test. Expected traits. This is After they do the versus in conjunction calculation they need observed with AP to explain what the counts. Biology lab. resulting chi-squared Chi – squared value means and if value. the data shows any How to use the significant difference chi – squared between the values to make hypothesized inferences. proportions or not. Chi – squared tables. Assumptions needed in order to make inferences using the chi – squared value. QIV Quarter IV: Chi-Squared Tests, AP Exam and Topics 60 Tests for Two ways In this activity Each student must Sec 12.2 # Homogeneity tables. we compare compare several 12.17, 12.19, and Marginal totals. several basketball teams 12.21, 12.23 Independence in How to different against at least four a Two – Way calculate famous different Table expected authors characteristics (like values for a two against the rebounds, shot way table. following: success proportion, What the null how many etc.) to see if their hypothesis is books got to collection of when using the the NY Times characteristics show chi – squared best-seller significant values and two list, how many differences between – way tables. books became the teams. They need Assumptions movies, how to look for which needed in many books characteristic or team order to make had sequels shows the most inferences and how contribution and in using a chi – many books which direction. squared value they have for two ways published. tables. 61 Review: Chapters 11 & 12 62 Exam: Chapters 11 & 12 63 AP Review 64 AP Review 65 AP Review 66 AP Review 67 AP Review 68 AP Exams 69 AP Exams 70 AP Exams 71 AP Exams 72 AP Exams 73 Discrimination 74 Discrimination 75 Chapter 13 76 Chapter 13 77 ANOVA 78 ANOVA 79 ANOVA 80 ANOVA Exploring Data Activity: Food and Agricultural Organization (AP Statistics) Materials: Laptop Statistics is a tool that is meant to analyze and help us understand data. To this end we will need several sources of data. The first source that we will make use of is the Food and Agriculture Organization of the United Nations. Please go to http://faostat.fao.org/ Click on ‘want to register?’ Register yourself at FAO by filling in the information for the following: Name Manlius Pebble Hill for the organization Educational institution for the type of organization USA for the country Check the first column of boxes (and any others that interest you) Use your school email address Make up a password We will use data from this website today and throughout the year. Go to http://www.fao.org/economic/ess/en/ and find the current agricultural yearbook. Find the spread sheets for the following: Total and Agricultural Population (including forestry and fisheries) (A1) Human Development Index and Poverty (G4) Find the definitions for total population, agricultural population, human development index and poverty. Find the units for each of the categories. Copy the columns for 2009 in each of the following categories into a new excel worksheet titled Excel Practice 1 AP Statistics. Make sure the countries match the data in each row. Name of Country total population agricultural population human development index Poverty Prevalence Year Poverty Prevalence was recorded In a new column you are going to calculate the ratio of agricultural population to total population. Label the column as such and then in the first row (lining up with the first country) place an equation that looks like ‘=E11/D11’ which should represent the ratio of agricultural population to total population of the first country. In this formula agricultural population for Afghanistan was in column E row 11 and the agricultural population for Afghanistan was in column D row 11. Then copy the formula in that cell and paste it to the rest of the cells in that row all the way down to the last country. The numbers should all be different and represent each country’s ratio. Note: all formulas in a cell for Excel should be preceded by an equal sign. Excel has a list of statistical functions that you can use, these are listed under ‘statistical functions’ in the help search menu. You will be using several of these functions from excel. Some of these include the following: =AVERAGE( range of cells ) This produces the average of all the numbers that you highlighted. =MEDIAN( range of cells) This finds the median or middle number of the cells that you highlighted. =SUM (range of cells ) This adds up the numbers in the cells that you highlighted =STDEV (range of cells ) This produces the Sample Standard Deviation for the cells that you highlighted. =CORREL ( first range of cells, second range of cells ) This produces the Correlation between the variables represented in the given two ranges of cells [usually two columns or two rows]. Do the following calculations with the data that you copied from the FAO statistical yearbook: Find the average agricultural population Find the median agricultural population Find the sum of the agricultural population and compare it to the world agricultural population. What should be true about these numbers? Find the standard deviation of the agricultural population. Find the correlation between the column labeled Human Development Index and the column that should represent the ratio ‘agricultural population : total population’ Make the following scatter plots and label the axes and each scatter plot: Human Development Index versus Proportion of Population that Farms Poverty Prevalence versus Proportion of Population that Farms What is the definition of the term ‘human development index’? What is the definition of the word ‘poverty’? What do you notice about the overall trends in each scatter plot? Does it look like there is any relationship between the different variables that you plotted? What do the points on the x axis mean? What would this suggest about what a nation should do to improve its human development index? Would your solution in the previous question automatically reduce poverty in a given country? Why or why not? What is considered the typical trend with respect to the percentage of people that farm? Why does USA’s poverty prevalence not show up in the table? Important aspects of this activity: You should always be able to analyze a data set using Excel even if you don’t remember all the formulas. The key is that you must remember what the formulas mean, when you can use the formulas, what the formulas can (and can’t) do. Taking a course in statistics allows you to become statistically literate, which will allow you to be intelligently informed about the information that you see around you. You will see statistical information pretty much any where you go or in many informative documents that you will see. Often statistical information can help guide decisions that you would have to make in your occupation. The statistical information also can show how your intuition is not always correct. To this end knowing what a statistic means can help you make life choices. This activity demonstrates the process of collecting, displaying, describing, analyzing and drawing conclusions from data. This process is the main process of statistics. The charts that we made and the descriptions of trends that we found in the charts are example of descriptive statistics. The column marked total population is an example of the population of interest. The column called total agricultural population is an example of sample population. The question that asked you to make a decision based on the trends that you saw in the data is an example of inferential statistics. (The above activity shows that students interpret statistical results in context) (This example also makes use of graphical exploration of data) Assignment: Each student must make a hypothesis for the mean sunrise time in our city. Then they need to determine how many days they would need in order to use a hypothesis test for a population mean. Then they need to design a sampling plan for getting a random sample of days. Then after gathering the data and calculating a sample mean they need to compare their results with one year’s worth of sunrise times. They need to make a box plot of each to show their results visually. (Here the box plots incorporated median based statistics with mean based analysis) (This assignment also shows statistical methods of exploring data) Activity: Non – Linear Relationships and Transformation (AP Statistics) Go to the Global Monitoring Division of the National Oceanic and Atmospheric Administration. http://www.esrl.noaa.gov/gmd/index.html Choose the ‘products’ tab and select search for data. Restrict the search to ‘Carbon Dioxide’ and monthly averages. Then select the data from Ascension Island in the UK. Copy the data in this file and paste it into a spreadsheet. You will have to separate the data in the column by highlighting the data and then choosing the ‘data’ tab and selecting the ‘text to columns’ option. Then sort by spaces. Once you have done this create a scatter plot of carbon dioxide levels to month/year. Then create an appropriate sized viewing window so that you can see the detail of each month. Does the data look linear? What function do you think might help straighten this data set? When you include the regression line in the scatter plot what sorts of curviness do you notice? Describe two ways that your scatter plot is curvy. Even with the curviness would you still feel like you could predict the carbon levels at ascension island in the UK? What would be a good rule for predicting the carbon levels? Create the following column called ‘predicted’: =337.42+(8/60)*I + cos (π *(I-2) / 6)+2*cos(π*I / 200) Create a scatter plot of carbon level versus predicted. Find the correlation between ‘carbon level’ and ‘predicted’. (This assignment shows how students must interpret data in context and is shows graphical exploration of data and well as numerical approximation of data.) Sampling and Experimentation The following is from the syllabus: Sampling Methods Why sample? Designing a survey to Exploring and Bias Sample sizes determine how many sampling methods Selection bias hours students spend on in the context Measurement or homework at our school farming (plant response bias in the upper school. wilt) and how to Non-response bias Along with a discussion get a good sample Conceptual bias about how to do the given some Simple Random actual sampling. uncontrolled samples variables. Stratified random sampling Cluster sampling Systematic sampling Why not Convenience sampling? Why not volunteer sampling? How important sampling biases are for researchers when designing experiments. Assignment: Sampling (AP Statistics) An experimenter requires prior knowledge of a subject before they can enact a test of any significance. A specific experiment comes with the purpose of measuring some quantity. When sampling a population to make the desired measurement the experimenter needs to know what variables affect the quantity that they want to measure. Consider the following passage from the 1957 yearbook of agriculture on Soils (p 44) “Water is the medium that disperses the protoplasm in the cell. It is a medium by which physical force is effected on the cell wall to bring about expansion and growth. Only a small part of the water taken up by roots from the soils retained in the cells of the plants. Most of the water that is absorbed is conducted to the leaves, where it is lost by evaporation or transpiration. Since the evaporation of 1 gram of water requires 539 calories, the high rate of water loss that takes place from leaves on hot summer days acts as an evaporative cooler. One mature tomato plant in a warm arid climate will transpire a gallon of water in a day. As much as 700 tons of water may be needed to produce 1 ton of alfalfa hay. The water that is transpired by a cornfield in Iowa in a growing season is enough to cover the field to a depth of 13 to 15 inches. The loss of water from plants is controlled by incident light energy, relative humidity, temperature, wind, opening of pores (called stomata) in leaves, and supply of water in soil. Incident light energy is the most important factor because the evaporation of water requires a source of energy. Relative humidity is also important because evaporation takes place much more rapidly in a dry atmosphere than in a humid one. The other factors I mentioned are of a relatively minor consequence. If water loss by transpiration exceeds water intake by the roots, a water deficit develops in the plant, expansion of growing cells ceases, and the plant stops growing. If the water deficit continues the plant wilts. If it becomes too severe, the plant tissues wither and die. By what means can plant cells absorb and retain water when the atmosphere is evaporating it form the leaves and the soil is impeding its entry in to the roots? An illustration: When salt is applied to shredded cabbage, the tissue fluids diffuse out of the leaf slices and dissolve the salt making a brine. The cabbage leaves become limp, or flaccid. If the limp leaves are washed free of brine and placed in pure water, they again become stiff or turgid. This exemplifies one of the most fundamental characteristics of the water relationships of plants. It is the diffusion of water through a semipermeable membrane more commonly called osmosis. When two solutions differing in concentration are separated by a membrane impermeable to the dissolved substance, water moves from the solution of lower concentration to the one of higher concentration. “ Suppose you wanted to measure the average number of plants that showed wilt during a day under the current farming system. Suppose that your current sampling method is to start with a random plant, then choose 1 out of every k plants by rows starting at 4 p.m. until you reach 20% of the plants. You will do this each day for a week in order to obtain a relatively random sample of plants from the fields. From this sample you would count the number of plants that showed any wilting, find the average for each day, find the percentage of plants that showed wilt, and then generalize the result to the whole field. Then after looking for any trends you would make a suggestion to either keep the current farming method or modify the farming method. Situation # 1: A significant portion of the field is shaded by larger trees in such way that the shade would influence the incident light hitting 40% of the plants at 4 p.m. Situation # 2: The farmer forgot to put out the watering system two of the nights during the week. Situation # 3: You happened to choose a week where the weather was hitting record highs. (Would the extra wilting that you probably saw be cause to change the farming system?) Situation # 4: You happened to choose a week of record high winds so that the water transpired by the plants did not stay under the plants causing a higher percentage of plants to wilt. Situation # 5: The field contains two similar looking plants that have different preferred growing temperatures. One of the plants wilts more easily under the normal temperatures for the week that you chose to take the sample. Situation # 6: The farmer has managed to water the plants in such a way that they are not wilting, but they have also stopped growing. For two of the above situations do the following: Describe the problem with the sampling method. Make sure to include why the situation would skew the results from a 1 in k systematic sampling method. Classify each type of bias that shows up. Also make sure to include which sampling method could correct the bias and how. Follow up questions: Describe the difference between a sampling bias and a cause of wilting. If you found a significant portion of plants that wilted (say over 10% of the sample) what might be some causes for the wilting? (This examples show how we explore sampling in an actual experiment. They have to be able to decide which sampling methods work best for a given farming situation.) AP Problem: Blocking (AP Statistics) (This is an AP problem that we go over to explore blocking and random assignment.) Assignment: Each student must clearly state a null hypothesis of the difference of between the mean electrical usage in their house when everyone is awake and the mean electrical usage in their house after every one has gone to bed. They need to create a sampling plan to get a random sample during those times. Then after gathering their data and making their calculations they need to determine if the data shows any significant difference from their prediction. They should speculate any causes given their conclusion. (This is an example of how students get involved in designing experiments on their own. Here they have to create their own sampling plan and decide how they will measure electrical usage at the different times.) Assignment: Each student must clearly state a null hypothesis for the difference of the mean temperature of one floor of his or her family’s house compared with the mean temperature of another floor of his/her family’s house. The temperature readings should happen at the same time. The students should then comment on how well paired the data sets are. They should also speculate as to any causes given their conclusion. (This is another example of how students get involved in designing experiments on their own.) Anticipating Patterns Handout: Probability – Things to do in the face of a problem in probability (AP Statistics) The first thing to look for when doing a problem in probability is to decide which definition of probability the problem requires. The classical definition of probability often follows theoretical predictions. The relative frequency definition of probability often follows strings of events from which an experimenter records the frequency of successes. Once you know which definition to use then the next big question to always ask is: ‘What is the sample space?’ Directly following this question as often as you can you should write out all the outcomes in the sample space (time permitting). Then you should determine the size of the sample space. Sample spaces for the classical definition of probability look like a finite set listing all the potential outcomes based on the situation. For rolling two dice the sample space is the following { (1,2); (1,3); (1,4); (1,5); (1,6); (2,1); (2,2); (2,3); (2,4); (2,5); (2,6); (3,1); (3,2); (3,3); (3,4); (3,5); (3,6); (4,1); (4,2); (4,3); (4,4); (4,5); (4,6); (5,1); (5,2); (5,3); (5,4); (5,5); (5,6); (6,1); (6,2); (6,3); (6,4); (6,5); (6,6)} But the size of the sample space for the classical definition of probability DOES NOT CHANGE. Sample spaces for the relative frequency definition of probability look like strings of experiment results. In rolling two dice the sample space might look like { (1,4); (1,7); (5,2); (6,1)} which has only four rolls or it might have a string of 200 rolls. But with the relative frequency definition of probability the size of the sample space CAN CHANGE. Once you have done this you can then proceed to the problem and describe as clearly as possible in terms of the outcomes which event the problem focuses on. The last goal is to determine the size of the event. To do this look at the outcomes in the sample space and circle all the outcomes that belong to the event E for the problem. Once you have done this then you can find the quotient size of event space E P(E) = size of sample space S IMPORTANT: Make sure to remember that the classical definition of probability and a previous relative frequency measurement act like the prediction or theory, and that a new relative frequency measurement is like the experiment that tests the theory. If the theory (either from the classical definition or a previous relative frequency measurement) is a good one then the results from the new relative frequency string should agree with the predictions. The above process is one of the major activities of science. The above process also belongs to any discipline that makes measurements. That is how important statistical analysis is in our society. Helpful Hints: Do NOT try to guess the probabilities in a given problem. Instead ALWAYS use the formulas to calculate a probability. For disjoint events OR means ADD the probabilities For independent events AND means MULTIPLY the probabilities One thing that can help is if the problem uses the words ‘find the probability of E given that….’ Here you should use the formula for conditional probability. P(E F) P(E | F) = P(F) size of event space E Notice that the formula for conditional probability still looks like , with the only size of sample space S difference that the sample space is now F instead of S. Another observation that can help is if you can divide the sample space into a disjoint collection of sets whose union is the whole sample space. Often a problem will have options that divide that sample space into clear disjoint sets. This often indicates that you should use either the total probability rule or Bayes’ Theorem. The law of large numbers is an assumption that the relative frequencies in a string of experiments will get close to that actual probabilities of an event. It does not mean that the actual frequencies will ‘level out’ , however. This means that when flipping a fair coin the percentage of head will get closer to 50% as you increase the number of heads, but the actual number of heads minus the actual number of tails can grow to be quite large (53,000 heads and 49,000 tails yields a percentage very close to 51% heads but the difference between the number of heads and tails is 4000). When estimating probabilities empirically… It is fairly common practice to use observed long – run proportions to estimate probabilities. The process of estimating probabilities is simple: Observe a very large number of chance outcomes under controlled circumstances. Estimate the probability of an event by using the observed proportion of occurrence and by appealing to the interpretation of probability as a long run relative frequency and the law of large numbers. Two way tables can help keep track of the information concisely Keep in mind the concept of independence and conditional probability when looking at the results. You have to be careful with statements that use the ‘law of average’ which is different form the law of large numbers. Law of Averages (Bad Version) For every occurrence in favor of an event E there must be an occurrence that is not in favor of event E Law of Averages (Okay version) Eventually even unlikely events are bound to happen. Independence and the Law of Averages Notice that the law of averages still cannot say the following: If you have flipped 10 tails in a row then it is more likely that the next one will be a heads’. Independence of flips guarantees that each flip has a probability of showing heads 50% of the time. What is unlikely is the particular string of 10 tosses that specifically you got (10 tails is just as unlikely as 9 tails and 1 head) (This is an example of a handout I give my students on probability. In includes the basic rules of probability.) AP Problem: Variability in Inferential Statistics (AP Statistics) Example 1.2 from Statistics and Data Analysis Second Edition (p 7). Contaminant Concentration (in parts per million in well water) 45 frequency (avaerages taken over 200 40 35 30 25 days) Series1 20 15 10 5 0 10 11 12 13 14 15 16 17 18 19 average contamination (the average of five measurements) (in parts per million) As part of its regular water quality monitoring efforts, an environmental control board selects five water specimens from a particular well each day. The concentration of contaminants in parts per million (ppm) is measured for each of the five specimens, and then the average of the five measurements is calculated. The histogram above summarizes the average contamination values for 200 days. Now suppose that a chemical spill has occurred at a manufacturing plant about 1 mile from the well. It is not known whether a spill of this nature would contaminate ground water in the are of the spill and , if so, whether a spill this distance from the well would affect the quality of well water. One month after the spill, five water specimens are collected from the well. Which of the following average measurements would suggest that be convincing evidence that the well water was affected by the spill? (a) 10 (b) 12 (c) 16 (d) 18 (e) 20 Type of Problem – Bar Charts and Inferential Statistics Focus 1 – What is a ‘normal’ contaminant level for the well water? Answer E Before the spill, the average contaminant concentration varied from day to day. An average of 16 ppm would not have been an unusual value, and so seeing an average of 16 ppm after the spill isn’t necessarily an indication that contamination has increased. On the other hand an average as large as 18 ppm is less common, and an average of 22 ppm is not at all typical of the pre - spill values. Therefore, 20 ppm makes sense as an answer. (This is an AP problem that we cover on the first day of school that includes variability.) Normal The general In this activity Each student will Sec 7.6 # Distributions shape of the we try to have to go home and 7.67, 7.69, normal measure the make sample the 7.71, 7.73 distribution. length of our electric meter How to classroom. readings when they calculate areas From the data get home during using a table. that we collect some time frame. How z – score we calculate They will have to relates to the mean and think about how they calculating standard will sample the areas. deviations. meter. Then after How probability We also they have gathered notation works calculate a the data they need to with normal probability calculate the mean distributions. distribution and standard Upper tailed, function from deviation. They also lower tailed and the data. need to thing about two – tailed whether the data that calculations. they have looks Symmetry of normal. Assuming the normal that the data is distribution. normal they will also have to make some guesses as to how many measurements they think will fall in a certain range of values. (This is on the syllabus.) The Sampling Comparing the In ‘Cents and Each student then Sec 8.2, 8.17, Distribution of distribution of the Central will then look up the 8.19, 8.21, the Sampling single Limit number of coins 8.23 Mean measurements Theorem’ we minted in different of a random explore how years to help explain variable X with the Central why the distribution the distribution Limit of dates from pennies of the averages Theorem that we saw in class from samples of works. The looked the way it did. size N. distribution How the that we take distribution of samples from the averages is the from several distribution samples of size dates of 100 N can have a pennies. different shape than the original distribution. How the distribution of the averages from different samples of size N become more ‘Gaussian’ as the size of the sample increases. How the mean of the distribution of the averages from samples of size N gets closer to actual population mean from which the samples come. How the standard deviation of the averages from samples of size N get smaller as N increases. The Central Limit Theorem. (This is on the syllabus.) (This is an AP problem that we go over that explores combining independent random variables.) Statistical Inference The syllabus includes detailed coverage of chapters on confidence intervals for a proportion, the difference between two proportions, the mean, the difference between two means, and the slope of the regression line. The syllabus also covers hypothesis testing and chi – squared tests; goodness of fit and tests for homogeneity/independence. See chapters 5, and 9 – 12. The course draws connections between all aspects of the statistical process including design, analysis, and conclusion Projects: Each student must write an article for the school newspaper. Before they can submit an article they must design a survey or experiment, create a sampling plan, gather the data, analyze the data and come to a conclusion given the data set. Then they have to write an article summarizing what they found along with at least one graphical aid for any reader of that article. The course teaches students how to communicate methods, results and interpretations using the vocabulary of statistics. Assignment: Correlation (AP Statistics) Look up “linear relationships in science” in Google’s Scholarly index and find a pair of quantities that have a linear relationship. Make sure that you can identify the raw data that the scholars used to demonstrate the linear relationship between the variables. Read the article and summarize the article while including the following information. Describe which quantities have a linear relationship Include a scatter plot demonstrating the linear relationship Calculate the correlation for the raw data the scholars used to demonstrate the linear relationship. Describe how your calculation corresponds to the results in the paper that you read. (This an example of one assignment where each student must look up an existing study that demonstrates a particular statistical relationship between variables. Here they have to summarize the methodology and statistical analysis, they have to interpret and explain what the relationship using statistical vocabulary [here like correlation].) The course teaches students how to use graphing calculators to enhance the development of statistical understanding through exploring data, assessing models, and/or analyzing data. We will cover how to use the graphing calculators each time we encounter a feature that the graphing calculator can accommodate. These features include the following: calculating the mean, calculating the standard deviation, calculating the median, creating scatter plots, creating box plots, linear regression, non – linear regression, 1 sample t and z tests, 2 samples t and z tests, z confidence intervals, t confidence intervals, chi – squared tests for goodness of fit, chi – squared tests for homogeneity and independence. The course teaches students how to use graphing calculators, tables, or computer software to enhance the development of statistical understanding through performing simulations. We use simulations to help make the central limit theorem clearer and that it works independently of the beginning distribution. The course demonstrates the use of computers and/or computer output to enhance the development of statistical understanding through exploring data, analyzing data, and/or assessing models. Activity: Food and Agricultural Organization (AP Statistics) Materials: Laptop Statistics is a tool that is meant to analyze and help us understand data. To this end we will need several sources of data. The first source that we will make use of is the Food and Agriculture Organization of the United Nations. Please go to http://faostat.fao.org/ Click on ‘want to register?’ Register yourself at FAO by filling in the information for the following: Name Manlius Pebble Hill for the organization Educational institution for the type of organization USA for the country Check the first column of boxes (and any others that interest you) Use your school email address Make up a password We will use data from this website today and throughout the year. Go to http://www.fao.org/economic/ess/en/ and find the current agricultural yearbook. Find the spread sheets for the following: Total and Agricultural Population (including forestry and fisheries) (A1) Human Development Index and Poverty (G4) Find the definitions for total population, agricultural population, human development index and poverty. Find the units for each of the categories. Copy the columns for 2009 in each of the following categories into a new excel worksheet titled Excel Practice 1 AP Statistics. Make sure the countries match the data in each row. Name of Country total population agricultural population human development index Poverty Prevalence Year Poverty Prevalence was recorded In a new column you are going to calculate the ratio of agricultural population to total population. Label the column as such and then in the first row (lining up with the first country) place an equation that looks like ‘=E11/D11’ which should represent the ratio of agricultural population to total population of the first country. In this formula agricultural population for Afghanistan was in column E row 11 and the agricultural population for Afghanistan was in column D row 11. Then copy the formula in that cell and paste it to the rest of the cells in that row all the way down to the last country. The numbers should all be different and represent each country’s ratio. Note: all formulas in a cell for Excel should be preceded by an equal sign. Excel has a list of statistical functions that you can use, these are listed under ‘statistical functions’ in the help search menu. You will be using several of these functions from excel. Some of these include the following: =AVERAGE( range of cells ) This produces the average of all the numbers that you highlighted. =MEDIAN( range of cells) This finds the median or middle number of the cells that you highlighted. =SUM (range of cells ) This adds up the numbers in the cells that you highlighted =STDEV (range of cells ) This produces the Sample Standard Deviation for the cells that you highlighted. =CORREL ( first range of cells, second range of cells ) This produces the Correlation between the variables represented in the given two ranges of cells [usually two columns or two rows]. Do the following calculations with the data that you copied from the FAO statistical yearbook: Find the average agricultural population Find the median agricultural population Find the sum of the agricultural population and compare it to the world agricultural population. What should be true about these numbers? Find the standard deviation of the agricultural population. Find the correlation between the column labeled Human Development Index and the column that should represent the ratio ‘agricultural population : total population’ Make the following scatter plots and label the axes and each scatter plot: Human Development Index versus Proportion of Population that Farms Poverty Prevalence versus Proportion of Population that Farms What is the definition of the term ‘human development index’? What is the definition of the word ‘poverty’? What do you notice about the overall trends in each scatter plot? Does it look like there is any relationship between the different variables that you plotted? What do the points on the x axis mean? What would this suggest about what a nation should do to improve its human development index? Would your solution in the previous question automatically reduce poverty in a given country? Why or why not? What is considered the typical trend with respect to the percentage of people that farm? Why does USA’s poverty prevalence not show up in the table?