Melaku Michael Home Work Check 1 Notes: Section 1-3 (1.1-1.3) Unit 1: Key Terms Data—consist of information coming from observations, counts, measurements, or responses. Statistics—is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. Population—is the collection of all outcomes, responses, measurements, or counts that are of interest. Sample—is a subset of a population. Parameter—a numerical description of a population characteristic. Statistic—is a numerical description of a sample characteristic. Descriptive statistics—is the branch of statistics that involves the organization, summarization, and display of data. Inferential statistics—is the branch of statistics that involves using a sample to draw conclusions about a population. Lecture Notes Data consist of information coming from observations, counts, measurements, or responses. The collection of all observations for a particular variable is called a data set. Statistics is the science of collecting, organizing, and interpreting data in order to make decisions. There are two types of data sets: 1. Population—collection of all outcomes, responses, measurements, or counts that are of interest. 2. Sample—a subset (part) of a population. The diagram below illustrates the relationship between a population and a sample. The diagram above shows that the SAMPLE is only part of the POPULATION. In most cases, samples will be studied and then a somewhat accurate prediction can be made on the entire population. The sample size will be discussed in later sections. IMPORTANT NOTE: The sample is that part of the population from which information is obtained. EXAMPLE: In a recent survey, 3002 American adults were asked if they read news on the Internet at least once a week. Six hundred of the adults said yes. Identify the population and the sample. SOLUTION: Population—responses from all American adults Sample—responses of the 3002 American adults EXAMPLE: There are 634 students registered for an online math course in Gwinnett County during the summer of 2005. Forty-two of the math students taking statistics were asked to respond to an online survey. Identify the population and the sample. Population—all 634 students who registered for an online math course Sample—the 42 math students who registered for the statistics class ******************************************************************* A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic. EXAMPLE: Decide whether the numerical value describes a population parameter or a sample statistic. A recent survey of a sample of MBAs reported that the average starting salary for an MBA is less than $65,000—SAMPLE STATISTIC Starting salaries for the 667 MBA graduates from the University of Chicago School of Business increased 8.5% from the previous year—the numerical measure of 8.5% is based on all 667 graduates and is a POPULATION STATISTIC EXAMPLE: Decide whether the numerical value describes a population parameter or a sample statistic. The average for the statistics class for the first test is 86.5. John scored a 93 on the test. Sample Statistic—John’s score of 93 Population Statistic—class average of 86.5 ******************************************************************** There are two major branches of statistics: 1. 1. descriptive—this involves the organization, summarization, and display of data 2. 2. inferential—this involves using a sample to draw conclusions about a population EXAMPLE: A large sample of men, aged 48, was studied for 18 years. For unmarried men, 60% to 70% were alive at age 65. For married men, 90% were alive at age 65. Which part of the study represents the descriptive branch of statistics? What conclusions might be drawn from this study using inferential statistics? Solution: Descriptive statistics involves statements such as “for unmarried men, 60% to 70% were alive at age 65 and for married men, 90% were alive at age 65. A possible inference drawn from the study is that being married is associated with a longer life for men. IMPORTANT NOTE: If the intent of the study is to examine and explore the information obtained for its own interest only, the study is DESCRIPTIVE. However, if the information is obtained from a sample of a population and the intent of the study is to use that information to draw conclusions about the population, the study is INFERENTIAL. EXAMPLE: Suppose we want to look at the number of home runs hit last year by the Atlanta Braves. Descriptive Statistics would report the number of home runs hit. Inferential Statistics would use that number to predict the number of home runs that may be hit the next year. Unit 2: Key Terms Qualitative Data—consist of attributes, labels, or nonnumerical entries. Quantitative Data—consist of numerical measurements or counts. Nominal level of measurement—qualitative data only that is characterized using names, labels, or qualities. Ordinal level of measurement—qualitative or quantitative data that can be arranged in order, but differences between data entries are not meaningful. Interval level of measurement—quantitative data that can be ordered and calculate meaningful differences between data entries. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero. Ratio level of measurement—data that is similar at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so one data value can be expressed as a multiple of another. Lecture Notes There are two types of data: qualitative data and quantitative data. Qualitative data consist of attributes, labels, or nonnumerical entries. Examples of qualitative data are: Gender Job Title Hair color Eye color Marital status Quantitative data consist of numerical measurements or count. Examples of quantitative data are: Age Height Weight Salary Grade point average The table below shows how information can be separated into two data sets. The table shows the model and base price of various vehicles. Model (qualitative data) Base Price (quantitative data) Escort LX Ranger 4 x 2 XL Contour LX Taurus LX Windstar Explorer XL 4 x 2 Crown Victoria $11,430 $11,485 $14,460 $18,445 $19,380 $21,560 $21,135 The type of car is the qualitative data and the base price of each model is the quantitative date. EXAMPLE: More than 20,000 men and women set out to run the 107 th Boston Marathon. The run covered 26 miles. The race was watched by thousands of people in Boston and millions that watched it on television. The winning time for the men was 2 hours and 10 minutes and for the women the winning time was 2 hours and 25 minutes. The qualitative data for the Boston Marathon was gender—men and women. The quantitative data for the Boston Marathon was the number of men and women, the miles and the finishing times. EXAMPLE: Human beings have one of four blood types: A, B, AB, or O. What kind of data do you receive when you are told your blood type? ANSWER: qualitative—the variable is “blood type” ************************************************************************ There are 4 levels of measurement for data. The level of measurement determines which statistical calculations are meaningful. The four levels of measurements are: 1) 1) Nominal 2) 2) Ordinal 3) 3) Interval 4) 4) Ratio Data at the nominal level are qualitative only. Data at this level are categorized using names, labels, or qualities. NO mathematical computation can be made at this level. Television Network Affiliates are examples of data at a nominal level. KATU, KGW, KOIN, and KPDX are just some of the network affiliates. The call letters of the network affiliates are simply names. We cannot do any computations with this set of data. School names are examples of data at the nominal level. Dacula High School, Duluth High School, Phoenix High School are merely names that we can identify. Data at the ordinal level are qualitative or quantitative. Data at this level can be arranged in order, but differences between data entries are not meaningful. The top 5 TV programs can be listed in order from 1-5. An example of this would be: 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. Seinfield E.R. Veronica’s Closet Friends NFL Monday Night Football The difference between a rank of 1 and 5 has no mathematical meaning. However, this data is ordinal because the rankings can be listed in order. Data at the interval level are quantitative. The data can be ordered and you can calculate meaningful differences between data entries. At the interval level, a zero entry simply represents a position on a scale; the entry is not an inherent zero. An inherent zero is a zero that implies “none”. For example, the amount of money you have in a savings account could be zero dollars. The zero represents no money. However, a temperature of 0 degrees does not represent a condition where no heat is present. The 0 degrees temperature is simply a position on the Celsius scale: it is not an inherent zero. An example of data at the interval level would be the years in which the New York Yankees won the World Series. They are: 1923, 1927, 1928, 1932, 1936, 1937, 1938, 1939, 1941, 1943, 1947, 1949, 1950, 1951, 1952, 1953, 1956, 1958, 1961, 1962, 1977, 1978, 1996, and 1998. To find the ratio of the year 1923 and 1958 has no mathematical value, thus it cannot be at the ratio level. However, to find how many years between the first and last years that the Yankees won the World Series does have a mathematical value. Data at the ratio level are similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so one data value can he expressed as a multiple of another. An example of data at the ratio level is shown in the table below: 1997 American League Home Run Totals (by team) Anaheim 161 Baltimore 196 Boston 185 Chicago 158 Cleveland 220 Detroit 176 Kansas City 158 Milwaukee Minnesota New York Oakland Seattle Texan Toronto 135 132 161 197 264 187 147 Using this data, you can find differences and ratios that are meaningful. For example, Seattle hit twice as many home runs as Minnesota did. The table below summarizes meaningful operations at the four levels of measurement. Level of Measurement Nominal Ordinal Interval Ratio Put data in categories Yes Yes Yes Yes Arrange data in order No Yes Yes Yes Subtract data values No No Yes Yes Determine if one data value is a multiple of another No No No Yes The table indicates the type of data that is found at each level. Level of measurement Nominal Ordinal Interval Ratio Type of Data Qualitative Qualitative or Quantitative Quantitative Quantitative View the chart on page 11 of your text to see examples and meaningful calculations of the four levels of measurement. Unit 3: Key Terms Statistical Study—the collecting, analyzing, and reporting of data. Experiment—an action whose outcome cannot be predicted with certainty. Simulation—the use of a mathematical or physical model to reproduce the conditions of a situation or process. Census—a count or measure of an entire population. Sampling—a count or measure of part of a population. Survey—an investigation of one or more characteristics of a population. Simple Random Sample—sample in which every member of the population has an equal chance of being selected. Stratified Sample—members of the population are separated into groups with similar characteristics such as age, gender or ethnicity. Cluster Sample—the unit for sampling is a naturally occurring subgroup. Systematic Sample—each member of the population is assigned a number. Convenience Sample—consists only of the available people. Placebo--treatment that has no value to the experiment. A “dummy” treatment. Lecture Notes The goal of every statistical study is to collect data and then use the data to make a decision. The process used to collect the data plays the most important role in the validity of your findings. If the process is flawed, then your decision could be called into question. Use the following GUIDELINES when designing a STATISTICAL STUDY: Identify the variable(s) of interest and the population of the study Develop a detailed plan for collecting data. Make sure the data are representative of the population Collect the data Describe the data with descriptive statistics techniques Make decisions using inferential statistics. Identify any possible errors. There are several ways to collect data. The focus of the study dictates the best way to collect the data. The following table shows four methods of data collection. Method Perform an experiment— experiments are often “double blind.” This means that neither the researcher nor the subject know which subjects are receiving NO treatment or Characteristics Treatment is applied to PART of the population The other PART of the population is used as a control group—given NO treatment Example Testing the effect of imposing a new marketing strategy in a certain region. a PLACEBO—treatment that has no value to the experiment. Responses from both groups Results are compared Computers, tables, or calculators are used in the collection of data Helps the researcher study situations that are impractical or dangerous to create in real life. Save time and money Provides complete information Costly Difficult to perform Takes enormous time Used to predict population parameters More practical than a census Use a simulation—the use of a mathematical or physical model to reproduce the conditions of a process. Automobile manufacturers use simulations with dummies to study the effects of crashes on humans. Take a census—a count or measure of an entire population. Use sampling—a count or measure of PART of a population Determine the population of Gwinnett County Determine the population of a city in Gwinnett county to predict the population of Gwinnett county The imitation of chance behavior, based on a model that accurately reflects the experiment under consideration is called a simulation. Reasons why we use simulation: it is an effective tool for finding likelihoods of complex results random digits from tables or calculators help simulate repetitions quickly gives good estimates of probabilities Example When using simulation for an experiment, follow these steps: 1. 1. State the problem or describe the experiment. Toss a coin 10 times. What is the likelihood of a run of at least 3 consecutive heads or 3 consecutive tails? 2. 2. State the assumptions. a. a.a head or a tail is equally likely to occur on each toss b. b. tosses are independent of each other 3. 3. Assign digits to represent outcomes. a. a.one digit simulates one toss of the coin b. b. odd digits represent heads; even digits represent tails 4. 4. Simulate many repetitions. 1 9 2 2 3 9 5 0 3 4 H H T T H H H T H T 5. 5. State your conclusions. The results in Step 4 do produce at least 3 consecutive heads. In order to generate random digits between any two specified values on the TI-83 calculator: Press the math button Arrow over to PRB Scroll down to randInt Enter (beginning value, ending value, # of values) EXAMPLE: Which method of data collection would you use to collect data for each study? 1. A study of the effect of exercise on senior citizens. Focus: Effect of exercise on senior citizens. Population: Collection of all senior citizens. Method of data collection: Experiment 2. A study of the effect of radiation fallout on senior citizens. Focus: Effect of radiation fallout on senior citizens. Population: Collection of all senior seniors. Method of data collection: Sampling 3. A study of the effect of learning statistics online. Focus: Effect of learning statistics online. Population: Collection of all online statistics’ students. Method of data collection: Sampling (if the population is large) Sometimes a SURVEY can help with data collecting. A survey is an investigation on one or more characteristic of a population. Most surveys are carried out on people by asking them questions. However, the wording of the question can lead to biased results. The following chart shows examples of different methods of data collection used in statistical studies. Statistical Study Data collection method The effect of an asteroid colliding with Simulation—because it is impractical to Earth create this situation The effect of aspirin on preventing Experiment—because the effect of a heart attacks treatment (taking aspirin) is being measured The weights of all lineman in the Census—because teams keep accurate National Football League records of all players Americans’ approval rating of the U.S. Sampling—because it would be nearly president impossible to talk to every American A biased sample is one that is NOT representative of the population. An example of a biased sample would be a sample consisting of only 18-22 year old college students if the statistical study was to research the 18-22 year old population of the country. Why is this sample biased? That’s right—because the sample consists only of 18-22 year old college students. There are lots of other samples of 18-22 year olds in the country. When collecting data, it is important to use correct sampling techniques. The results that you report can be questionable if the collection process is faulty. IMPORTANT NOTE: If the data was collected solely because you were looking for a particular outcome, or the subjects just happened to be your friends, the collection process could be the downfall of your study. Remember that the data should reflect what actually happens, not what you want the numbers to say. The following table shows the different types of sampling techniques. Sampling Technique Characteristics Every member of the population has an equal chance of being selected. Simple random sample Members of the population are separated into groups with similar characteristics. Stratified sample Subgroups are formed and subgroups are selected and each member from that group is used in the sample. Each member of the population is assigned a number. Cluster sample Systematic sample Consists only of the available people. This often leads to biased studies. Convenience sample IMPORTANT NOTE: Keep in mind that the sample that you choose will be used to draw conclusions about the entire population. Thus, the sample should be a representative sample—it should reflect as closely as possible the relevant characteristics of the population under consideration. EXAMPLE: You want to determine the average weights of the adult male. The sample should not be professional football players. EXAMPLE: You want to determine the average SAT score for Gwinnett County High Schools. The sample should not be all students from one high school. EXAMPLE: You want to determine the opinion of students at your school regarding gun control. Identify the sampling technique you are using if you select the samples listed. 1. You select students who are in your statistics class. Sampling Technique: Convenience Sampling 2. You assign each student a number, and after choosing a starting number, question every 25th student. Sampling Technique: Systematic Sampling View the table on pages 18-19 to see additional examples of each sampling technique.