professional documents
home
Upload
docsters
Upload
Baseball Findings The statistics behind the game Harlan Thompson Sungjin Cho Ryan Fagan An Introduction • Throughout its long history, baseball has been the subject of many statistical studies. It lends itself well to statistics because very careful records are kept of everything that happens in every game. • The topics that have been studied range from the affect of interleague play on team standings to the role of chance in streaks and slumps • Other topics of study include records and predicting the outcomes of games. • We thought that looking at home runs and salary would be interesting because the great number of home runs hit and the inflation of salaries are both controversial topics. Home Runs Per Year -How has the total number of home runs in major league baseball changed from year to year? year 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 HR/game 1.3879 1.1497 1.7303 1.4036 1.6301 1.4658 1.2685 1.6044 1.5674 1.547 1.7104 1.8105 year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 HR/game 2.1216 1.501 1.463 1.575 1.6064 1.4639 1.7778 1.6399 1.7994 2.0688 2.0459 2.084 2.2749 Test #1 • We ran a regression with the year as the independent variable and the number of home runs as the dependent variable to find out the rate at which the number of home runs in the league is increasing. Scatterplot Results Source | SS df MS -------------+-----------------------------Model | .911664082 1 .911664082 Residual | .966758514 23 .042032979 -------------+-----------------------------Total | 1.8784226 24 .078267608 Number of obs = 25 F( 1, 23) = 21.69 Prob > F = 0.0001 R-squared = 0.4853 Adj R-squared = 0.4630 Root MSE = .20502 -----------------------------------------------------------------------------hr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------year | .0264817 .0056862 4.66 0.000 .0147189 .0382445 _cons | 1.350108 .079607 16.96 0.000 1.185428 1.514787 ------------------------------------------------------------------------------ Interpretation • The 95% confidence interval for the coefficient of year is totally positive - this shows that the number of home runs is definitely increasing each year. • An R2 value of .4853 clearly shows a positive relationship, although not a very strong one. This could be because many other factors can affect the number of home runs hit -- weather, injuries to certain players, etc. • The coefficient of year is .0264817, so each year about .02648 more home runs are hit in each game. This is over 4 more home runs per year. Test #2 • We split up the home run data into 2 separate groups 1976-1987 and 1988-1999. • Then we ran a hypothesis test on the two groups to find out if their variances are equal to determine whether or not we could use a paired t test on the data. • We used the following hypotheses: H0 : var(HR (‘76 - ‘87)) = var(HR(‘88-’99)) HA : var(HR(‘76 - ‘87)) not= var(HR(‘88-’99)) -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------hr1 | 12 1.584108 .073678 .255228 1.421944 1.746272 hr2 | 12 1.775 .0807902 .2798653 1.597182 1.952818 ---------+-------------------------------------------------------------------Comb. | 24 1.679554 .0570527 .2794998 1.561532 1.797577 ------------------------------------------------------------------------------ Results Ho: sd(hr1) = sd(hr2) F(11,11) observed = F_obs = 0.832 F(11,11) lower tail = F_L = F_obs = 0.832 F(11,11) upper tail = F_U = 1/F_obs = 1.202 Critical values at .05 significance level: (.288, 3.47) Because the F statistic does not lie outside of this region, we cannot reject the null hypothesis!! Interpretation • The variance in home run hitting has not changed significantly over the past 25 years. • Therefore we can use these two sets of data in a paired t test to determine whether or not the number of home runs hit has increased. Test #3 • Because we found that the two groups did not have an appreciable difference in variance, we can use a paired t test to determine whether or not the number of home runs hit per year has risen from the period 1976-1987 to the period 1988-1999. • So we ran a hypothesis test on the two groups with the following hypotheses: H0 : HR (‘76 - ‘87) = HR(‘88-’99) HA : HR(‘76 - ‘87) not= HR(‘88-’99) Results Paired t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------hr1 | 12 1.584108 .073678 .255228 1.421944 1.746272 hr2 | 12 1.775 .0807902 .2798653 1.597182 1.952818 ---------+-------------------------------------------------------------------diff | 12 -.1908917 .0665798 .230639 -.3374327 -.0443506 -----------------------------------------------------------------------------Ho: mean(hr1 - hr2) = mean(diff) = 0 Ha: mean(diff) < 0 t = -2.8671 P < t = 0.0077 Ha: mean(diff) ~= 0 t = -2.8671 P > |t| = 0.0153 Ha: mean(diff) > 0 t = -2.8671 P > t = 0.9923 Interpretation • The mean for the years from 1976 to 1987 was 1.584108 HR/game vs. 1.775 HR/game from 1988 to 1999. • We can reject our null hypothesis because we found t = -2.8671 (much less than the critical value -1.96). • The the probability of Type I error is only .0153. • Therefore, the mean number of home runs per game from 1988 to 1999 was significantly greater than the mean number from ‘76 to ‘87. • So, the number of home runs per year does seem to be increasing over time. Home Runs by Position First we looked at last year’s home runs by position for each team. The following is a sample of the data we accumulated... Team Anaheim NY Mets San Fran SS HR 6 4 20 1B HR 36 22 19 2B HR 9 25 33 3B HR 47 24 10 C HR 14 13 14 LF HR 35 15 49 CF HR 25 17 12 RF HR 34 18 24 TOT HR 206 138 181 Next we calculated the total number of home runs and at bats as well as the average number of home runs per at bat from each position for the whole league (in order of performance)... Position First Base Left Field Right Field Center Field Third Base Catcher Shortstop Second Base HR/AB 0.051925 0.047185 0.0462273 0.0391384 0.038288 0.0348063 0.0243771 0.0239095 HRs 752 629 667 627 523 381 354 300 ABs 14737 13098 14314 15623 13154 10681 14050 14535 Do some positions hit significantly more than the average? • The league average of home runs per at bat is .0384. • For each position, we used binomial hypothesis tests to test whether or not the number of home runs per at bat from that position differs significantly from the mean. • For each position, Ho : HR/AB = .0384 HA : HR/AB not= .0384 (Reject if |z| > 1.96) Results SIGNIFICANTLY BETTER (reject null) • • • First Base: z = 7.978 Left Field: z = 5.731 Right Field: z = 5.104 ABOUT AVERAGE (accept null) • • • Center Field: z = 1.127 Third Base: z = 0.812 Catcher: z = -1.468 BELOW AVERAGE (reject null) • • Shortstop: z = -8.145 Second Base: z = -11.143 Interpretation • So, we’ve proven that first basemen, left fielders and right fielders are significantly above the mean in home run hitting. • Shortstop and second basemen are significantly below the mean in home run hitting. • Center fielders, third basemen and catchers are about average. • This makes sense - the players at positions that require the most mobility (shortstop, second base) would obviously not be as powerful as those who play positions require less speed and agility. • It is interesting that center fielders are significantly different from the other outfielders - they do have to have a lot more flexibility and speed. Does salary affect performance? • We looked at team salary vs. number of wins to see if the amount of money paid to the players has a significant affect on a team’s performance. Below is some of the data we used. 2000 Team New York Yankees Los Angeles Dodgers New York Mets Boston Red Sox Atlanta Braves Cleveland Indians Arizona Diamondbacks St. Louis Cardinals Baltimore Orioles Texas Rangers Seattle Mariners Detroit Tigers Toronto Blue Jays Chicago Cubs Tampa Bay Devil Rays Colorado Rockies P ayroll ($) $114,336,616 $105,040,202 $99,793,463 $97,022,789 $94,537,875 $90,488,555 $87,029,013 $80,749,563 $80,466,320 $72,683,709 $69,861,939 $68,586,561 $66,814,275 $65,297,578 $65,161,683 $64,767,786 Wins 87 86 94 85 95 90 85 95 74 71 91 79 83 65 69 83 San Diego Padres San Francisco Giants Anaheim Angels Hous ton Astros Philadelphia Phillies Cincinnati Reds Oakland Athletics Chicago White Sox Milwaulkee Brewers Montreal Expos Pittsburgh Pirates Kansas City Royals Florida Marlins Minnesota Twins $64,144,989 $59,566,105 $59,198,764 $58,294,429 $53,894,196 $53,894,196 $42,988,297 $42,332,755 $41,478,423 $39,477,830 $36,273,762 $31,807,466 $30,941,620 $23,499,966 76 97 82 72 65 85 91 95 73 67 69 77 79 69 Wins vs. Payroll for 2000 114 .3 23 .5 65 Wins 97 Wins vs. Payroll for 1999 Wins vs. Payroll for 1998 Results for 2000 Regr ession Results - 2000 Source | SS df MS ---------+-----------------------------Model | 3171.9669 1 3171.9669 Residual | 13079.9633 28 467.141547 ---------+-----------------------------Total | 16251.9302 29 560.411387 Number of obs = F( 1, 28) Prob > F R-squared Adj R-squared Root MSE = = = = = 30 6.79 0.0145 0.1952 0.1664 21.613 -----------------------------------------------------------------------------payroll | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------wins | 1.046749 .4017006 2.606 0.015 .2239026 1.869595 _cons | -19.44844 32.76286 -0.594 0.558 -86.56012 47.66324 -----------------------------------------------------------------------------Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------payroll | 30 65.30333 23.67301 23.5 114.3 wins | 30 80.96667 9.991318 65 97 Results for 1999 Regr ession Results - 1999 Source | SS df MS ---------+-----------------------------Model | 6826.84626 1 6826.84626 Residual | 7726.42119 28 275.943614 ---------+-----------------------------Total | 14553.2675 29 501.836809 Number of obs = F( 1, 28) Prob > F R-squared Adj R-squared Root MSE = = = = = 30 24.74 0.0000 0.4691 0.4501 16.612 -----------------------------------------------------------------------------payroll | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------wins | 1.225894 .2464638 4.974 0.000 .7210361 1.730753 _cons | -50.38485 20.16826 -2.498 0.019 -91.69766 -9.072042 -----------------------------------------------------------------------------Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------payroll | 30 48.79 22.40171 14.7 92 wins | 30 80.9 12.51578 63 103 Results for 1998 Regr ession Results - 1998 Source | SS df MS ---------+-----------------------------Model | 4844.58182 1 4844.58182 Residual | 4057.20604 28 144.900216 ---------+-----------------------------Total | 8901.78786 29 306.958202 Number of obs = F( 1, 28) Prob > F R-squared Adj R-squared Root MSE = = = = = 30 33.43 0.0000 0.5442 0.5279 12.037 -----------------------------------------------------------------------------payroll | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------wins | .9553504 .1652225 5.782 0.000 .6169076 1.293793 _cons | -36.30338 13.56227 -2.677 0.012 -64.08443 -8.52233 -----------------------------------------------------------------------------Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------payroll | 30 41.08 17.52022 8.3 71.9 wins | 30 81 13.52902 54 114 Interpretation • The R2 value for the year 2000 (.1952) did not reflect a significant correlation, however years 1998 (.5442) and 1999 (.4691) reflect a relationship between total payroll and number of wins • Because the coefficient of the number of wins is roughly 1 for all three years, we can conclude that an additional win costs about a million dollars. Salary and home run hitting • Finally, we thought we’d combine these two studies of salary and home run hitting and analyze how the changes in average salary have been resulted in changes in the number of home runs hit per person. Exactly how many more home runs are we getting per $1? • We looked at data from 1969 to 2000. • We found average salary but we could not find average number of home runs/player. However we thought the leader in home run percentage might give some kind of portrayal of the number of home runs being hit. Salary vs. Home Runs Year 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 Salary(thousands) 24.9 29.3 31.5 34.1 36.6 40.8 44.7 51.5 76.1 99.9 113.6 143.8 185.7 241.5 289.2 329.4 HR Pct Leader 9.16 7.95 9.49 7.57 10.2 6.93 7.17 7.81 8.46 7.18 9.02 8.76 8.76 7.45 7.49 6.87 Year 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Salary(thousands) 371.6 412.5 412.5 438.7 497.3 597.5 851.5 1028.7 1076.1 1168.3 1110.8 1120 1336.6 1398.8 1611.2 1895.6 HR Pct Leader 7.92 7.08 8.8 7.18 8.66 8.9 7.69 8.99 8.58 9.75 12.3 12.29 9.29 13.75 12.48 10.21 Regression Results Source | SS df MS -------------+-----------------------------Model | 40.3836608 1 40.3836608 Residual | 54.7139269 30 1.82379756 -------------+-----------------------------Total | 95.0975877 31 3.06766412 Number of obs = 32 F( 1, 30) = 22.14 Prob > F = 0.0000 R-squared = 0.4247 Adj R-squared = 0.4055 Root MSE = 1.3505 -----------------------------------------------------------------------------hrpct | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------sal | .002094 .000445 4.71 0.000 .0011852 .0030029 _cons | 7.760353 .3369654 23.03 0.000 7.072178 8.448528 ------------------------------------------------------------------------------ Interpretation • The coefficient of salary is .002094 and the entire confidence interval for this value is positive. So it seems that an increase in salary may produce an increase in home run hitting. For every additional hundred thousand dollars in average salary, the leading home run hitter would hit home runs .2% more. We found an R2 value of .4247, which is fairly significant. However, from 1969 to 1976, salary stayed fairly standard (compared to the inflation today), so this may have hurt our regression since home runs were increasing at the time, although not as rapidly as recently. This suggests that home runs and salary may be increasing independently through time. There may not be an actual relationship between the two. Further study would be needed to determine if they are related. • • •
flag this doc
64
0
not rated
0
5/28/2008
English
Preview

Baseball in the Olympics

sammyc2007 5/28/2008 | 86 | 1 | 0 | educational
Preview

Government Baseball

sammyc2007 5/28/2008 | 52 | 0 | 0 | educational
Preview

The Baseball Scorecard

sammyc2007 5/28/2008 | 97 | 2 | 0 | educational
Preview

BASEBALL AMERiCA

sammyc2007 5/28/2008 | 89 | 1 | 0 | educational
Preview

Baseball and the Law

sammyc2007 5/28/2008 | 104 | 0 | 0 | educational
Preview

Baseball Challenge

sammyc2007 5/28/2008 | 61 | 0 | 0 | educational
Preview

BASEBALL in Japan

sammyc2007 5/28/2008 | 96 | 0 | 0 | educational
Preview

Baseball Jeopardy

sammyc2007 5/28/2008 | 110 | 0 | 0 | educational
Preview

Baseball Salaries

sammyc2007 5/28/2008 | 81 | 1 | 0 | educational
Preview

The Baseball Project

sammyc2007 5/28/2008 | 77 | 0 | 0 | educational
Preview

The Expansion of Baseball

sammyc2007 5/28/2008 | 47 | 0 | 0 | educational
Preview

The Physics of Baseball

sammyc2007 5/28/2008 | 64 | 2 | 0 | educational
Preview

HITTING A BASEBALL - HOW TO

sammyc2007 5/28/2008 | 48 | 0 | 0 | educational
Preview

How to Play Baseball

sammyc2007 5/28/2008 | 47 | 1 | 0 | educational
Preview

Baseball in Japan compared Baseball in the United States

sammyc2007 5/28/2008 | 8 | 0 | 0 | educational
Preview

WEST VIRGINIA desarrollo económico autoridad solicitud de ayuda financiera en espanol

sammyc2007 6/13/2008 | 293 | 2 | 0 | legal
Preview

Valoración en espanol

sammyc2007 6/13/2008 | 251 | 0 | 0 | legal
Preview

Venta de cuentas de las empresas en espanol

sammyc2007 6/13/2008 | 312 | 4 | 0 | legal
Preview

Una declaración de deseo de una muerte natural en espanol

sammyc2007 6/13/2008 | 279 | 3 | 0 | legal
Preview

Valor de arrendamiento y subarrendamiento en espanol

sammyc2007 6/13/2008 | 522 | 2 | 0 | legal
Preview

Última voluntad y testamento en espanol

sammyc2007 6/13/2008 | 425 | 1 | 0 | legal
Preview

Última voluntad y testamento esta es la última voluntad y testamento de mí en espanol

sammyc2007 6/13/2008 | 250 | 0 | 0 | legal
Preview

Toda la solución de acuerdo todos los derechos en espanol

sammyc2007 6/13/2008 | 229 | 0 | 0 | legal
Preview

Última voluntad y testamento CONOCER TODOS LOS HOMBRES POR ESTOS PRESENTA que yo en espanol

sammyc2007 6/13/2008 | 354 | 0 | 0 | legal
Preview

Subcontrato para construir casa en espanol

sammyc2007 6/13/2008 | 316 | 0 | 0 | legal
 
review this doc