Final Assignment 1
Business Statistics Final Assignment
Benjamin W. Kratz
Professor Bruce Busbee
BUSN 5760 Applied Statistics
Webster University at Fort Jackson
October 13, 2008
Final Assignment 2
When running a business it is imperative to perform four types of statistical analysis: ANOVA testing, Linear
Regression, Correlation Analysis, and Pricing Indexes. By looking at each of the statistical analysis, the business
owners and employees can determine how prices and wages will change along with what variables are the main
influences for the change. By determining this, they are able to create models that can predict the future outcome
with a statistical accuracy of 90-99%.
Final Assignment 3
Businesses use statistical data to answer the “so what?” Their goal is be able to predict how the economy will
change and what variables will cause the greatest influence and to what extent they influence the end cost. ANOVA
testing, Linear Regression, Correlation Analysis, and Pricing Indexes are four key statistical analysis areas they can
use to determine how the economy will change.
Businesses use ANOVA testing to see if the means of a population are the same (your null hypothesis) or if they
differ between populations (your research hypothesis) by looking at the variances. An ANOVA will tell you if there
is a statistically significant difference between group means (averages) based on group variances and sample sizes.
When conducting the ANOVA, they look for the total variation by obtaining the sum of the squared differences
between each observation and the overall mean. When calculating the total variation they break the computation
down into two separate components. The first component is the treatment variation (TV) and is computed by taking
the sum of the squared differences between each treatment mean and the total mean. The second component is the
random variation (RV) and is computed by taking the sum of the squared differences between each observation and
its treatment mean. The RV information also indicates the error component. The ANOVA test procedure produces
an F-statistic, which is used to calculate the p-value. To determine F distribution they use the following equation:
If the null hypothesis is correct, we expect F to be about one, whereas "large" F indicates a location effect.
How big should F be before we reject the null hypothesis? In statistical hypothesis testing, we use a p-value
(probability value) to decide whether we have enough evidence to reject the null hypothesis and say our research
hypothesis is supported by the data. To find the 1 percent level of significance they can use the chart found in
Appendix B.4 in the textbook titled “Statistical Techniques in Business and Economics” (Lind, Marchal, and
Chapter 12 of the same textbook provides a great example of the ANOVA by surveying passengers from four
different airlines. The intent is to find if there is a difference in the mean satisfaction level among the four airlines.
Final Assignment 4
The survey included questions on ticketing, boarding, in-flight service, baggage handling, pilot
communication, and so forth. Twenty-five questions offered a range of possible answers: excellent,
good, fair, or poor: A response of excellent was given a score of 4, good a 3, fair a 2, and poor a 1.
These responses were then totaled, so the total score was an indication of the satisfaction with the
flight. The greater the score, the higher the level of satisfaction with the service. The highest possible
score was 100. (Lind, Marchal, and Wathen, 2008).
Table 1: Results from Surveys: (Lind, Marchal, and Wathen, 2008).
Eastern TWA Allegheny Ozark
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
The null hypothesis and the alternate hypothesis are as follows:
H0: µ1 = µ2 = µ3 = µ4 H1: µ1 > µ2 > µ3 > µ4
Acceptance of the null hypothesis means that there is no difference in the mean scores for all four airlines.
Rejection of null hypothesis means that there is no a difference in at least one pair of mean scores. However, the
initial computation will not denote which data group differs or how many data groupings differ. The F distribution
is used to test the statistical data using a significance level of .01. To formulate the decision rule we look for the
critical value (cv) by using Appendix B.4 (Lind, Marchal, and Wathen, 2008). In order to find the cv the degrees of
freedom (df) need to be identified for the numerator (k) by taking the total number of treatments and subtracting 1
and the denominator by taking the total number of observations (n) and subtracting the number of treatments.
Therefore, df for the numerator = k – 1 = 4 – 1 = 3 and df for the denominator = n - k = 22 – 4 = 18. By using the 3
and 18 they compute the cv to be 5.09 which if the computed value of F exceeds 5.09 then they will reject H0. The
final step is to select the sample, perform the calculations, and make a decision as shown in Table 2.
Table 2: ANOVA Computation Layout: (Lind, Marchal, and Wathen, 2008).
Source of Variation Sum of Squares df Mean Square F
Treatments SST k-1 SST/(k-1) = MST MST/MSE
Error SSE n-k SSE/(n-k) = MSE
Total SS total n-1
Final Assignment 5
Excel commands for the one-way ANOVA were used to run the data and the results are shown in Table 3.
Table 3: Excel One-way ANOVA Computation Results:
Groups Count Sum Average Variance
Eastern 4 349 87.25 36.92
TWA 5 391 78.20 58.70
Allegheny 7 510 72.86 30.14
Ozark 6 414 69.00 13.60
Total 22 1664 75.64
Source of Variation SS df MS F P-value F crit
Between Groups 890.68 3 296.89 8.99 0.0007 3.16
Within Groups 594.41 18 33.02
Total 1,485.09 21
The results tell us that the total error is 1,485.10 and the error within the groups is 890.68. The information
needed to accept or reject is the error between groups, which is 890.68. By taking the error between groups and
dividing it by the df (3) the mean square (MST) of 296.89 is obtained. Doing the same with the error within groups
the mean square (MSE) of 33.02 is obtained. There is a large differential between each mean square providing an
early indication that the hypothesis may be rejected since the between group error is larger than the within group
error. To confirm this notion the formula MST/MSE is used to find the F value. The calculated F Value is then
compared the the critical value obtained from Appendix B.4, pg. 789 (Lind, Marchal, and Wathen, 2008). By taking
the df for the denominator (18) and the df for the numerator the critical value of 5.09 is derived. The final step is to
compare the F value and the F critical value. Since the F value is greater than the F critical value the H o is rejected
and it indicates that there is a significant difference between each sample group. So what? By looking at the P-
value of .0007 it is determined that the probability of finding a f value larger when the null hypothesis is true is very
small. Since this is the case, the likelihood of obtaining a Type I error is very small.
With this data, a customer wanting to travel would be able to know that not all airlines provide the same level of
service with the same satisfaction. By knowing this, the customer would then begin to look more closely at the
survey and try to see what services were mentioned. Were the services that they prefer to use included in the
survey? If so what were the results for those services? By answering these questions, the customer is adding
additional weighted value to the variables, which would result in a new analysis of the data.
Final Assignment 6
To narrow down the influencing variable, more data would be needed to perform a correlation analysis to see
what service or combination of services have the greatest influence on flight satisfaction. The bad part about
performing statistical analysis on such a streamlined data set is that one person’s personal satisfaction is not the
same as another person. This creates a degree of bias in the data set that can potentially mislead the true nature of
the statistical findings. To eliminate some of the bias in the survey, there should be a control factor provided as a
means to reflect the bias data within the statistical analysis.
Linear Regression and Correlation Analysis:
While the ANOVA allow businesses to find similarities between several populations, it still leaves questions to
be answered as noted in the problem with the airlines. So how do businesses answer the question of how do the
variables relate to each other. In order to answer this type of questions a correlation analysis needs to be conducted
to create a model of the data that if we have a known value we will know the resulting value. To know the range of
certainty the business would also perform a linear regression and confidence interval.
Douglas Lind and associates provide a great example of this in Chapter 13 as they talk about the copier sales of
America (2008). The example looks the number of sales calls and copiers sold for 10 salespeople, as seen in Table
4) to see if there is a direct or indirect relationship between the number of calls made and the number of copiers sold.
They look to create a model using correlation analysis to measure the association between two variables. To do the
Table 4: Number of Sales Calls and Copiers Sold for 10 Salespeople:
Sales Represenative Number of Sales Calls Number of Copiers Sold
Tom Keller 20 30
Jeff Hall 40 60
Brian Virost 20 40
Greg Fish 30 60
Susan Walch 10 30
Carlos Ramirez 10 40
Rich Niles 20 40
Mike Kiel 20 50
Mark Reynolds 20 30
Soni Jones 30 70
Total 220 450
(Table 13-1: Lind, Marchal, and Wathen, 2008)
Correlation analysis it is important to identify the dependent variable and independent variable. The dependent
variable is the variable that is being predicted and the independent variable is the variable that provides the basis for
estimation. If they were to conduct a scatter diagram as seen in Table 5, the dependent variable would be on the y-
Final Assignment 7
axis and the independent variable on the x-axis. Once plotted the graph clearly shows that there is some type of
correlation between the number of calls made by a sales person and the number of sales.
Table 4: Number of Sales Calls and Copiers Sold for 10 Salespeople:
Sales Calls and Copiers Sold
Copiers Sold 60
0 10 20 30 40 50
(Chart 13-1: Lind, Marchal, and Wathen, 2008)
So how does this help the business manager? Well, in the graphic form they can quickly show it to the sales
reps as a means to motivate them to increase their calls in an effort to increase sales. For most managers this is not
enough to go on when they want to know what the amount of sales will be if calls are increased. To properly answer
this they need to gain a better understanding of how the two values relate to each other by computing the coefficient
of correlation (r) along with determining how far they deviate from the mean and their products.
Table 5: Deviations from the mean and Their Products:
(Table 13-3: Lind, Marchal, and Wathen, 2008)
The following equation is used to compute the coefficient of correlation (r) using the standard deviations of the
samples of the sales calls and 10 copiers sold using the following formula:
Final Assignment 8
The resulting value can range from -1.00 to 1.00. The closer the value is to -1.00 or 1.00 the stronger the correlation
and the closer to 0.00 the weaker the correlation. A negative value indicates an inverse relationship and a positive
value indicates a direct relationship. Using excel„s descriptive statistics function we can obtain the standard
deviation (s) as seen in Table 6.
Table 6: Descriptive statistics of Sales Calls and Copiers Sold for 10 Salespeople:
Number of Sales Calls Number of Copiers Sold
Mean 22.000 Mean 45.000
Standard Error 2.906 Standard Error 4.534
Median 20.000 Median 40.000
Mode 20.000 Mode 30.000
Standard Deviation (sx) 9.189 Standard Deviation (sy) 14.337
Sample Variance 84.444 Sample Variance 205.556
Kurtosis 0.396 Kurtosis -1.001
Skewness 0.601 Skewness 0.566
Range 30.000 Range 40.000
Minimum 10.000 Minimum 30.000
Maximum 40.000 Maximum 70.000
Sum 220.000 Sum 450.000
Count 10.000 Count 10.000
Confidence Level(95.0%) 6.574 Confidence Level(95.0%) 10.256
The computation of ���� = = 0.759 indicates a strong positive correlation. This data does not tell
10−1 9.189 (14.337)
the manager that as the number of calls increase the number of sales will also increase, only that the two variables
have some type of relationship.
The data does not yet tell the manager to what amount of sales will one additional call create, to determine this;
a correlation needs to be established using linear regression analysis equation: Ŷ = a + bX
“Ŷ” is the estimated value of the Y variable for a selected X value
“a” is the Y-intercept (value of Y when X = 0)
“b” is the slope if the line (mean change in Ŷ for each change of one unit in the X variable)
“X” is the selected independent variable. To obtain the slope of the regression line by taking the
The first step is to find the slope: b = r (sy / sx) = 0.759( ) = 1.1842. The second step is to determine the Y-
intercept: a = Ȳ - bX = 45 – 1.1842 (22) = 18.9476. With these two values the manager can now calculate how
many sales will result from an increase in 20 calls (X) by calculating Ŷ = 18.9476 + 1.1843 (20) = 42.6316 copiers.
Simply put for every additional call the sale representative can expect an increase of 1.2 copiers sold. However, the
equation is not truly reliable since the sales calls ranged from 10 to 40 which then limit the use of the equation to
Final Assignment 9
this range. If you use 0-10 or < 40 the accuracy lessens. The equation is only a prediction statement and is not
To provide some validity to the accuracy of the prediction equation they calculate the standard error of estimate
to determine the measure of dispersion of the observed values around the line or regression. This is done using the
∑( ����−Ŷ Y− Ŷ )
follow equation: s y * X = . To ease the process the date has been computed using excel‟s Data
Analysis program to calculate regression. The results are shown in Table 7.
Table 7: Regression calculation of Sales Calls and Copiers Sold for 10 Salespeople:
Multiple R 0.759
R Square 0.576
Adjusted R Square 0.523
Standard Error 9.901
The standard error computes to 9.901 depicting how far from the regression line the data point deviate. With
this knowledge, the manager can be 90% certain of their calculations. Therefore, there is a 10% chance their data is
Drawing correlations between variables is not the only thing that is important to businesses and managers.
Profits are important and to be able to see the true profits then use Consumer Price Index (CPI) numbers. CPI
expresses the relative change in the sample value compared to the base period established. Two basic types of data
are needed to construct the CPI: price data and weighting data. The percent change in the CPI is a measure of
inflation. The CPI can be used to adjust for the effects of inflation in wages, salaries, pensions, or regulated or
On weighted index is the Laspeyres Price Index (LPI) developed to determine a weighted price index using
base-period quantities as weights using the following:
Final Assignment 10
P=( ptqo/ poqo) x 100
Douglas Lind and associates provide a great example of this in Chapter 15, as they talk about the prices for the six
food items shown in Table 8 (2008).
Table 8: Price and Quantity of Food Items in 1995 and 2005:
Item Price-95 Qty-95 Price-95*Qty-95 Price-05 Price-05*Qty-95
Bread $0.77 50 $38.50 $0.89 $44.50
Eggs $1.85 26 $48.10 $1.84 $47.84
Milk $0.88 102 $89.76 $1.01 $103.02
apples $1.46 30 $43.80 $1.56 $46.80
Orange Juice $1.58 40 $63.20 $1.70 $68.00
Coffee $4.40 12 $52.80 $4.62 $55.44
(Data from Table 15-3: Lind, Marchal, and Wathen, 2008)
To calculate the LPI they determine the total amount spent for the six items in the base-period equaling $336.16.
Then we take the 2005 price and multiply the 1995 quantities to establish a weighted value of $365.60. Now that
the two values are calculated, the weighted price index can be computed. The final computed value is 108.8
indicating that there is an 8.8 percent increase in the cost over the ten-year period.
P=( ptqo/ poqo) x 100 = 100 = 108.8
The data from LPI does not reflect changes in any buying patterns that may have occurred over time. To
compensate for this they can use the Paasche Price index using current year quantities to reflect current buying
habits. The problem with using this price index is that it can provide greater weight to the prices whose quantities
have decreased. Therefore, the use of Fisher‟s Ideal Index (FII) tries to balance the effects of the two price indexes
by taking the geometric mean of the two indexes. However, the FII has similar issues as the Paashe Price Index in
that it requires current quantity data for each period being used.
Another use of CPI is when employees determine what the true amount of their current income is based on
inflation? They can calculate for real income (RI) by using the equation: �������� = 100. To see the work
they take the annual income $20,000 from 1982-84 and set it as the base period (equal 100 CPI). Then they take the
present year income or $40,000 and divide it by current CPI for that year which is 200. When they place it into the
RI = (MI/CPI) 100 = 100 = 20,000 the employee will realize that their income has the same purchasing power
as it did in 1982-84 and that the employers have properly adjusted their income to reflect the current CPI. Now this
Final Assignment 11
is not always the case, for example if the CPI were 250 then their RI would be $16,000 indicating that the inflation
of the market has weakened their income/purchasing power by $4,000. This concept is also called deflated income
and is brought to light when labor unions negotiate new contracts for employees.
Table 9: OCOLA for Army Major Living in Kaiserslautern, Germany:
(Retained from: http://perdiem.hqda.pentagon.mil/cgi-bin/cola-oha/o_cola.pl)
Businesses also use the CPI to determine cost-of-living allowance (COLA) increases within management-union
contracts. For instance, the military pays Soldiers over seas a supplement (O-COLA) to offset the cost difference of
the local economy with the US economy. Looking at Table 9, you will see that a Major in the Army living off post
in Kaiserslautern Germany with three dependants will receive an additional $39.378 daily to offset the local
economy‟s 0.32 inflationary index reflecting the difference between the US CPI and the EURO CPI. The OCOLA
ensures that service members are not penalized with their income‟s purchasing power because they are serving in
The Producer Price Index (PPI), another version of the CPI, is a vital tool for business owners when they need
to provide daily budget analysis. The PPI reflects the prices charged the manufacturer for the materials purchased to
produce the end product and is used to calculate if and where they need to adjust their budget in the future. If the
company were a bakery, they would want to know what the PPI is for crude goods to determine if the current
allotted budget will be enough for the next month‟s production requirement. This is also a good indicator as to if the
Final Assignment 12
cost of their product needs to increase or not. The business owners will also be able to determine the ratio of growth
between raw material cost and sales as a means to determine at what point they will need to increase or decrease the
product price and by how much.
Studying Statistics is important for any business owner to establish a baseline for becoming successful.
Statistical analysis of their company‟s cost for goods and services and how they relate to periodic influxes in the
economy help to establish new goal and benchmarks for the company. Without constant statistical review, an owner
may never realize that they need to change the price of their goods or services in order to stay in business or that
they are losing employees because their wages do not support their current cost of living.
Final Assignment 13
Lind, D., Marchal, W., & Wathen, S. (2008). Statistical Techniques in Business & Economics. (3rd ed.). New
Delhi: Tata McGraw-Hill Publishing Company Limited. Pgs. 409-593.
“Overseas Cost of Living”. (2008). Department of Defense Per Deim, Travel and Transportation Allowance
Committee. Retrieved October 6, 2008, from: http://perdiem.hqda.pentagon.mil/cgi-bin/cola-oha/o_cola.pl