Docstoc

Statistics In Practice

Document Sample
Statistics In Practice Powered By Docstoc
					                                STATISTICS           IN    PRACTICE
                                      An Application Component
             Assignment No. 5, Math 256H – Introduction to Statistical Analysis, Fall 2005
                                             Trent University

1. Data


The data from which the sample for this report originates is from a company that underwrites extended
service contracts, credit protection, and other insurance programs for financial institutions, retailers, and
other entities. For security purposes, we will refer to this company as Assurance Company. A particular
branch of Assurance processes calls from customers (of the financial institutions, retailers, and other
entities) who need any information or service regarding the contract or policy they have that has been
underwritten with Assurance.


For convenience, we will refer to the customers who are policy holders as “customers”, and to the
customers who have issued policies to policy holders that have been underwritten to Assurance as
“clients”.


The data contains information about the time spent by the underwriting company’s employees on calls
with customers (all times are in seconds). It is categorized by date and client. The relevant column titles
represent as follows: “Client” is the client, “Handled” is the number of calls answered, “Talk Time” is
the time spent on the phone between an Assurance employee and a customer, and “Week” is the time
period (in 2005) from which the data came.




2. Application


Since Assurance must decide when it is profitable to take on new client (based on expected volume and
duration of incoming calls versus return from client), it is useful to know if the newly generated call
volume and duration will be similar to current clients with a similar policy type. For example, HBC
(Hudson’s Bay Company) and Wall-Mart both offer store-only credit cards and have underwritten these
to Assurance. That is, credit cards that are usable for purchases only at the respective stores.         Say
Chapters-Indigo wants to underwrite their new store-only credit cards to Assurance and is willing to pay a
given monthly amount to Assurance for this service. Assurance will need to estimate the expected call
volume and duration in order to estimate the associated costs with this new policy, and compare this to
the offered amount and make a good business decision as to take on this new policy or not.


Assurance may wish to estimate the expected call volume and duration based on the available data from
HBC and Wall-Mart and any other similar clients. But in order for this to be a good estimation, it would
be necessary to know that the call volumes and durations from HBC and Wall-Mart (and other) customers
are roughly the same.




3. Statistical Test


As part of Assurance making an informed business decision about the possibility of underwriting a new
store-only credit card, we want to test if the average talk time per call is the same among current clients
who have store-only credit cards. (It would also be important to know the expected volume of calls.)


For simplicity, we pick just two such clients, HBC and Wall-Mart, and test if they have equal average talk
time per call.




4. Sample


We use data collected during the period from January to September, 2005, since this data exists for both
HBC and Wall-Mart. From the data, the relevant information we need are the ratios of talk time to
handled calls. Since the data is divided in to week long periods, and we have no means to reconstruct the
talk time associated with each individual call, we must perform our test on average talk time per call per
week.


A problem with the data was that not all “weeks” were true 7-day periods. Some weeks were chopped by
the beginning/end of a month, since the data was also divided by month. To fix this problem, we
reconstructed weeks that occurred over a month end to achieve all weeks being true 7-day periods.
Another, much more serious problem, was that for July, two weeks were labelled as Jul 24-30 (each with
different data), and “weeks” Jul 31 and Jul 1-2 where completely missing (July 1st and 2nd would have
been days 1 and 2 while July 31st would have been day 7 of the Sunday to Saturday 7-day period). After
consulting the financial analyst at Assurance, it was decided that for both of the missing periods, the
particular location was closed for operation (and thus there were no calls) for Canada Day and Civic
Holiday long weekends respectively. Because of these irregularities (“week” periods which are not 7
days, and uncertain data in the case of the Jul 24-30 weeks), we decided to omit the affected data. That is,
we omitted both of the Jul 24-30 weeks, and the Jun 25-30 and Aug 1-6 “weeks” since there was no way
to reconstruct them as 7-day periods.


Then, two samples were created (one each for HBC and Wall-Mart) by finding the average talk time per
call per week by dividing the “Talk Time” by the “Handled Calls” for each of the remaining 35 of 38
weeks from the period Jan 2 to Sept 24.


This gave the following results:


HBC:                   x1 = 250 , s1 = 67.0 , n1 = 35

Wall-Mart:             x 2 = 194 , s 2 = 68.0 , n2 = 35




5. Results and Preliminary Conclusions


With the above data, we test the hypothesis H 0 : µ1 − µ 2 = 0 . Since n1 , n2 > 30 constitutes a large
sample, we are reasonably well supported in assuming that the data is normally distributed, although
conducting a test of normality would be best, but is omitted here.


So the test statistic is



Z=
     (X1 − X 2 ) − (µ10 − µ20 ) ,   where µ10 − µ 2 = 0 from H 0 .
                                                  0
               2         2
             S1        S2
                   +
             n1        n2


Thus we compute
z=
       (x1 − x2 ) − (µ10 − µ 20 ) = (250 − 194) − 0     = 3.47 against H 1 : µ1 − µ 2 ≠ 0 (Two-Tailed)
              s1
                 2
                   + 2
                      s
                        2
                                    (67.0)2 + (68.0)2
              n1     n2              35         35


The P-value for z = 3.47 is then


P - value = 2[1 − Φ ( z )] = 2[1 − Φ (3.47 )] = 2[1 − 0.9997] = 0.0006 .


So for α level of significance, we reject H 0 for all α ≥ 0.0006 .


That is, we are very confident that the two means µ1 and µ 2 are not equal.




6. Validation of Sample and Statistical Test


In order for the procedure for our statistical test to be valid (and hence its results useful), we need to be
sure that the samples were good estimates of the true data. Since the data is, in this case, from an ongoing
process, and the samples were continuous and inclusive segments† of this data, for the samples to be good
estimates of the data, we need to know that the values of the data in each of the samples is uniform over
time (or at least that they have a similar periodic behaviour such that many periods occur over the time
period from which the samples were taken).


We may have some reason to believe that within the period of January – September, there may be some
observable fluctuation in data values due to management of calls at Assurance, or customer’s tendencies,
etc. However, since the samples were taken from the same time period, any fluctuation in time will be
observed equally in both samples. So this becomes significantly less critical.


As for the period between October and December not included in the sample, we do not have good reason
to assume there will be a significant difference between average call times from HBC and Wall-Mart
policy holders, as compared to this difference in the time period from which the sample was taken.


†
    before the omission of “weeks” Jul 1-2, Jul 24-30, and Jul 30
Perhaps if for Christmas, say HBC offered some incentive for customers to use their credit cards, then it is
possible to imagine that with increased usage, more customers would need assistance and calling
Assurance. And it is possible that by the nature of this incentive program, that dealing with customers
may take more time than at other times during the year. So in this case, not having data from the
Christmas period in the sample would produce a poor sample.


As for the omission of the weeks attached to “weeks” Jul 1-2 and Jul 31, we justify this by assuming that
these times do not represent any irregularities between the data for HBC and for Wall-Mart. That is, the
data in these periods behaves much the same between HBC and Wall-Mart as in other periods. Since
these are short periods, this is a safe assumption.


Also, we must be sure that the two samples are in fact independent. We assume that there is no priority of
calls at Assurance between HBC and Wall-Mart (i.e. the priority of answering a call is determined
uniquely by the order, time spent in queue, of that call). So, we do not have good reason to assume there
will be any significant dependence between the two samples. (If the volume and/or duration of calls from
HBC customers, say, are suddenly much larger than that from Wall-Mart customers, then further
incoming calls from both HBC and Wall-Mart will be equally delayed.) Also, there may be some market
dependence between the number of HBC and Wall-Mart policy holders, and thus the volume of calls, but
this is not likely to affect the duration of calls.


We therefore feel validated in using our statistical test for the chosen sample.




7. Final Conclusions and Recommendations


Since the statistics indicate that there is not equality between the average talk time per call per week from
HBC and Wall-Mart customers, and the sample data for this test was compiled and used in a statistically
sound fashion, we conclude that neither the data for HBC or Wall-Mart will be a good estimate for the
potential future data generated by Chapters-Indigo. Therefore, Assurance should not make a business
decision about Chapters-Indigo based on the available data from HBC and Wall-Mart.

				
DOCUMENT INFO
Shared By:
Stats:
views:53
posted:11/4/2009
language:English
pages:5