STATISTICS IN PRACTICE An Application Component Assignment No. 5, Math 256H – Introduction to Statistical Analysis, Fall 2005 Trent University 1. Data The data from which the sample for this report originates is from a company that underwrites extended service contracts, credit protection, and other insurance programs for financial institutions, retailers, and other entities. For security purposes, we will refer to this company as Assurance Company. A particular branch of Assurance processes calls from customers (of the financial institutions, retailers, and other entities) who need any information or service regarding the contract or policy they have that has been underwritten with Assurance. For convenience, we will refer to the customers who are policy holders as “customers”, and to the customers who have issued policies to policy holders that have been underwritten to Assurance as “clients”. The data contains information about the time spent by the underwriting company’s employees on calls with customers (all times are in seconds). It is categorized by date and client. The relevant column titles represent as follows: “Client” is the client, “Handled” is the number of calls answered, “Talk Time” is the time spent on the phone between an Assurance employee and a customer, and “Week” is the time period (in 2005) from which the data came. 2. Application Since Assurance must decide when it is profitable to take on new client (based on expected volume and duration of incoming calls versus return from client), it is useful to know if the newly generated call volume and duration will be similar to current clients with a similar policy type. For example, HBC (Hudson’s Bay Company) and Wall-Mart both offer store-only credit cards and have underwritten these to Assurance. That is, credit cards that are usable for purchases only at the respective stores. Say Chapters-Indigo wants to underwrite their new store-only credit cards to Assurance and is willing to pay a given monthly amount to Assurance for this service. Assurance will need to estimate the expected call volume and duration in order to estimate the associated costs with this new policy, and compare this to the offered amount and make a good business decision as to take on this new policy or not. Assurance may wish to estimate the expected call volume and duration based on the available data from HBC and Wall-Mart and any other similar clients. But in order for this to be a good estimation, it would be necessary to know that the call volumes and durations from HBC and Wall-Mart (and other) customers are roughly the same. 3. Statistical Test As part of Assurance making an informed business decision about the possibility of underwriting a new store-only credit card, we want to test if the average talk time per call is the same among current clients who have store-only credit cards. (It would also be important to know the expected volume of calls.) For simplicity, we pick just two such clients, HBC and Wall-Mart, and test if they have equal average talk time per call. 4. Sample We use data collected during the period from January to September, 2005, since this data exists for both HBC and Wall-Mart. From the data, the relevant information we need are the ratios of talk time to handled calls. Since the data is divided in to week long periods, and we have no means to reconstruct the talk time associated with each individual call, we must perform our test on average talk time per call per week. A problem with the data was that not all “weeks” were true 7-day periods. Some weeks were chopped by the beginning/end of a month, since the data was also divided by month. To fix this problem, we reconstructed weeks that occurred over a month end to achieve all weeks being true 7-day periods. Another, much more serious problem, was that for July, two weeks were labelled as Jul 24-30 (each with different data), and “weeks” Jul 31 and Jul 1-2 where completely missing (July 1st and 2nd would have been days 1 and 2 while July 31st would have been day 7 of the Sunday to Saturday 7-day period). After consulting the financial analyst at Assurance, it was decided that for both of the missing periods, the particular location was closed for operation (and thus there were no calls) for Canada Day and Civic Holiday long weekends respectively. Because of these irregularities (“week” periods which are not 7 days, and uncertain data in the case of the Jul 24-30 weeks), we decided to omit the affected data. That is, we omitted both of the Jul 24-30 weeks, and the Jun 25-30 and Aug 1-6 “weeks” since there was no way to reconstruct them as 7-day periods. Then, two samples were created (one each for HBC and Wall-Mart) by finding the average talk time per call per week by dividing the “Talk Time” by the “Handled Calls” for each of the remaining 35 of 38 weeks from the period Jan 2 to Sept 24. This gave the following results: HBC: x1 = 250 , s1 = 67.0 , n1 = 35 Wall-Mart: x 2 = 194 , s 2 = 68.0 , n2 = 35 5. Results and Preliminary Conclusions With the above data, we test the hypothesis H 0 : µ1 − µ 2 = 0 . Since n1 , n2 > 30 constitutes a large sample, we are reasonably well supported in assuming that the data is normally distributed, although conducting a test of normality would be best, but is omitted here. So the test statistic is Z= (X1 − X 2 ) − (µ10 − µ20 ) , where µ10 − µ 2 = 0 from H 0 . 0 2 2 S1 S2 + n1 n2 Thus we compute z= (x1 − x2 ) − (µ10 − µ 20 ) = (250 − 194) − 0 = 3.47 against H 1 : µ1 − µ 2 ≠ 0 (Two-Tailed) s1 2 + 2 s 2 (67.0)2 + (68.0)2 n1 n2 35 35 The P-value for z = 3.47 is then P - value = 2[1 − Φ ( z )] = 2[1 − Φ (3.47 )] = 2[1 − 0.9997] = 0.0006 . So for α level of significance, we reject H 0 for all α ≥ 0.0006 . That is, we are very confident that the two means µ1 and µ 2 are not equal. 6. Validation of Sample and Statistical Test In order for the procedure for our statistical test to be valid (and hence its results useful), we need to be sure that the samples were good estimates of the true data. Since the data is, in this case, from an ongoing process, and the samples were continuous and inclusive segments† of this data, for the samples to be good estimates of the data, we need to know that the values of the data in each of the samples is uniform over time (or at least that they have a similar periodic behaviour such that many periods occur over the time period from which the samples were taken). We may have some reason to believe that within the period of January – September, there may be some observable fluctuation in data values due to management of calls at Assurance, or customer’s tendencies, etc. However, since the samples were taken from the same time period, any fluctuation in time will be observed equally in both samples. So this becomes significantly less critical. As for the period between October and December not included in the sample, we do not have good reason to assume there will be a significant difference between average call times from HBC and Wall-Mart policy holders, as compared to this difference in the time period from which the sample was taken. † before the omission of “weeks” Jul 1-2, Jul 24-30, and Jul 30 Perhaps if for Christmas, say HBC offered some incentive for customers to use their credit cards, then it is possible to imagine that with increased usage, more customers would need assistance and calling Assurance. And it is possible that by the nature of this incentive program, that dealing with customers may take more time than at other times during the year. So in this case, not having data from the Christmas period in the sample would produce a poor sample. As for the omission of the weeks attached to “weeks” Jul 1-2 and Jul 31, we justify this by assuming that these times do not represent any irregularities between the data for HBC and for Wall-Mart. That is, the data in these periods behaves much the same between HBC and Wall-Mart as in other periods. Since these are short periods, this is a safe assumption. Also, we must be sure that the two samples are in fact independent. We assume that there is no priority of calls at Assurance between HBC and Wall-Mart (i.e. the priority of answering a call is determined uniquely by the order, time spent in queue, of that call). So, we do not have good reason to assume there will be any significant dependence between the two samples. (If the volume and/or duration of calls from HBC customers, say, are suddenly much larger than that from Wall-Mart customers, then further incoming calls from both HBC and Wall-Mart will be equally delayed.) Also, there may be some market dependence between the number of HBC and Wall-Mart policy holders, and thus the volume of calls, but this is not likely to affect the duration of calls. We therefore feel validated in using our statistical test for the chosen sample. 7. Final Conclusions and Recommendations Since the statistics indicate that there is not equality between the average talk time per call per week from HBC and Wall-Mart customers, and the sample data for this test was compiled and used in a statistically sound fashion, we conclude that neither the data for HBC or Wall-Mart will be a good estimate for the potential future data generated by Chapters-Indigo. Therefore, Assurance should not make a business decision about Chapters-Indigo based on the available data from HBC and Wall-Mart.