Platform Calculation

Document Sample
Platform Calculation Powered By Docstoc
					On planning an A/B test to achieve a certain power
These calculations show you the minimum sample size (N) needed for a test to have a certain amount of power.
       N is total number of users, the sum of those in the treatment and the control.

The power of a test is the ability of the test to "see" or detect a certain size effect.

For example, a test may have 80% power to detect an effect of 2%.
What this means is that if the actual treatment mean is 2% larger or smaller than the control mean,
the hypothesis test that there is no difference (assuming alpha=0.05) has an 80% chance of being rejected.
Of course, if the true means are more than 2% apart then it is even more likely the null hypothesis will be rejected.

When you are planning an experiment or parallel flight, there are two possible scenarios. Your objective may be to launch
         1) the treatment is significantly better than the control or
         2) the treatment is not significantly worse than the control (e.g. for software updates or strategic changes)
In the first case you may need to "find" a positive effect as small as 1% in your OEC, or primary metric before you can laun
In the second case you should set up a "no-go" decision point such that if the treatment mean is less than say 0.98*contro
In the first, you want a high probability of detecting a Delta of 1%, in the second, of 2%.

Power calculations are most useful in the planning stage - prior to starting an experiment. They should be used cautiousl
Also note that power calculations are necessarily approximations. The better your estimate of standard deviation or (in th

In order to use the spreadsheets enter values in the appropriate yellow cells.

If you have any questions on the use or interpretation of this spreadsheet, please contact rogerlon@microsoft.com
For future updates to this calculator and other experimentation tools check out            http://exp-platform.com/

On using this calculator for A/B/C or MVT experiments
One note: these calculators are set up for simple A/B tests but can be used for one factor with many treatments or for MV
       is giving you the sample size needed for a "head-to-head" comparison of a treatment to the control.
For example, if you have a control and two treatments, with each group receiving 1/3 of the total population the calculato
       one treatment. For an MVT it is highly recommended that the allocation of experimental units to the groups (trea
       for each treatment but more importantly for any interactions you are interested in. If you have some factors with
       for the factors with the most variants then the other factors will have sufficient power (provided all factors have a
n power
ertain amount of power.




e of being rejected.
hypothesis will be rejected.

 . Your objective may be to launch the change if

dates or strategic changes)
rimary metric before you can launch the feature.
 mean is less than say 0.98*control mean the feature should not be launched.


nt. They should be used cautiously after an experiment is complete.
ate of standard deviation or (in the case of a binary metric) proportion, the better the approximation.




ct rogerlon@microsoft.com
 http://exp-platform.com/tools.aspx



 r with many treatments or for MVTs. Just be aware that the calculator in those cases
 ment to the control.
  the total population the calculator will give you the sample size needed for the control plus
 rimental units to the groups (treatments) for each factor be equal to get sufficient power
d in. If you have some factors with more treatments than others, do your power calculations
 power (provided all factors have an equal allocation to all treatment groups).
   Calculations of power for OEC metrics that take more than t
   The assumption is that the sample size will be large enough for the central limit theorem to hold - which will be the case f

Two alternatives to input of information - enter either the percent change (delta) or the actual change you want to be able

A. Enter Percent Change (i.e. what percent change from the current average)

   Case I: Assume the treatment and control have approximately the same number of observations.
   Assuming 2 groups (T/C) with approximately the same number in each group. Also assume a hypothesis test with a 5% Typ

   Total sample size needed for given values of Average, StdDev and Pct change (Note: Delta and StdDev are in the original u

   Average            2.3 => input values in the yellow cells
   D as Pct           1%              Power                 80%            90%          97.5%
   StdDev =           5.7             N              1,965,369       2,579,546      3,807,902
   Delta =         0.023              N is total sample size, so split between treatment and control.

   Case II: Assume the smaller group has a percent (q) of the total observations.

   Total sample size needed for given Average, StdDev, Pct change and q

   Average            2.3 => input values in the yellow cells
   D as Pct           1%              Power                 80%            90%           97.5%
   StdDev =           5.7             N              5,459,357       7,165,406      10,577,505
   q=               10%               N is total sample size, so split between treatment and control.
   Delta =         0.023




B. Enter Actual Change (i.e. what change from the current average in the original metric)

   Case I: Assume the treatment and control have approximately the same number of observations.
   Assuming 2 groups (T/C), same number in each group, alpha=.05 for t-test for comparison of two means

   Total sample size needed for given values of StdDev and Delta (Note: Delta and StdDev are in the original units of the met

   StdDev =           5.7             Power                 80%            90%          97.5%
   Delta =         0.023              N             1,965,369        2,579,546      3,807,902
                                      N is total sample size, so split between treatment and control.

   Case II: Assume the smaller group has a percent (q) of the total observations.

   Total sample size needed for given values of p, Delta and q

   StdDev =           5.7             Power               80%            90%           97.5%
   Delta =         0.023              N             5,459,357      7,165,406      10,577,505
q=               10%              N is total sample size, so split between treatment and control.

Note: since the mean is not specified under B, the percent change for a specified delta cannot be computed from the info
ake more than two values.
o hold - which will be the case for all but the smallest sample sizes for online experiments.

ual change you want to be able to detect. In either case, the size of the treatment and control groups may be the same or different, the




 a hypothesis test with a 5% Type I error rate.

and StdDev are in the original units of the metric)




of two means

 in the original units of the metric)
not be computed from the information given.
may be the same or different, the two cases below.
   Calculations of power for OEC metrics that only take two val
   The assumption is that the sample size will be large enough for the central limit theorem to hold - which will be the case fo

Two alternatives to input of information - enter either the percent change (delta) or the actual change you want to be able
The average value for this type of metric is the proportion of 0s (or 1s) that occur in the control or treatment. This could be

A. Enter Percent Change (i.e. what percent change from the current proportion of 0s or 1s)

   Case I: Assume the treatment and control have approximately the same number of observations.
   Assuming 2 groups (T/C) with approximately the same number in each group. Also assume a hypothesis test with a 5% Typ
   Let p = the proportion of ones in the control (e.g. conversion rate.) Values of 0.0 and 1.0 are not permitted.

   Total sample size needed for given values of p and Delta (Note: Delta is the change from the existing proportion you need
              input values in the yellow cells
   p=               0.40 0.489898 Power                     80%            90%         97.5%
   D as Pct        0.50%              N             1,920,000        2,520,000     3,720,000
   Delta =         0.002              N is total sample size, so split between treatment and control.

   Case II: Assume the smaller group has a percent (q) of the total observations.

   Total sample size needed for given values of sigma, Delta and q
              input values in the yellow cells
   p=               0.40 0.489898 Power                     80%            90%         97.5%
   D as Pct        0.50%              N             3,000,000        3,937,500     5,812,500
   q=                20%              N is total sample size, so split between treatment and control.
   Delta =         0.002




B. Enter Actual Change (i.e. change from the current proportion)

   Case I: Assume the treatment and control have approximately the same number of observations.
   Assuming 2 groups (T/C), same number in each group, alpha=.05 for t-test for comparison of two means
   Let p = the percent of one value to the total in the control (e.g. percent of users that return in a certain time period.) Value

   Total sample size needed for given values of p and Delta (Note: Delta is the change from the existing percent you need to
              input values in the yellow cells
   p=                40% 0.489898 Power                     80%            90%         97.5%
   Delta =          0.2%              N             1,920,000        2,520,000     3,720,000
   D as Pct        0.50%              N is total sample size, so split between treatment and control.

   Case II: Assume the smaller group has a percent (q) of the total observations.

   Total sample size needed for given values of sigma, Delta and q

   p=                40% 0.489898 Power                     80%            90%          97.5%
Delta =     0.2%   N             3,000,000        3,937,500     5,812,500
q=           20%   N is total sample size, so split between treatment and control.
D as Pct   0.50%
only take two values (e.g. 0 or 1)
 to hold - which will be the case for all but the smallest sample sizes for online experiments.

ctual change you want to be able to detect. In either case, the size of the treatment and control groups may be the same or different, t
ontrol or treatment. This could be a conversion rate, for example.




me a hypothesis test with a 5% Type I error rate.
 are not permitted.

 the existing proportion you need to detect)




 n of two means
urn in a certain time period.) Values of 0% and 100% are not permitted.

 the existing percent you need to detect)
s may be the same or different, the two cases below.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:28
posted:7/5/2011
language:English
pages:12
Description: Platform Calculation document sample