# Basic Statistics with Microsoft Excel

Document Sample

```					Basic Statistics
with Microsoft Excel
Helen Dixon
Aim and Objectives
   Aim of today’s course
 To illustrate how Excel can be used to carry
out some basic statistical analyses and tests
   Objectives
 To show you how to use some of the
statistical worksheet functions available within
Excel
 To show you how to use some of the tools
available in the Analysis ToolPak
 To make you aware of the limitations of Excel
Why use Excel?
 Software more accessible
 Previous familiarity with software
 Easy to format output
 Better charting facilities than some
statistical applications
 Easy to use results with other applications
Problems with Excel
   Errors due to rounding, missing data or extreme
values
   Not suitable for very large data sets
   Output labelled or arranged inappropriately
   Need to repeat processes for different variables
or options
   No record of analyses
   Some algorithms are numerically unstable - little
or no information about algorithms employed
   Analysis ToolPak results are not dynamic and
may vary with results generated by functions
Statistical Functions
 Frequency Distributions
 Mean, Median and Mode
 Percentiles and Quartiles
 Deviation and Squared Deviation about
the Mean
 Variance and Standard Deviation
 Covariance and the Correlation Coefficient
Frequency
   Use COUNTIF to count how many times
an item appears in a list
 =COUNTIF(range,   criteria)
   Use FREQUENCY to calculate how often
values occur within a range
 =FREQUENCY(data_array,        bins_array)
   Can also use Histogram tool in Analysis
Toolpak
Mean, Median, Mode
   Use AVERAGE or AVERAGEA to
calculate the arithmetic mean
 =AVERAGE(number1,   number2, etc.)
   Use MEDIAN to return the middle number
 =MEDIAN(number1,   number2, etc)
   Use MODE to return the most common
value
 =MODE(number1,   number2, etc)
Percentiles and Quartiles
   Use PERCENTILE to return the kth percentile of
a data set
 =PERCENTILE(array,    percentile)
 Percentile argument is a value between 0 and 1

   Use QUARTILE to return the given quartile of a
data set
 =QUARTILE(array,        quart)
 Quart is 1, 2, 3 or 4
 IQR = Q3-Q1

   May return different values to statistical package
Variance and Standard Deviation
   Use VAR, VARA, VARP or VARPA to
calculate the variance for a range
 E.g.
=VAR(value1, value 2, etc.)
 Squared deviations about the mean/N or /n-1

   Use STDEV, STDEVA, STDEVP or
STDEVPA to calculate the standard
deviation for a range
 =E.g. =STDEV(value1, value2, etc.)
 Positive square root of variance
Covariance and
the Correlation Coefficient
   Use COVAR to calculate the covariance
 =COVAR(array1,  array2)
 Average of products of deviations for each
data point pair
 Depends on units of measurement
   Use CORREL to return the correlation
coefficient
 =CORREL(array1,   array2)
 Returns value between -1 and +1
   Also available in Analysis ToolPak
Probability
 Numerical measure of the likelihood that
an event will occur
 Some probabilities that can be calculated
using Excel:
 BinomialProbabilities
 Poisson Probabilities
 Hypergeometric Probabilities
 Normal Probabilities
 Exponential Probabilities
Binomial Probabilities
   Use BINOMDIST to compute binomial
distribution probabilities and cumulative
binomial probabilities
 =BINOMDIST(number_s,     trials, probability_s,
cumulative)
 Calculates the probability that a sequence of
independent trials with two possible outcomes
will have a given number of successes
 Cumulative is either TRUE or FALSE
Poisson Probabilities
   Use POISSON to compute Poisson
Probabilities
 =POISSON(x,   mean, cumulative)
 Shows the probability of x occurrences of an
event over a specified interval of time or
space
Hypergeometric Probabilities
   Use HYPGEOMDIST to compute
hypergeometric probabilities
 =HYPGEOMDIST(sample_s,
number_sample, population_s, number_pop)
 Computes the probability of x successes
(sample_s) in n trials (number_sample) when
the trials are dependent
 Similar to Binomial except trials are not
independent – probability of success changes
from trial to trial
 Does not compute cumulative probabilities
Normal Probabilities
   Use NORMSDIST or NORMDIST to
compute the cumulative probability
 =NORMSDIST(z)
 =NORMDIST(x,   mean, standard_dev,
cumulative)
   Use NORMSINV or NORMINV to compute
the z or x value given a cumulative
probability
 =NORMSINV(probability)
 =NORMINV(probability,   mean, standard_dev)
Exponential Probabilities
   Use EXPONDIST to compute exponential
probabilities
 =EXPONDIST(x,    lambda, cumulative)
 x is the random variable
 Lambda is 1/mean
 Useful in computing probabilities for the time it
Analysis ToolPak
 Descriptive Statistics
 Correlation
 Linear Regression
 t-Tests
 z-Tests
 ANOVA
 Covariance
Descriptive Statistics
   Mean, Median, Mode        Range
   Standard Error            Minimum
   Standard Deviation        Maximum
   Sample Variance           Sum
   Kurtosis                  Count
   Skewness                  kth Largest
   Confidence Level for      kth Smallest
Mean
Correlation and Regression
   Correlation is a measure of the strength of linear
association between two variables
   Values between -1 and +1
   Values close to -1 indicate strong negative relationship
   Values close to +1 indicate strong positive relationship
   Values close to 0 indicate weak relationship
   Linear Regression is the process of finding a line of best
fit through a series of data points
   Can also use the SLOPE, INTERCEPT, CORREL and RSQ
functions
t-Tests and z-Tests
 Used to test hypotheses by comparing
means
 If sample means are equal suggests both
samples came from same population
 t-Test – n <30
 Equalor unequal variances or paired test
 Check result using TTEST function
   z-Test – n>30
 Used    for means with known variances
ANOVA: Analysis of Variances
   Compares variances in two or more data sets
   If difference is found it can be assumed that the
means of the data sets are different
   Single Factor – use instead of t-Test for more
than 2 samples
   Two Factor with Replication – useful when data
can be classified along 2 different dimensions
   Two Factor without Replication – as above but
only one observation for each pair
PivotTables
 Use for crosstabulations
 Data must be in tabular format: columns
 Easy to pivot data
 Easy to create PivotCharts
 Can summarise and analyse data without
affecting data source
Final Tips
   Excel only suitable for basic analysis using small
data sets
   Later versions of Excel more reliable than Excel
97
   Check Analysis TookPak results with worksheet
functions
   Check overall results by hand or with dedicated
statistical package

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 18 posted: 3/25/2010 language: English pages: 23
How are you planning on using Docstoc?