# DESCRIPTIVE STATISTICS WITH EXCEL

Document Sample

```					     DESCRIPTIVE STATISTICS WITH EXCEL

Edward OMEY
edward.omey@hubrussel.be
www.edwardomey.com

1.    STARTING FROM DISCRETE RAW DATA ...................................................... 2
1.1 Calculating the relevant statistics .......................................................................... 3
1.2. Frequency table ...................................................................................................... 4
1.3. Graphs ................................................................................................................... 10
1.3.1. The needle graph ........................................................................................... 10
1.3.2. The EDF (discrete data) ............................................................................... 16
2. STARTING FROM “CONTINUOUS” RAW DATA ............................................. 22
2.1. The basic descriptive statistics ............................................................................ 22
2.2. The frequency table ............................................................................................. 24
2.3. Graphs ................................................................................................................... 26
2.3.1.The histogram................................................................................................. 26
2.3.2. The EDF ......................................................................................................... 28
3. The box plot ................................................................................................................. 32
4. STARTING FROM GROUPED DATA ................................................................... 37

Descriptive Statistics and EXCEL E. Omey                                HUB BBA 2009                                1
1. STARTING FROM DISCRETE RAW DATA
We start from raw data such in TABLE 1. In Table1 we collected data concerning a
discrete variable : the number of ill employees during 100 consecutive days.

TABLE 1
2           4        2         4         7
4           5        2         5         2
9           7        3         9         5
5           5        2         2         1
4           1       10        10         5
4           9        4         5         5
4           8        3        11        10
2           4        5         9         2
9           8        4         4         4
5           8        4         7         5
4           5        8         3         7
7          10        3         4         5
2           1        5        10         6
5           2        4         7         8
4           3        7         3         5
8           3        3         6         4
10            6        7         5         9
2           7        5         7         5
4           5        4         2         3
8           3        4         8         3

Before starting to make any calculations, it is better to put the data in 1 row or in 1
column. We choose to put the data in 1 column: Table 2.

TABLE
2
2
4
9
5
4
4
4
...
9
5
3
3

Descriptive Statistics and EXCEL E. Omey        HUB BBA 2009                2
1.1 Calculating the relevant statistics

To calculate the relevant DESCRIPTIVE STATISTICS, we activate

TOOLS  DATA ANALYSIS

We choose “descriptive statistics”. Saying OK gives:

In the
-    INPUT RANGE: we select all data
-    GROUPED BY: we choose columns
-    OUTPUT RANGE: select one empty cell
-    SUMMARY STATISTICS: we only choose this

We find:

Descriptive Statistics and EXCEL E. Omey    HUB BBA 2009   3
Saying OK, we get:

Column1

Mean                      5,17   the mean value
Standard Error       0,249061    this is s/n
Median                       5
Mode                         4
Standard
Deviation            2,490609    this is s
Sample Variance      6,203131    this is s²
Kurtosis             -0,65576    this characterizes the shape
Skewness             0,474551    this characterizes symmetry/skewness
Range                      10
Minimum                     1
Maximum                    11
Sum                       517    the sum of all data
Count                     100    the number of data

The smallest entry was 1 and the largest is equal to 11.

1.2. Frequency table
Next we calculate the frequencies of the entries 1, 2, …, 11.
In EXCEL these entries are called the BINS.

Descriptive Statistics and EXCEL E. Omey          HUB BBA 2009   4
We choose an empty column and enter the bins:

BINS
1
2
3
4
5
6
7
8
9
10

We select all cells next to the bins:

Now we INSERT a FUNCTION by using the function wizard:

choose this

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   5
Choosing “function”, we get the following screen:

As “CATEGORY”, we choose STATISTICAL:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   6
Inside statistical we choose FREQUENCY:

After OK, we get the following screen:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   7
-   In DATA ARRAY:we select all our raw data
-   In BINS ARRAY, we select the cells containing the bins

In the place of OK, now we use 3 buttons:

CTRL and SHIFT and ENTER

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009   8
As a result, we see the following next to the bins:

BINS
1          3
2         12
3         11
4         20
5         20
6          3
7         10
8          8
9          6
10          6
11          1

EXCEL has calculated that the frequency of the outcome 1 is equal to 3; the frequency of
the outcome 9 is equal to 6 and so on.

Now we can easily calculate relative frequencies (RELFREQ) and cumulative relative
frequencies. We find RELFREQ by dividing by n = 100. We find EDF by cumulating the
relative frequencies:

BINS        freq    relfreq   cum rel freq
1           3      0,03           0,03
2          12      0,12           0,15
3          11      0,11           0,26
4          20        0,2          0,46
5          20        0,2          0,66
6           3      0,03           0,69
7          10        0,1          0,79
8           8      0,08           0,87
9           6      0,06           0,93
10           6      0,06           0,99
11           1      0,01               1
100           1

Descriptive Statistics and EXCEL E. Omey        HUB BBA 2009                 9
1.3. Graphs
1.3.1. The needle graph
The NEEDLE graph is the graph of the relative frequencies. To make it, we select the
relative frequencies and choose the CHART WIZZARD.

We select the first entry of COLUMN and choose NEXT

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009                 10
Here we choose SERIES :

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   11
In the last part of this screen, we choose “CATEGORY (X) AXIS LABEL” and select all
the BINS:

Now we choose NEXT and choose a TITLE and an X-axis name:

Descriptive Statistics and EXCEL E. Omey      HUB BBA 2009              12
Now we choose NEXT and FINISH and get the following result:

Needle graph

0,25
0,2
0,15
Series1
0,1
0,05
0
1    2       3       4        5     6    7   8       9       10   11
absent employees

We delete the irrelevant “SERIES 1”:

Needle graph

0,25
0,2
0,15
0,1
0,05
0
1       2       3        4         5    6       7       8        9    10     11
absent employees

and now we double-click on one of the bars and see:

Descriptive Statistics and EXCEL E. Omey                          HUB BBA 2009        13
Here we choose OPTIONS and set the GAP WIDTH equal to 500:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   14
After OK, we get what we want:

Needle graph

0,25
0,2
0,15
0,1
0,05
0
1     2    3     4      5    6      7    8   9   10   11
absent employees

Descriptive Statistics and EXCEL E. Omey     HUB BBA 2009   15
1.3.2. The EDF (discrete data)

To construct the empirical distribution function, we select the CUMULATIVE
RELATIVE FREQUENCIES and as in the previous paragraph, we use the
CHARTWIZZARD and choose COLUMN.

Now we choose SERIES and put the column of BINS on the horizontal X-axis. We find:

Descriptive Statistics and EXCEL E. Omey      HUB BBA 2009             16
We go to the next step and insert a title:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   17
Now we go to the finish and find:

EDF

1,2
1
0,8
0,6                                           Series1
0,4
0,2
0
1    2 3   4 5   6 7 8      9 10 11

It is necessary to do some “make up”.
- We cut the vertical axis at the value 1. To do this we double click on the vertical axis
and choose “scale” and set the maximum equal to 1:

After OK we get:

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009                    18
EDF

1
0,8
0,6
Series1
0,4
0,2
0
1    2 3   4 5   6 7 8   9 10 11

Now we remove “series 1” and double click on one of the blue bars. We get the following
screen:

We choose “options”

Descriptive Statistics and EXCEL E. Omey     HUB BBA 2009               19
and we set the gap-width equal to 0.
We find

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   20
and after OK, we obtain:

EDF

1

0,8

0,6

0,4

0,2

0
1     2     3    4     5   6   7    8      9    10   11

This step-function is the EDF.

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009   21
2. STARTING FROM “CONTINUOUS” RAW DATA
We start from the following raw data (for convenience they are present here using
columns)

1,16       2,77         17,54    4,82    0,55
0,19       0,13          5,73   14,12    7,13
0,99      20,62            17     5,7    0,62
11,79       0,48          4,64    6,94    2,27
2,55       1,04          0,56    0,72    1,56
13,65       9,48          1,36    3,11    0,06
3,45      32,37         10,54    0,25   10,19
0,45      22,57          6,81   16,18    5,59
7,88       4,73         10,25    4,31    2,16
5,08       3,69          3,51    0,79    1,36
7,95      19,46         19,94   14,12    3,49
4,92      12,52           4,5    0,85   10,88
3,52       0,24             4     3,1    0,81
3,14       3,94          7,23    3,55    4,71
4,74       2,92          7,11    6,33    0,46
7,06       6,03          9,65    2,49       8
12,35      17,27          2,17    4,96   25,24
18,99       3,45           5,9    4,09   19,11
3,6       9,66          5,06     8,6     0,2
3,53       3,22         29,07    0,51    0,19

2.1. The basic descriptive statistics

As in the previous example, we take TOOLS – DATA ANALYSIS and DESCRIPTIVE
STATISTICS:

then we fill in the form:

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009              22
and find:

Column1

Mean                    6,7627
Standard Error       0,679842
Median                   4,675
Mode                       0,19
Standard
Deviation            6,798419
Sample Variance      46,21851
Kurtosis             2,385655
Skewness             1,584499
Range                    32,31
Minimum                   0,06
Maximum                  32,37
Sum                     676,27
Count                      100

The smallest value is 0,06 and the largest is given by 32,37.
The range is 32,37 – 0,06 = 32,31.

Descriptive Statistics and EXCEL E. Omey        HUB BBA 2009   23
2.2. The frequency table
We will construct a frequency table with about 10 classes of equal size.
- the range is 32,31
- we choose 10 classes of size 32,31/10 = 3,231.
 we choose size = 3,2
- the lower bound of the first class, we choose 0.

Now we construct a column with the bins as follows:

BINS
0
3,2
6,4
9,6
12,8
16
19,2
22,4
25,6
28,8
32
35,2

Using the FREQUENCY FUNCTION as before, we find the following table:

BINS
0          0
3,2         34
6,4         30
9,6         11
12,8          9
16          3
19,2          6
22,4          3
25,6          2
28,8          0
32          1
35,2          1

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009    24
The sum (use the sum-symbol!) is given by 100.

BINS
0         0
3,2        34
6,4        30
9,6        11
12,8         9
16         3
19,2         6
22,4         3
25,6         2
28,8         0
32         1
35,2         1
100

The table gives the following information
- there are no observations less than or equal to 0
- there are 34 observations between 0 (not included) and 3.2 (included)
- there are 30 observations between 3.2 (not included) and 6.4 (included)
etc.

We can calculate relative frequencies and cumulative relative frequencies as before:

ABS          REL       CUM REL
BINS        FREQ         FREQ      FREQ
0             0         0                0
3,2            34      0,34             0,34
6,4            30       0,3             0,64
9,6            11      0,11             0,75
12,8             9      0,09             0,84
16             3      0,03             0,87
19,2             6      0,06             0,93
22,4             3      0,03             0,96
25,6             2      0,02             0,98
28,8             0         0             0,98
32             1      0,01             0,99
35,2             1      0,01                1
100         1

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009                25
2.3. Graphs
2.3.1.The histogram
For the histogram, we select the relative frequencies, then the chart wizard and choose
again for a column graph. We find:

Again we select series and select the bins to be on the horizontal axis:

Descriptive Statistics and EXCEL E. Omey           HUB BBA 2009                 26
We go to the finish and get:

0,4
0,35
0,3
0,25
0,2                                                            Series1
0,15
0,1
0,05
0
0   3,2 6,4 9,6 12,8 16 19,2 22,4 25,6 28,8 32 35,2

We remove “series 1” and by double-clicking on one of the blue bars, we can use the
options to set the gap width equal to 0:
we get:

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009                 27
and then

0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
0   3,2   6,4   9,6   12,8   16   19,2 22,4 25,6 28,8   32   35,2

This is the histogram.

2.3.2. The EDF
To construct the EDF, we select both the BINS ànd the CUM REL FREQ and then we
activate the chart wizard and choose XY-scatter:

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009         28
and then:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   29
Using „next‟ we get:

Descriptive Statistics and EXCEL E. Omey   HUB BBA 2009   30
We can choose “FINISH” and obtain:

1,2

1

0,8

0,6                                                              Series1

0,4

0,2

0
0          10           20           30           40

We remove the “series 1” and cut the vertical axis at 1. We get:

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009      31
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
0          10          20          30          40

This is the EDF.

3. The box plot
The box plot is a graph in which we show
- Q(0) = the minimum (the smallest observation)
- Q(4) = the maximum (the largest observation)
- Q(1) and Q(3): the 1st and the 3rd quartile.

To calculate these quartiles from raw data we use the function QUARTILE.

Descriptive Statistics and EXCEL E. Omey    HUB BBA 2009       32
We get a form and to get the 1st quartile, we enter “1” in the second bar of the table:

We get Q(1) = 2,1675.

The other quartiles can be calculated by replacing “1” by “0”, or “3” or “4”.

We get the following numbers:

Quartiles
0                   0,06
1                2,1675
2                 4,675
3                9,5225
4                 32,37

To construct the box plot, we HAVE TO give the numbers in the follwing order:
Q(1)                2,1675
Q(0)                   0,06
Q(4)                 32,37
Q(3)                9,5225

Now we select the numbers and in the chart wizard we choose STOCK:

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009                 33
In STCOK, we choose the second picture of the first row:

After „next‟, we get:

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009   34
In this frame, we choose ROWS in the place of columns, and then get:

When going to the finish, we get the desired box plot:

Descriptive Statistics and EXCEL E. Omey       HUB BBA 2009   35
35
30

25
Series1
20                                                              Series2
15                                                              Series3
Series4
10

5
0
1

We can remove the unnecessary things, use other types of lines and find for example:

35

30

25

20

15

10

5

0

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009                 36
4. STARTING FROM GROUPED DATA
In many cases we don‟t get the raw data, but the data are presented in a frequency table.
To calculate the relevant statistics here, we use the built-in functions of EXCEL, cf. § 1.2.

We use the following example.

Salary classes
(x1000)               Abs.Freq
] 8,10]                      5
]10,12]                     10
]12,14]                     30
]14,16]                     55
]16,18]                     35
]18, 20]                    29
]20,22]                     24
]22,24]                      7
total              195

The BINS here are the endpoints of the classes. In the terminology of BINS, we can make
the following new table:

BINS        AbsFreq
8         0
10         5
12        10
14        30
16        55
18        35
20        29
22        24
24         7
195

This table can be used to draw a histogram or to draw the EDF as before.

To calculate some of the relevant descriptive statistics, in the frequency table we add a
new column with the CLASS CENTRES. After this construction, we find:

Descriptive Statistics and EXCEL E. Omey        HUB BBA 2009                   37
Salary         Class
classes        Centre        Abs.Freq
] 8,10]                  9          5
]10,12]                 11         10
]12,14]                 13         30
]14,16]                 15         55
]16,18]                 17         35
]18, 20]                19         29
]20,22]                 21         24
]22,24]                 23          7
195

Now we can calculate the mean and the variance.

The mean is given by

Mean = (9 * 5 + 11 * 10 + … + 23 * 7)/195
= sumproduct( centres and frequencies)/ n

The sumproduct function is available in the function wizard of EXCEL.

- we activate the function wizard:

- In the mathematical functions we find the function SUMPRODUCT:

Descriptive Statistics and EXCEL E. Omey      HUB BBA 2009   38
- After OK, we get

- In the first array, we select the class centres, in the second array we select the absolute
frequencies:

Descriptive Statistics and EXCEL E. Omey             HUB BBA 2009                 39
- After OK, we get the result: 3181

- Of course, we need also to divide by 195 to get the mean value 16,3128.

To calculate the variance, we first have to calculate the second moment.

The second moment is given by

Second moment          = (9² * 5 + 11² * 10 + … + 23² * 7)/195
= sumproduct( centers and centers and frequencies)/ n

Now we use the sumproduct function, but now in the first array and in the second array,
we select the class centers, in the 3rd array we select the absolute frequencies. Now we get
the following result:

Descriptive Statistics and EXCEL E. Omey          HUB BBA 2009                 40
- We have to divide the result 53931 by 195 to find the second moment.
Here we obtain the second moment = 276,5692

-The variance is given by
s²      = second moment – (mean)²
= 276,5692 – (16,3128)²
= 10,46112

- The standard deviation is given by the square root of the variance:

s       = sqrt(10,46112)
= 3,234

To calculate other relevant parameters such as median and the quartiles, for GROUPED
data, there are no special functions available in EXCEL.

Descriptive Statistics and EXCEL E. Omey         HUB BBA 2009          41

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 20 posted: 8/30/2010 language: English pages: 41