# Histograms Worksheets

Document Sample

```					Organization, Summarization, and Presentation of Data
Le Chapter 2.1

This workbook contains 14 worksheets (the last 4 are supplemental)

Topics covered:
frequency tables
histograms
frequency polygons
cumulative relative frequency polygons
Some Excel functions
Histograms with unequal interval widths

Worked examples from the book:
Example 2.1
Example 2.2
Example 2.3
Example 2.4
Some useful Excel functions

30       For the data shown here we illustrate some useful functions in Excel
44       To find these functions you open a cell and either
28        - type in the function name
42        - go to Insert/Paste Function and choose the Function you want
23       (look up Paste functions in Introductoin to Excel given under Lecture 1)
43

6       COUNT(range)
23       MIN(range)
44       MAX(range)
35       AVERAGE(range)
210       SUM(range)
36       MEDIAN(range)
Frequencies using Raw or Detailed Data

cumulative frequency = total numbers up to a certain boundary
relative cumulative frequency = total proportion up to a certain boundary

Weights in Pounds of 57 Children in a Daycare Center         (page 58)
Weights of Children in Daycare
68           63          42          27          30         36            28      32   79         27
22           23          24          25          44         65            43      25   74         51
36           42          28          31          28         25            45      12   57         51
12           32          49          38          42         27            31      50   38         21
16           24          69          47          23         22            43      27   49         28
23           19          46          30          43         49            12

Largest number:                 79                uses maximum function = MAX(range)
Smallest number:                12                uses minimum function = MIN(range)
Difference:                    67

If we used 5 intervals, their width would be:          13.4
If we used 15 intervals, their width would be:          4.5

We chose a width of 10, and use the Frequency command to count.

Cumulative
Chosen        Upper      Cumulative           Relative  Relative                     uses frequency function = frequency(range, upperlimit value)
Intervals      Limit      Frequency Frequency Frequency Frequency
Notes: this gives cumulative frequency up to the limit point
10-19                19.5            5           5      8.8%        8.8%                        also note the use of fixed cell labels
20-29                29.5           24          19     33.3%       42.1%
30-39                39.5           34          10     17.5%       59.6%                Frequency is found by subtracting cumulative frequencies
40-49                49.5           47          13     22.8%       82.5%
50-59                59.5           51           4      7.0%       89.5%
60-60                69.5           55           4      7.0%       96.5%                To construct frequency table without cumulative frequency
70-79                79.5           57           2      3.5%      100.0%
- set up limit column and highlight required frequency column
Total                                        57    100.0%                            - type in = frequency(range, upperlimit range)
- instead of enter, use shift-control-enter
Frequencies and Proportions
Frequencies = numbers of observations in each range
Relative frequencies = proportions of observations in each range

Example 2.1: Deaths by Age: Minnesota, 1987 (p 57)
The following table gives the number of deaths by age for the state of MN in 1987.

Range           Frequency     Relative Frequency
Age             No. of Deaths Proportion of Deaths
Deaths by Age, MN 1987
Less Than 1                564
1-4                         86                                                                      12,000
5-14                       127
10,000
15-24                      490

Number of People
25-34                      667                                                                       8,000
35-44                      806
6,000
45-54                    1,425
55-64                    3,511                                                                       4,000
65-74                    6,932
2,000
75-84                   10,101
85 and Over              9,825                                                                          0
Total                 34,534

Q: Find the proportions of deaths in each age group
Age

Q: Graph the proportions as a bar chart
Frequency chart (histogram)
We can show the data for the weights of 57 children graphically using a histogram.

Weights of Daycare Children

20

18

16

14
Number of Children

12

10   2
2                        8

6

4

2

0
10-19   20-29   30-39   40-49   50-59   60-60   70-79
Weight Range

Practice making a histogram from the weight data

Intervals Frequency
10-19           5                                              1. Highlight the data in cells B35:C41 (don't include the headings)
20-29          19                                              2. Click on the graph icon on the menu bar: Chart Wizard
30-39          10                                              3. Select the column chart type
40-49          13                                              4. Click on 'Next' for preview, Click on 'Next' again
50-59           4                                              5. Add title and axis labels. Under 'Legend' unselect 'Show legend' Click on Next
60-60           4                                              6. Save as an object in this worksheet - you will need to move and resize the graph
70-79           2                                              7. Double-click on any column, go to Options under 'Format Data Series' and reduce gap to 0.
8. Does your histogram resemble the one above? You may need to resize your chart.
Frequency polygon for the weights of 57 children
1. Add two new intervals to the data for weights: 0-9 and 80-89
2. Both of these have frequency = 0
3. Highlight the data in cells A11:B19
4. Select Chart Type: Line
6. Resize as needed

Interval   Frequency
0-9              0                                                                            Frequency Polygon of Weights
10-19              5
20-29             19                                                          20
30-39             10                                                          18
40-49             13                                                          16
14
50-59              4

Frequency
12
60-60              4                                                          10
70-79              2                                                           8
80-89              0                                                           6
4
2
Practice going through these steps to make a                                   0
Frequency polygon                                                                          0-9 10-19 20-29 30-39 40-49 50-59 60-60 70-79 80-89
Weight (lbs)

To impose a frequency polygon over a histogram
1. Add a third column to the data for the frequency polygon - the frequency counts repeated
2. Highlight data and click on the Chart Wizard. Go to the 'Custom Types' tab and select 'Line-Column' chart type
3. Continue with titles, remove legend, as before.

Interval   Frequency                                                                            Histogram and Frequency Polygon
0-9              0          0
10-19              5          5                                                            20
20-29             19         19                                                            18
30-39             10         10                                                            16
40-49             13         13                                                            14
Frequency

50-59              4          4                                                            12
60-60              4          4                                                            10
70-79              2          2                                                             8
80-89              0          0                                                             6
4
2
Practice going through these steps to make a                                                0
Frequency polygon and histogram on the same chart
0-9 10-1920-2930-3940-4950-5960-6070-7980-89

Weight (lbs)
Frequency Table, Histogram and Frequency Polygon for Ex. 2.3
Example 2.3: Effects of Exercise on the Menstrual Cycle (page 60-61)
We obtained the menarchal age (in years) or 56 female swimmers who began their swimming training after they had reached menarche
these served as controls to compare with those who began their training prior to menarche.

Age at Menarche
14.0                           16.1            13.4     14.6                    13.7         13.2           13.7   14.3
12.9                           14.1            15.1     14.8                    12.8         14.2           14.1   13.6
14.2                           15.8            12.7     15.6                    14.1         13.0           12.9   15.1
15.0                           13.6            14.2     13.8                    12.7         15.3           14.1   13.5
15.3                           12.6            13.8     14.4                    12.9         14.6           15.0   13.8
13.0                           14.1            13.8     14.2                    13.6         14.1           14.5   13.1
12.8                           14.3            14.2     13.5                    14.1         13.6           12.4   15.1

Largest number:                                                     16.1
Smallest number:                                                    12.4
Difference = R:                                                     3.7

If we used five intervals, they would have a width of:                                                                  0.74
If we used fifteen intervals, they would have a width of:                                                               0.25

We chose an interval width of 0.5.

Cumulative
Chosen                          Upper         Cumulative           Relative  Relative
Intervals                        Limit         Frequency Frequency Frequency Frequency

12.0-12.4                              12.45               1               1        1.8%            1.8%
12.5-12.9                              12.95               9               8       14.3%           16.1%
13.0-13.4                              13.45              14               5        8.9%           25.0%
13.5-13.9                              13.95              26              12       21.4%           46.4%
14.0-14.4                              14.45              42              16       28.6%           75.0%
14.5-14.9                              14.95              46               4        7.1%           82.1%
15.0-15.5                              15.45              53               7       12.5%           94.6%
15.5-15.9                              15.95              55               2        3.6%           98.2%
16.0-16.4                              16.45              56               1        1.8%          100.0%

Total                                                                   56      100.0%

We can show the data graphically using a histogram or a frequency polygon.

Age at Menarche                                                                                              Age at Menarche

18                                                                                                              18

16                                                                                                              16

14                                                                                                              14
Number of Children

Number of Children

12                                                                                                              12

10                                                                                                              10

8                                                                                                               8

6                                                                                                               6

4                                                                                                               4

2                                                                                                               2
0                                                                                                               0
12.0-    12.5-   13.0-    13.5-   14.0-    14.5-   15.0-   15.5-    16.0-                                     12.0-   12.5-   13.0-    13.5-   14.0-   14.5-   15.0-   15.5-   16.0-
12.4     12.9    13.4     13.9    14.4     14.9    15.5    15.9     16.4                                      12.4    12.9    13.4     13.9    14.4    14.9    15.5    15.9    16.4
Age at Menarche, in Years                                                                                   Age at Menarche, in Years
Frequency Table, Histogram and Frequency Polygon for Ex. 2.4
Example 2.4: Percent saturation of Bile (page 63-64
The following provides data on percentage saturation of bile for 31 male patients.

Percent Saturation of Bile
40           90      88           106                        58
86          112      65           110                        88
111           52      79            78                        73
86           88      87            80                       118
106          137      56            47                        67
66           88     110            74                        57
123

We chose an interval width of 10%.

Chosen      Upper        Cumulative                   Relative
Intervals    Limit        Frequency         Frequency Frequency

40-49%         49.5                     2           2     6.5%
50-59%         59.5                     6           4    12.9%
60-69%         69.5                     9           3     9.7%
70-79%         79.5                    13           4    12.9%
80-89%         89.5                    21           8    25.8%
90-99%         99.5                    22           1     3.2%
100-109%      109.5                    24           2     6.5%
110-119%      119.5                    29           5    16.1%
120-129%      129.5                    30           1     3.2%
130-139%      139.5                    31           1     3.2%

Total                                            31    100.0%

We can show the data graphically using a histogram or a frequency polygon.

Percent Saturation of Bile                                                             Percent Saturation of Bile

9                                                                                          9

8                                                                                          8

7                                                                                          7
Number of Subjects

Number of Subjects

6                                                                                          6

5                                                                                          5

4                                                                                          4

3                                                                                          3
2                                                                                          2
1                                                                                          1
0                                                                                          0

Percent Saturation of Bile, in 10% Intervals                                           Percent Saturation of Bile, in 10% Intervals
Cumulative Relative Frequency Graph for Weights of 57 children
The data in Table 2.3 (pg 66) is used for this example
Cumulative relative frequency should be in decimal notation when used for graphing
This is an x-y scatter plot (with subtype: points joined using straight-lines)
take the upper-boundary of each class on the x-axis and
relative cumulative frequency on the y-axis

upper b.   Cum. Relative freq.
19.5    0.087719298                                                              Cumulative Relative Frequency Graph
29.5    0.421052632                                                   100%
39.5    0.596491228                                                    90%
49.5    0.824561404                   Cumulative Relative frequency    80%
70%
59.5    0.894736842                                                    60%
69.5    0.964912281                                                    50%
40%
79.5               1                                                   30%
20%
10%
0%
0         20        40                                          60           80   100
Weight

This gives us an approximate idea what the graph should look like, with just a few summary points based on the histogram.
If you want to use all the data and get a more exact result, follow these steps.

Step 1) Copy the raw data into a single column.
Step 2) Sort the data. To do this, begin by highlighting the column of data from Step 1.
Then enter menu commands Data / Sort .
Step 3) Create a column that numbers the children, in order, from 1-57.
Step 4) Calculate the cumulative percent.
Step 5) Create a line chart, using the output from Step 2 as the x-axis and using the output from Step 4 as the y-axis.

Step 3       Step 4
Step 1        Step 2        Number      Cum. Pct
Copy           Sort                1        1.75%
68                12          2        3.51%                                                                                                                 Cumulative Distribution of Weights
63                12          3        5.26%
42                12          4        7.02%
120.00%
27                16          5        8.77%
30                19          6       10.53%
36                21          7       12.28%                                                                                               100.00%
Percent with that Weight or Less

28                22          8       14.04%
32                22          9       15.79%
79                23        10        17.54%                                                                                                80.00%
27                23        11        19.30%
12                23        12        21.05%
60.00%
32                24        13        22.81%
49                24        14        24.56%
38                25        15        26.32%                                                                                                40.00%
42                25        16        28.07%
27                25        17        29.82%
31                27        18        31.58%                                                                                                20.00%
50                27        19        33.33%
38                27        20        35.09%
0.00%
21                27        21        36.84%
0          20           40              60          80   100
22                28        22        38.60%
Weight, in Pounds, of 57 Childern
23                28        23        40.35%
24                28        24        42.11%
25                28        25        43.86%
44                30        26        45.61%
65                30        27        47.37%
43                31        28        49.12%
25                31        29        50.88%
74                32        30        52.63%
51                32        31        54.39%
16                36        32        56.14%
24                36        33        57.89%
69                38        34        59.65%
47                38        35        61.40%
23                42        36        63.16%
22                42        37        64.91%
43                42        38        66.67%
27                43        39        68.42%
49                43        40        70.18%
28                43        41        71.93%
36                44        42        73.68%
42                45        43        75.44%
28                46        44        77.19%
31                47        45        78.95%
28                49        46        80.70%
25                49        47        82.46%
45                49        48        84.21%
12                50        49        85.96%
57                51        50        87.72%
51                51        51        89.47%
23                57        52        91.23%
19                63        53        92.98%
46                65        54        94.74%
30                68        55        96.49%
43                69        56        98.25%
49                74        57       100.00%
12                79
In a histogram, the area of the each bar is proportional to the frequency of the corresponding class interval.
When the class widths are equal we usually take the frequency as the height of each bar.
However, when there are unequal class intervals, we need to adjust the height of bars appropriately
so that the area of the bars remain proportional to the frequency of the class intervals.

One way to adjust the height is to take the frequency density of the class interval using the simple formula
frequency density = frequency / class width
When we take the frequency density as the height of the bars, the area of each bar is exactly equal
(and proportional) to the frequency it represents

Another way to adjust the height is to take the relative frequency density of the class interval
using the simple formula
relative frequency density = relative frequency / class width
where relative frequency = frequency /total frequency
When we take the relative frequency density as the height of the bars, the area is still proportional
to the frequency it represents while the TOTAL area of all the bars is exactly equal to 1.

The following worksheets: Histogram 1 and Histogram 2 give examples of histograms with unequal interval widths.
These are available for your information but are not required for this course.
Upper limit                       Frequency Class width Freq. Density relative Freq. rel. freq. Density
0                                  0                         0.00           0.00                0.00
5                                  3         5               0.60           0.06                0.01
10                                  6         5               1.20           0.11                0.02
20                                 10        10               1.00           0.19                0.02
30                                 15        10               1.50           0.28                0.03
40                                 10        10               1.00           0.19                0.02
50                                  5        10               0.50           0.09                0.01
70                                  5        20               0.25           0.09                0.00                                                 (Copy the formulae down if you need more intervals)

total                                       54                           6.05                                  1.00                          0.11

Histogram using freq. density                                                                 Histogram using relative f.d.                                                40
1.6                                                                                  0.03                                                                                    40

Relative frequency density
1.4                                                                                 0.025                                                                                    50
50
Frequency density

1.2
0.02                                                                                    50
1
70
0.8                                                                                 0.015
70
0.6                                                                                  0.01                                                                                    70
0.4                                                                                                                                                                           0
0.005                                                                                     0
0.2
0
0
0
0
-20         0        20   40   60       80      100                                -20            0         20          40     60           80           100
0
0
0
0
0      0                 0
0      0                 0
0     0.6          0.011111
5     0.6          0.011111
5      0                 0
5     1.2          0.022222
10     1.2          0.022222
10      0                 0
10      1           0.018519
20      1           0.018519
20      0                 0
20     1.5          0.027778
30     1.5          0.027778
30      0                 0
30      1          0.018519
40      1          0.018519
40      0                 0
40    0.5          0.009259
50    0.5          0.009259
50      0                 0
50   0.25           0.00463
70   0.25           0.00463
70      0                 0
70      0                 0
0      0                 0
0      0                 0
0      0                 0
0      0                 0
0      0                 0
0      0                 0
0      0                 0
0      0                 0
-5             0                 0
2.5          0.60              0.01
7.5          1.20              0.02
15          1.00              0.02
25          1.50              0.03
35          1.00              0.02
45          0.50              0.01
60          0.25              0.00
60          0.25              0.00
60          0.25              0.00
60          0.25              0.00
80             0                 0
The spreadsheet "histogram 1" contains a table and two charts.

Since the standard chart wizard does not allow us to draw a histogram with unequal class widths,
XY scatter plots have been customised to draw the histograms and the frequency polygons

You can enter the upper class boundaries and the corresponding frequencies in the cells shaded in blue.
In addition, you must also specify a number in the cell shaded in green which gives the lower class boundary
of the first class interval. i.e. where on the x-axis you want the histogram to start.
You can specify up to 10 upper class boundaries. Leave cells at the bottom of the table blank if you have less
than 10 intervals, by selecting the cells and pressing the delete button.
If you need to include more intervals just copy the formulae in columns C to F down the required cells

Click on the other cells in the table to see the formula used to calculate the frequency density,
relative frequency and the frequency density. The table of values used to construct the histogram and
the frequency polygon can be viewed in the cell range P1 to T44

Notice that the shape of the two histograms are the same although the heights are on a different scale.
When you make changes to the table, the frequency polygon may dissappear temporarily. Click on the chart.

A word of caution: To avoid writing over cells containing formulae (unless you want to change or delete them)
you can restrict changes only to the cells that have been shaded.
Raw data in column A and upper limits (cells shaded in blue in column C) to be entered by the user.

DATA         Upper limit    Frequency      Freq. Density
12             0              0                 0
8              5              2                0.4
20            10              1                0.2                                                    Histogram
35            20              8                0.8                                    0.9
12            30              2                0.2                                    0.8
5             40              2                0.2                                    0.7

Frequency density
3             50              0                 0                                     0.6
18            60              1                0.1                                    0.5
12            80              2                0.1                                    0.4
92            100             4                0.2                                    0.3
98            100             0                 0                                     0.2
98           total            22                                                      0.1
12                                                                                     0
79                                                                                          0   20   40    60     80   100   120
34
100
20.3
23         total number of data points
12                           22
19
54
72

10
10
10
0     0
0     0
0   0.4
5   0.4
5     0
5   0.2
10   0.2
10     0
10   0.8
20   0.8
20     0
20   0.2
30   0.2
30     0
30   0.2
40   0.2
40     0
40     0
50     0
50     0
50   0.1
60   0.1
60     0
60   0.1
80   0.1
80     0
80   0.2
100   0.2
100     0
100     0
100     0
100     0
100     0
The sheet "histogram 2" allows you to enter raw data into the column A.
When you specify the lower boundary of the first class in the green cell and the upper class boundaries in the
blue cells in the table, the formula used column D records the frequency counts from the raw data

The formula used to calculate the frequency denisty is in the form a/(b-c) where a is the frequency count
and (b-c) is the class width found from the difference between the class boundaries.
Remember that the formula for the frequency density is the frequency divided by the class width.

If your class intervals are of equal width and you want to use the actual frequency for the height of the bars
then you can enter a new formula in the frequency density column to read the corresponding frequency
from the cell to its immediate left. (eg. in cell E2 you enter the formula =D2 and copy down)

The cell D20 keeps count of the number of data points as you enter data in column A. If this total is
not the same as the total frequency in the table, then the boundaries used may not include all the data points.

If you need less than 10 intervals, fill in the redundant cells in column C of the table with the upper boundary
of the last class interval. (You cannot delete a part of the table because of the way the frequency table is set up)

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 526 posted: 5/9/2011 language: English pages: 18
Description: Histograms Worksheets document sample
How are you planning on using Docstoc?