JMP Tutorial _3 – Summaries for a Single Numerical Variable by mifei

VIEWS: 24 PAGES: 12

									JMP Tutorial #3 – Summaries for a Single Numerical
Variable
    Data File: Sleep-time.JMP
Background: These data come from a study comparing the time it takes for
                smokers and non-smokers to fall asleep.
  Variables: Sleep-time.JMP
                  > Smoking Status - smoker or non-smoker
                  > Sleep Time - time to fall asleep

There are three basic things needed to sufficiently describe a numerical variable. These
things are measure of location, measure of variability, and measure of shape.

To begin, select Analyze > Distribution. In the Distribution window that appears, place
Sleep Time in the Y, Columns box as follows.




To obtain an extended list of summaries, right click on diamond next to Moments and
select Display Options > More Moments.
Measures of Location:

There are several measures of location.




           Number                          Description
                    The total sample size (n = 84).
               1    Note: JMP uses capital N to denote sample size.


                    The sample mean for all individuals in the study is 20.48
               2
                    minutes to fall asleep.

                    The sample median is 20.45 minutes. This says 50% of
                    the individuals in the study fell asleep in less than 20.45
                    minutes and 50% of the individuals to longer. Note also
               3    that the sample mean and sample median are very close
                    in value indicating that the distribution of times to fall
                    asleep is nearly symmetric.

                    The smallest observed time to fall asleep was 15
               4
                    minutes.
                    The largest observed time to fall asleep was 25.8
               5    minutes. The range is therefore 10.8 minutes (25.8 –
                    15.0).
                     The first quartile or 25th percentile/quantile is 18.025
                    minutes which says that 25% of the individuals in our
               6
                    sample fell asleep before 18.025 minutes and 75% of the
                    individuals took longer.
                    The third quartile or 75th percentile/quantile is 23.0
                    minutes which says that 75% of the individuals in our
               7    sample fell asleep before 23.0 minutes and 25% of the
                    individuals took longer. The interquartile range (IQR) is
                    the difference between the third and first quartiles:
                    IQR = 23.00 – 18.025 = 4.975 minutes

                    This is the range of the middle 50% of the data.



Measures of Variability:

There are also several measures of variability which are described next




                                                                           .



     Number                             Description
              The sample variance s  9.33 minutes2. This quantity has no real
                                      2

         1    interpretation.

              The sample standard deviation s = 3.054 minutes.

              We can use Chebyshev’s Theorem to say that at least 75% of
              individuals fall asleep between:
         2
               x  2  s  20.49  2  3.05  20.49  6.10  14.4 min. to 26.59 min.

              In actuality all individuals in our sample had times to fall asleep in
              this range. Thus the “at least” part of Chebyshev’s.
                                                            s    3.05
              The standard error of the mean is SE ( x )              .333 .
                                                             n     84
         3    The standard error of the mean gives an estimate of the precision of
              our sample mean. As a rule of thumb the sample mean give or take
              two standard errors gives a range of values that is very likely to cover
              the true population mean (  ) (  95% chance to be precise). Here
              this would give the following:

               x  2  SE( x )  20.49  2  .33  20.49  .66  19.83 min. to 21.15 min.


         4
              Range = Maximum – Smallest = 25.8 – 15.0 = 10.8 min.

              Inter-quartile Range = 75% Percentile - 25% Percentile = 4.975 min.
                    (IQR)
         5




Measures of Shape:

There are two basic visual displays for shape -- histogram and boxplot. These things are
usually displayed horizontal instead of vertically as done in JMP. Right click on Sleep
Time, select Display Options > Horizontal Layout to change the graph to landscape
view.
Number                           Description
         This is a histogram of the times to fall asleep. The
         distributional shape is almost uniform.

         You can change the number of bins/class intervals used
         to construct the histogram by changing the mouse to the
   1
         hand mode, holding down the left mouse button, and
         moving the mouse up and down.

         Disadvantage: Changing the bins may change your
         perception of shape. (** See hand tool below.)
         This is an outlier boxplot of the times to fall asleep.
   2
         There does not appear to be any outliers in these data.
         This bracket highlights where the most densely packed
   3     50% of the data lies.




Comment: You can change the number of bins in the histogram by clicking the hand
button [    ] on the menu bar, placing the cursor over the graph. Moving the cursor up
while holding down the left-mouse button will increase the number of bins, lowering it
will decrease the number of bins. Click on the arrow icon [ ] to turn this feature off.


Before we discuss measures of shape, consider the following descriptions.

The most common distribution is the normal distribution which is bell-shaped. A picture
of a normal distribution is given here.




The normal distribution is symmetric because the shape above and below the center is the
same. A distribution that does not have the same shape above and below the center is a
skewed distribution.
                             Pictures of Skewed Distributions

                         Skewed Right               Skewed Left




Kurtosis measures the steepness of the high point relative to the normal distribution.


                                  Understanding Kurtosis

                       Negative Kurtosis          Positive Kurtosis




Interpreting these values for our dataset.
            Number                             Description
                     We have slight skewness to the left because the
                     skewness statistics is negative. However it is very near 0
               1
                     so it is better in this case to say that the distribution is
                     nearly symmetric.
                     The kurtosis is negative indicating that this distribution is
               2     less peaked than the normal distribution.



Many statistical procedures require that the distribution of all measurements take on a
certain form. For example, a common assumption to many procedures is that the
measurements follow a normal distribution. JMP allows us to visually compare our
estimated or empirical distribution to serveral common distributions.

To check how closely the data follows a normal distribution, right click on the header of
the histogram/boxplot chart, select Fit Distribution > Normal.




The histogram now contains a red line for the best fitting normal distribution for that
data.
How well does this data fit a normal distribution?

Not very well is this case. Our distribution is more spread out and less peaked than the
normal ideal.

A smoothed histogram can also help in understanding distributional shape. To obtain a
smoothed curve estimate select Smooth Curve from the Fit Distribution pull-out menu.




Here we see that there normal curve and smoothed histogram curve do not match very
well. One would not characterize the times to fall asleep as having a normal distribution.

Comparative Displays:
Suppose the goal is to compare/contrast the time to fall asleep between smokers and non-
smokers. This can be done fairly easily in JMP.

To begin, select Analyze > Distribution. Place Sleep Time in the Y, Columns box and
place Smoking Status in the By box. Click OK.
JMP returns the following output
Notice that the histograms have different scaling on the horizontal axes. To put them on
equal scaling we can change the range of the histogram for non-smokers to go from 14 to
26 minutes also. To do this in JMP select Distributions > Uniform Scaling as shown
below.




The resulting histograms with uniform horizontal axis scaling are shown below.
One of the best ways to compare a single numerical variables (sleep time) across another
categorical variable (smoking status) is to create side-by-side boxplots.

This is done in JMP by selecting Analyze > Fit Y by X. Place Sleep Time in the Y,
Response box, place Smoking Status in the X, Factor box, and click OK.




JMP first gives just dotplot of the sleep times, to put boxes over the points, right click on
the header of the graph, select Display Options, and click Box Plots and Points Jittered.
What do we see in this plot? What are the similarities? What are the differences?

								
To top