Docstoc

Methodology

Document Sample
Methodology Powered By Docstoc
					Multiple-choice example




                          1
Solution
           • The mean and
             median would very
             RARELY have
             EXACTLY the same
             value.
           • The median is
             indeed the middle
             score (the 50th
             percentile).
           • The median is
             LESS vulnerable.
           • We have our
             answer.



                          2
Example




          3
Solution

      • A grouping variable is a
        set of CODE NUMBERS,
        not real scores.
      • Yes.
      • It’s absolutely essential.
      • We have our answer.




                              4
              Last week
1. How to enter data into SPSS.
2. How to obtain various displays and
   graphs of the Caffeine data.




                                        5
      The SPSS Data Editor
   Works in two display modes:

1. VARIABLE VIEW, in which you enter
   information about the variables in your
   data set;
2. DATA VIEW, in which you enter your
   data.

                                             6
      Entering data into SPSS
•   When entering data into SPSS, begin in
    Variable View.
•   Good work in Variable View confers
    benefits both at the stage of data entry
    and when you are viewing the output.




                                               7
      Variable View completed
                            Actually, value labels




• Variable View controls the appearance of Data
  View.
• Variable View also controls the appearance of
  the output. The graphs I produced with SPSS all
  had clear labels, for example.
• Variable View creates your WORKING
  ENVIRONMENT.
                                                     8
 Between subjects experiments
• In the caffeine experiment, each of the
  participants in an experiment is tested
  under only ONE of the conditions making
  up the independent variable.
• In this experiment, the conditions making
  up the independent variable are said to
  vary BETWEEN SUBJECTS, and the
  experiment is said to be of BETWEEN
  SUBJECTS design.

                                              9
   Within subjects experiments

• In WITHIN SUBJECTS experiments, each
  participant is tested under ALL conditions.
• The Stroop experiment, which I described
  in the first lecture, was of within subjects
  design: each participant performed under
  all three conditions.


                                             10
          Grouping variables
• When entering data from a between subjects
  experiment, you need a GROUPING VARIABLE.
• A GROUPING VARIABLE is a column of code
  numbers or VALUES, each number representing
  the condition under which a score was achieved.
• We can let 1 = ‘Placebo’ and 2 = ‘Caffeine’,
  where 1 and 2 are VALUES and ‘Placebo’ and
  ‘Caffeine’ are VALUE LABELS.


                                               11
Part of Data View
         • In Data View, all the
           scores from the
           experiment appear as
           a SINGLE COLUMN
           of numbers.
         • The Group variable
           appears as a single
           column of CODE
           NUMBERS.

                               12
Seeing the value labels
            • To see the value
              labels in Data View
              (instead of the
              values),click Value
              Labels in the View
              menu.
            • Seeing the value
              labels helps you avoid
              transcription errors
              when inputting data.
                                  13
      Graphs and displays

  We looked at two kinds of graphs:
1.those that are pictures of
  DISTRIBUTIONS;
2.those that SUMMARISE DISTRIBUTIONS
  by picturing the statistics.



                                   14
          Histograms and
        Stem-&-leaf displays

• Histograms and stem-and-leaf displays
  are pictures of DISTRIBUTIONS.




                                          15
A histogram




              16
Stem-and-leaf display




                        17
    Bar charts and box plots


• BAR CHARTS and BOX PLOTS are
  pictures of the STATISTICS of
  distributions.




                                  18
Bar chart (with error bars)


          standard
          deviations

             means




                              19
         Types of bar chart
   SPSS offers three types of bar chart:
1. Simple
2. Clustered
3. Stacked

   I do not like stacked bar charts, so I shall
   consider only the Simple and Clustered
   types.
                                              20
  Suppose there are two grouping
     variables in your data set
• We have run another caffeine experiment.
• But this time, we want to include the
  participant’s gender in the data set.
• We add a second grouping variable,
  Gender, where 1 = Male and 2 = Female.
• As well as comparing the performance of
  the Caffeine and Placebo groups, we also
  want to compare the mean scores of the
  female and male participants.
                                         21
Two grouping variables




                         22
Clustered bar chart




                      23
Explanation
      • Here the ‘category
        variable’ is Experimental
        Condition.
      • The ‘Cluster variable’ is
        Gender.
      • It looks as if the males
        are a little better under
        the Placebo condition.
      • It looks as if the females
        are a little better under
        the Caffeine Condition.

                                     24
Boxplots

Extreme score



    Upper
   quartiles      whiskers

    medians


     Lower
    quartiles

                Outlier



                             25
     Lecture 6

Descriptive statistics
    with SPSS




                         26
         The three most important
         properties of a distribution

1.   Its typical value, AVERAGE or CENTRAL TENDENCY,
     measured by the MEAN, the MEDIAN and the MODE.
2.   The SPREAD or DISPERSION of scores around the
     average value, measured by the STANDARD
     DEVIATION and RANGE STATISTICS such as the
     SIMPLE RANGE, the INTERQUARTILE and the SEMI-
     INTERQUARTILE RANGES.
3.   The SHAPE of the distribution.



                                                   27
Results of the caffeine experiment




                                 28
 Obtaining descriptive statistics
• Descriptive statistics such as the mean
  and standard deviation can be obtained
  through a variety of SPSS procedures.
• The MEANS procedure gives the means
  and standard deviations of data in different
  groups.



                                             29
Finding the Means procedure




                              30
               The Means procedure
The Desciptive Statistics
procedure does not allow
comparison between groups   • Our data set has a
and is inappropriate for      grouping variable. The
grouped data.                 Means option in Compare
                              Means is the best one to
                              start with.
                            • Note that the Descriptive
                              Statistics procedure is
                              only suitable for data sets
                              that have no grouping
                              variables.
                            • Descriptives would only
                              calculate the statistics of
                              ALL the scores.
                                                       31
                The Means dialog box

The variables in
your data set are
listed here initially.
Select an item by
clicking on it: it will
appear in inverse
video.
The central arrows
will come live and
you can use one of
them to transfer
the item to the
appropriate box.
                           Click here to order additional statistics.
                                                                32
                  The Options dialog box
To add extra items,
click the first (to
select it) then, holding
the Control button
down, click on the
other items you want
to include.


Click the black arrow
in the central pillar to
transfer the statistics
to the right hand
panel.



   Click the Continue
   button to return to                     33
   the Means dialog.
The SPSS Output Viewer




                         34
              The output
• There are two tables.
• The Case Processing Summary warns you
  if SPSS has failed to include any data in
  the analysis.
• The Report is a table containing the
  statistics you asked for.



                                         35
         The Case Summary




• Clearly, SPSS found no problems with these
  data.
• All the data have been included in the analysis.


                                                     36
                The Report




• The Placebo and Caffeine means are 9.25 and
  11.90, respectively. The corresponding medians
  are 9.00 and 12.50.
• As we saw before, the simple range of values in
  either distribution is 15 hits.
                                               37
        Editing SPSS output
• If you don’t like a table in the Output, you
  can alter its appearance dramatically.
• You can remove information you do not
  need.
• You can change the shape of the table.
• The first move is to double-click on the
  image in the Viewer.

                                                 38
          Changing the table




• Double-click on the item concerned, to produce
  a shaded border.



                                                   39
       The Output Editor
Click on Pivot to
access the Pivot menu.




                           40
The Pivot menu




                 41
A more compact table




                       42
Leaving the Editor

         • Click the File drop-
           down menu (in the
           Editor) and choose
           Close.
         • Click anywhere in the
           Viewer outside the
           rectangle, which will
           then disappear.

                               43
Some revision




                44
             Percentiles
• A PERCENTILE is the VALUE or SCORE
  below which a specified percentage or
  proportion of the distribution lies.
• The 30th percentile is the value below
  which 30% of the distribution lies.
• The 70th percentile is the value below
  which 70% of scores lie.


                                           45
The 30th and 70th percentiles


                       (0.70)

          0.30

          30th
          percentile




                       0.70

                                (0.30)
                                  70th percentile   46
The median is the 50th percentile




             0.50       0.50




             50th percentile
               (median)



                                    47
     The 2.5th and 97.5th percentiles
                                            • The distribution of IQ has
                                              a mean of 100 and an SD
                                              of 15.
                                            • 130 is 2 SD’s above the
                                              mean.
                  0.95                      • Below that value lies 0.95
                                              + 0.025 = 0.975 or 97.5%
                 (95%)                        of the distribution.
2 ½ % = .025                 2 ½ % = .025   • So 130 is the 97.5th
                                              percentile.
                 mean                       • Similarly the value 70 lies
                                              2SD’s below the mean.
mean – 1.96×SD          mean +1.96×SD         Only .025 = 2.5% of
                                              values lie below that. An
                                              IQ of 70 is the 2.5th
                                              percentile.           48
     Populations and samples
• I measure the IQs of 20 people.
• The POPULATION is the reference set
  containing the reaction times of ALL
  people.
• The 20 IQs I have are merely a subset or
  SAMPLE from the population.
• Suppose I calculate their mean and SD.
• I might find that M = 108.0 and SD = 16.1.
                                           49
    Statistics versus parameters

•  STATISTICS are characteristics of
   SAMPLES; PARAMETERS are
   characteristics of POPULATIONS.
• A normal population has two parameters:
1. the mean;
2. the standard deviation.
• The IQ population has a mean of 100
   and an SD of 15.
                                        50
       The meaning of ‘error’
• The population mean and standard devation are
  100 and 15, respectively; whereas the sample
  mean and standard deviation are 108.0 and
  16.1.
• We usually study samples in order to make
  inferences about the populations from which
  they have been drawn.
• But if the statistics of samples are taken as
  ESTIMATES of parameters, they are likely to be
  in error, to at least some extent.

                                               51
         Sampling variability
• Population characteristics are fixed.
• But the values of the corresponding
  statistics wobble around the values of the
  parameters.
• This ‘wobble’ is known as SAMPLING
  VARIABILITY or SAMPLING ERROR.



                                               52
         Sampling variability
• Suppose I take 25 people’s IQs and
  calculate the mean. It might be 95.1 .
• I do this 4000 times, calculating the value
  of the mean each time.
• I command SPSS to draw a histogram of
  the distribution of the MEANS of the 4000
  samples.


                                                53
Distribution of the sample mean




                              54
Distribution of the sample mean
                • The values of the
                  mean range from 89.9
                  to 110.0 .
                • SAMPLING implies
                  SAMPLING
                  VARIABILITY or
                  SAMPLING ERROR.




                                    55
    The laws of large numbers
• A set of theorems in probability theory.
• Their import is that if you make the sample
  size large enough, the mean of the sample
  will be as close as you wish to the value of
  the population mean.
• So if we take the mean and standard
  deviation of 4000 IQs, their values should
  be close to 100 and 15, respectively.
                                             56
Distribution of 4000 IQs




                           57
The 2.5th and 97.5th percentiles
• Since this is a very large sample from a
  normal population, we should find that the
  97.5th percentile is close to 130 (2 SDs
  above the mean) and the 2.5th percentile is
  close to 70 (2 SDs below the mean).
• To obtain these percentiles, we can use
  SPSS’s Freqencies procedure.


                                           58
The Frequencies procedure
             • It’s in the Descriptive
               Statistics menu.
             • Click on
               Frequencies… to
               enter the Frequencies
               dialog box.




                                     59
      The Frequencies dialog



Click on the Display
frequency tables box to
remove the arrow. This
avoids a huge output of
cumulative frequencies.




                               60
The output
     • As you would expect,
       the mean is close to
       100.
     • The mean and
       median have similar
       values: 99.90 to 2
       places of decimals.
     • The 2.5th and 97.5th
       percentiles are close
       to 70 and 130,
       respectively.
                           61
                 Exercises
• I have made the SPSS data file for the 4000 IQs
  available on my Website.
• Command SPSS to draw a histogram of the
  distribution.
• Use the Descriptives procedure to find the mean
  and standard deviation of the distribution.
• I suggest you try the Frequencies procedure to
  obtain the 2.5th and 97.5th percentiles.


                                                62
          The caffeine data
• I have also brought the caffeine data on
  site.
• Try using the Means procedure to obtain
  descriptive statistics of the Caffeine and
  Placebo distributions.
• Try producing boxplots and a bar chart.
• Introduce a Gender grouping variable into
  the data and obtain a clustered bar chart.
                                               63