# Methodology

Document Sample

```					Multiple-choice example

1
Solution
• The mean and
median would very
RARELY have
EXACTLY the same
value.
• The median is
indeed the middle
score (the 50th
percentile).
• The median is
LESS vulnerable.
• We have our

2
Example

3
Solution

• A grouping variable is a
set of CODE NUMBERS,
not real scores.
• Yes.
• It’s absolutely essential.

4
Last week
1. How to enter data into SPSS.
2. How to obtain various displays and
graphs of the Caffeine data.

5
The SPSS Data Editor
Works in two display modes:

1. VARIABLE VIEW, in which you enter
information about the variables in your
data set;
2. DATA VIEW, in which you enter your
data.

6
Entering data into SPSS
•   When entering data into SPSS, begin in
Variable View.
•   Good work in Variable View confers
benefits both at the stage of data entry
and when you are viewing the output.

7
Variable View completed
Actually, value labels

• Variable View controls the appearance of Data
View.
• Variable View also controls the appearance of
the output. The graphs I produced with SPSS all
• Variable View creates your WORKING
ENVIRONMENT.
8
Between subjects experiments
• In the caffeine experiment, each of the
participants in an experiment is tested
under only ONE of the conditions making
up the independent variable.
• In this experiment, the conditions making
up the independent variable are said to
vary BETWEEN SUBJECTS, and the
experiment is said to be of BETWEEN
SUBJECTS design.

9
Within subjects experiments

• In WITHIN SUBJECTS experiments, each
participant is tested under ALL conditions.
• The Stroop experiment, which I described
in the first lecture, was of within subjects
design: each participant performed under
all three conditions.

10
Grouping variables
• When entering data from a between subjects
experiment, you need a GROUPING VARIABLE.
• A GROUPING VARIABLE is a column of code
numbers or VALUES, each number representing
the condition under which a score was achieved.
• We can let 1 = ‘Placebo’ and 2 = ‘Caffeine’,
where 1 and 2 are VALUES and ‘Placebo’ and
‘Caffeine’ are VALUE LABELS.

11
Part of Data View
• In Data View, all the
scores from the
experiment appear as
a SINGLE COLUMN
of numbers.
• The Group variable
appears as a single
column of CODE
NUMBERS.

12
Seeing the value labels
• To see the value
labels in Data View
values),click Value
Labels in the View
• Seeing the value
labels helps you avoid
transcription errors
when inputting data.
13
Graphs and displays

We looked at two kinds of graphs:
1.those that are pictures of
DISTRIBUTIONS;
2.those that SUMMARISE DISTRIBUTIONS
by picturing the statistics.

14
Histograms and
Stem-&-leaf displays

• Histograms and stem-and-leaf displays
are pictures of DISTRIBUTIONS.

15
A histogram

16
Stem-and-leaf display

17
Bar charts and box plots

• BAR CHARTS and BOX PLOTS are
pictures of the STATISTICS of
distributions.

18
Bar chart (with error bars)

standard
deviations

means

19
Types of bar chart
SPSS offers three types of bar chart:
1. Simple
2. Clustered
3. Stacked

I do not like stacked bar charts, so I shall
consider only the Simple and Clustered
types.
20
Suppose there are two grouping
• We have run another caffeine experiment.
• But this time, we want to include the
participant’s gender in the data set.
• We add a second grouping variable,
Gender, where 1 = Male and 2 = Female.
• As well as comparing the performance of
the Caffeine and Placebo groups, we also
want to compare the mean scores of the
female and male participants.
21
Two grouping variables

22
Clustered bar chart

23
Explanation
• Here the ‘category
variable’ is Experimental
Condition.
• The ‘Cluster variable’ is
Gender.
• It looks as if the males
are a little better under
the Placebo condition.
• It looks as if the females
are a little better under
the Caffeine Condition.

24
Boxplots

Extreme score

Upper
quartiles      whiskers

medians

Lower
quartiles

Outlier

25
Lecture 6

Descriptive statistics
with SPSS

26
The three most important
properties of a distribution

1.   Its typical value, AVERAGE or CENTRAL TENDENCY,
measured by the MEAN, the MEDIAN and the MODE.
2.   The SPREAD or DISPERSION of scores around the
average value, measured by the STANDARD
DEVIATION and RANGE STATISTICS such as the
SIMPLE RANGE, the INTERQUARTILE and the SEMI-
INTERQUARTILE RANGES.
3.   The SHAPE of the distribution.

27
Results of the caffeine experiment

28
Obtaining descriptive statistics
• Descriptive statistics such as the mean
and standard deviation can be obtained
through a variety of SPSS procedures.
• The MEANS procedure gives the means
and standard deviations of data in different
groups.

29
Finding the Means procedure

30
The Means procedure
The Desciptive Statistics
procedure does not allow
comparison between groups   • Our data set has a
and is inappropriate for      grouping variable. The
grouped data.                 Means option in Compare
Means is the best one to
• Note that the Descriptive
Statistics procedure is
only suitable for data sets
that have no grouping
variables.
• Descriptives would only
calculate the statistics of
ALL the scores.
31
The Means dialog box

The variables in
listed here initially.
Select an item by
clicking on it: it will
appear in inverse
video.
The central arrows
will come live and
you can use one of
them to transfer
the item to the
appropriate box.
32
The Options dialog box
click the first (to
select it) then, holding
the Control button
down, click on the
other items you want
to include.

Click the black arrow
in the central pillar to
transfer the statistics
to the right hand
panel.

Click the Continue
the Means dialog.
The SPSS Output Viewer

34
The output
• There are two tables.
• The Case Processing Summary warns you
if SPSS has failed to include any data in
the analysis.
• The Report is a table containing the

35
The Case Summary

• Clearly, SPSS found no problems with these
data.
• All the data have been included in the analysis.

36
The Report

• The Placebo and Caffeine means are 9.25 and
11.90, respectively. The corresponding medians
are 9.00 and 12.50.
• As we saw before, the simple range of values in
either distribution is 15 hits.
37
Editing SPSS output
• If you don’t like a table in the Output, you
can alter its appearance dramatically.
• You can remove information you do not
need.
• You can change the shape of the table.
• The first move is to double-click on the
image in the Viewer.

38
Changing the table

• Double-click on the item concerned, to produce

39
The Output Editor
Click on Pivot to

40

41
A more compact table

42
Leaving the Editor

• Click the File drop-
Editor) and choose
Close.
• Click anywhere in the
Viewer outside the
rectangle, which will
then disappear.

43
Some revision

44
Percentiles
• A PERCENTILE is the VALUE or SCORE
below which a specified percentage or
proportion of the distribution lies.
• The 30th percentile is the value below
which 30% of the distribution lies.
• The 70th percentile is the value below
which 70% of scores lie.

45
The 30th and 70th percentiles

(0.70)

0.30

30th
percentile

0.70

(0.30)
70th percentile   46
The median is the 50th percentile

0.50       0.50

50th percentile
(median)

47
The 2.5th and 97.5th percentiles
• The distribution of IQ has
a mean of 100 and an SD
of 15.
• 130 is 2 SD’s above the
mean.
0.95                      • Below that value lies 0.95
+ 0.025 = 0.975 or 97.5%
(95%)                        of the distribution.
2 ½ % = .025                 2 ½ % = .025   • So 130 is the 97.5th
percentile.
mean                       • Similarly the value 70 lies
2SD’s below the mean.
mean – 1.96×SD          mean +1.96×SD         Only .025 = 2.5% of
values lie below that. An
IQ of 70 is the 2.5th
percentile.           48
Populations and samples
• I measure the IQs of 20 people.
• The POPULATION is the reference set
containing the reaction times of ALL
people.
• The 20 IQs I have are merely a subset or
SAMPLE from the population.
• Suppose I calculate their mean and SD.
• I might find that M = 108.0 and SD = 16.1.
49
Statistics versus parameters

•  STATISTICS are characteristics of
SAMPLES; PARAMETERS are
characteristics of POPULATIONS.
• A normal population has two parameters:
1. the mean;
2. the standard deviation.
• The IQ population has a mean of 100
and an SD of 15.
50
The meaning of ‘error’
• The population mean and standard devation are
100 and 15, respectively; whereas the sample
mean and standard deviation are 108.0 and
16.1.
• We usually study samples in order to make
inferences about the populations from which
they have been drawn.
• But if the statistics of samples are taken as
ESTIMATES of parameters, they are likely to be
in error, to at least some extent.

51
Sampling variability
• Population characteristics are fixed.
• But the values of the corresponding
statistics wobble around the values of the
parameters.
• This ‘wobble’ is known as SAMPLING
VARIABILITY or SAMPLING ERROR.

52
Sampling variability
• Suppose I take 25 people’s IQs and
calculate the mean. It might be 95.1 .
• I do this 4000 times, calculating the value
of the mean each time.
• I command SPSS to draw a histogram of
the distribution of the MEANS of the 4000
samples.

53
Distribution of the sample mean

54
Distribution of the sample mean
• The values of the
mean range from 89.9
to 110.0 .
• SAMPLING implies
SAMPLING
VARIABILITY or
SAMPLING ERROR.

55
The laws of large numbers
• A set of theorems in probability theory.
• Their import is that if you make the sample
size large enough, the mean of the sample
will be as close as you wish to the value of
the population mean.
• So if we take the mean and standard
deviation of 4000 IQs, their values should
be close to 100 and 15, respectively.
56
Distribution of 4000 IQs

57
The 2.5th and 97.5th percentiles
• Since this is a very large sample from a
normal population, we should find that the
97.5th percentile is close to 130 (2 SDs
above the mean) and the 2.5th percentile is
close to 70 (2 SDs below the mean).
• To obtain these percentiles, we can use
SPSS’s Freqencies procedure.

58
The Frequencies procedure
• It’s in the Descriptive
• Click on
Frequencies… to
enter the Frequencies
dialog box.

59
The Frequencies dialog

Click on the Display
frequency tables box to
remove the arrow. This
avoids a huge output of
cumulative frequencies.

60
The output
• As you would expect,
the mean is close to
100.
• The mean and
median have similar
values: 99.90 to 2
places of decimals.
• The 2.5th and 97.5th
percentiles are close
to 70 and 130,
respectively.
61
Exercises
• I have made the SPSS data file for the 4000 IQs
available on my Website.
• Command SPSS to draw a histogram of the
distribution.
• Use the Descriptives procedure to find the mean
and standard deviation of the distribution.
• I suggest you try the Frequencies procedure to
obtain the 2.5th and 97.5th percentiles.

62
The caffeine data
• I have also brought the caffeine data on
site.
• Try using the Means procedure to obtain
descriptive statistics of the Caffeine and
Placebo distributions.
• Try producing boxplots and a bar chart.
• Introduce a Gender grouping variable into
the data and obtain a clustered bar chart.
63

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 2 posted: 4/14/2011 language: English pages: 63