Embed
Email

Reliability

Document Sample
Reliability
Shared by: HC111123165117
Categories
Tags
Stats
views:
11
posted:
11/23/2011
language:
English
pages:
27
PSY 513 – Lecture 1

Reliability

Characteristics of a psychological test or measuring procedure



1. Reliability – The extent to which a test or measuring procedure yields the same score for the

same person from one administration to the next.



2. Validity – the extent to which scores on a test correlate with some valued criterion – another

measures of the same construct, other measures of different constructs, performance on a job or

task.



3. Reading level



4. Face validity – Extent to which test appears to measure what it measures.



5. Content validity – Extent to which test content corresponds to content of what it is designed

to measure or predict.



6. Cost



What makes a good test?



A good psychological test is reliable, valid, has a reading level appropriate to the intended

population, has acceptable face and content validity, and is cheap.









PSY 513: Lecture 1: Reliability - 1 11/23/11

Scoring a psychological test

Most tests have multiple items. The test score is usually the sum or average of responses to the multiple items.



If the test is one of knowledge, the sum is of the number of correct responses.

If the test is a personality test, the sum or average is of numerically coded responses, e.g., 1s, 2s, . . . 5s.



Sometimes subtest scores are computed and the overall score will be the sum of scores on subtests.



Occasionally, the overall score will be the result of performance on some task, such as holding a stylus on a

revolving disk, as in the Pursuit Rotor task or moving pegs from holes in one board to holes in another, as in

the Pegboard dexterity task.



Invariably the result of the “measurement” of a characteristic using a psychological test is a number – the

person’s score on that test, just as the result of measurement of weight is a number – the score on the face of the

bathroom scale.



Reliability

Working Definition: The extent to which a test or measuring procedure yields the same score for the same

person from one administration to the next in instances when the person has not changed from one time to the

next.



Consider the following hypothetical measurements of IQ



Highly Reliable Test Test with Low Reliability

IQ at Time 1 IQ at Time 2 Person IQ at Time 1 IQ at Time 2

112 111 1 112 105

140 141 2 140 128

85 86 3 85 92

106 108 4 106 100

108 107 5 108 116

95 93 6 95 105

117 118 7 117 110

120 121 8 120 126

135 134 9 135 130



High reliability: Persons' scores will be about the same from measurement to measurement.



Low reliability: Persons' scores will be different from measurement to measurement.



Note that there is no claim that these IQ scores are the “correct” values for Persons 1-9. That is, this is not

about whether or not they are valid or accurate measures. It’s just about whether whatever measures we have

are the same from one time to the next.



Why do we care about reliability?

Later, although think about your bathroom scale and the number it gives you from day to day. What would you

prefer – a number that varied considerably from day to day or a number that, assuming you haven’t changed,

was about the same from day to day.

PSY 513: Lecture 1: Reliability - 2 11/23/11

Classical Test Theory: Model of an observed score

Key concepts: (Concepts not actually observable are dimmed.)



Observed score. The score of a person on the measuring instrument.



True score. The actual amount of the characteristic possessed by an individual.

It is assumed to be unchanged from measurement to measurement (within reason).



Error of measurement. An addition to or subtraction from the true score which is random and unique to

the person and time of measurement.



In Classical Test Theory, the observed score is the sum of true score and the error of measurement.



Symbolically: Observed Score = True Score + Error of Measurement.



Xj = T + Ej where j represents the measurement time.



Note that T is not subscripted because it is assumed to be constant across times of measurement.



It is assumed that if there were no error of measurement the observed score would equal the true score. But,

typically error of measurement causes the observed score to be different from the true score.



So, For a person, Observed Score at time 1 = True Score + Measurement Error at time 1.

Observed Score at time 2 = True Score + Measurement Error at time 2.



Note again that the true score is assumed to remain constant across measurements.









PSY 513: Lecture 1: Reliability - 3 11/23/11

Conceptualizing reliability

Two possibilities, both requiring measurement at two points in time.



1. Conceptualizing reliability as differences between scores from one time to another.

This is the conceptualization that follows naturally from the Classical Test Theory notions above.



Consider just the absolute differences between measures.

Highly Reliable Test with Low Reliability

Person

IQ at Time 1 IQ at Time 2 Difference IQ at Time 1 IQ at Time 2 Difference

1 112 111 1 112 109 2

2 140 140 0 140 128 12

3 85 86 -1 85 92 -7

4 106 108 -2 106 100 6

5 108 107 1 108 116 -8

6 95 93 2 95 105 -10

7 117 118 -1 117 110 7

8 120 120 0 120 123 -2

9 135 135 0 135 130 5



The distributions of differences





-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 10 12









-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 10 12



A measure of variability of the differences could be used as a summary of reliability.

One such measure is Standard Error of Measurement, abbreviated SEM.

The SEM is the standard deviation of difference scores obtained from two applications of the same test.

The smaller the SEM, the more reliable the test.

Advantages

1) This conceptualization naturally stems from the Classical Test Theory framework – it is the

variability of the Es in the Xi = T + Ei formulation.

2) So it’s easy to understand.



Problems: 1) It's a golf score, smaller is better. Some nongolfers have trouble with such measures.

2) The SEM depends on the response scale. Tests with a 1-7 scale will have larger SEMs than tests

that use a 1-5 scale, even though the test items might be identical.

3) It requires that the test be given twice, with no memory of the first test when participants take the

2nd test, a situation that’s hard to create.



It is useful however, to assess how much one could expect a person’s score to vary from one time to another.



For example: You miss the cutoff for a program by 10 points. If the SEM is 40, then you have a good chance

of exceeding the cutoff next time you take the test. If the SEM is 2, then your chances of exceeding the cutoff

by taking the test again are much smaller.

PSY 513: Lecture 1: Reliability - 4 11/23/11

2. Conceptualizing reliability as the correlation between measurements at two time periods.



This conceptualization is based on the fact that if the differences in values of scores on two successive

measurements are small, than the correlation between those two sets of scores will be high and positive.



150





140





130

Score

at 120

Time

Correlation between the

2 110 two administrations of a

highly reliable test.

100

HIGHREL1









90





80

80 90 100 110 120 130 140 150



HIGHREL2 Score at Time 1

150





140





130

Score

at 120

Correlation between the

Time two administrations of a

2 110

test with low reliability.

100

LOWREL1









90





80

90 100 110 120 130 140



LOWREL2 Score at Time 1



If the measurements are identical from time 1 to time 2, r = 1.



If there is no correspondence between measures at the two time periods, r = 0.



Advantages of using the correlation between two administrations as a measure of reliability -



1) It’s a bowling score – bigger r means higher reliability.

2) It is relatively independent of response scale – items scores on a 1-5 scale are about as reliable as the

same items scored on a 1-7 scale.

3) The correlation is a standardized measure ranging from 0 to 1, so it’s easy to conceptualize reliability

in an absolute sense – Close to 1 is good; close to 0 is bad..



Disadvantages 1) Relationship to Classical Test Theory requires some thought.

2) Assessment requires two administrations.



Conclusion



Most common measures of reliability are based on the conception of reliability as the correlation between

successive measures.

PSY 513: Lecture 1: Reliability - 5 11/23/11

Definition of reliability

The reliability of a test is the correlation between the population of values of the test at time 1 and the

population of values at time 2 assuming constant true scores and no carryover between the two

measurements.





Symbolized as Population rXX' or simply as rXX' This is pronounced “r sub X, X-prime”.



Issues associated with the definition



The definition of reliability refers to a situation that most likely is not realizable in practice.



1) If the population is large, vague, or infinite, then it will be impossible to access all the

members of the population.



2) The assumption of no carry-over from Time 1 to Time 2 is very difficult to realize in

practice, since people remember how they performed or responded on tests. For this reason, it is

usually (though not always) not feasible in practice to test people twice to measure reliability.



The bottom line is that the true reliability of a test is a quantity that we’ll never actually know. What we will

know is one or more estimates of reliability.



You’ll hear people speak about “the reliability of the test”. You should remember that they should say, “the

estimate of the reliability of the test”.



As an aside, recent research has emphasized that persons determine the reliability estimates.

Persons who are inconsistent will yield lower estimates of reliability than those who are consistent.



I’ll use the phrase “true reliability” or “population reliability” to refer to the population value.

I’ll try to remember to use “estimate of reliability” when referring to one of the estimates.



Some facts about reliability from Classical Test Theory

1. Variance of Observed scores = Variance of True scores + Variance of Errors of Measurement



σ2X = σ2T + σ2E

2. True reliability = Variance of True scores / Variance of Observed scores.



rXX' = σ2T / σ2X

Neither of these is of particular use in practice, though. They’re presented here for completeness.









PSY 513: Lecture 1: Reliability - 6 11/23/11

Estimates of Reliability

As said above, we never know the true reliability of a test. So we have to get by with estimates of reliability.



Test-retest estimate

Operational Definition



1. Give the test to a normative group.

2. Minimize memory/carryover from the first administration.

3. Give the test again to the same people.

4. Compute the correlation between scores on the two administrations.



Most straightforward – fits nicely with the conceptual definition of true reliability



Disadvantages



Requires two administrations of the test – more time.



May be inflated by memory/carryover from the first administration to the second





Advantages



Has good “face” validity.



For performance tests, the test-retest method may be the only feasible method.



For single-item scores, may be the only feasible method.









PSY 513: Lecture 1: Reliability - 7 11/23/11

Parallel Forms estimate

Operational Definition



1. Develop two equivalent forms of the test. Should have same mean and variance.

2. Give both forms to the normative group.

3. Compute the correlation between paired scores.



2nd administration of the test as opposed to. administration of an equivalent form of the test.



Note that this definition has introduced a new notion – the notion that an equivalent form can “stand in” for the

original test when computing the correlation that is the estimate of reliability.



If we give the same test twice, we can be reasonably sure it’s the same test on the second administration as it

was on the first.



But giving an equivalent form of the test requires a leap of faith – that the 2nd form is interchangeable with the

original form on that second administration.



The notion of alternative measures being used instead of repeated administration of the same measure has

implications for other estimates of reliability to be considered shortly.



The key to the success of the parallel forms method is that the two forms be equivalent. Equal means and

variances are a primary way of insuring that equivalence.



Advantages



Don’t have to worry about memory/carryover between two administrations.



Having two forms that can be used interchangeably may be useful in practice.



Disadvantages



Takes more time to develop two forms than it does one.



It may not be possible to develop alternative, equivalent forms.



A low reliability estimate, i.e., low r between forms, has two interpretations

1. Low reliability.

2. Forms are not equivalent.









PSY 513: Lecture 1: Reliability - 8 11/23/11

Split-half estimate

“Halving your test and using it two.”



Operational Definition



1. Identify two equivalent halves of the test.

2. Give the test once.

3. Score the halves separately, so that you have two scores for each person – score on 1st half and on 2nd half.

4. Compute the correlation between the 1st Half and 2nd Half scores. Call that correlation rH1,H2.

5. Plug the correlation into the following Spearman-Brown Prophecy Formula



2 * rH1,H2

Split-half reliability estimate = -------------------------------

1 + rH1,H2



Note that the higher the correlation between the two halves, the larger the estimated reliability.



An Internal Consistency Estimate



The split-half method is the simplest example of what are called internal consistency estimates of reliability.



The estimate relies on the consistency (correlation) of the two halves, both of which are internal to the test.



The greater the consistency – correlation - of the two halves, the higher the reliability.



Advantages



1. It allows you to estimate reliability in a single setting.

2. Very computerizable. The program that scores the whole test can be program to score the two halves and

compute a reliability estimate at the same time.



Disadvantages



1. Requires equivalent halves. This may be hard to achieve.

2. A low reliability estimate may be the result of either 1) low reliability or 2) nonequivalence of the halves.

3. Test may not be splittable.

4. Different halving techniques give different estimates of reliability.









PSY 513: Lecture 1: Reliability - 9 11/23/11

Cronbach’s coefficient alpha estimate

Coefficient alpha takes the notion introduced by the split-half technique to its logical conclusion.



Logic



The split-half uses the consistency of two halves to estimate the reliability of the whole – the sum of the two

halves.



But it’s surely the case that the particular halves chosen will affect the estimate of reliability. Some will lead to

lower estimates. Other possible halves might lead to larger estimates of reliability.



So, the logic goes, why not look at all possible halves; compute a reliability estimate for each possible split;

then average all those reliability estimates.



Coefficient alpha essentially does this, although it is not directly based on halving the test.



Instead, alpha is based on splitting the test into as many pieces as you can, usually into as many items as there

are on the test, and computing the correlation between them.



Operational Definition of Standardized Alpha..



1. Identify as many equivalent pieces of the test as possible. Let K be the number of pieces identified.

2. Compute the correlations between all possible pairs of pieces.

3. Compute the mean (arithmetic average) of the correlations. Call it r-bar. (r for correlation; bar for mean)

4. Plug K and r-bar into the following formula



K * r-bar

Standardized alpha = α = ----------------------------------------------

1 + (K-1) * r-bar



Relationship to split-half reliability



Coefficient alpha is simply an extension of split-half reliability to more than two pieces.



Note that if K = 2, then there is only one correlation – the correlation between the two halves.

So r-bar is simply rH1,H2.



And the formula for alpha reduces to 2*rH1,H2 / (1 + rH1,H2). This is the split-half formula.



“Regular” alpha vs. Standardized alpha



There is another formula, based on variances of the pieces and covariances between them that is typically

computed and reported. If you see alpha reported, it will likely be the variance-based version.



I presented the standardized version here, because 1) it’s formula is easier to follow than the variance-based

formula and 2) its value is typically within .02 of the variance-based formula.



SPSS reports both.





PSY 513: Lecture 1: Reliability - 10 11/23/11

Hand Computation Of

Standardized Coefficient Alpha

Suppose a measure of job satisfaction has four items.

Q1: I'M HAPPY ON MY JOB.

Q2: I LOOK FORWARD TO GOING TO WORK EACH DAY.

Q3: I HAVE FRIENDLY RELATIONSHIPS WITH MY COWORKERS.

Q4: MY JOB PAYS WELL.



Suppose I gave this "job satisfaction" instrument to a group of 100 employees. Each person responded with

extent of agreement to each item on a scale of 1 to 5. Total score, i.e., observed amount of job satisfaction, is

either the sum of the responses to the four items or the mean of the four items



The data matrix might look like the following:

Two different

Expressions of Scale scores

PERSON Q1 Q2 Q3 Q4 TOTAL MEAN

1 3 4 3 3 13 3.25

2 5 4 5 5 19 4.75

3 1 2 1 1 5 1.25

4 3 2 3 3 11 2.25

5 4 5 4 3 16 4.00

6 4 4 3 2 13 3.25

etc etc etc etc etc etc



Suppose the correlations between the items were as follows:

Q1 Q2 Q3 Q4

Q1 1 Obviously, each item correlated perfectly with itself,

Q2 .4 1 so the 1's on the diagonal will not be used in

Q3 .5 .4 1

Q4 .3 .4 .5 1

computation of alpha.



The average of the interitem correlations is

r-bar = (.4 + .5 + .3 + .4 + .4 + .5) / 6 = 2.5 / 6 = .417



Standardized Coefficient alpha is



No. items * r-bar 4 * .417 1.668 1.668

Alpha = ---------------------------- = ------------------------ = ------------ --------- = .74

1+(No.items-1)*r-bar 1 + (4-1)*.417 1 + 1.251 2.251



Notes:



1. Alpha is merely a re-expression of the correlations between the items. The more highly the items are

intercorrelated, the larger the value of alpha.



2. Alpha can be increased by adding items, as long as the average of the interitem correlations does not

decrease. So any test can be made more reliable by adding relevant items - items which correlate with the other

items.



3. Just as was the split-half reliability estimate, alpha depends on the consistency (correlations) of the pieces of

the test, all of which are internal to (part of) the test.

PSY 513: Lecture 1: Reliability - 11 11/23/11

The SPSS RELIABILITY PROCEDURE



Example Data: Items of a Job Satisfaction Scale. 60 respondents. 1=Dissatisfied; 7=Satisfied.

Q27 Q32 Q35 Q37 Q43 Q45 Q50 OVSAT



1.00 5.00 2.00 2.00 1.00 1.00 2.00 2.00

1.00 7.00 6.00 4.00 6.00 2.00 6.00 4.57

7.00 7.00 1.00 7.00 7.00 6.00 7.00 6.00

4.00 6.00 6.00 6.00 6.00 6.00 6.00 5.71

1.00 6.00 5.00 2.00 1.00 1.00 3.00 2.71

3.00 3.00 7.00 6.00 7.00 1.00 6.00 4.71

6.00 7.00 7.00 6.00 6.00 6.00 7.00 6.43

2.00 7.00 3.00 3.00 3.00 1.00 3.00 3.14

6.00 6.00 7.00 6.00 6.00 6.00 6.00 6.14

4.00 6.00 5.00 4.00 4.00 3.00 3.00 4.14

1.00 3.00 6.00 5.00 5.00 6.00 5.00 4.43

1.00 5.00 1.00 1.00 1.00 1.00 1.00 1.57

1.00 5.00 1.00 1.00 1.00 5.00 1.00 2.14

1.00 7.00 2.00 2.00 3.00 3.00 3.00 3.00

7.00 7.00 6.00 7.00 7.00 7.00 7.00 6.86

6.00 4.00 4.00 7.00 6.00 7.00 7.00 5.86

7.00 7.00 7.00 7.00 5.00 7.00 7.00 6.71

7.00 7.00 4.00 4.00 7.00 7.00 6.00 6.00

6.00 5.00 7.00 7.00 6.00 5.00 6.00 6.00

7.00 5.00 5.00 6.00 6.00 2.00 9.00 5.17

3.00 6.00 6.00 3.00 5.00 5.00 5.00 4.71

3.00 7.00 6.00 7.00 4.00 3.00 7.00 5.29

6.00 6.00 7.00 7.00 7.00 6.00 7.00 6.57

3.00 7.00 7.00 7.00 6.00 1.00 7.00 5.43

5.00 7.00 6.00 6.00 7.00 6.00 6.00 6.14

4.00 6.00 6.00 6.00 6.00 3.00 6.00 5.29

5.00 5.00 6.00 5.00 5.00 1.00 5.00 4.57

3.00 6.00 2.00 5.00 6.00 6.00 5.00 4.71

4.00 4.00 2.00 3.00 3.00 2.00 2.00 2.86

7.00 7.00 7.00 7.00 7.00 7.00 7.00 7.00

5.00 6.00 6.00 4.00 7.00 6.00 6.00 5.71

7.00 6.00 4.00 7.00 7.00 5.00 7.00 6.14

4.00 5.00 4.00 5.00 7.00 5.00 7.00 5.29

3.00 7.00 7.00 7.00 6.00 6.00 7.00 6.14

6.00 6.00 6.00 6.00 5.00 5.00 6.00 5.71

4.00 5.00 7.00 4.00 6.00 4.00 7.00 5.29

7.00 7.00 6.00 7.00 7.00 6.00 7.00 6.71

6.00 5.00 2.00 7.00 6.00 6.00 7.00 5.57

3.00 6.00 7.00 5.00 3.00 7.00 6.00 5.29

6.00 6.00 7.00 7.00 6.00 6.00 7.00 6.43

6.00 4.00 5.00 7.00 6.00 6.00 6.00 5.71

4.00 4.00 4.00 6.00 4.00 1.00 2.00 3.57

5.00 5.00 6.00 6.00 7.00 5.00 6.00 5.71

4.00 6.00 6.00 6.00 6.00 6.00 6.00 5.71

5.00 6.00 6.00 6.00 6.00 6.00 6.00 5.86

2.00 2.00 2.00 2.00 2.00 2.00 3.00 2.14

5.00 6.00 6.00 5.00 5.00 6.00 6.00 5.57

2.00 6.00 6.00 5.00 3.00 5.00 6.00 4.71

5.00 6.00 2.00 5.00 5.00 6.00 4.00 4.71

5.00 6.00 7.00 6.00 6.00 7.00 7.00 6.29

1.00 6.00 6.00 2.00 5.00 1.00 5.00 3.71

5.00 6.00 7.00 6.00 6.00 3.00 7.00 5.71

6.00 6.00 6.00 6.00 6.00 6.00 6.00 6.00

7.00 7.00 7.00 7.00 7.00 6.00 7.00 6.86

7.00 1.00 6.00 7.00 3.00 5.00 6.00 5.00

4.00 6.00 6.00 5.00 5.00 6.00 6.00 5.43

1.00 5.00 5.00 5.00 1.00 2.00 5.00 3.43

1.00 6.00 5.00 3.00 5.00 5.00 3.00 4.00

7.00 7.00 7.00 7.00 7.00 5.00 7.00 6.71

4.00 6.00 7.00 7.00 7.00 5.00 7.00 6.14









PSY 513: Lecture 1: Reliability - 12 11/23/11

Analyze -> Scale -> Reliability Analysis …









PSY 513: Lecture 1: Reliability - 13 11/23/11

Alpha and standardized

alpha should be

approximately equal.







All items should have

approximately equal

standard deviations. Item 32

is suspect here. In general,

items with small standard

deviations will tend to

suppress reliability.





Look for items with

small or negative

correlations with the

other items. They'll be

the most likely

candidates for exclusion

from the scale. Item 32’s

correlations have been

highlighted.









PSY 513: Lecture 1: Reliability - 14 11/23/11

Use this column to

identify items

whose removal

would result in an

increase in scale

reliability, such as

Item 32.









I’ve reproduced the display of alpha for the whole scale to make it easier to use the values in the rightmost

column above.









PSY 513: Lecture 1: Reliability - 15 11/23/11

Reliability Example

Tests with Right/Wrong Answers

The example below illustrates how reliability analysis would be performed on a multiple choice test in which there was a right and

wrong answer to each item.



I chose to enter the raw responses to the items into SPSS from within a Syntax Window.



The DATA LIST command tells SPSS the names of the variables (q1, q2, . . ., q36) and where each is located within a line (columns

1-36). For this example, q36 was an essay question and was not included in the reliability analysis.



The values represent responses marked by test takers as follows:



1=a 2=b 3=c 4=d 9=no answer provided.



DATA LIST /q1 to q36 1-36.

BEGIN DATA.

333331112113322114241221114421423122

311333432212311114341422321112133224

323333431212311424242411331222413225

333333441212321421242411322921423223

313333441232311121241411324121423225

323333141212311121242412321221423225

111411431212213434342421111222433225

211321413212333314142413224121443124

333311112313122412341411222122423221

332333133212311414142411224222423220

323333431212311414242441321221433223

213332131212311134142211221221433225

313333441312322114122411214222333325

323333431212314121242411324221422223

331332412212312114241111311422413221

333133413232311124142214131121433224

313333431212311414242411221121423225

313321432313341324342431311221433224

312321431333212424232111223221433223

323332131213311414242411321421433224

323333441212311124242411321221423225

313333431212311123242411321221421225

331333431212311121242411214312423293

113333412313313121242241221321423324

END DATA.









PSY 513: Lecture 1: Reliability - 16 11/23/11

The following commands "score" each response and put the score for each question into a new variable.



RECODE q1 (3=1) (ELSE=0) INTO q1score.

RECODE q2 (2=1) (ELSE=0) INTO q2score.

RECODE q3 (3=1) (ELSE=0) INTO q3score.

RECODE q4 (3=1) (ELSE=0) INTO q4score.

RECODE q5 (3=1) (ELSE=0) INTO q5score.

RECODE q6 (3=1) (ELSE=0) INTO q6score.

RECODE q7 (4=1) (ELSE=0) INTO q7score. This question

had two correct

RECODE q8 (3=1) (ELSE=0) INTO q8score. answers.

RECODE q9 (1=1) (ELSE=0) INTO q9score.

RECODE q10 (2,3=1) (ELSE=0) into q10score.

RECODE q11 (1=1) (ELSE=0) INTO q11score.

RECODE q12 (2=1) (ELSE=0) INTO q12score.

RECODE q13 (3=1) (ELSE=0) INTO q13score.

RECODE q14 (1=1) (ELSE=0) INTO q14score.

RECODE q15 (1=1) (ELSE=0) INTO q15score.

RECODE q16 (1=1) (ELSE=0) INTO q16score.

RECODE q17 (2=1) (ELSE=0) INTO q17score.

RECODE q18 (1=1) (ELSE=0) INTO q18score.

RECODE q19 (2=1) (ELSE=0) INTO q19score.

RECODE q20 (4=1) (ELSE=0) INTO q20score.

RECODE q21 (2=1) (ELSE=0) INTO q21score. This question

RECODE q22 (4=1) (ELSE=0) INTO q22score. had two correct

RECODE q23 (1=1) (ELSE=0) INTO q23score. answers.

RECODE q24 (1=1) (ELSE=0) INTO q24score.

RECODE q25 (2,3=1) (ELSE=0) INTO q25score.

RECODE q26 (2=1) (ELSE=0) INTO q26score.

RECODE q27 (1=1) (ELSE=0) INTO q27score.

RECODE q28 (2=1) (ELSE=0) INTO q28score.

RECODE q29 (2=1) (ELSE=0) INTO q29score.

RECODE q30 (1=1) (ELSE=0) INTO q30score.

RECODE q31 (4=1) (ELSE=0) INTO q31score.

RECODE q32 (2=1) (ELSE=0) INTO q32score.

RECODE q33 (3=1) (ELSE=0) INTO q33score.

RECODE q34 (2=1) (ELSE=0) INTO q34score.

RECODE q35 (2=1) (ELSE=0) INTO q35score.









PSY 513: Lecture 1: Reliability - 17 11/23/11

These are the

The following is a list of the newly created "score" variables. variable names.



Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q

Q Q Q Q Q Q Q Q Q 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E TOTSCORE



1 0 1 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 16

1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 0 1 1 1 21

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 30

1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 29

1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 29

1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 32

0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 1 1 1 18

0 0 0 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 0 1 17

1 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 1 1 1 1 1 17

1 0 0 1 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 25

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 30

0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 26

1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 21

1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 32

1 0 0 1 1 0 1 0 0 1 1 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 1 21

1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 22

1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 30

1 0 1 1 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 23

1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 21

1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 27

1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 33

1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 32

1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 27

0 0 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 0 1 25







The RELIABILITY procedure was invoked with the following syntax command. Obviously, it can also be

invoked from a pull down menu:



Note that the variables which are assessed are the 1/0 "score" variables, not the original responses.



RELIABILITY

/VARIABLES=q1score q2score q3score q4score q5score q6score q7score q8score

q9score q10score q11score q12score q13score q14score q15score q16score

q17score q18score q19score q20score q21score q22score q23score q24score

q25score q26score q27score q28score q29score q30score q31score q32score

q33score q34score q35score

/FORMAT=NOLABELS

/SCALE(ALPHA)=ALL/MODEL=ALPHA The syntax invoking the

/STATISTICS=DESCRIPTIVE SCALE RELIABILITY procedure.

/SUMMARY=TOTAL CORR .









PSY 513: Lecture 1: Reliability - 18 11/23/11

Reliability output from a previous version of SPSS.

****** Method 2 (covariance matrix) will be used for this analysis ******

_

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)





Mean Std Dev Cases



1. Q1SCORE .8333 .3807 24.0

2. Q2SCORE .2500 .4423 24.0

3. Q3SCORE .7083 .4643 24.0

4. Q4SCORE .9167 .2823 24.0

5. Q5SCORE .7917 .4149 24.0

6. Q6SCORE .6250 .4945 24.0

7. Q7SCORE .7500 .4423 24.0

8. Q8SCORE .5417 .5090 24.0

9. Q9SCORE .6250 .4945 24.0

10. Q10SCORE .9583 .2041 24.0

11. Q11SCORE .8750 .3378 24.0

12. Q12SCORE .7500 .4423 24.0

13. Q13SCORE .8750 .3378 24.0

14. Q14SCORE .7500 .4423 24.0

15. Q15SCORE .6250 .4945 24.0

16. Q16SCORE .5417 .5090 24.0

17. Q17SCORE .5000 .5108 24.0

18. Q18SCORE .2500 .4423 24.0

19. Q19SCORE .6250 .4945 24.0

20. Q20SCORE .9167 .2823 24.0

21. Q21SCORE .7917 .4149 24.0

22. Q22SCORE .7500 .4423 24.0

23. Q23SCORE .7500 .4423 24.0

24. Q24SCORE .8333 .3807 24.0

25. Q25SCORE .8750 .3378 24.0

26. Q26SCORE .6667 .4815 24.0

27. Q27SCORE .5833 .5036 24.0

28. Q28SCORE .5000 .5108 24.0

29. Q29SCORE .9167 .2823 24.0

30. Q30SCORE .6667 .4815 24.0

31. Q31SCORE .9167 .2823 24.0

32. Q32SCORE .5000 .5108 24.0

33. Q33SCORE .9167 .2823 24.0

34. Q34SCORE .8333 .3807 24.0

35. Q35SCORE .9583 .2041 24.0 This message will be printed

* * * Warning * * * Determinant of matrix is zero

whenever the number of

variables exceed the number of

Statistics based on inverse matrix for scale ALPHA persons. Alpha is not affected.

are meaningless and printed as .

_









PSY 513: Lecture 1: Reliability - 19 11/23/11

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)



N of Cases = 24.0



N of

Statistics for Mean Variance Std Dev Variables

Scale 25.1667 28.7536 5.3622 35



Inter-item

Correlations Mean Minimum Maximum Range Max/Min Variance

.0972 -.3780 .7977 1.1757 -2.1106 .0454

_







R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)





Item-total Statistics



Scale Scale Corrected

Mean Variance Item- Squared Alpha

if Item if Item Total Multiple if Item

Deleted Deleted Correlation Correlation Deleted



Q1SCORE 24.3333 27.6232 .2463 . .8050

Q2SCORE 24.9167 26.0797 .5486 . .7937

Q3SCORE 24.4583 26.6938 .3844 . .7999

Q4SCORE 24.2500 27.9348 .2477 . .8051

Q5SCORE 24.3750 26.3315 .5285 . .7950

Q6SCORE 24.5417 25.4764 .6075 . .7901

Q7SCORE 24.4167 28.2536 .0647 . .8119

Q8SCORE 24.6250 27.7228 .1440 . .8101

Q9SCORE 24.5417 25.5634 .5890 . .7909

Q10SCORE 24.2083 27.9982 .3304 . .8042

Q11SCORE 24.2917 28.5634 .0211 . .8113

Q12SCORE 24.4167 27.0362 .3308 . .8021

Q13SCORE 24.2917 27.1721 .4166 . .8001

Q14SCORE 24.4167 26.5145 .4486 . .7976

Q15SCORE 24.5417 25.6504 .5707 . .7917

Q16SCORE 24.6250 28.1576 .0624 . .8135

Q17SCORE 24.6667 26.1449 .4495 . .7969

Q18SCORE 24.9167 26.9493 .3503 . .8013

Q19SCORE 24.5417 25.8243 .5342 . .7933

Q20SCORE 24.2500 28.1087 .1888 . .8065

Q21SCORE 24.3750 27.0272 .3604 . .8011

Q22SCORE 24.4167 27.2101 .2921 . .8035

Q23SCORE 24.4167 27.3841 .2536 . .8050

Q24SCORE 24.3333 28.1449 .1148 . .8092

Q25SCORE 24.2917 27.1721 .4166 . .8001

Q26SCORE 24.5000 26.9565 .3130 . .8028

Q27SCORE 24.5833 27.4710 .1949 . .8079

Q28SCORE 24.6667 27.1884 .2449 . .8058

Q29SCORE 24.2500 28.6304 .0144 . .8106

Q30SCORE 24.5000 27.1304 .2774 . .8042

Q31SCORE 24.2500 28.1087 .1888 . .8065

Q32SCORE 24.6667 26.8406 .3122 . .8029

Q33SCORE 24.2500 30.0217 -.4356 . .8208

Q34SCORE 24.3333 27.0145 .4028 . .7999

Q35SCORE 24.2083 28.9547 -.1105 . .8117

_



R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)



Reliability Coefficients 35 items



Alpha = .8080 Standardized item alpha = .7902









PSY 513: Lecture 1: Reliability - 20 11/23/11

The logic behind coefficient alpha



Coefficient alpha is based on the premise that originated with the use of the Spearman-Brown

split half estimate: If different parts of a test correlate highly with each other, then that means

they would be likely to correlate higher with themselves



Factors affecting estimates of reliability

There are at least three major factors that will affect the relationship of a reliability estimate to the true

reliability of the test in the population in which the test will be used.



Let’s call the sample of persons upon whom the reliability estimate is based the reliability sample.



1. Variability of the reliability sample relative to variability of the population in which the instrument will

be used.



If the reliability sample is too homogeneous, the reliability estimate will be too small.



On the assumption that you want to report as high a reliability coefficient as possible, this suggests that

you should make the sample from whom you obtain the estimate of reliability as heterogeneous as

possible.



2. Errors of measurement specific to the reliability sample.



Suppose the test requires the reading level of a college graduate, but you include a variety of persons,

including some persons not in college in the reliability sample.



This means that some of the people won’t understand some of the items and will guess.



Guessing is represented in Classical Test Theory by large errors of measurement.



So test characteristics such as reading level, poor wording that cause large errors of measurement reduce

reliability and estimates of reliability.



3. Consistency of the people making up the reliability sample.



The specific people making up the sample may contribute to the errors of measurement referred to in 2

above.



Some people are more careless (?) inconsistent (?) than others. If the reliability sample is composed of a

bunch of careless respondents, the reliability estimates will be smaller than if the reliability sample were

composed of consistent responders.



We split a sample into two groups based on the variability of their responses to items within the same

Big Five dimension. Here are the reliability values for the two groups . . .



Group Extraversion Agreeableness Conscientiousness Stability Intellect

Consistent .92 .83 .84 .90 .85

Inconsistent .85 .69 .79 .83 .76



PSY 513: Lecture 1: Reliability - 21 11/23/11

The Reliability Ceiling: Why be concerned with reliability?

Goal of research: To find relationships (correlations) between independent and dependent variables.



If we find significant correlations, our work is lauded, published, rewarded.



If we don’t find significant correlations, our work is round-filed.



So, most of the time we want large correlations between the measures we administer.



Of all the tests out there, with which test will your test correlate most highly?



The answer is that a test will correlate most highly with itself.



And reliability is the extent to which a test correlates with itself.



So if reliability is low, that means that a test doesn’t even correlate highly with itself.



That being the case, how could we expect it to correlate highly with any other test?



And the answer is: we couldn’t. If a test doesn’t correlate highly with itself, it won’t correlate highly with any

other test.



The fact that the reliability of a test limits its ability to correlate with other tests is called the reliability ceiling

associated with a test.



Reliability Ceiling Formula.

Suppose X is the independent variable and Y is the dependent variable in the relationship being tests.



Let rXX’ and rYY’ be the true reliability of X and Y respectively.



Let rtX,tY be the correlation between the True scores on the X dimension and True scores on the Y dimension.



Then rXY < = rtX,tY * sqrt(rXX’*rYY’)

The correlation between observed X and Y scores will be less than the correlation between True X and True Y

by a factor that is the square root of the products of the two reliabilities. Unless reliabilities are 1, this means

that the observed correlation will always be less than the true correlation.









PSY 513: Lecture 1: Reliability - 22 11/23/11

Reasons for nonsignficant correlations between independent and dependent variables.

We’ve now covered three reasons for failure to achieve statistically significant correlations.



1. The correlation between true scores is 0, i.e., X and Y are not related to each other.



This means that our theory which predicts a relationship between X and Y is wrong.



We must revise our thinking about the X – Y relationship.



From a methodological point of view, this is the only excusable reason for a nonsignificant result.



2. Low power.



There could really be a relationship between True X and True Y, but our sample size is too small for our

statistical test to detect it.



This is inexcusable.



We should always have sufficient sample size to detect the relationship we expect to find.



3. Low reliability.



There could really be a relationship between True X and True Y, but our measures of X and Y are so

unreliable that the observed correlation is not significant.



This is inexcusable.



We should always have measures sufficiently reliable to allow us to detect the relationship we expect to

find.



The above is a good candidate for an essay test question or a couple of multiple choice questions.



Acceptable Reliability

How high should reliability be?

How tall is tall?



Some very general guidelines



Reliability Range Characterization



0 - .6 Poor

.6 - .7 Marginally Acceptable

.7 - .8 Acceptable

.8 - .9 Good

.9+ Very Good

.95+ Too good.







PSY 513: Lecture 1: Reliability - 23 11/23/11

Introduction to Path Diagrams

Symbols

Observed variables are symbolized by squares or rectangles.



103

84

Observed 121

76

Variable ...

97

81









Theoretical Constructs, also called Latent Variables are symbolized by Circles or ellipses.



Latent 106

Variable 78

/ 115

80

Theoretical ...

Construct 93

83









Correlations between variables are represented by double-headed arrows



"Correlation" "Correlation"

Arrow Arrow

Observed Observed Latent Latent

Variable Variable Variable Variable









106 104

103 101 78

84 90 79

115 114

121 128 80

76 72 79

... ...

... ... 93

97 93 92

83 81

81 80









"Causal" or "Predictive" relationships between variables are represented by single-headed arrows

"Causal" "Causal"

Arrow Arrow

Latent Observed Observed Latent

Variable Variable Variable Variable





"Causal"

Arrow "Causal"

Latent Latent Arrow Observed

Variable Observed

VariablePSY 513: Lecture 1: Reliability - 24 11/23/11

Variable

Variable

Representation of Classical Test Theory



In Equation form: Observed Scores = True Scores + Errors of Measurement Xi = T + Ei







"Causal" "Causal"

T Arrow X E

Arrow

True Scores Observed Errors of

Scores Measurement







That is, every observed score is the sum of a person's true position on the dimension of interest plus whatever

error occurred in the process of measuring. The relationship between T and O is one in which Observed score

is said to be a reflective indicator of the True amount.



In terms of the labels of the diagrams . . .







"Causal"

Error of

"Causal"

True Score Arrow Observed Measurement

Arrow

Latent Latent

Variable

Variable Variable









106 103 -3

78 84 +6

+6

115 "Causal" 121 "Causal"

80 76 -4

Arrow Arrow ...

... ...

93 97 +4

83 81 -2









PSY 513: Lecture 1: Reliability - 25 11/23/11

The Reliability Ceiling considered using Path Notation





SYMBOLICALLY, rXY ≤ rTxTy sqrt(rxx' * ryy')









Error of Error of

Measurement Measurement







Observed rXY What we

observe.







Wonderlic GPA’s







True rXY









Intelligence Academic

Ability









What we want.









PSY 513: Lecture 1: Reliability - 26 11/23/11

When reliability is high - variance due to errors of measurement is small so the observed r will be about equal

to the observed r.

Error

of Error of

Measu Measure

rement ment





Observed rXY









Observed

WPT GPA’s

r is close

Scores to true r









True rXY







Intelligence Academic

Success









When reliability is low, variance due to errors of measurement is large making the observed r smaller than the

true r.





Error

Error

of

of

Measurement

Measurement





Observed rXY









Observed

r is less

WPT than true r GPA’s

Scores





True rXY









Intelligence Academic

Success









PSY 513: Lecture 1: Reliability - 27 11/23/11


Related docs
Other docs by HC111123165117
MEMBANGUN TIM (TEAM BUILDING)
Views: 13  |  Downloads: 0
WELCOME TO ENGR 111/112
Views: 2  |  Downloads: 0
MEDIATION POLICIES AND PROCEDURES
Views: 0  |  Downloads: 0
July 20, 2006
Views: 5  |  Downloads: 0
Dati generali
Views: 9  |  Downloads: 0
Calaveras County:
Views: 2  |  Downloads: 0
Sheet1
Views: 0  |  Downloads: 0
Slide 1 - LCE/ESALQ/USP
Views: 11  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!