# Studying behavior by yy8Shk

VIEWS: 6 PAGES: 33

• pg 1
```									                       Variable, validity,
reliability

Dr. Yan Liu
Department of Biomedical, Industrial and Human Factors Engineering
Wright State University
Variables
   Definition
   An event, situation, behavior, or individual characteristic that varies
   cognitive task performance, word length, intelligence, etc.
   Types of Variables by Measurement Scales
   Nominal variables
   Values are unordered labels
   color, name
   Ordinal variables
   Possible levels are ordered in sequence)
   ranking of preference, satisfaction
   Interval variables
   Derive gaps between values; no true zero
   time between flight departure and arrival, temperature (in the unit of C or F)
   Ratio variables
   Values are real numbers with true zeros
   size, weight
2
Operational Definitions of Variables
   Operational Definition of Variable
   Definition of the variable in terms of the operations or techniques the researcher
uses to measure or manipulate it
   Why Operational Definitions
   Some variables must be operationally defined so they can be studied empirically
   “cognitive task performance” can be defined as the number of errors made in
detecting a target object on a screen
   The task of operationally defining a variable forces scientists to discuss abstract
concepts in concrete terms
   Operational definitions help us communicate our ideas to others
   Which Operational Definition to Use
   A variety of methods to operationally define a variable may be available, each of
   Decision on which operational definition to use in a study should be based on
the goal of the study and other considerations (e.g. ethnics and cost)
3
Relationship Between Variables
   Types of Relationship in Interval and Ratio Variables
   Positive linear relationship
   Negative linear relationship
   Curvilinear relationship
   No relationship
   Relationships and Reduction of Uncertainty
   Detecting relationships between variables means reducing our uncertainty about
the nature of the variables
   Error variance (random variability)
   Research is aimed at reducing error variance by identifying systematic
relationships between variables

4
Perfect Positive Linear Relationship   Perfect Negative Linear Relationship

Perfect Positive Linear Relationship
No Relationship
intermediate minimum)                                              5
Suppose you have surveyed 200 people about whether or not they like shopping, and
100 people said Yes and the remaining 100 said No. What can do conclude from the
information?
When you meet a person, you can only make a random guess whether the person likes
shopping or not, and the chance your answer is correct (either way) is 50%
Suppose you also have asked people to indicate their gender, and found the
relationship between gender and attitude toward shopping as follows.

Male Female
Like        Yes      30     70     70% males do not like shopping
shopping?                            70% females like shopping
No      70     30
Therefore, when you predict a person’s
Number of                       attitude toward shopping based on the
100    100
participants                    person’s gender, you will be correct 70% of
the time

6
Nonexperimental Methods
   Relationships are studied by making observations or measures of the
variables of interest as behavior occurs naturally
   Observing behavior
   Requires trained personnel to observe and code behavior according to a prearranged
set of criteria
   Ask people to describe their behavior (survey)
   Examining various public records (e.g. census data)
   A relationship between variables is established when the variables
vary together
   Allows observing covariation between variables (“correlational
method”)

7
   Correlation Does Not Equal Causality
   The correlational method identifies whether two variables are associated but not
how they are associated
   Issues
   Direction of cause and effect
   With nonexperimental method, it is difficult to determine which variable causes the
other
   The third-variable problem
   Extraneous variables may be causing the observed relationship
   Third-variables introduce alternative explanations for the observed relation
   A known yet uncontrolled third-variable is called confounding variable
 When two variables are confounded, their effects on another variable cannot be
separated

8
Experimental Methods
   Study cause and effect
   Use controlled observations and measurements to test hypotheses
   Experimental Control
   All extraneous variables are kept constant
   Cannot be confounding variables or responsible for the results of the experiment
   Treating participants in all groups in the experiment identically; the only
difference between groups is the manipulated variable of interest
   If you design an experiment to compare the effectiveness of two visualization
techniques, participants in the two settings must have the same technical background
related to using the visualization techniques and be given equivalent tasks, the
lighting and all other relevant conditions will be the same, and so on.

9
Experimental Methods (Cont’d)
   Randomization
   Used when it is difficult or costly to keep an extraneous variable constant
   Making the individual characteristic composition of different groups virtually
identical
   Assign participants to different groups in a random fashion
   Using a list of random numbers (Appendix C.1 in Cozby’s book)
   Computer-based random number generators (e.g. Excel, Randomizer
(http://www.randomizer.org))
   Other variables that cannot be held constant are also controlled by
randomization
   A random order is used to schedule the sequence of various experimental conditions

10
Suppose you have recruited 16 participants and need to assign them to 2 groups (8
in each group)

Participant      Random          Group            8 smallest numbers ->group 1
Order          Number        Assignment         8 largest numbers -> group 2
1             56               1
2             57               1
3             51               1
4             10               1
5             69               2
6              9               1
7             75               2
8              5               1
9             78               2
10            90               2
11            79               2
12             4               1
13            79               2
14            48               1
15            57               2
16            82               2                                               11
Independent and Dependent Variables
   Independent Variable
   Manipulated by the researcher
   Considered to be the “cause” of the observed relationship
   Dependent Variable
   Studied and measured by the researcher to see if its values depend on the values
of the independent variable
   Considered to be the “effect” of the observed relationship

Dependent
variable
Independent         Dependent
Variable           Variable

Independent
variable
12
Researchers conducted a study to examine the effect of music on exam scores. They
hypothesized that scores would be higher when students listened to soft music compared
to no music during the exam because the soft music would reduce students’ text anxiety.
One hundred students (50 males, 50 females) were randomly assigned to either the soft
music or no music conditions. Students in the music condition listened to music using
headphones during the exam. Fifteen minutes after the exam began, the researchers asked
the students to complete a questionnaire that measured test anxiety. Later, when the
exams were completed and graded, the scores were recorded.

As hypothesized, test anxiety was significantly lower and exam scores were significantly
higher in the soft music condition compared to the no music condition.

Where are the independent, dependent, mediating, and confounding variables?

13
   Artificiality of Experiments
   High degree of control and the laboratory setting may sometimes create an
artificial atmosphere that may limit either the questions that can be addressed or
the generality of the results
   Field experiment
   The independent variable is manipulated in a natural setting
   Researcher attempts to control extraneous variables via randomization or
experimental control
   Researcher loses the ability to directly control many aspects of the situation, and thus
the experiment suffers from the possibility of “contamination”
   Ethical and Practical Considerations
   Sometimes the experimental method is not a feasible alternative because
experimentation would either unethical or impractical

14
Comparison of Non-Experimental and
Experimental Methods
Relationships studied     •Allows measure of covariation          •Difficult to infer cause
by making                 between variables                       and effect
Non-      observations or           •Behavior can be observed in a          •Direction and third-
Experimental measuring variables       natural context                         variable problem
as they exist naturally   •Allows us to study participant         •Difficult to control many
variables that cannot be manipulated    aspects of the situation

Direct manipulation     •Reduces ambiguity in interpretation    •High control may create
and control of          of results regarding cause and effect   an artificial atmosphere
variables, then         •Attempts to eliminate the impact of    •Can be unethical or
response or result is   all possible confounding third-         impractical
observed                variables
Experimental
•Permits greater experimental
control
•Reduces the possible influence of
extraneous variables through
randomization

15
Evaluating Research: Three Validities
   Validity
   “Truth” and accurate representation of information
   Three types of validity to evaluate research
   Construct validity
   Internal validity
   External validity
   Construct Validity
   The adequacy of the operational definition of variables
   The operational definition of variable indeed reflects the true theoretical meaning of
the variables
   A measure of social anxiety has construct validity if its measures the social anxiety
construct and not some other variable such as dominance
   Variables that are abstract constructs usually can be measured and manipulated
in a variety of ways but do not have a single perfect operational definition

16
Evaluating Research: Three Validities (Cont’d)
   Internal Validity
   The ability to draw conclusions about causal relationships from our data
   A study has high internal validity when strong inferences can be made that one
variable caused changes in the other variable
   Strong causal inferences can be made more easily when the experimental
method is used
   External Validity
   The extent to which the results can be generalized to other populations and
settings
   Whether the results can be replicated with other operational definitions of the
variables, with different participants, or in other settings
   Artificiality of laboratory experiments is an issue of external validity
   Field experiments represent one way that researchers try to increase the external
validity of their experiments
   The goal of high internal validity may sometimes conflict with the goal of
external validity
17
Measurement Error
   Measurement error
   Any deviation from the “true value”
   Systematic Error
   Caused by factors that systematically affect measurement of the variable across
samples
   Tends to be consistently either positive or negative
   Referred to as “bias”
   Can be controlled using strategies such as frequent calibration and
randomization
   Random Error
   Caused by factors that randomly affect measurement of the variable across
samples
   It does not have any consistent effects across the entire sample population
   Referred to “noise”
   Difficult to control
   Leads to unreliability of measures
18
   Appropriate to most human factors studies
   Attempts are made to determine the values of the variables of interest yet one is not
able to do so because of various errors in the measurement
   Easy to understand and familiar to most human factor practitioners
   Every measurement is a sum of two components: the true score of the measure
and random error

X*: the observed score
X*  X          (Eq. 1) X: the true score
δ: random error, with mean 0 and variance  
2

 X   X   2
2
*
2
(Eq. 2)

19
Reliability of Measures
   Reliability
   The consistency of measures obtained by individuals when reexamined with the
same criterion measure on different occasions or with different sets of equivalent
   A reliable measure of intelligence should yield the same result each time you
administer the intelligence test to the same person. The test would be unreliable if it
measured the same person as average one week, low the next, and bright the next
   Mathematically, the reliability of a measure is defined as the proportion of the
variability in the measure attributable to the true score
rXX   X /  X *   X /( X   2 ) (Eq. 3)
2     2       2     2

20
97   103

Less random error

21
Reliability of Measures (Cont’d)
   The Importance of Reliability
   Researchers cannot use unreliable measures to systematically study variables or
the relationships among variables
   Trying to study human behavior using unreliable measures is a waste of time
because the results will be unstable and unable to be replicated
   How to Improve Reliability
   Use careful measurement procedures
   Carefully training observers to collect data or record behavior
   Paying close attention to the way questions are phrased

22
Access Reliability
   Use Correlation Coefficient
   Correlation coefficient indicates the strength and direction of a linear
relationship between two random variables
   Pearson product-moment correlation coefficient
    If we have a series of n measurements of X and Y , xi and yi (i = 1, 2, ..., n), then the
sample correlation of X and Y, rX,Y , can be estimated using the Pearson product-
moment correlation coefficient. It is the best estimate of rX,Y if X and Y are both
normally distributed.

Σ ( xi  x )( yi  y )
rX ,Y        ( n 1) S X SY
(Eq. 4)

To assess the reliability of a measure, we need to obtain at least two scores on the
measure from multiple individuals

23
Types of Reliability Estimates
   Test-Retest Reliability
   Alternative-Form Reliability
   Inter-Rater Reliability
   Internal Consistency Reliability

24
Test-Retest Reliability
   Procedure
   Measure the same individuals at two points in time and then calculate the
correlation coefficient between the first and second test scores
   To test the reliability of an intelligence test, the test is given to a group of people on
one day and again a week later
   The procedure is simple and straightforward
   The problem of carry-over effects due to memory and/or practice which will
result in inflated estimates of reliability

25
Alternative-Form Reliability
   Procedure
   Administer two parallel forms of a test to the same group of individuals
   Two forms of a measuring instrument is considered parallel if an object's true
score is the same for both forms and if both forms produce equal means and
equal variances
   Helps to alleviate carry-over effects
   Difficult to come up with two parallel forms, especially with personality
measures

26
Inter-Rater Reliability
   Access Reliability of Rating System
   The extent to which two or more individuals (coders or raters) agree in their
observations
   Two usability experts are asked to give a rating on the usability of a website
according to a sliding rating scale (1 being the worst, 5 being the best). If one expert
gives “1” to the usability of the website, whereas the other gives “5”, then the
interrater reliability of the rating scale would be quite low
   Depends on the ability of the raters to be consistent
   Training and education can help enhance inter-rater reliability
   Cohen’s Kappa (K)
PO  PC              PO: Observed proportion of agreement
k         1 PC
(Eq. 5)
PC: Proportion of agreement predicted by chance

PC     1
n2     pm
i
i   (Eq. 6) pmi: product of the ith row and column marginals

27
Raters A and B are asked to evaluate the usability of 100 websites using a three-
point likert scale (1 being the worst, 3 being the best)

Ratings of B
1      2    3
1    20     5    10    35
Ratings                                  row margin
2    10    30    10    50
of A
3    2      3    10    15
32    38    30

column margin

PO     20  30 10
100         0.60
P  1002 (35  32  50  38  15  30)  0.347
C
1

k    0.6  0.347
1 0.347      0.39
28
Internal Consistency of Multi-Item tests
   Multi-Item Tests
   Many psychological measures are made up of a number of different questions
(items)
   An intelligence test may have 100 items, a satisfaction questionnaire may have 10
items
   Internal Consistency Reliability
   The extent to which the items of a measuring instrument correlate with one
another
   All items measure the same variable, so they should yield consistent results
   Split-half reliability
   Cronbach’s alpha

29
Split-Half Reliability
   Procedure
   The questions in the measuring instrument are divided in half, creating two
pseudo-parallel half-tests a and b
   Each of the two half-tests is scored on a number of individuals
   Calculate the correlation between the total scores of a and b, rab
   The half-test reliability is adjusted to estimate the overall reliability of the whole
instrument, rAB , using the Spearman-Brown formula
2rab
rAB     1 rab
(Eq. 7)

rab = 0.6, then rAB = 2∙0.6/(1+0.6) = 0.75

30
Cronbach’s Alpha(α)
   Most popular measure of internal consistency reliability
   Interpreted as the mean of all possible split-half coefficients
Si2
rXX  (      k
k 1   )(1     2
SX
)   (Eq. 8)

k : the number of items
S i2 : sample variance for item i
2
S X : sample variance of the total test scores

Cronbach's α of 0.7 is a rule-of-thumb acceptable level of agreement

Calculate Cronbach’s alpha from JMP

Choose Analyze --> Multivariate and specify your continuous columns. From the
Multivariate pull-down menu select Item Reliability --> Cronbach's Alpha

31
Sources of Psychological Tests
   It is usually wise to use existing measures of psychological
characteristics rather than develop your own
decide which measure to use
   You can compare your findings with prior research that uses the
measures
   You should always report the reliability of any psychological
measure used in your study even if it is an existing one!

32
Sources of Psychological Tests (Cont’d)
   Mental Measurements Yearbook
   A database containing the most recent descriptive information and critical
reviews of new and revised tests from the Buros Institute's 9th, 10th, 11th, 12th,
13th, 14th, and 15th Yearbooks
   Covers more than 2,200 commercially-available tests in categories such as
personality, developmental, behavioral assessment, neuropsychological,
achievement, intelligence and aptitude, educational, speech & hearing, and
sensory motor
The good news is that you can access it online for free through WSU library!
>> WSU library online >> Databases >> Mental Measurements Yearbook

33

```
To top