VIEWS: 6 PAGES: 33 POSTED ON: 9/11/2012 Public Domain
Variable, validity, reliability Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University Variables Definition An event, situation, behavior, or individual characteristic that varies cognitive task performance, word length, intelligence, etc. Types of Variables by Measurement Scales Nominal variables Values are unordered labels color, name Ordinal variables Possible levels are ordered in sequence) ranking of preference, satisfaction Interval variables Derive gaps between values; no true zero time between flight departure and arrival, temperature (in the unit of C or F) Ratio variables Values are real numbers with true zeros size, weight 2 Operational Definitions of Variables Operational Definition of Variable Definition of the variable in terms of the operations or techniques the researcher uses to measure or manipulate it Why Operational Definitions Some variables must be operationally defined so they can be studied empirically “cognitive task performance” can be defined as the number of errors made in detecting a target object on a screen The task of operationally defining a variable forces scientists to discuss abstract concepts in concrete terms Operational definitions help us communicate our ideas to others Which Operational Definition to Use A variety of methods to operationally define a variable may be available, each of which has advantages and disadvantage Decision on which operational definition to use in a study should be based on the goal of the study and other considerations (e.g. ethnics and cost) 3 Relationship Between Variables Types of Relationship in Interval and Ratio Variables Positive linear relationship Negative linear relationship Curvilinear relationship No relationship Relationships and Reduction of Uncertainty Detecting relationships between variables means reducing our uncertainty about the nature of the variables Error variance (random variability) Research is aimed at reducing error variance by identifying systematic relationships between variables 4 Perfect Positive Linear Relationship Perfect Negative Linear Relationship Perfect Positive Linear Relationship No Relationship (quadratic relationship with an intermediate minimum) 5 Suppose you have surveyed 200 people about whether or not they like shopping, and 100 people said Yes and the remaining 100 said No. What can do conclude from the information? When you meet a person, you can only make a random guess whether the person likes shopping or not, and the chance your answer is correct (either way) is 50% Suppose you also have asked people to indicate their gender, and found the relationship between gender and attitude toward shopping as follows. Male Female Like Yes 30 70 70% males do not like shopping shopping? 70% females like shopping No 70 30 Therefore, when you predict a person’s Number of attitude toward shopping based on the 100 100 participants person’s gender, you will be correct 70% of the time 6 Nonexperimental Methods Relationships are studied by making observations or measures of the variables of interest as behavior occurs naturally Observing behavior Requires trained personnel to observe and code behavior according to a prearranged set of criteria Ask people to describe their behavior (survey) Examining various public records (e.g. census data) A relationship between variables is established when the variables vary together Allows observing covariation between variables (“correlational method”) 7 Disadvantages of Nonexperimental Methods Correlation Does Not Equal Causality The correlational method identifies whether two variables are associated but not how they are associated Issues Direction of cause and effect With nonexperimental method, it is difficult to determine which variable causes the other The third-variable problem Extraneous variables may be causing the observed relationship Third-variables introduce alternative explanations for the observed relation A known yet uncontrolled third-variable is called confounding variable When two variables are confounded, their effects on another variable cannot be separated 8 Experimental Methods Study cause and effect Use controlled observations and measurements to test hypotheses Experimental Control All extraneous variables are kept constant Cannot be confounding variables or responsible for the results of the experiment Treating participants in all groups in the experiment identically; the only difference between groups is the manipulated variable of interest If you design an experiment to compare the effectiveness of two visualization techniques, participants in the two settings must have the same technical background related to using the visualization techniques and be given equivalent tasks, the lighting and all other relevant conditions will be the same, and so on. 9 Experimental Methods (Cont’d) Randomization Used when it is difficult or costly to keep an extraneous variable constant Making the individual characteristic composition of different groups virtually identical Assign participants to different groups in a random fashion Using a list of random numbers (Appendix C.1 in Cozby’s book) Computer-based random number generators (e.g. Excel, Randomizer (http://www.randomizer.org)) Other variables that cannot be held constant are also controlled by randomization A random order is used to schedule the sequence of various experimental conditions 10 Suppose you have recruited 16 participants and need to assign them to 2 groups (8 in each group) Participant Random Group 8 smallest numbers ->group 1 Order Number Assignment 8 largest numbers -> group 2 1 56 1 2 57 1 3 51 1 4 10 1 5 69 2 6 9 1 7 75 2 8 5 1 9 78 2 10 90 2 11 79 2 12 4 1 13 79 2 14 48 1 15 57 2 16 82 2 11 Independent and Dependent Variables Independent Variable Manipulated by the researcher Considered to be the “cause” of the observed relationship Dependent Variable Studied and measured by the researcher to see if its values depend on the values of the independent variable Considered to be the “effect” of the observed relationship Dependent variable Independent Dependent Variable Variable Independent variable 12 Researchers conducted a study to examine the effect of music on exam scores. They hypothesized that scores would be higher when students listened to soft music compared to no music during the exam because the soft music would reduce students’ text anxiety. One hundred students (50 males, 50 females) were randomly assigned to either the soft music or no music conditions. Students in the music condition listened to music using headphones during the exam. Fifteen minutes after the exam began, the researchers asked the students to complete a questionnaire that measured test anxiety. Later, when the exams were completed and graded, the scores were recorded. As hypothesized, test anxiety was significantly lower and exam scores were significantly higher in the soft music condition compared to the no music condition. Where are the independent, dependent, mediating, and confounding variables? 13 Disadvantages of Experimental Methods Artificiality of Experiments High degree of control and the laboratory setting may sometimes create an artificial atmosphere that may limit either the questions that can be addressed or the generality of the results Field experiment The independent variable is manipulated in a natural setting Researcher attempts to control extraneous variables via randomization or experimental control Researcher loses the ability to directly control many aspects of the situation, and thus the experiment suffers from the possibility of “contamination” Ethical and Practical Considerations Sometimes the experimental method is not a feasible alternative because experimentation would either unethical or impractical 14 Comparison of Non-Experimental and Experimental Methods Description Advantages Disadvantages Relationships studied •Allows measure of covariation •Difficult to infer cause by making between variables and effect Non- observations or •Behavior can be observed in a •Direction and third- Experimental measuring variables natural context variable problem as they exist naturally •Allows us to study participant •Difficult to control many variables that cannot be manipulated aspects of the situation Direct manipulation •Reduces ambiguity in interpretation •High control may create and control of of results regarding cause and effect an artificial atmosphere variables, then •Attempts to eliminate the impact of •Can be unethical or response or result is all possible confounding third- impractical observed variables Experimental •Permits greater experimental control •Reduces the possible influence of extraneous variables through randomization 15 Evaluating Research: Three Validities Validity “Truth” and accurate representation of information Three types of validity to evaluate research Construct validity Internal validity External validity Construct Validity The adequacy of the operational definition of variables The operational definition of variable indeed reflects the true theoretical meaning of the variables A measure of social anxiety has construct validity if its measures the social anxiety construct and not some other variable such as dominance Variables that are abstract constructs usually can be measured and manipulated in a variety of ways but do not have a single perfect operational definition 16 Evaluating Research: Three Validities (Cont’d) Internal Validity The ability to draw conclusions about causal relationships from our data A study has high internal validity when strong inferences can be made that one variable caused changes in the other variable Strong causal inferences can be made more easily when the experimental method is used External Validity The extent to which the results can be generalized to other populations and settings Whether the results can be replicated with other operational definitions of the variables, with different participants, or in other settings Artificiality of laboratory experiments is an issue of external validity Field experiments represent one way that researchers try to increase the external validity of their experiments The goal of high internal validity may sometimes conflict with the goal of external validity 17 Measurement Error Measurement error Any deviation from the “true value” Systematic Error Caused by factors that systematically affect measurement of the variable across samples Tends to be consistently either positive or negative Referred to as “bias” Can be controlled using strategies such as frequent calibration and randomization Random Error Caused by factors that randomly affect measurement of the variable across samples It does not have any consistent effects across the entire sample population Referred to “noise” Difficult to control Leads to unreliability of measures 18 Additive Error Model Additive Error Model Appropriate to most human factors studies Attempts are made to determine the values of the variables of interest yet one is not able to do so because of various errors in the measurement Easy to understand and familiar to most human factor practitioners Every measurement is a sum of two components: the true score of the measure and random error X*: the observed score X* X (Eq. 1) X: the true score δ: random error, with mean 0 and variance 2 X X 2 2 * 2 (Eq. 2) 19 Reliability of Measures Reliability The consistency of measures obtained by individuals when reexamined with the same criterion measure on different occasions or with different sets of equivalent tasks (Salvendy & Carayon, 1997) A reliable measure of intelligence should yield the same result each time you administer the intelligence test to the same person. The test would be unreliable if it measured the same person as average one week, low the next, and bright the next Mathematically, the reliability of a measure is defined as the proportion of the variability in the measure attributable to the true score rXX X / X * X /( X 2 ) (Eq. 3) 2 2 2 2 20 97 103 Less random error 21 Reliability of Measures (Cont’d) The Importance of Reliability Researchers cannot use unreliable measures to systematically study variables or the relationships among variables Trying to study human behavior using unreliable measures is a waste of time because the results will be unstable and unable to be replicated How to Improve Reliability Use careful measurement procedures Carefully training observers to collect data or record behavior Paying close attention to the way questions are phrased 22 Access Reliability Use Correlation Coefficient Correlation coefficient indicates the strength and direction of a linear relationship between two random variables Pearson product-moment correlation coefficient If we have a series of n measurements of X and Y , xi and yi (i = 1, 2, ..., n), then the sample correlation of X and Y, rX,Y , can be estimated using the Pearson product- moment correlation coefficient. It is the best estimate of rX,Y if X and Y are both normally distributed. Σ ( xi x )( yi y ) rX ,Y ( n 1) S X SY (Eq. 4) To assess the reliability of a measure, we need to obtain at least two scores on the measure from multiple individuals 23 Types of Reliability Estimates Test-Retest Reliability Alternative-Form Reliability Inter-Rater Reliability Internal Consistency Reliability 24 Test-Retest Reliability Procedure Measure the same individuals at two points in time and then calculate the correlation coefficient between the first and second test scores To test the reliability of an intelligence test, the test is given to a group of people on one day and again a week later Advantage The procedure is simple and straightforward Disadvantage The problem of carry-over effects due to memory and/or practice which will result in inflated estimates of reliability 25 Alternative-Form Reliability Procedure Administer two parallel forms of a test to the same group of individuals Two forms of a measuring instrument is considered parallel if an object's true score is the same for both forms and if both forms produce equal means and equal variances Advantage Helps to alleviate carry-over effects Disadvantage Difficult to come up with two parallel forms, especially with personality measures 26 Inter-Rater Reliability Access Reliability of Rating System The extent to which two or more individuals (coders or raters) agree in their observations Two usability experts are asked to give a rating on the usability of a website according to a sliding rating scale (1 being the worst, 5 being the best). If one expert gives “1” to the usability of the website, whereas the other gives “5”, then the interrater reliability of the rating scale would be quite low Depends on the ability of the raters to be consistent Training and education can help enhance inter-rater reliability Cohen’s Kappa (K) PO PC PO: Observed proportion of agreement k 1 PC (Eq. 5) PC: Proportion of agreement predicted by chance PC 1 n2 pm i i (Eq. 6) pmi: product of the ith row and column marginals 27 Raters A and B are asked to evaluate the usability of 100 websites using a three- point likert scale (1 being the worst, 3 being the best) Ratings of B 1 2 3 1 20 5 10 35 Ratings row margin 2 10 30 10 50 of A 3 2 3 10 15 32 38 30 column margin PO 20 30 10 100 0.60 P 1002 (35 32 50 38 15 30) 0.347 C 1 k 0.6 0.347 1 0.347 0.39 28 Internal Consistency of Multi-Item tests Multi-Item Tests Many psychological measures are made up of a number of different questions (items) An intelligence test may have 100 items, a satisfaction questionnaire may have 10 items Internal Consistency Reliability The extent to which the items of a measuring instrument correlate with one another All items measure the same variable, so they should yield consistent results Split-half reliability Cronbach’s alpha 29 Split-Half Reliability Procedure The questions in the measuring instrument are divided in half, creating two pseudo-parallel half-tests a and b Each of the two half-tests is scored on a number of individuals Calculate the correlation between the total scores of a and b, rab The half-test reliability is adjusted to estimate the overall reliability of the whole instrument, rAB , using the Spearman-Brown formula 2rab rAB 1 rab (Eq. 7) rab = 0.6, then rAB = 2∙0.6/(1+0.6) = 0.75 30 Cronbach’s Alpha(α) Most popular measure of internal consistency reliability Interpreted as the mean of all possible split-half coefficients Si2 rXX ( k k 1 )(1 2 SX ) (Eq. 8) k : the number of items S i2 : sample variance for item i 2 S X : sample variance of the total test scores Cronbach's α of 0.7 is a rule-of-thumb acceptable level of agreement Calculate Cronbach’s alpha from JMP Choose Analyze --> Multivariate and specify your continuous columns. From the Multivariate pull-down menu select Item Reliability --> Cronbach's Alpha 31 Sources of Psychological Tests It is usually wise to use existing measures of psychological characteristics rather than develop your own Existing measures have reliability and validity data to help you decide which measure to use You can compare your findings with prior research that uses the measures You should always report the reliability of any psychological measure used in your study even if it is an existing one! 32 Sources of Psychological Tests (Cont’d) Mental Measurements Yearbook Published by the Buros Institute of Mental Measurements (BIMM) A database containing the most recent descriptive information and critical reviews of new and revised tests from the Buros Institute's 9th, 10th, 11th, 12th, 13th, 14th, and 15th Yearbooks Covers more than 2,200 commercially-available tests in categories such as personality, developmental, behavioral assessment, neuropsychological, achievement, intelligence and aptitude, educational, speech & hearing, and sensory motor The good news is that you can access it online for free through WSU library! >> WSU library online >> Databases >> Mental Measurements Yearbook 33