"Behavior Rating Scales"
Behavior Rating Scales Definition Types Construction Issues Weaknesses Strengths Selection Considerations Specific Scales: Conners’; CBCL; BASC; others Definition Rating Scale: any paper and pencil device where by one (usually a care taker such as a parent or teacher, though not excluding peers) assesses the behavior of that individual based on his or her observations of the child or adolescent over an extended period of time (usually more than a month) Martin, Hooper & Snow, 1986 Types of Rating Scales Range of constructs from general functioning to concrete behaviors Personality: Personality Inventory for Children-Revised (PIC-2); Minnesota Multiphasic Personality Inventory- Adolescent (MMPI-A) Behavior Checklists: Child Behavior Checklist (CBCL); Conners’ Rating Scales-Revised; Behavior Assessment System for Children (BASC); Devereux Scales of Mental Disorders Specific Disorders- Children’s Manifest Anxiety Scales; Beck Depression Inventory; Children’s Depression Inventory Summary of Construction Issues Checklist vs Dichotomy vs Continuum Item choice Ability to Sum Scores Anchors Description of Behavior/Construct Checklist vs Dichotomy vs Continuum Checklists: rater checks the item of the behavior exists; can be used in screening for specific DSM-IV disorders Dichotomy: rater indicates of the behavior exists or does not exist; forced dichotomy; Yes/No Continuum: 1 2 3 4 5 Increases reliability with more steps (plateau after 11 steps with little gain); Odd number allows for a neutral, middle step, but can create a response set Item Choice Subjectivity of instrument is a function of the level of analysis; type of item; manner scaled Sufficient number of items to sample the construct Face validity of items Specificity of behavior: Is delinquent vs Lies; steals; violates curfew Too specific may lead to trivial information, excessive length Time frame identified, e.g. Within the last two weeks… Various strategies used to develop items and scales Factor analysis: placing in a factor items that cluster together Empirical keying: using selected items to distinguish one group from another Theoretical constructs: using selected items to measure the theoretical constructs underlying the construction of the test Content analysis: using experts to select items to measure the trait or diagnostic category of interest Ability to Sum Scores Construction of some tests allows for sum scores across scales which increases the reliability of the instrument Broad band factors have higher reliability than narrow band, e.g. Internalizing & Externalizing have higher reliabilities than individual scales such as Social Withdrawal or Aggression Anchors End points on a scale Numerical (Likert scale) Degrees of agree/disagree Adjectives such as good/bad; carefree/anxious; impulsive/reflective Actual behavior to typify a type of attitude such as religion: attends church 1 time per months; 2 times per months; weekly; biweekly—This may be specific to the construct; may not represent equal intervals; may be difficult to find discreet specific behaviors Comparison to norm or product scales Description of Behavior/Construct Scales need to be defined Based on theory Behaviors which fall under one construct on one test, may be utilized on another construct in another test Summary of Weaknesses Disadvantages Considerations for Misuse Safeguards Disadvantages Four areas of variation on assessment data which summarize the disadvantages of rating scales: source variance, setting variance, temporal variance, and instrument variance (Martin, Hooper and Snow) Source Variance Primary source of error in rating scale data is the informant Knowledge of subject for at least 2 months Perceptions of rater Tolerance of behavior Stress level of respondent Choice of informant may slant results Internalizing behaviors or low rate behaviors may not be observed May not recognize the usefulness of the scale Reading level of informant (30-40% of the population does not read at a fifth grade level) Response Bias Response Bias Science identifies truth as the convergence of data Respondents may differ in perception, normative life experiences (e.g. urban/suburban; poverty/wealth), response style, and desired outcome: teacher may want the child in a program; teacher/parent may not have objective view in relation to normal peers; parent may have ulterior motive such as custody, monetary benefits Respondents sometimes are biased without awareness Reasons for Inadvertent Bias Complexity of the mental processes required for response lead to bias (Cooper, 1981) 1. Observation of the action 2. Observation encoding, aggregation, & storage in short- term memory 3. Short-term memory decay 4. Transfer to long-term storage and aggregation 5. Long-term memory decay Above can be influenced by expectation of respondent Reasons for Inadvertent Bias (cont.) 6. Presentation of categories to be rated 7. Observation and impression retrieval from long-term storage 8. Recognition of observations and impressions relevant to rating category. 9. Comparison of observations and impressions to rater’s standards 10.Incorporation of extraneous considerations 11.Making the rating-weighing the behavior Types of Response Bias May be due to respondent’s intentions or characteristic way of responding to an item regardless of content Halo Effect Leniency or severity Central tendency or range restriction Response acquiescence Response deviance Social desirability Halo Effect A rater’s failure to discriminate among distinct and independent aspects of a ratee’s behavior (Saal, 1980) Cognition: rate child positively in emotional or behavioral issues because they are smart Socially adept: child must be emotionally or cognitively adept because of positive social behaviors (always helpful, smiles) Other raters’ may report conflicting information Leniency or severity Occurs when ratings are consistently higher or lower than are warranted Inferred when a rater uses predominantly one extreme or the other on the scale Cannot be verified unless an independent observation or other party disagrees, e.g. parent sees child as hyperactive while few others see him as such Central tendency or range restriction Rater restricts range of all ratings to average or above or below (may revert to leniency or severity bias) Rater may choose middle response since they feel they do not know all the universe of possible occurrences of the behavior (e.g. I don’t know how he is with his friends; I only see him at school/home) therefore cannot rate as Always True/False, etc. Response acquiescence & response deviance Response acquiescence tends to agree with each item Response deviance tends to respond in a deviant, unfavorable, uncommon, or unusual way Social desirability Interpret the test responses to provide the most favorable view of the child Rater may not be aware of the tendency to underrate problematic responses Rater may hesitant to endorse items that suggest the presence of a particular disorder (e.g. Beck Depression Inventory) Methods to minimize bias Use a lie scale or faking good scale Switch left and right for positive responses Use bipolar adjectives Response scaling: many problem behaviors occur in all children, dichotomy is not adequate (most children yell, cry, hit at least sometimes) Provide clear instructions Limit number of response categories to reduce confusion, lack of focus, length Identify at the beginning what the scales mean and time frame for rating Setting Variance Interaction with the environment can affect results, i.e. home/school/ clinic Interventions used Consider if instrument is sensitive across settings or specific to one setting Temporal Variance Change in behavior over time Medication issues Intervention Maturation Significant events: deaths, divorce, illness, trauma Instrument Variance Sloppy construction Definition of construct Qualitative technical aspects Quantitative: depth of information as well as breadth Considerations for Misuse May be convenient and efficient for assessor, but may not be for the informant Provide feedback and explain the instrument Inappropriate use of instrument for screening, diagnosis, intervention development, program evaluation Choice of an instrument to sway identification of a specific condition Safeguards Aggregate principle: collect data on same construct over varied settings with varied instruments to increase reliability by controlling the sources of variance Test over several time periods Use several instruments Use several raters Multi-setting, Multi-source, Multi- instrument Design • Variations in responses may be due to setting, activity, or rater • Can lead to hypothesis development Strengths Rating scale is a derivative of the unstructured interview, an evolution of the interview in the direction of increasing structure The interview has more variability in interviewers; does not cover all areas; problems may be missed; clients are not always willing and articulate inaccurate reporting; reliability and validity may be poor Rating scale can identify strengths and weaknesses Validate referent’s concern Evaluate the severity and range of the concern Assess atypical patterns Part of multi-source, multi-method evaluation Strengths (cont.) Several assumptions allow for the comparison of rater’s responses: 1) Informants can describe or rate the child 2) Items have the same or similar meaning for all respondents 3) Respondents report their thoughts, feelings, & behaviors openly and honestly 4) Measures have adequate reliability and validity Strengths (cont.) Rating scales can tap behaviors you may not be able to quantify in other tests Convenience: time-and cost- efficient for assessor, multiple viewpoints Comprehensive scales can ensure touching range of problem areas unlike interviews which may delve into one problem but miss others Structured response format and operationalizing behavior can reduce subjectivity Increase ecological validity of the assessment, normal environment Strengths (cont.) Teacher ratings have high predictive power; teacher has formal training, structure setting, comparison to other children Biases evidenced between settings or individuals can be used in assessment and intervention, identify the “real’ problem (child or referent), parenting style differences, influence of setting Some rating scales ask informant to identify the most problematic/concerning problem Child may not be able to interact/respond to assessment, e.g. infants, severely impaired Strengths (cont.) Use of caretaker as informant is strength in parents have observed child since birth; parents are motivated; part of natural environment More objective and reliable than projective and interview; can be less biased than self-report Can provide information on strengths as well as concerns Selection Considerations Technical considerations: Norms, validity, reliability, constructs sampled, test construction Informant, situation, time, client Scope of instrument: Narrow and/or broad category of behaviors; Choose for what you need and want; strengths (competencies) and weaknesses Purpose or use: screening, diagnosis, placement, intervention; program evaluation Clinical Utility: ease of administration; useful clinical information; sensitive to effects of intervention Specific Scales BASC CBCL Conners’ others BASC-Behavior Assessment System for Children Teacher Rating Scale Preschool 4-5 yrs (109 items) Child 6-11 yrs (148 items) Adolescent 12-18(138 items) Parent Rating Scale Preschool 4-5 yrs (105 items) Child 6-11 yrs (138 items) Adolescent 12-13 (126 items) Self-Report Scale Child 8-11 yrs (152 items) Adolescent 12-18 yrs (186 items) Each takes about 30 minutes to complete BASC (cont.) Scores Teacher and Parent have 4- point response (never, sometimes, often, almost always) Self-Report has true/false T scores and %ile ranks Scored by hand on carbonless forms or computer BASC (cont.) Standardization 2,084 Children ages 6-11 and 1,090 adolescents 12-18 for parent scale 1,259 children ages 6-11 and 809 adolescents 12-18 for teacher scale 5,413 children ages 8-11 and 4,448 adolescents ages 12-18 for Self- Report Collected 1988-1991, matching 1986 U.S. Census Separate norms for males, females, and clinical samples About 70% of clinical samples were males with dx of conduct or behavior disorder BASC (cont.) Reliability Internal consistency reliabilities for the 3 scales in the school age sample range from .62 to .95 for TRS; .58 to .94 for PRS, and .61 to .89 for Self-Report Interrater reliabilities: PRS are generally low, .35 to .73; TRS from .29 to .70 for preschool and .44 to .93 for school- age; none available for adolescents Test-retest: PRS for 2 to 8 week interval range from .41 to .94; TRS .59 to .95; Self-Report .57 to .81 for children and .67 to .81 for adolescents BASC (cont.) Validity Construct validity for internalizing and externalizing dimensions of the BASC scales is supported by factor and structural equation analyses Criterion-related validity is satisfactory for the 3 scales, as show by acceptable correlations with other similar measures BASC (cont.) Integrative approach across multiple informants Strength is in assessment of children ages 6 to 11 years, particularly in externalizing behaviors Separation of Attention & Hyperactivity; Depression & Anxiety Limited psychopathology and personality domains Comparison across child and adult forms is difficult Readability of Self-Report may be too high Child Behavior Checklist (CBCL) Teacher’s Report Form (TRF) & Youth Self-Report (YSR) Parent Rating Preschool 2-3 yrs (99 items) School-age 4-18 yrs (120 items) Teacher Rating Form Caregiver/Teacher 2-5 yrs (99 items) School Age 6-18 yrs (120 items) Youth Self-Report Ages 11-18 yrs (119 items) Requires 5th grade reading level about 30 minutes to complete Parent and Teacher form take 10-15 minutes to complete CBCL, TRF & YSR (Cont.) Scores 3-point response (not true, somewhat true or sometimes true & , often true) T scores and %ile ranks Scored by templates, scannable answer sheets, or computer CBCL, TRF & YSR (Cont.) Standardization 1,200 males and females ages 4- 11 and 1,168 adolescents 12-18 for parent scale 713 children ages 5-11 and 678 adolescents 12-18 for teacher scale 637 Males and 678 females for Self-Report Collected 1989, matching 1990 U.S. Census Separate norms for males&females CBCL, TRF & YSR (Cont.) Reliability Internal consistency reliabilities for the parent from .56 to .92; for teacher.63 to .96; and .59 to .90 (males) & .59 to .89 (females)for Self-Report Interrater reliabilities: Parent.26 to .86; Teacher from -.05 to .81; none available for adolescents Test-retest: Parent for 1 week interval range from .63 to .97; Teacher .82 to .95for males & .43 to .99 for females; Self-Report .47 to .81 for 50 children ages 11 to 18 CBCL, TRF & YSR (Cont.) Validity Concurrent validity for parent, teacher, and YSR forms is satisfactory, acceptable correlations with Conner Discriminant validity for parent and teacher forms is acceptable and satisfactory for YSR shown by significant differences in scores between referred and nonreferred samples CBCL, TRF & YSR (Cont.) Does not provide validity scales Support cross-informant assessment Low levels of reliability, suggesting caution in their interpretation and application Broad-based screening measure rather than a precise measure of disorder Conners’ Rating Scales-Revised Parent and teacher versions are designed for ages 3-17 Self-report is for ages 12-17 years Short forms (@ 27 items) and long forms (59-87 items) are available Conners’ Rating Scales-Revised Scores 4-point response (not true at all, just a little true, pretty much true, & very much true) T scores Scored by self-scoring sheet or computer scored with interpretive report Conners’ Rating Scales-Revised Standardization 8,000 individuals drawn from 1993 to 1996 from 45 U.S. states and 10 Canadian provinces. Norms are provided separately for males and females by age levels Does not match U.S. Census as there are more Euro-Americans than in general population Conners’ Rating Scales-Revised Reliability Internal consistency reliabilities for the parent and teacher from .73 to .96; for adolescent .75 to .92 Test-retest: Parent and teacher forms are variable for long and short forms, with better reliabilities for the short form over a 6-8 week retest; self-report form ranges from .72 to .89 between the two forms. Conners’ Rating Scales-Revised Validity Construct validity is satisfactory based on factor analysis used to construct the scales Convergent validity is good, high correlations between long and short forms Criterion validity is good, high correlations between various versions of the scales Discriminant validity for parent and teacher forms is good significant differences in scores between referred and nonreferred samples Conners’ Rating Scales-Revised Improvement over previous scales Standardization samples are small for any age group or gender Adequate to good reliability and adequate validity, with informant versions strong in evaluating externalizing problems Self-report is useful for measuring general distress Others Devereux Scales of Mental Disorders: Good reliability but limited validity; limited in its evaluation of psychopathology; some items include content that is difficult for parents and teachers to evaluate; not clearly aligned to DSM-IV, although this was an objective Scales specific to ADHD or other diagnosis: many have limited sample size and limited utility References Knoff, H. M. (2002). Best practices in personality assessment. In A. Thomas & J. Grimes (eds) Best practices in school psychology IV, Vol. 2. Bethesda, MD: National Association of School Psychologists Martin, R., Hooper, S., & Snow,J. (1986). Behavior rating scale approaches to personality assessment in children and adolescents. In H. Knoff (Ed.) The assessment of child and adolescent personality. New York: Guilford Press. Sattler, J.M. (2002). Assessment of children: Behavioral and clinical applications, (4th ed.). San Diego: Jerome M. Sattler, Publisher, Inc.