Running Head: OPPORTUNITY ACCESS
Differential Item Functioning in Biodata:
Opportunity Access as an Explanation of Gender- and Race-related DIF
Anna Imus, Neal Schmitt,
Brian Kim, Fred Oswald,
Stephanie Merritt, & Alyssa Friede
Michigan State University
The support of the College Board in the conduct of the research described in this article is
acknowledged and appreciated. We also wish to thank all of the undergraduate research
assistants and support staff for their assistance in our research efforts.
Please direct correspondence concerning this article to Neal Schmitt, Department of
Psychology, Michigan State University, East Lansing, MI 48824-1116. Email: firstname.lastname@example.org
Phone: (517) 355-9563.
Investigations of differential item functioning (DIF) have been conducted mostly on ability tests
and have found little evidence of easily interpretable differences across various demographic
subgroups. In this study, we examined the degree to which DIF in biographical data items
referencing academically relevant background, experiences, and interests was related to
differences in judgments about access to these experiences by members of different gender and
race subgroups. DIF in the location parameter was significantly related (r = -.51, p < .01) to
gender differences in perceived accessibility to experience. No significant relationships with
accessibility were observed for DIF in the slope parameter across gender groups or for the slope
and location parameters associated with DIF across Black and White groups. Practical
implications for use of biodata and theoretical implications for DIF research are discussed.
Differential Item Functioning in Biodata:
Opportunity Access as an Explanation of Gender- and Race-related DIF
The validity and utility of ability tests in personnel selection (Hunter & Hunter, 1984;
Schmidt & Hunter, 1998) and in academic prediction contexts (Hezlett et al., 2001) has been
well established. Also well established is the fact that members of some minority groups as well
as females score substantially lower on average on cognitive ability tests (Hough, Oswald, &
Ployhart, 2001; Schmitt, Clause, & Pulakos, 1996). These mean differences are up to one
standard deviation or more in magnitude and, when used to make hiring or admissions decisions
in highly competitive environments, often result in relatively infrequent selection of minority
candidates. This adverse impact remains a concern of all those working in a selection or
admission context as well as our citizenry in general (Grutter vs. Bollinger, 2003).
Because of the relatively large adverse impact produced by the use of cognitive ability
tests, researchers and practitioners have experimented with the use of alternative predictors,
usually noncognitive in nature, that display less adverse impact (Sackett, Schmitt, Ellingson, &
Kabin, 2001). Even when these noncognitive predictors are less valid, their low correlation with
cognitive measures and their relevance to motivational or personality determinants of work
performance usually means they add incrementally to the validity of a battery of procedures that
also includes cognitive tests. Such a battery may also display somewhat less adverse impact
(Bobko, Roth, & Potosky, 1999; Sackett & Wilk, 1994) and may be more acceptable to
candidates because a broader array of potentially relevant individual characteristics are
In this paper, we explore how members of different racial and gender subgroups respond
to one such noncognitive predictor, biodata. Biodata include items that assess the background,
experiences, interests, and hobbies of respondents. Early versions of biographical items were
scored based on their relationship to external criteria such as job performance or grades
(England, 1971). Subsequent efforts to use biodata have focused on the interpretation of these
measures and the construction of scales that measure specific constructs (Mumford, 1999;
Mumford & Stokes, 1992; Stokes, 1999). Reviews of this literature indicate that biodata
measures have validity for prediction of a wide variety of work and academic outcomes and that
mean subgroup differences are usually much smaller than those for cognitive measures
In the employment arena, reviews of validation research on biodata have reported
substantial validity in the prediction of a variety of relevant work outcomes. Hunter and Hunter
(1984) reported that on average the validity of biodata was .37 against supervisory ratings.
Earlier, Reilly and Chao (1982) reported an average validity of .32 and Asher (1972) reported
that 90% of the validities in studies he reviewed were above .30. Rothstein, Schmidt, Erwin,
Owens, and Sparks (1990) reported an average observed validity of .30 (corrected to .36) across
79 different organizations engaged in customer service jobs. Hence, in the latter case at least,
biodata validity generalized across organizations. Schoenfeldt (1999) reported that biodata items
that appeared post hoc to reflect specific targeted constructs displayed cross-validities against
three of five criteria above .30. Some of the meta-analyses cited involve overlapping data, but the
general conclusion that biodata are valid predictors of job performance outcomes is warranted.
Biodata have also proven useful in predicting academic success. Owens and his
colleagues (Owens, 1976; Owens & Schoenfeldt, 1979; Stokes, Mumford, & Owens, 1989)
developed measures of developmental patterns of life history experiences, collected data from
large groups of students, and reported meaningful relationships with a variety of subsequent
academic and life outcomes. Their life history patterns also appeared to be stable across time.
Oswald, Schmitt, Kim, Ramsay, and Gillespie (2004) found cross-validated correlations of
empirically keyed biodata scales of .37, -.30, and .57 with first year college grade point average,
absenteeism, and a BARS composite of self-rated performance, respectively. Correlations with a
priori biodata scales were less impressive, ranging from -.02 to .24 for GPA, .00 to -.32 against
absenteeism, and .17 to .47 against the BARS composite.
Comparisons of racial groups usually show no evidence of sizable mean differences or
differential prediction for biodata. Reilly and Chao (1982) reported that there were no mean
differences and no validity differences across racial groups. Likewise, in the Rothstein et al.
(1990) study there was no evidence of differential validity across racial subgroups. This was also
true in a study comparing African Americans, Hispanic Americans and Caucasians by Pulakos
and Schmitt (1996). Oswald et al. (2004) reported that race differences on 12 biodata measures
averaged .02 standard deviations lower for African Americans than for Caucasians and that
women scored on average .20 standard deviations higher than males.
In spite of these data and previous claims that there are few demographic differences in
biodata responses (Mitchell, 1994), there have been others who have expressed caution in the
selection of biodata items since there are often subtle differences in the background indicators of
success across groups. One example of such inadvertent “discrimination” is an empirically keyed
item used in research by Pace and Schoenfeldt (1977). Having a Detroit address as opposed to a
suburban address was related to the criterion, but it was also highly correlated with race. In
addition, there is some evidence that different items or keys may be needed for males and
females (Hough, 1986). Validities were significantly higher for women than men in Ritchie and
Boehm’s (1977) study of management potential. Webster, Booth, Graham, and Alf (1978) found
significant gender differences on 7 of 11 biodata items among Navy hospital corps trainees.
Nevo (1976) reported validity coefficients of .36 and .18 for men and women military personnel.
Owens and Schoenfeldt (1979) also identified subgroups of people using their biodata profile
technique that were comprised of primarily men or women.
These studies on gender differences, and to a lesser extent on race differences, suggest
that biodata items have the potential to operate differently across these groups in terms of what
they indicate about the success of respondents. Methods to identify such differentially
functioning items are frequently used in educational contexts with ability tests, but less so with
noncognitive measures like biodata.
Differential Item Functioning
Differential item functioning (DIF) occurs when members of two different groups with
equal standing on a construct respond differently to a given item. When items show DIF, it can
be concluded that the test is biased and the interpretation of scores from this test provides
different interpretations for different groups (Gierl, 2005). DIF that indicates a consistent
difference across the entire latent trait continuum is referred to uniform DIF. Nonuniform DIF
reflects a situation in which the differences in the ability of equally able members of the two
groups to provide a correct response varies as a function of the level of the latent trait. In this
paper, we focus on uniform DIF. Additionally, analyses that examine DIF across ethnic and
gender lines are important because they confirm that factors unrelated to the construct of interest
are being captured by the test.
While DIF analyses are common practice in the refinement of selection tests, few studies
have been able to confirm a priori hypotheses about which items should function differently
across groups (Camilli & Shepard, 1994, Roussos & Stout, 1996; Whitney & Schmitt, 1997).
Given this dilemma, it is also difficult to write items that avoid problems that potentially produce
DIF. At this point, we can eliminate problem items after they have been administered and refine
them for future administrations, but we have few guidelines with respect to writing DIF-free
items prior to administration.
Given the studies indicating possible differences in the information some biodata items
might provide across gender and race groups, it is perhaps surprising that little previous work on
DIF in biodata has been reported. Using log-linear analyses, Whitney and Schmitt (1997) found
that 29 of 107 biodata items exhibited DIF. They found some evidence that items exhibited DIF
in ways that were consistent with a priori hypotheses about cultural differences between the
African American and Caucasian groups they compared. Cultural values related to perceptions of
human nature were related to findings of DIF in biodata items. Removal of items that displayed
DIF did lower the overall mean test score difference between subgroups by about one standard
In the current study, we extend the Whitney and Schmitt (1997) work by comparing
gender groups as well as racial groups on a biodata measure (Oswald et al., 2004). Unlike
Whitney and Schmitt, we attempted to build biodata measures around a specific set of constructs;
namely, leadership, adaptability, desire for continuous learning, perseverance, and multicultural
appreciation that might be relevant to the academic and social success of a college student (see
Table 1 for a description of these items). These constructs were arrived at based on a thorough
analysis of the stated objectives that universities include in their mission statements for
undergraduate education at their institutions (Oswald et al., 2004).
Insert Table 1 about here
The bulk of the previous biodata research has examined gender and race differences at
the scale level, though consideration of specific item content might be more easily interpretable
(Pace & Schoenfeldt, 1977) as a more refined approach to understanding subgroup differences.
Further, examination of DIF may reveal that there are items that are psychometrically biased
against or in favor of both groups being compared with little impact on the total scale score. This
would be consistent with findings described above on the lack of large subgroup mean
differences on biodata measures as well as some research on gender differences on ability
measures (Roznowski, 1987). The latter finding would be theoretically important and perhaps
reassuring in a practical sense.
In a study most similar to the one reported in this paper, Stricker and Emmerich (1999)
investigated the role of three variables (i.e., familiarity, interest, and emotional reactions) that are
most commonly cited in the literature on DIF as potential explanations of subgroup differences
in item responses. Ratings of items based on these three constructs on the Advanced Placement
Psychology Test were related to male-female DIF resulting from a Mantel-Haenszel analysis of
the items. Mean differences across males and females in interest and familiarity were related
positively and significantly to Mantel-Haenszel values. Ratings of unpleasantness of the items
were related negatively to the Mantel-Haenszel values. These correlations suggest that these
perceived features of items (i.e., interest, familiarity, and pleasantness) may be the source of
differences in the manner in which males and females respond to these tests.
The items in the biodata test examined here are items related to the experiences of the
respondents; hence the opportunity or access to these experiences should be an important
determinant of individuals’ responses. For the current study, we are focusing on the exposure
that individuals in different subgroups to the experiences reflected in the content of each item.
Focus on a biodata scale (as compared with different testing instruments) suggests that
differences in the accessibility of the experiences reflected in the item content should be
correlated with DIF. More specifically, if biodata items involve things that are differentially
accessible to members of different subgroups, we would expect that answers to these items will
be more (or less) informative with respect to the underlying construct being measured.
We used item response theory to examine DIF across Black and White groups and Male
and Female groups. An item showing DIF indicates that members of the two groups being
compared who have similar status on the construct measured are either more or less likely to
provide a positive response to an item. Our attempt is to understand DIF by correlating
judgments of accessibility to differences in IRT parameters across subgroups. Toward this
objective, we used one sample of participants to rate the extent to which they felt they had the
opportunity to participate in the activity described by each of the biodata items. DIF was then
calculated based on actual biodata responses from a second group of participants responding to
the items as they were intended to be used. By correlating the degree to which race and gender
subgroups differ in their belief that they had access to needed resources from the first sample
with the actual parameter estimates of the same items based on the biodata responses of the
second sample, we can detect the extent to which differences in accessibility relate to the DIF
displayed by the items. Given below are the research propositions tested:
H1: Differences in the accessibility of experiences between African Americans and
Caucasians will be correlated significantly with the DIF slope parameter estimates when
comparing Blacks versus Whites.
H2: Judged African American-Caucasian differences in the accessibility of the biodata
items will be correlated significantly with the DIF location parameter estimates when
comparing Blacks versus Whites.
H3: Judged Male-Female differences in the accessibility of the biodata items will be
correlated significantly with the DIF slope parameter estimates when comparing Males
H4: Judged Male-Female differences in the accessibility of the biodata items will be
correlated significantly with the DIF location parameter estimates when comparing Males
Two separate samples were used for the current study. Sample 1 provided judgments of
the perceived accessibility of the situation described in each of the biodata items. Sample 2
responded to the biodata items as would normally be the case. This sample was used to assess
DIF across subgroup responses to the biodata items.
Sample 1. A variety of recruiting techniques were used to obtain participants for this
sample. First, we invited individuals from the Psychology Subject Pool at a large Midwestern
university to complete a paper-pencil version of the survey for course credit. We also placed
flyers on public bulletin boards across the university. Finally, we recruited additional minority
members to complete a web-based version, which was identical in all aspects other than the
collection mode in an attempt to accumulate a larger Black male sample, as we desired
approximately equal samples sizes for Males, Females, Blacks and Whites.
This sample consists of 150 respondents. Fifty six percent was female, and 87% were
between the ages of 18 and 21. For the purpose of the current study, we were interested in
African American-Caucasian race differences; therefore, any respondents reporting a different
ethnicity were not included in analyses. Black individuals made up 42% of the sample. Finally,
33% of the sample were freshman, 32% reported were sophomores, 17% were in their junior
year, and the remainder reported having attended their institution for four or more years.
Sample 2. Participants who completed the biodata survey were freshman from two public
universities in the Mid-West. The items used for the DIF analyses were collected in conjunction
with a larger paper-and-pencil survey effort. In exchange for participation, students were paid
$20.00. Further, our research team over-sampled Blacks to ensure that we had a large enough
sample from both groups for DIF analyses. The participants for this study included 176 African
Americans and 231 Caucasians. Forty-nine percent of the sample was male, and 99% reported
being either 18 or 19 years old.
Biodata measure. The biodata measure included 42 items assessing background
experiences. This measure was a subset of the instrument developed by Oswald et al. (2004),
designed to predict performance in college. The original test contained 12 dimensional scales,
but we chose a subset of items that focused on the dimensions of multiculturalism, leadership,
adaptability, perseverance, and continuous learning. See Table 1 for a brief explanation of each
dimension. These dimensions were chosen because they seemed most likely to demonstrate
differences in responding across race and/or gender and because they reflected items thought to
be related to the academic and social success of new college students. Participants were asked to
indicate the number of times they had experienced the activity described within the item stem.
All items were measured on a scale from one to five, with higher values on the scales indicating
more experience on the activity addressed by the item.
IRT and DIF analyses are based on the assumption that items in the measure are
unidimensional. The internal consistency of the 42-item scale was considered adequate ( = .86),
but with 42 items, high alphas can be achieved even with very low item intercorrelations.
Therefore, to examine the dimensionality of the scale further, we conducted an exploratory
principal factors analysis. The first factor explained over twice the variance of the second (11.5%
versus 6.8%). We also conducted a confirmatory factor analysis specifying that all items
(grouped in 10 parcels of 4-5 items each) correlate with a single latent factor. This model of item
responses fit the data reasonably well (RMSEA = .08, Comparative Fit Index = .93, and
Nonnormed Fit Index = .92). In summary, with these exploratory and confirmatory analyses, we
concluded that the 42 items were adequately unidimensional to proceed with the IRT and DIF
analyses. Perfect unidimensionality would have precluded any finding of DIF.
Accessibility measure. The accessibility measure paralleled the items described in the
above biodata measure. The item stem was worded identically. However, participants were
instructed to indicate the extent to which they perceived having the opportunity to participate in
the activity described in each item on a five-point scale ranging from “Always” to “Never.”
Examples of the judgments requested are presented in Table 2. To ensure that there was
agreement among individuals within groups, rwg(j) (James, Demaree, & Wolf, 1984) values
were calculated for Male, Female, White, and Black groups. The agreement index for all four
groups was larger than .90, which exceeded the minimum standard of .70 set forth by James et
al. (1984) for within-group agreement indices and indicated near perfect agreement on
accessibility for this set of judges.
Insert Table 2 about here
DIF analyses were conducted in Parscale 4.1 (Muraki & Bock, 2003) using the two-
parameter polytomous IRT model. The model was estimated twice for race and gender
subgroups, so that differences in slope and location could be examined for African Americans vs.
Caucasians and for Females vs. Males. In the first model, item location parameters were
constrained as constant across both groups, and slope parameters were compared. In the second
model, the slope parameters were constrained to equality across groups allowing comparisons of
the location parameters. In polytomous models, there is a location parameter for each response
option (see Figure 1); the average of the location parameters across the item characteristic curves
for the response functions pertaining to the various response options was used as the item
location parameter. We then calculated the difference in item location by subtracting the
Caucasian parameter from the African American parameter for race and the Male parameter
from the Female parameter for gender analyses; thus positive values indicate higher item
locations for the comparison group (i.e., African Americans, Females), and negative values
indicate higher levels for the reference group (i.e., Caucasians, Males). The contrast in slope was
calculated by dividing the comparison group slope (i.e., African Americans, Females) by the
reference group slope (i.e., Caucasians, Males); thus contrasts < 1.0 mean greater item
discrimination for the reference group, and contrasts > 1.0 mean greater item discrimination for
the comparison group.
Insert Figure 1 about here
To determine differences in accessibility scores, we calculated Cohen’s d effect sizes. An
effect size for each of the 42 items was calculated that represents the difference in the degree to
which members of different groups felt they had access to the experiences reflected in a given
item., divided by the pooled variance of these groups. Correlations between the accessibility
means of the four subgroups ranged from .86 (African American and Caucasian) to .96 (female
and African American).
In order to determine the extent to which perceived accessibility explained DIF, we
correlated d-values in accessibility for each item with the slope and location contrasts as defined
above. Tests of our hypotheses, then, involve the significance of four correlations: (a) the
correlation between the location contrast parameter and the d in accessibility judgments across
racial groups, (b) the correlation between the location contrast parameter and the d in
accessibility judgments across gender groups, (c) the correlation between the slope contrast
parameter with the d in accessibility judgments across racial groups, and (d) the correlation
between the slope contrast parameter with the d in accessibility judgments for gender groups.
DIF results are presented in Table 3. These values represent the contrasts between
parameters for the two groups as described in the methods section above. As can be seen a
number of these contrasts are fairly large and several in each analysis are significantly different.
Insert Table 3 about here
In Table 4, we provide the means, standard deviations, and d values across groups
associated with the judgments of accessibility by members of the different groups. Using d
values of .20 to .50 as indicants of a moderate effect size, we see that seven of the male-female
d-values were of moderate effect size. All seven were positive indicating that females perceived
higher levels of accessibility to these items. Differences in judgments of accessibility to 19 items
were of moderate effect sizes when comparing Black and White groups. All but three of these d
values were negative indicating that for most of these items, Black individuals felt they had less
opportunity to experience an item on average.
Insert Table 4 about here
Tests of the significance of the correlations between the IRT contrasts and the d-values
associated with the judged accessibility of the items provide tests of our four hypotheses.
Hypothesis 1 proposed that differences in perceived accessibility of the experiences described in
an item stem would be significantly and positively related to DIF in the slope parameters across
Black and White respondents. The contrasts for the slope parameters were not significantly
related to the accessibility d-values (r = -.01, p > .05) providing no support for Hypothesis 1.
In Hypothesis 2, we proposed that the perceived accessibility of experiences would be
significantly related to the magnitude of the contrasts for the location parameters across the two
race groups. Similar to the findings for the slope parameter, the correlation was relatively low
and nonsignificant (r = .22, p >.05).
Hypothesis 3 was not supported. There was no significant correlation between the
contrast of an items’ slope parameters and the d-values for perceived accessibility (r = .06, p >
.05) across male and female groups.
Hypothesis 4 was supported. The correlation between the contrast of the location
parameter estimates for Males and Females and the d-value for accessibility across Males and
Females was negative, relatively large and statistically significant (r = -.51, p < .01). This finding
can be interpreted as indicating that items showing gender differences in accessibility also
functioned differently across gender groups. The negative correlation indicates that items for
which females reported greater accessibility as compared to males were those that displayed a
smaller DIF when comparing male and female location parameters. This makes sense given that
items with smaller or negative DIF would favor females (i.e., require less overall status on the
biodata construct to achieve a higher item score) over Males and that these are the same items for
which Females perceive they have greater accessibility.
Figure 1 has been provided as an example of an item that demonstrated significant DIF
on the location parameter. This figure depicts the item characteristic curve (ICC) for both
Females (see dashed lines) and Males (see solid lines) on a single item. To interpret this plot for
comparison across the two groups, one must first consider the likelihood of an individual at a
given level of ability to select a response option. Next, an assessment is made concerning how
individuals from the groups being compared (e.g., Females vs. Males) differ in terms of the
probability of selecting a response option at the same level of ability. In other words, the goal is
to understand how individuals with the same level of ability on the overall test differ in terms of
the likelihood of selecting each of the response options. For the biodata measure used for this
study, items were scored such that the curve for Option 1 represents the lowest score and the
curve for Option 5 represents the highest score on each item. In Figure 1, we depict the curves
relating status on the underling construct to the probability that members of both groups chose
each of the four options. In this case, Females had a greater probability of selecting the best
response option (i.e., 5) across all levels of ability. This difference is best illustrated by
comparing the probability of selecting a score of five on the high end of the underlying biodata
continuum between the two groups. For this item, we see that Females with high levels on this
construct had a much higher likelihood of choosing the best response option (i.e., “5”) than
Males of similar status on the construct. In other words, a Male who had the same overall score
on the test as a Female is less apt to choose the best response option. The reverse is true for
Options 1 and 2. In these cases, Males were more likely to select one of these options than were
Females whose status on the construct was the same. The item depicted in this figure was Item
21 (“plan ahead and make a specific schedule of things you need or want to do” ) which showed
both a substantial difference in perceived accessibility (d = .42) and a large contrast value in the
location parameters associated with Male and Female groups (-2.02).
Another question that arises in situations in which DIF is used to eliminate items is
whether the mean difference between subgroups is affected by the elimination of such items.
There were 14 items for both the gender and race analyses where significant DIF in the location
parameters was evident. In Table 5, we present the means of Male, Female, African American,
and Caucasian groups on the total 42-item scale as well as the shortened 28-item scales
consisting only of items that did not display DIF for the subgroups being compared. The mean
difference across both gender and race was near zero on both the full scale and DIF-eliminated
scale. Further, removing the items that demonstrated DIF did not materially affect the biodata
scores. This finding is promising, given that we did not expect that there should be group
differences on the composite scale-score. Individual items displaying statistically significant DIF
were either biased for or against both subgroups, with minimal overall effect on total observed
Insert Table 5 about here
In this study, we found partial support for hypotheses that suggested that perceived
differences in accessibility to experiences reflected in biodata items were related to bias in the
item as reflected in IRT DIF indices. For Female-Male comparisons, the extent to which an item
was contaminated due to differences in the perception that one had the opportunity to perform
the described experiences was related to DIF in location parameter estimates across the two
The relationship indicated that the items for which female access was considered to be
less than that for males’ access were the items for which Female responses to the items were
lower than Males’ responses even though their status on the underlying construct was identical.
The magnitude and direction of the effect indicate the need for caution in using biodata items to
index the background, experience and interests of females when compared with males. The
absence of a relationship with the slope parameter indicates that any differences between males
and females in the nature of the relationship between item responses and the underlying biodata
construct was not related to perceived differences in accessibility. It should be noted though that
there were few large differences in the slope parameters (see Table 3), so this result may be a
function of the fact that these items do not differ in slopes across gender groups.
The results for race subgroups (Hypotheses 1 and 2) indicated no significant relationship
between perceptions of item accessibility and item location and item slope differences. The
relationship between the location parameter and accessibility perceptions was relatively strong
(.22) and similar in magnitude to the correlations reported by Stricker and Emmerich (1999).
However, it is positive which indicates that the items for which Black access was considered to
be less than White access, White responses to the items was lower than were Blacks’ responses
to the same items even though their status on the underlying construct was identical. The number
of items was only 42 so a correlation of .27 would be statistically significant (p < .05); similar
future studies should include at least as many (or more) unidimensional items as we employed to
provide reliable results.
Much research has established that item format and content issues are critical elements in
test construction. This literature has shown that differences across test format, even aspects as
seemingly trivial as the instructions, can affect the properties of a test, including validity
(Campion, Palmer, & Campion, 1997; McDaniel & Nguyen, 2001; Ployhart & Ehrhart, 2003)
and reliability (Ployhart & Ehrhart, 2003). However, efforts to relate the impact of these factors
to differences in the way the test measures constructs across subgroups have been largely
unsuccessful (e.g., Scheuneman & Gerritz, 1990; Schmitt & Pulakos, 1998). The current study
posited that one way to understand differences across subgroups at the item-level is to take a
theory-driven approach. We acknowledge that there are many theoretical explanations that can
be used to describe item-level differences across groups; however, given that biodata are
designed to reference past experiences and interests, positing a priori that differential opportunity
to engage in these experiences across subgroups was quite logical.
From a practical stance, our findings give those working in the employee selection
domain the knowledge that differences in the accessibility across gender groups may serve to
increase the chances that selection tests utilizing experience-based items should be examined
prior to administration. Items that appear to the test-taker as based on experiences they have not
had the opportunity to perform could serve to harm some disadvantaged subgroups.
Finally, it is important to note that researchers who utilize IRT methodology to better
understand subgroup differences on tests should see our findings as a meaningful step in truly
understanding theoretical explanations for subgroup differences that are unrelated to the status of
examinees on the construct underlying the test. Further, theories that can support arguments as
why DIF occurs should be considered and tested using the methods offered here.
Asher, J. J. (1972). The biographical item: Can it be improved? Personnel Psychology, 25, 251-
Bobko, P., Roth, P. L., & Potosky, D. (1999). Derivation and implications of a meta-analytic
matrix incorporating cognitive ability, alternative predictors, and job performance. Personnel
Psychology, 52, 561-590.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks,
Campion, M., Palmer, D. K., & Campion, J. (1997). A review of structure in the interview
process. Personnel Psychology, 50, 650-702.
England, G. W. (1971). Development and use of weighted application blanks. Minneapolis:
University of Minnesota.
Gierl, M. J. (2005). Using a dimensionality-based DIF analysis paradigm to identify and interpret
constructs that elicit group differences. Educational Measurement: Issues and Practices, 24,
Grutter v. Bollinger (02-241) 539 U. S. 306 (2003) 288 F. 3d 732, affirmed.
Hezlett, S. A., Kuncel, N. R., Vey, M. A., Ahart, A. M., Ones, D. S., Campbell, J. P., & Camara,
W. F. (2001). The predictive validity of the SAT: A meta-analysis. In D. S. Ones & S. A.
Hezlett (Chairs), Predicting performance: The interface of I/O Psychology and educational
research. Symposium conducted at the 16th Annual Convention of the Society for Industrial
and Organizational Psychology, San Diego, CA.
Hough, L. M. (1986). Utility of temperament, biodata, and interest assessment for predicting job
performance: A review and integration of the literature (PDRI Rep. No. 145). Minneapolis:
Personnel Decisions Research Institute.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detection, and
amelioration of adverse impact in personnel selection procedures: Issues, evidence, and
lessons learned. International Journal of Selection and Assessment, 9, 152-194.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job
performance. Psychological Bulletin, 96, 72-98.
James, L. R., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability
with and without response bias. Journal of Applied Psychology, 69, 85-98.
McDaniel, M. A., & Nguyen, H. T. (2001). Situational judgment tests: A review of practice and
constructs assessed. International Journal of Selection and Assessment, 9, 103-113.
Mitchell, T. W. (1994). The utility of biodata. In G. S. Stokes, M. D. Mumford, & W. A. Owens
(Eds.), Biodata handbook (pp. 485-516). Palo Alto, CA: Consulting Psychologists Press.
Mumford, M. D. (1999). Construct validity and background data: Issues, abuses, and future
directions. Human Resource Management Review, 9, 117-146.
Mumford, M. D., & Stokes, G. S. (1992). Developmental determinants of individual action:
Theory and practice in the application of background data measures. In M. D. Dunnette & L.
M. Hough (Eds.), Handbook of Industrial and Organizational Psychology (Vol.3, pp. 61-
138). Palo Alto, CA: Consulting Psychologists Press.
Muraki, E., & Bock, R. D. (2003). PARSCALE for windows (Version 4.1). Chicago: Scientific
Nevo, B. (1976). Using biographical information to predict success of men and women in the
army. Journal of Applied Psychology, 47, 106-108.
Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A. (2004). Developing a
biodata measure and situational judgment inventory as predictors of college success. Journal
of Applied Psychology, 89, 187-207.
Owens, W. A. (1976). Background data. In M. D. Dunnette (Ed.), Handbook of industrial and
organizational psychology (pp. 609-644). New York: Rand McNally.
Owens, W. A., & Schoenfeldt, L. F. (1979). Toward a classification of persons. Journal of
Applied Psychology, 64, 569-607.
Pace, L. A., & Schoenfeldt, L. F. (1977). Legal concerns in the use of weighted applications.
Personnel Psychology, 30, 159-166.
Ployhart, R. E., & Ehrhart, M. G. (2003). Be careful what you ask for: Effects of response
instructions on the construct validity and reliability of situational judgment tests.
International Journal of Selection and Assessment, 11, 1-16.
Pulakos, E. D., & Schmitt, N. (1996). An evaluation of two strategies fro reducing adverse
impact and their effects on criterion-related validity. Human Performance, 9, 241-258.
Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of some alternative employee selection
procedures. Personnel Psychology, 35, 1-63.
Ritchie, R. J., & Boehm, V. R. (1977). Biographical data as a predictor of women’s and men’s
management potential. Journal of Vocational Behavior, 11, 363-368.
Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Journal of
Applied Psychology, 75, 175-184.
Roussos, L., & Stout, W. F. (1996). A multidimensionality-based DIF analysis paradigm.
Applied Psychological Measurement, 20, 355-371.
Roznowski, M. (1987). Use of tests manifesting sex differences as measures of intelligence:
Implications for measurement bias. Journal of Applied Psychology, 72, 480-483.
Sackett, P. R., Schmitt, N., Ellingson, J. E., Kabin, M. B. (2001). High-stakes testing in
employment, credentialing, and higher education: Prospects in a post-affirmative action
world. American Psychologist, 56, 302-318.
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment
in preemployment testing. American Psychologist, 49, 929-954.
Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to
explore sources of item difficulty and group performance characteristics. Journal of
Educational Measurement, 27, 109-131.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124, 262-274.
Schmitt, N., Clause, C. S., & Pulakos, E. D. (1996). Subgroup differences in ability as assessed
by different methods. In C. L. Cooper & I. Robertson (Eds.), International review of
industrial and organizational psychology (Vol. 11, pp. 115-140). New York: Wiley.
Schmitt, N., & Pulakos, E. D. (1998). Biodata and differential prediction: Some reservations. In
M. D. Hakel (Ed.), Beyond multiple choice: Evaluating alternatives to traditional testing for
selection. Mahwah, NJ: Lawrence Erlbaum Associates.
Schoenfeldt, L. F. (1999). From dust bowl empiricism to rational constructs in biographical data.
Human Resource Management Review, 9, 147-167.
Stokes, G. S. (1999). Introduction to special issue: The next one hundred years of biodata.
Human Resource Management Review, 9, 111-116.
Stokes, G. S., Mumford, M. D., & Owens, W. A. (1989). Life history prototypes in the study of
human individuality. Journal of Personality, 57, 509-545.
Stricker, L. J., & Emmerich, S. J. (1999). Possible determinants of differential item functioning:
Familiarity, interest, and emotional reaction. Journal of Educational Measurement, 36, 347-
Webster, E. G., Booth, R. F., Graham, W. K., & Alf, E. F. (1978). A sex comparison of factors
related to success in Naval Hospital Corps School. Personnel Psychology, 31, 95-106.
Whitney, D. J., & Schmitt, N. (1997). Relationship between culture and responses to biodata
employment items. Journal of Applied Psychology, 82, 113-129.
Description of the Content Assessed by the Biodata Measure
Continuous learning, and intellectual interest and curiosity
Being intellectually curious and interested in continuous learning. Actively seeking new ideas
and new skills, both in core areas of study as well as in peripheral or novel areas.
Appreciation for diversity
Showing openness, tolerance, and interest in a diversity of individuals and groups (e.g., by
culture, ethnicity, religion, or gender). Actively participating in, contributing to, and influencing
a heterogeneous environment.
Demonstrating skills in a group, such as motivating others, coordinating groups and tasks,
serving as a representative for the group, or otherwise performing a managing role in a group.
Adaptability and life skills
Adapting to a changing environment (at school or home), dealing well with gradual or sudden
and expected or unexpected changes. Being effective in planning one’s everyday activities and
dealing with novel problems and challenges in life.
Committing oneself to goals and priorities set, regardless of the difficulties that stand in the way.
Goals range from long-term goals (e.g., graduating from college) to short-term goals (e.g.,
showing up for class every day even when the class isn’t interesting).
Examples of Accessibility Items
Volunteered to be the spokesperson for a group project you did at school or work.
A. I always had the opportunity to be a spokesperson on a group project if I wanted to
B. I usually had the opportunity to be a spokesperson on a group project if I wanted to
C. I sometimes had the opportunity to be a spokesperson on a group project if I wanted to
D. I rarely had the opportunity to be a spokesperson on a group project if I wanted to
E. I never had the opportunity to be a spokesperson on a group project if I wanted to
Failed to meet responsibilities because you had taken on too much.
A. I always had access to the opportunity to take on a lot of responsibilities if I wanted to
B. I usually had access to the opportunity to take on a lot of responsibilities if I wanted to
C. I sometimes had access to the opportunity to take on a lot of responsibilities if I wanted
D. I rarely had access to the opportunity to take on a lot of responsibilities if I wanted to
E. I never had access to the opportunity to take on a lot of responsibilities if I wanted to
Encountered problems that take a long time to solve.
A. I always had access to difficult problems
B. I usually had access to difficult problems
C. I sometimes had access to difficult problems
D. I rarely had access to difficult problems
E. I never had access to difficult problems
Contrast of DIF Parameter Estimates for Each Item
DIF Parameter Contrast
Item # Race Gender
location slope location slope
1 .664** .841 -1.094** .954
2 -.145 .762 -.225 1.074
3 .772** .686* -.043 1.168
4 -.475 .916 .658* 1.100
5 .173 .800 -.670 1.940*
6 .463 1.010 -.796 .785
7 .509* 1.320 -.072 .757*
8 -.589** .905 .563 .860
9 -.110 1.059 .394 1.205
10 .052 .826 -.289 .924
11 -.259* .914 .258 1.317
12 -.080 .841 -.017 .977
13 -.450** .970 -.710** .880
14 .103 1.036 .016 1.561**
15 -.097 .975 -.499** 1.271
16 .265 1.162 -.216 .855
17 .153 .948 .775* .713**
18 .172 .905 .480 .975
19 -.436 .790* 1.588** 1.213
20 .454 1.114 .070 1.141
21 .711 .884 -2.018** 1.170
22 -.086 .968 -.213 1.022
23 -.371* .985 .081 .643**
24 -.647 1.205 .905 .836
25 .310 .891 -.492* 1.560*
26 .582** 1.186 -.029 .945
27 -.480 1.497* -.267 .803
28 .434* .915 -.259 1.026
29 -.179 1.089 .174 .915
30 .327 .999 -.528* 1.254
31 -.496 1.041 .267 1.159
32 .411 .799 -.888** .910
33 -.295 1.624* .366* .728*
34 .033 1.467 .766** .965
35 .115 1.016 .422* 1.260
36 -.767** .814 .949** 1.211
37 .525** 1.409* .149 .712**
38 -.011 1.116 -.089 .622**
39 -.563* .814 .465 1.012
40 .474* 1.365* .300 .852
41 -1.210** 1.394* -.469 .946
42 .044 .696** .338 1.019
Note: * p <.05, ** p < .01.
Item Level Information
Item Content N Mean (S.E.) SD
N Mean (S.E.) SD
1. Attended cultural events, even when F 66 3.11 (.14) 1.14 .15 B 51 3.14 (.16) 1.15 .15
you didn't know whether you would like
them M 54 2.94 (.13) 0.98 W 70 2.97 (.12) 1.01
2. Tried to understand people of different F 66 3.64 (.14) 1.15 .11 B 51 3.59 (.17) 1.20 -.01
cultural beliefs M 54 3.52 (.15) 1.08 W 70 3.60 (.13) 1.06
3. Get along with people from F 66 4.09 (.13) 1.05 .16 B 51 4.10 (.14) .96 .12
backgrounds different than my own M 54 3.93 (.14) 1.04 W 70 3.97 (.13) 1.10
4. Searched for information about other F 66 3.98 (.13) 1.09 .04 B 51 3.73 (.15) 1.08 -.38
regions, countries, or cultures M 54 3.94 (.14) 1.05 W 70 4.13 (.12) 1.03
5. Tried to talk to someone from a F 66 3.48 (.14) 1.14 .21 B 51 3.33 (.16) 1.11 -.10
different country or culture just to learn
about their background M 54 3.26 (.13) 0.96 W 70 3.44 (.12) 1.04
6. Lived with someone who you consider F 65 2.43 (.16) 1.30 .02 B 51 2.47 (.19) 1.35 .07
to be very different from yourself M 54 2.41 (.19) 1.39 W 69 2.38 (.16) 1.33
7. Gone to an event where the purpose F 66 3.05 (.15) 1.20 -.03 B 51 3.18 (.17) 1.23 .16
was to expose people to a new culture M 54 3.07 (.13) 0.95 W 70 3.00 (.12) .99
F 65 3.68 (.14) 1.09 -.21 B 50 3.64 (.15) 1.05 -.24
8. Try different ethnic foods
M 54 3.91 (.15) 1.09 W 70 3.90 (.13) 1.12
9. Volunteered to be the spokesperson for F 66 3.56 (.16) 1.34 -.10 B 51 3.35 (.20) 1.44 -.38
a group project you did at school or
work M 54 3.69 (.15) 1.13 W 70 3.83 (.13) 1.05
10. Took a leadership role in HS and/or F 66 4.02 (.15) 1.25 -.02 B 51 3.75 (.19) 1.32 -.43
organized activity M 54 4.04 (.14) 1.05 W 70 4.24 (.12) .97
11. Been responsible for assigning tasks F 66 3.76 (.14) 1.11 .26 B 51 3.57 (.15) 1.04 -.09
and deadlines for other people M 54 3.46 (.16) 1.16 W 70 3.67 (.14) 1.20
F 65 4.00 (.11) 0.92 .04 B 51 3.84 (.14) .97 -.27
12. Taken charge of a group you were in
M 54 3.96 (.12) 0.87 W 69 4.09 (.10) .82
13. Set the schedule for groups in which F 65 3.75 (.13) 1.02 .32 B 50 3.48 (.14) .99 -.20
you worked M 54 3.43 (.14) 1.02 W 70 3.69 (.12) 1.04
F 65 3.86 (.13) 1.03 -.05 B 50 3.70 (.16) 1.11 -.30
14. Tried to get someone to join an activity
in which you were involved or learning M 54 3.91 (.14) 1.00 W 70 4.00 (.11) .92
F 66 3.77 (.12) 0.99 -.09 B 51 3.71 (.14) .97 -.20
15. Was offered leadership positions
M 54 3.85 (.11) 0.83 W 70 3.89 (.10) .88
16. Selected by a group or club to serve as F 65 3.51 (.15) 1.17 .09 B 50 3.38 (.18) 1.24 -.13
an official representative M 54 3.41 (.14) 1.04 W 70 3.53 (.12) 1.00
17. Adjusted to major changes in your life F 66 3.94 (.13) 1.09 .31 B 51 4.02 (.14) .97 .36
(moving, new school, new job) M 54 3.57 (.17) 1.27 W 70 3.61 (.16) 1.30
18. Felt comfortable in new situations or F 66 3.80 (.12) 0.95 .18 B 51 3.76 (.12) .86 .06
places M 54 3.65 (.11) 0.78 W 70 3.71 (.11) .89
19. Dealt w/ situations that forced you to F 66 3.50 (.14) 1.10 -.07 B 51 3.71 (.15) 1.04 .25
make adjustment in your daily life M 54 3.57 (.16) 1.18 W 70 3.43 (.14) 1.19
20. Failed to meet responsibilities because F 66 2.85 (.13) 1.08 -.15 B 51 2.75 (.14) 1.00 -.26
you had taken on too much M 54 3.00 (.13) 0.93 W 70 3.01 (.12) 1.04
21. Planned ahead and make a specific F 66 4.02 (.11) 0.87 .42 B 51 3.76 (.12) .86 -.11
schedule of things you need or want to
do M 53 3.60 (.15) 1.06 W 69 3.87 (.13) 1.06
Table 4 (cont.)
Item Content N Mean (S.E.) SD
N Mean (S.E.) SD
22. Changed your study habits to improve F 66 3.97 (.11) 0.88 .36 B 51 3.78 (.13) .92 -.08
on a skill or do better in class M 54 3.63 (.14) 1.01 W 70 3.86 (.12) .98
23. Handled multiple projects F 66 3.85 (.11) .86 .14 B 51 3.76 (.12) .89 -.07
simultaneously M 54 3.72 (.13) .92 W 70 3.83 (.11) .90
24. Worked on a serious and relatively F 66 3.65 (.13) 1.09 -.03 B 51 3.84 (.15) 1.05 .30
difficult task and a phone call interrupts
M 54 3.69 (.13) .99 W 70 3.53 (.12) 1.02
25. Did your best when you work on a F 66 4.35 (.09) .71 .23 B 51 4.25 (.10) .74 -.04
project M 54 4.17 (.12) .86 W 70 4.29 (.10) .82
26. Accomplished something you initially F 66 4.08 (.09) .75 .03 B 51 4.08 (.11) .77 .01
thought was very difficult or impossible
M 54 4.06 (.11) .79 W 70 4.07 (.09) .77
27. Finished a project when faced F 66 4.18 (.10) .85 .13 B 51 4.12 (.12) .87 -.04
w/difficult circumstances M 54 4.07 (.11) .80 W 70 4.16 (.09) .79
28. Determined to continue w/ a project F 65 4.21 (.10) .85 .18 B 51 4.12 (.11) .82 -.06
under difficult circumstances
M 54 4.06 (.12) .86 W 69 4.17 (.11) .88
29. Gave up on a task after being told that F 66 2.94 (.18) 1.42 -.13 B 51 2.73 (.19) 1.39 -.36
you were not doing well M 54 3.11 (.17) 1.22 W 70 3.20 (.15) 1.28
F 65 4.23 (.08) .65 .09 B 50 4.16 (.09) .67 -.13
30. Succeeded in a task you are engaged in
M 54 4.17 (.10) .70 W 70 4.25 (.08) .67
31. Encountered problems that take a long F 66 3.68 (.09) .75 .00 B 51 3.69 (.10) .73 -.02
time to solve M 54 3.69 (.11) .77 W 70 3.70(.09) .79
32. Chose classes, projects, or assignments F 66 3.36 (.14) 1.15 .10 B 51 3.25 (.16) 1.11 -.12
simply to learn something new about
different groups, cultures, and customs M 54 3.26 (.13) .94 W 70 3.39 (.12) 1.03
33. Went out and researched a subject on F 66 3.73 (.15) 1.22 .05 B 51 3.55 (.17) 1.22 -.21
your own because you were interested M 54 3.67 (.17) 1.24 W 70 3.8 (.15) 1.22
34. Read materials that pertained to F 65 3.95 (.13) 1.03 -.06 B 51 3.96 (.14) 1.02 -.05
subjects that you are learning about
M 54 4.02 (.13) .96 W 69 4.01 (.12) .99
35. Spent extra time on school assignments F 65 3.38 (.13) 1.06 .01 B 50 3.36 (.14) .98 -.05
so that you could gain a better
understanding of the material or M 54 3.37 (.15) 1.09 W 70 3.41 (.14) 1.14
36. principles more information about F 65 3.98 (.12) .94 -.14 B 50 3.84 (.13) .90 -.39
something that you found interesting
M 54 4.11 (.12) .90 W 70 4.20 (.11) .91
37. Asked a teacher or classmate questions F 66 3.69 (.13) 1.07 -.12 B 51 3.6 0(.15) 1.05 -.25
that go beyond the text book
M 54 3.81 (.13) .99 W 70 3.86 (.12) 1.01
38. Been so absorbed when learning F 65 3.74 (.12) 1.00 .17 B 50 3.55 (.14) 1.01 -.20
something that you didn't realize how
much time passed. M 54 3.57 (.13) .94 W 70 3.74 (.11) .94
39. Gone out and learned more about F 66 3.68 (.13) 1.04 -.11 B 51 3.55 (.14) .99 -.30
something simply because it seemed
interesting M 54 3.8 (.14) 1.02 W 70 3.86 (.12) 1.04
40. When a textbook or instructor mentions F 66 3.29 (.16) 1.26 .02 B 51 3.14 (.15) 1.10 -.21
another source of info about a topic,
you found out more about it on your M 54 3.26 (.17) 1.25 W 70 3.40 (.16) 1.36
41. own. a class or found an instructor so F 66 3.55 (.15) 1.19 .14 B 51 3.37 (.15) 1.08 -.16
that you could learn more about a
hobby or skill
M 54 3.39 (.13) .98 W 70 3.54 (.13) 1.11
42. Became involved in something just for F 66 3.68 (.14) 1.11 .10 B 51 3.49 (.14) 1.03 -.22
the sake of learning M 54 3.57 (.14) 1.06 W 70 3.73 (.13) 1.12
Note: * Positive scores indicate that Women and Blacks had higher perceptions of accessibility for a given item.
M = male, F = female, W = White, B = Black.
Group Differences in the Biodata Scale Score With and Without the DIF Items
Mean SD d-value Mean SD d-value
F 3.10 0.43 0.07 B 3.10 0.41 0.05
M 3.07 0.40 W 3.08 0.43
DIF items F 3.12 0.43 0.05 B 3.12 0.40 -0.05
removed M 3.10 0.44 W 3.14 0.41
Figure 1. Example DIF item where Females had higher average ability than Males.