The Coachability and Fakability of Personality Selection Tests Used for Police Selection
Corey E. Miller
University of Akron
Gerald V. Barrett
Barrett & Associates
Correspondence may be addressed to the first author who is now at:
Wright State University
Department of Psychology
Dayton, OH 45435
Paper presented at the 25th annual International Personnel Management Association Assessment
Council conference. Newport Beach, CA.
The use of personality and biodata measures in safety force selection is increasing due to
demands to reduce adverse impact. Gottfredson (1996) suggested that the test developers of the
Nassau County project did not adequately address concerns with faking (for a discussion of the
Nassau County controversy see http://www.ipmaac.org/nassau/; for recent reviews of faking
literature see Alliger & Dwight, 2000; Ellingson, Sackett, & Hough, 1999; McFarland, & Ryan,
2000; Ones, & Viswesvaran, 1998; Rosse, Stecher, Miller, & Levin, 1998; Snell, Sydell, &
Lueke, 1999; Zickar, & Robie, 1999; Vasilopoulos, Reilly, & Leaman, 2000; Viswesvaran, &
Ones, 1999). An issue that has yet to be discussed despite its particular relevance to safety force
selection is coaching.
Training programs for civil service selection tests are common (Ryan, Ployhart,
Greguras, & Schmit, 1998). Workshops on “test strategy” costing hundreds of dollars are
routinely offered to applicants. In fact, developing training sessions for the test can be more
lucrative than developing the test. Some consulting firms develop training seminars for the tests
they have unsuccessfully bid for, then later serve as expert witnesses for the plaintiff (see
Cavanaugh v. Cleveland, 1997). There is such a demand for safety force test training that even
the municipalities themselves are offering programs (see Maurer, Solamon, and Troxtel, 1998).
Barrett, G. V. (1997) has cautioned that safety force applicants expect to be able to
prepare for important tests. Wheeler (1998) has cautioned practitioners who administer
personality tests that applicants who have failed a personality test often look for guidance on
how they might improve their scores. Sackett, Burris, & Ryan (1989) have called for research
examining the effect of coaching on personality tests. Even researchers that have suggested
faking is a minor or negligible problem have called for research investigating the effects of
coaching programs (Hough, 1997a, 1997b). We could locate no published research examining
the coachability of personality or biodata measures in the context of safety force selection.
However published literature exists documenting the coachability of integrity tests (Alliger et al.,
1996). Additionally, Young, White, & Oppler's (as cited in Zickar & Robie, 1999) results
suggest that providing clues to military recruits drastically altered their responses. Young et al.
(as cited in Zickar & Robie, 1999) observed that individuals that received three practice items
and correct answers to a biodata measure obtained average scores of one standard deviation
above the honest responders. The participants that were instructed to present themselves in a
“good light” but did not receive the clues obtained average scores one half standard deviation
above the “honest” responders. As coaching programs are commonplace in safety force testing,
research in the context of safety forces is needed.
Barrett, P. M. (1998) describes Suffolk County’s recent trouble with coaching programs
for their police selection test. In an attempt to increase the rate of minorities selected and fend of
pressure from the U.S. Justice department, Suffolk County stopped using cognitive ability tests
and instead used only Erwin’s biodata measure to select their 1988, 1992, and 1996 recruit
classes. The test did not change much, which would make it susceptible to coaching effects. The
top scores did in fact rise from year to year; the top scores in 1988 ranged from 88.2 to 95.5, but
in 1992 the top scores were in the range of 93.2 to 99. The test has been used by hundreds of
police forces across the nation, therefore the problems faced by Suffolk County are likely not
As the department could still not meet their minority hiring goals because the test still
had some adverse impact, Suffolk County began an unusual minority recruitment program.
Forty-three minority applicants were offered part-time unpaid positions as cadets and given
police clerk jobs. The cadets were promised officer positions at the end of two and a half years of
volunteering if they could score 70 or higher on the exam. In 1992, ordinary candidates had to
score above 90 just to be considered. As the salary of a Suffolk County patrol officer is $70,000
per year, some of the candidates turned down college or law school admissions to enroll in the
When it came near time to take the exam most cadets protected their time investment by
taking a preparation course given by a sergeant currently on the force who had been involved in
the validation of the instrument, Sgt. Bugge. Sgt. Bugge had given courses to hundreds of past
applicants for several hundred dollars but made an exception for the minority cadets and only
charged $20. Sgt. Bugge’s training consisted of “hypothetical” questions and “preferred”
answers such as:
How many of your relatives work in law enforcement? A: Three.
Which hobbies do you engage in at least once a year? A: Hunting.
The exam given in 1996 hadn’t changed much from the versions given in 1988 and 1992.
Many of the questions were versions of the “hypothetical” questions given in training. All of the
cadets passed, several scored above 90, and some even earned perfect scores of 100. The
attitudes of the cadets reflects the attitude that Barrett, G. V. (1997) described as common to
applicants: that there is a right answer and providing it is not cheating.
Personality or biodata measures used for personnel selection often have a lie scale or
social desirability measure designed to catch dissimulators. Coaching programs might not be a
threat to the validity of personality and biodata measures if lie scales function well. Zickar and
Drasgow (1996) have suggested that social desirability or lie scale training may occur commonly
in a personnel selection context. Therefore the current research also examined the effects of
coaching on lie scales or social desirability measures.
Hypothesis 1: Participants who receive coaching on personality theory and personality test
construction will achieve higher scores on the personality measures than a sample of
participants who did not receive the training.
Hypothesis 2: Participants who receive coaching on social desirability theory and lie scale
construction will achieve lower scores on a lie scale measure than a sample of
participants who did not receive the training.
Hypothesis 3: Participants that received personality test training will be more likely to be
selected for the job than those participants that did not receive the training.
Do Social Desirability Measures Measure Faking?
Ones, Viswesvaran, and Reiss (1996) completed a meta-analysis of the effects of social
desirability on personality selection measures. They concluded that social desirability did not
attenuate the validity coefficient between personality variables and job performance. However,
they did suggest that research is needed to determine whether applicants bias their responses
according to what they perceive as desirable for the job they are applying for, not the qualities
seen as desirable for society in general. Ones et al. (1996) went on to state that research is needed
to determine if job desirability is distinct from social desirability. Job desirability might explain
why several researchers have concluded that social desirability measures do not measure faking
well (Braun & Costantini, 1970; Braun & La Faro, 1969; Bridgman & Hollenbeck, 1961;
Christiansen, 1998; Elliot, Lawty-Jones, & Jackson, 1996; Ellingson, Sackett, and Hough, 1999;
Eyesenck and Eyesenck, 1975; Kroger & Turnbull, 1974; Meehl & Hathaway, 1946; Paulhus,
1991; Vincent, Linsz, and Greene, 1966; Zickar & Drasgow, 1996).
Alliger and Dwight (2000) have suggested that social desirability scales may only
represent the extent to which test takers provide accurate but exaggerated information about
themselves, while individuals are likely engaging in the complete fabrication of information in a
personnel selection context. In addition, a large body of research suggests that individuals are in
fact tailoring their answers to what traits they believe are desirable for individuals in the job to
have. Kroger (1967) conceptualizes faking as the enacting of a social role. He found that
personality test takers would respond appropriately to what job the experimenter said he was
studying, even though they were instructed to describe themselves, not to take the role (Kroger,
1967; Kroger & Turnbull, 1970, 1974). A number of studies have suggested that individuals
tailor their responses according to their ideas of what is required for the particular job they are
applying for (Elliott, 1976; Elliot et al., 1996; Frei, 1997; French, 1958; Furnham, 1990; Jeske
and Whitten, 1975; Kaufman, Hakmiller, and Porter, 1959; Kirchner, 1961, 1962; Kluger, Reilly,
and Russell, 1991; Mahar, Cologon, and Duck, 1995; Mudgett, 1999; Stanley and Stokes, 1999;
Vasilopoulos, Reilly, and Leaman, 2000; Velicer and Weiner, 1975; Wesman, 1952).
Two additional studies have shown that job desirability bias does attenuate the validities
of personality selection measures. Pannone (1984) found that faking moderated the validity of a
biodata measure. The measure had an observed validity of .55 for applicants not identified as
fakers by Pannone's faking measure, and an observed validity of .26 for those identified as
fakers. Pannone's faking measure was a single item that referred to the use of a non existent piece
of equipment. Anderson et al. (1984) had a series of several items on a self-report measure of
clerical task experience designed to catch fakers. Similar to Pannone the items were non-existent
tasks such as “Determining myopic weights for periodic tables”, “Planning basic entropy
programs”, “Cleaning chartels”, “filling rhetaguards”, and “dusting votres”. Anderson et al.
administered an actual test of typing ability and this inflation measure to a sample of clerical
applicants. Anderson et al. found that scores on this inflation measure correlated .48 with scores
on the biodata measure across 13 different occupational classes. Anderson et al. (1984) found
that the self-report measure of typing ability correlated .27 with actual test scores and explained
7% of the variance while scores on the inflation measure had a -.41 correlation with actual test
scores and explained 16% of the variance above the self-report measure. This led the authors to
conclude that both of these procedures enhance validity. Both Pannone (1984) and Anderson et
al. (1984) studies were excluded from the meta-analysis of Ones et al. (1996) as they did not
measure social desirability.
Kluger and Colella (1993) tried to determine if job desirability is distinct from
social desirability. Kluger and Colella found that job desirability ratings of the test items
predicted mean item score differences between participants instructed to answer honestly
and those instructed to fake, over and above social desirability ratings of the items. This
led the authors to conclude that job desirability plays a unique role in faking, if not the
sole role. Perhaps this study is not as widely cited as it should be is that the results were
not unequivocal. The job desirability ratings obtained by Kluger and Collela were related
to the social desirability ratings. Thus, they had difficulty disentangling the two. This
might have been partially due to the particular job they were studying, that of nursing
home assistant. The traits that are desirable for this particular job are high in socially
desirability (i.e. kindness, selflessness, altruism, honesty). The traits that are important
for a police officer are likely less socially desirable. Also, selection of police officers is
much more widespread. Thus, in addition to the fact that a replication with a safety force
job is important to establish generalizability, it would have implications for the entire
literature. These two reasons, coupled with the fact that Kluger and Colella's results were
not unequivocal, suggests a replication within a safety forces context would make a
contribution to the literature.
Hypothesis 4: Job desirability ratings of the items will predict the item mean score differences
between participants instructed to respond to the measures honestly and those instructed
to fake over and above social desirability ratings.
There were a total of 298 participants, all of who were Introductory Psychology students
at a large Midwestern University who received extra credit for their participation. As the
participants were primarily Freshmen who were beginning college, they had comparable
education to police officers as most police officers have some college experience, which is
important as introductory psychology courses typically include material on personality and
personality measures. Bagby et al. (1990) have suggested that undergraduates are comparable to
policemen in terms of education. Sixty six percent of the participants were female. Thirteen and
a half percent of the participants reported themselves to be of African American descent, the
remainder of the participants were 2% Asian, 79.2% Caucasian, 2.3% Hispanic, .3% Native
American, and 2.6% other. The mean age of the sample was 20.65 years, while the mean years of
education completed was 12.30 years Ninety-six and nine tenths percent of the sample had work
experience and the mean work experience for the entire sample was 5.08 years.
Manipulation One of the Current Study
The goal of the first manipulation of the current research was to determine if it is
possible to construct an effective coaching program for personality and biodata measures. The
present research defined an effective coaching program as one enabling coached individuals to
raise their scores above what their scores actually should be. The current research studied these
objectives in the context of safety forces selection testing, specifically for the position of police
Manipulation one of the study was whether participants received personality measure
training, lie scale training, or a control training. Groups 1 and 2 received both training programs.
Group 1 received the personality measure training first, while Group 2 received the lie scale
training first. Group 3 received personality measure training only, group 4 received lie scale
training only. Group 5 served as the control group and received a caselaw based sexual
harassment prevention training program. Groups 1 through 4 consisted of 31, 35, 33, and 34
participants respectively. Group 5 consisted of 62 participants.
The personality training program explained all five traits in the Big Five personality
factor taxonomy. The trainer presented the various names the trait has been called by different
researchers, gave definitions and synonyms provided by the literature and various test manuals
(not the tests actually administered). The trainer also presented items modeled after those that are
included in established measures to demonstrate how the traits are measured by personality
inventories. It was explained that Extroversion predicts for most sales jobs and that some
personality measures measure constructs such as stress tolerance (i.e. the Hogan Personality
Inventory), but that the most important construct is conscientiousness. The training was 15
minutes in duration.
The lie scale training focused on the theory behind lie scales (i.e. unlikely virtues) and
included items representative of lie scales. The majority of the training consisted of an
explanation of unlikely virtues, mainly that test developers believe that dishonest individuals
report that they have many virtues that are unlikely or most people do not have are not being
honest. It was explained that test takers should admit to some foibles and human faults in order
to appear believable. Examples were given such as someone who admits they do not brush their
teeth after every meal and has been late to work at least once in their lifetime is more believable
than someone who says they always brush their teeth after every meal and says they have never
been late to work. It also emphasized that it is true that some brush their teeth after every meal
and have never been late to work, however, the measure identifies these individuals as liars. The
training was 15 minutes in duration.
Personality and biodata outcome measures were selected that represent the state of the art
in the field. A biodata measure was administered that measures Conscientiousness, and two
personality measure were administered that measure the Big Five dimensions of personality.
Biodata measures are presumed to be more resistant to social desirability bias and faking
(Mumford & Owens, 1987; Telenson, Alexander, & Barrett, 1983).
The first personality measure administered was the NEO-FFI inventory. The NEO-FFI is
a 60 item measure of the Big Five personality, and is a shortened version of the NEO PI-R. The
manual suggests that the measures may be useful for selecting employees (Costa & McCrae,
1992) and research suggests that the NEO PI-R is useful for personnel selection (Piedmont &
Weinstein, 1994), and police officers in particular (Costa, McCrae, & Kay, 1995). Many faking
studies in the past have used the short form (Furnham, 1997; McFarland & Ryan, 2000; Paulhus,
Bruce, & Trapnell, 1995; Schmit & Ryan, 1993; Schmit et al. 1995). The NEO-FFI items are
arranged in a pattern for ease of hand scoring so that every fifth item is a consciousness item.
This would likely aid fakers, especially if they have just taken a training course, therefore the
item order was randomized for this administration. Permission from the publisher was obtained
to alter the NEO-FFI in this manner.
The second personality measure administered was the Hogan Personality Inventory,
which is a 206 T/F item measure of the Big Five personality model (Hogan & Hogan, 1992). In
addition to the five primary scales, the items can be formed into five empirical scales. The Hogan
Personality Inventory has been used for safety forces testing. Ryan, Ployhart, and Friedel (1998)
analyzed Hogan Personality Inventory data for a 1700 police applicants that applied for positions
in a Midwestern city. The Hogan Personality Inventory empirical scales of Reliability and Stress
Tolerance were used to select the police officers along with the primary scale of Intellectance.
Reliability identifies individuals who are honest, dependable, and responsive to supervisors.
Stress Tolerance identifies persons who handle stress well. Intellectance measures the degree to
which a person seems creative and analytical. The manual reports internal consistency
reliabilities for the three scales mentioned from .69 to .86, and test-retest estimates from .79 -
.87. One aspect of the Hogan Personality Inventory relevant to the current research is that the
scoring key is held secret by the publisher, and test forms must be sent to the publisher for
scoring. Therefore, in addition to the fact that the training program was not directly applicable to
the Hogan Personality Inventory the trainer was blind to the scoring scheme. Thus, this measure
served to determine the generalizability of the training.
The third measure was a biodata measure that measured the construct of
conscientiousness, the Conscientiousness Biodata Questionnaire (CBDQ). The CBDQ is a
biodata measure developed by Snell and associates, and has been used in faking research (Frei,
1997; Griffith, 1997). As the CBDQ measures conscientiousness it resembles a personality
measure such as the NEO-FFI or the Hogan Personality Inventory more than what might be
termed a traditional biodata instrument. Using Mael’s (1991) taxonomy of biodata, virtually all
of the CBDQ items would be classified as historical, external, objective, first-hand, discrete,
controllable, noninvasive, and equally accessible. As the measure was not designed specifically
for police officer selection, most of the items are not directly job relevant for police officers. The
bulk of the items would fall in between Mael’s verifiable-nonverifiable attribute, the items would
be best described as verifiable in principle (Gandy, Outerbridge, Sharf, & Dye, 1989; Stricker,
1987; 1988). Two example items are “When you had an appointment scheduled (ex. hair, doctor,
dentist, eye exam) how often were you a few minutes late?”, and “How likely are you to keep the
manuals and warranties for items that you buy?”. The CBDQ is organized into eight facet scales
of eight items each: Dependable and Reliable, Planful, Organization, Self-discipline, Deliberate
and Rational, High Standards, Attention to Detail, Particularity. Based on previous research the
Dependable and Reliable facet scale would be most applicable to the present research.
After the aforementioned personality and biodata measures all participants received a ten
item true/false social desirability measure embedded in 20 true/false filler items designed to
appear relevant to the position of police officer which served to disguise the social desirability
items. It appears that most lie scales used for personality selection measures consist of
approximately ten items (Hansen & McLellan, 1997; Hough, 1998). Sample filler items might be
“I have always rooted for the good guy in movies.” and “I think following laws and rules is
important to the good of society.”
The social desirability measure was a short form developed by Strahan and Gerbasi
(1972) using items of the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe,
In many past faking studies participants have been instructed to make a favorable
impression as possible. Researchers have criticized these instructions as being unrealistic
(Hough, 1997a). Although Cunningham, Wong, and Barbee (1994) found that participants
instructed to get as high a score as possible on an honesty test still had statistically significantly
lower scores than a national sample of actual applicants, instructions instructing participants to
get as high as score as possible might not be realistic. Groups 1 through 5 were instructed to
answer the above measures as they would if they were applying for a police officer position they
wanted. They were told to assume that it is a job they feel qualified for, and that they want to
ensure the employer makes the right decision, namely hiring them, therefore they should respond
to the measures in a manner that gives them the best chance of being selected. These instructions
were modeled after the instructions of Hough et al. (1990, p. 586): “Imagine you are at the
Military Entrance Processing Station (MEPS) and you want to join the Army. Describe yourself
in a way that you think would ensure that the army selects you.”
Manipulation Two of the Current Study
Group 6 also received the control training like Group 5, but was given different
instructions. Group 6 was instructed to answer the questionnaires honestly. Group 6 consisted of
51 participants. Group 7 was asked to rate the NEO-FFI and CBDQ items on social desirability
and job desirability in a replication and extension of Kluger and Colella (1993). Specifically they
were asked to respond on a 5 point Likert scale (strongly agree to strongly disagree) to "This
item measures something that is important or desirable for a police officer to have or be" (job
desirable) and "This item measures something people in general would like to have or be"
(socially desirable). The experimenter conducted a short briefing with example items in order to
increase inter-rater reliability. Group 7 consisted of 52 participants.
Means, standard deviations, internal consistency estimates () and correlations for the
variables of interest are presented in Table 1. Alpha levels are presented along the diagonal. Item
level data was not provided for the Hogan Personality Inventory, therefore internal consistency
estimates could not be computed. The Hogan manual (Hogan & Hogan, 1992) reports internal
consistency reliabilities for the three scales of interest from .69 to .86, and test-retest estimates
from .79 - .87. Group 7 is not included in the data presented in Table 1 as they did not complete
any of these measures, therefore the sample size is 246.
Hypothesis 1: Participants who receive coaching on personality theory and personality
test construction will achieve higher scores on the personality measures than a sample of
participants who did not receive the training.
Hypothesis 1 was supported as participants that received personality measure training
(groups 1, 3, & 4) scored significantly higher on the Conscientiousness measures. One way
analysis of variance (ANOVAs) analyses were done on all of the above mentioned
Conscientiousness scales. The results are presented in Table 2. Eleven of the twelve ANOVAs
were significant, and the twelfth approached traditional significance levels. Although the trained
group had a higher mean score on the Hogan Personality Inventory Reliability Scale (M = 11.71,
SD = 3.57) than the mean for the group that did not receive training (M = 10.67, SD = 3.90) it
was not a statistically significant difference (F = 3.78 (1, 191), p = .053). Although the effect of
coaching did not meet traditional significance levels, the coached group had a statistically
significant higher mean than the group that received control training and was instructed to
answer honestly, and a large effect size was observed (M = 8.61, SD = 3.14, p < .001, d = .89)
Hypothesis 2: Participants who receive coaching on social desirability theory and lie scale
construction will achieve lower scores on a lie scale measure than a sample of
participants who did not receive the training.
A higher score on a lie scale is more likely to be considered dishonest, therefore, a
successful training program would lower scores on the lie scale measure. Hypothesis 2 would be
supported if participants that received lie scale training (groups 1, 2, and 4, N = 97) scored
significantly lower on the short form of the Marlowe-Crowne Social Desirability measure than
the participants that did not receive lie scale training (groups 1 and 5, N = 95) as revealed by a t-
test. The mean for the trained group was in fact significantly lower (M = 5.88, SD = 3.02) than
the mean for the group that did not receive training (M = 7.08, SD = 2.46) F = 9.23 (1, 190), p <
.01.; Between SS = 70.03, Within SS = 1441.84, Between MS = 70.03, Within MS = 7.59). The
effect sizes are presented in Table 3. The robustness of the effect sizes shows that the statistically
significant effects were also practically significant. Therefore, Hypothesis 2 was supported.
Hypothesis 3: Participants that received personality test training will be more likely to be
selected for the job than those participants that did not receive the training.
Hypothesis 3 would be supported if an analysis of trends reveals that participants who
received training were more likely to be selected based on the selection measure composite
score. The selection measure composite score was computed by adding the scores for the NEO-
FFI Conscientiousness scale, the CBDQ Dependable and Reliable scale, and the Hogan
Personality Inventory Reliability scale, then subtracting the participants score on the Marlowe-
Crowne Social Desirability measure. This procedure is modeled after the procedure used to score
PDI’s Customer Service Measure and was also used in Hough (1998), and Rosse et al. (1998).
The analysis was modeled after Hough (1998). All participants were rank ordered by the
selection measure composite score. The percentage of applicants selected from each of the
groups were computed for different selection ratios. As with Hough (1998), no statistical tests
were conducted to evaluate this hypothesis. The results of this analysis are presented in Table 4.
The data was broken down by five groups: groups 1 and 2 (both personality measure and lie
scale training, instructed to fake), group 3 (personality measure training only, instructed to fake),
group 4 (lie scale training only, instructed to fake), group 5 (control training, instructed to fake),
and group 6 (control training, instructed to answer honestly). The percentage of the total sample
the groups represent respectively are 27%, 13%, 14%, 25%, and 20%. If groups that received
training were represented in the selected groups disproportionate to their proportion of the entire
sample, the analysis of trends would reveal that the training would in fact aid applicants in
reaching the top of the distribution.
The selection ratios represent what proportion of the total sample would be selected. The
proportion of hypothetical applicants selected is presented for selection ratios of .01 (1%), .05
(5%), .10 (10%), and .20 (20%). Selection was top-down, and in the event of ties all ties were
selected. Therefore the number of individuals selected is actually slightly higher than the
selection ratio in all scenarios.
Perhaps the most informative set of data is the one for a selection ratio of .10, or 10%.
This is the selection ratio presented in the earlier discussion of Hough (1998). Although
individuals that received both personality measure and lie scale training represented only 27% of
the total sample, they were disproportionately represented in the top 10% of the sample, as they
comprised 35% of it. Perhaps more strikingly, although individuals that received personality
measure training but not lie scale training represented only 14% of the entire sample, 46% the
top 10% was comprised of them. Individuals that received lie scale training but not personality
measure training comprised 14% of the entire sample and 12% of the selected sample. Although
individuals that received control training and were instructed to fake comprised 25% of the total
sample, only 7% of them were in the top 10%. Although the group that received control training
and was instructed to be honest comprised 20% of the entire sample, none of these individuals
were selected in any of the scenarios. In fact, none of them were included in the top 40%. It is
clear from these data that the individuals that received instruction moved to the top of the
distribution and individuals that did not receive instruction were pushed to the bottom, thus
Hypothesis 7 was supported. It appears that personality measure training was more important
than lie scale training, which would not be wholly unexpected as the personality measures made
up the bulk of the items included in the selection instrument.
Hypothesis 4: Job desirability ratings of the items will predict the item mean score
differences between participants instructed to respond to the measures honestly and those
instructed to fake over and above social desirability ratings.
As Kluger and Colella (1993) suggested that student raters had difficulty with reverse
scored items, this analysis was completed on only the items that were not reverse scored. The
social desirability ratings were not statistically significant predictors of the item mean differences
(R2 = .05, R2 = .00, NS). However, the job desirability ratings did significantly predict item
mean differences over and above the social desirability ratings (R2 = .29, R2 = .08, p < .05).
Thus, job desirability accounted for a statistically significant amount of variance over and above
This research was concerned with two main questions, is it possible to coach individuals
to beat personality measures and lie scales, and are social desirability measures sufficient
measures of faking. The first question was conclusively answered yes. Although the second
question was answered less conclusively, this research raises serious questions about using social
desirability measures as a measure of faking. A review of the results is a good point to begin a
discussion of the contributions this research has made to the literature.
Hypotheses 1 and 2 found that the trained participants were able to raise their scores on
the personality measures and lower their scores on the lie scale. These results should be very
unpleasant to practitioners using personality measures and/or lie scales for selection purposes.
The information contained in personality measure training is contained in many introduction to
psychology classes, if not an undergraduate class in personality or tests and measures. If an
undergraduate psychology major has not acquired the information in the classroom, she would
have acquired the skills to find the information in the published literature. She would be able to
build an effective training program from the published literature, obtaining the test manual would
not be necessary. Such a situation has the potential to create difficulties for practitioners.
Analysis 3 was an analysis of trends which looked at the effect of faking and training on
the likelihood of being selected for a job. It was clear that coaching and faking would greatly
improve an applicant's chances. In fact, if an applicant was honest and did not receive training
they had no chance of being selected. These factors are especially pertinent in a safety forces
testing program. Typically several hundred applicants apply for each position, many applicants
take the test every time it is offered, many are coached at present, and the applicants interact with
each other in mass testing. If an individual planned on answering honestly, it is quite doubtful
she would continue to do so if she discovers that other applicants were coached.
Hypothesis 4 was concerned with the sufficiency of social desirability measures as a
measure of faking. The accompanying analysis disentangled social desirability and job
desirability. The job desirability ratings were significant over and above the social desirability
ratings. As Kluger and Colella (1993) concluded, this is evidence that job desirability plays a
unique role in faking, if not the sole role. This study, along with Kluger and Colella, Anderson
(1984), and Pannone (1986) makes an argument for further research in the area which must be
heeded, especially in the context of safety force selection.
Standardized mean differences, or effect sizes, are useful to illustrate the magnitude of
the differences observed. The effect of faking on the scales used in the present research ranged
from .20 to 1.09 (see Table 4). The estimates are roughly in line with Viswesvaran and Ones
meta-analysis estimates which ranged from .48 to .65. The estimates of the current research are
more variable than the meta-analysis estimates, as expected. Stanush’s (1997) meta-analysis
found that biodata inventories were more fakable than personality inventories (d = .94 v. .45).
Zickar and Robie (1999) found effect sizes ranging from .51 to .73. The results presented in
Table 4 show that the CBDQ acted more like a personality measure than a biodata instrument in
Zickar and Robie (1999) defined the effects of coaching as the effect of the coached
group instructed to fake over the group that received no training and was instructed to answer
honestly, the effect sizes ranged from .92 to 1.13. Using this definition, the current research
found effect sizes ranging from .02 to 1.66 (see Table 4). The effect size for the scale the training
was designed for (NEO-FFI Conscientiousness) was 1.66. To put this in perspective, individuals
who score at the 50th percentile before training, on average, will score at the 95th percentile after
training. Thus, the coaching in the current research was more effective than the coaching in
Zickar and Robie (1999) on the scale it was designed for, and roughly as effective on the other
scales. This is expected as the training in Zickar and Robie (1999) consisted of only three
practice items. The associated effect sizes ranged from .28 - .60 (see Table 4).
The final question concerned what strategies are fakers actually using. The job
desirability rating manipulation showed that we should be concerned with job desirability bias,
not just social desirability bias. Social desirability measures are questionable measures of faking,
therefore, studies that conceptualize faking as social desirability and use the terms
interchangeably have questionable generalizability.
No research is without limitations. One potential limitation of the current research is the
reliance on student samples. This reliance on student samples might limit the generalizability to
applied situations with actual applicants (Gordon, Slade, & Schmitt, 1986). However, Dobbins,
Lane, and Steiner (1988) suggested that field studies are generally not as able to control
confounding variables as lab studies and that the crux of generalizability is how faithfully the
psychological processes were captured. I suggest that the psychological processes of learning
about personality measures and applying the knowledge that would occur in an actual coaching
program were faithfully represented in the present research. One potential benefit from
conducting this research in the laboratory was the ability to control for the confounding effect of
previous coaching. It would seem virtually impossible to control for all the possible sources of
coaching (co-workers, relatives or friends currently on the force, unions, and test taking
workshops) as they would likely all have different effectiveness (possibly even a negative
effect), and would all be unlikely to be disclosed.
However, the results of the present research should not be used to set parameter estimates
of how effective coaching programs might be in actual applicant settings. I suggest that the
sample of participants that had no extrinsic motivation to attend to the trainer and no incentive to
apply the knowledge would set the floor for a sample of actual applicants that would have much
more invested in the training itself and the incentive of a highly coveted highly paid job.
Suggesting that the study be replicated with an applied sample would be prudent with most
research, however recommending that researchers coach actual applicants and measure their
performance on an actual applied sample might not be an endeavor advocated by practitioners in
spite of the obvious scientific benefits. Additionally, the current research included a measure
widely used for safety force selection (HPI), as well as a measure which appears similar to the
one used in Nassau County.
Simple, short, training programs had statistically and practically significant effects. An
individual with a baccalaureate degree in psychology could develop a similar training program.
Due to the prevalence of coaching programs in safety forces testing, practitioners should expect
training programs to be developed and marketed that dramatically increase applicants' scores on
personality tests. In addition, this study showed that job desirability plays a unique role in faking,
if not the sole role. Thus, generalizing research with social desirability measures to faking is
Alliger, G. M., & Dwight, S. A. (2000). A meta-analytic investigation of the susceptibility of
integrity tests to faking and coaching. Educational and Psychological Measurement, 60,
Alliger, G. M., Lilienfeld, S. O., & Mitchell, K. E. (1996). The susceptibility of overt and covert
integrity tests to coaching and faking. Psychological Science, 7, 32-39.
Anderson, C. D., Warner, J. L., & Spencer, C. C. (1984). Inflation bias in self-assessment
examinations: Implications for valid employee selection. Journal of Applied Psychology,
Bagby, R. M., Gillis, J. R., Dickens, S. (1990). Detection of faking with the new generation of
objective personality measures. Behavioral Sciences and the Law, 8, 93-102.
Barrett, G. V. (1997). A historical perspective on the Nassau County Police Entrance
Examination: Arnold v. Ballard (1975) revisited. The Industrial Organizational
Psychologist, 35(2), 42-46.
Barrett, P. M. (1998, October 12). Legal Limbo. The Wall Street Journal, 72, A1, A11.
Braun, J. R., & Costantini, A. (1970). Faking and faking detection on the Personality Research
Form, AA. Journal of Clinical Psychology, 26, 516-518.
Braun, J. R., & La Faro, D. (1969). A further study of the fakability of the personal orientation
inventory. Journal of Clinical Psychology, 25, 296-299.
Bridgman, C. S., & Hollenbeck, G. P. (1961). Effect of simulated applicant status on Kuder
Form D occupational interest scores. Journal of Applied Psychology, 45, 237-239.
Cavanaugh v. Cleveland, (1997), Case number 331912, Court of Common Pleas, Cuyahoga
Christiansen, N. D. (1998, April). Sensitive or senseless? Using social desirability measures to
identify distortion. Poster session presented at the 13th annual conference of the Society
for Industrial and Organizational Psychology, Dallas.
Costa, P. T. (1996). Work and personality: Use of the NEO-PI-R in Industrial/Organisational
Psychology. Applied Psychology: An International Review, 45, 225-241.
Costa, P. T., & McCrae, R. R. (1992). NEO-PI-R Professional Manual. Odessa, FL:
Psychological Assessment Resources.
Costa, P. T., McCrae, R. R., & Kay, G. G. (1995). Persons, places, and personality: Career
assessment using the revised NEO Personality Inventory. Journal of Career Assessment,
Crowne, D. P., & Marlowe, D. (1964). A new scale of social desirability independent of
psychopathology. Journal of Consulting Psychology, 24, 349-354.
Cunningham, M. R., Wong, D. T., & Barbee, A. T. (1994). Self-presentation dynamics on
overt integrity tests: Experimental studies of the Reid Report. Journal of Applied
Psychology, 79, 643-658.
Dobbins, G. H., Lane, I. M., & Steiner, D. D. (1988). A note on the role of laboratory
methodologies in applied behavioural research: Don't throw out the baby with the bath
water. Journal of Organizational Behavior, 9, 281-286.
Ellingson, J. E., Sackett, P. R., & Hough, L. M. (1999). Social desirability corrections in
personality measurement: Issues of applicant comparison and construct validity. Journal
of Applied Psychology, 84, 155-166.
Elliott, A. G. P. (1976). Fakers: A study of managers’ response on a personality test. Personnel
Review, 5, 33-37.
Elliot, S., Lawty-Jones, M., & Jackson, C. (1996). Effect of dissimulation on self-report and
objective measures of personality. Personality and Individual Differences, 21, 335-343.
Eyesenck, H. J., & Eyesenck, S. B. (1975). The Eyesenck Personality Questionnaire manual.
London: Hodder & Stoughton.
Frei, R. L. (1997). Fake this test! Do you have the ability to raise your score on a service
orientation inventory? Unpublished doctoral dissertation, The University of Akron,
French, E. G. (1958). A note on the Edwards Personal Preference Schedule for use with basic
airmen. Educational and Psychological Measurement, 18, 109-115.
Furnham, A. (1990). Faking personality questionnaires: Fabricating different profiles for
different purposes. Current Psychology: Research & Reviews, 9, 46-55.
Furnham, A. F. (1997). Knowing and faking one’s five-factor personality score. Journal of
Personality Assessment, 69, 229-243.
Gandy, J. A., Outerbridge, A. N., Sharf, J. C., & Dye, D. A. (1989). Development and Initial
Validation of the Individual Achievement Record. Washington, D. C.: U. S. Office of
Gordon, M. E., Slade, L. A., & Schmitt, N. (1986). The "Science of the Sophomore" revisited:
from conjecture to empiricism. Academy of Management Review, 11, 191-207.
Gottfredson, L. S. (1996). Racial gerrymandering the content of police tests to satisfy the U.S.
Justice Department: A case study. Psychology, Public Policy, and Law, 2, 418-446.
Griffith, R. (1997). Faking of non-cognitive selection devices: Red herring is hard to swallow.
Unpublished doctoral dissertation, University of Akron, Akron, Ohio.
Hansen, T. L., & McLellan, R. A. (1997, April). Social desirability and item content. In G. J.
Lautenschlager (Chair), Faking on non-cognitive measures: The extent, impact, and
identification of assimilation. Symposium conducted at the 12th annual conference of the
Society for Industrial and Organizational Psychology, St. Louis.
Hogan, R., & Hogan, J. (1992). Hogan Personality Inventory Manual. Tulsa, OK: Hogan
Hough, L. M. (1997a, April). Discussant. In G. J. Lautenschlager (Chair), Faking on non-
cognitive measures: The extent, impact, and identification of assimilation. Symposium
conducted at the 12th annual conference of the Society for Industrial and Organizational
Psychology, St. Louis.
Hough, L. M. (1997b). The Millennium for Personality Psychology: New horizons or good old
daze. Applied Psychology: An International Review, 47, 233-261.
Hough, L. M. (1998). Effects of Intentional Distortion in personality measurement and
evaluation of suggested palliatives. Human Performance, 11, 209-244.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-
related validities of personality constructs and the effect of response distortion on those
validities [Monograph]. Journal of Applied Psychology, 75, 581-595.
Jeske, J. O., & Whitten, M. R. (1975). Motivational distortion of the Sixteen Personality Factor
Questionnaire by persons in job applicant roles. Psychological Reports, 37, 379-382.
Kaufman, R. A., Hakmiller, K. L., & Porter, L. W. (1959). The effects of top and middle
management sets on the Ghiselli Self-description Inventory. Journal of Applied
Psychology, 43, 149-153.
Kirchner, W. K. (1961). "Real-life" faking on the Strong Vocational Interest Blank by sales
applicants. Journal of Applied Psychology, 45, 273-276.
Kirchner, W. K. (1962). "Real-life" faking on the Edwards personal preference schedule by sales
applicants. Journal of Applied Psychology, 46, 128-130.
Kluger, A. N., & Colella, A. (1993). Beyond the mean bias: The effect of warning against faking
on biodata item variances. Personnel Psychology, 46, 763-780.
Kluger, A. N., Reilly, R. R., & Russell, C. J. (1991). Faking biodata tests: Are option keyed
instruments more resistant? Journal of Applied Psychology, 76, 889-896.
Kroger, R. O. (1967). Effects of role demands and test-cue properties upon personality test
performance. Journal of Consulting Psychology, 31, 304-312.
Kroger, R. O., & Turnbull, W. (1970). Effects of role demands and test-cue properties on
personality test performance: Replication and extension. Journal of Consulting and
Clinical Psychology, 35, 381-387.
Kroger, R. O., & Turnbull, W. (1974). Invalidity of validity scales: The case of the MMPI.
Journal of Consulting and Clinical Psychology, 43, 48-55.
McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across noncognitive measures.
Journal of Applied Psychology, 85, 812-821.
Mael, F. A. (1991). A conceptual rationale for the domain and attributes of biodata items.
Personnel Psychology, 44, 763-792.
Mahar, D., Cologon, J., & Duck, J. (1995). Response strategies when faking personality
questionnaires in a vocational selection setting. Personality and Individual Differences,
Maurer, T., Solamon, J., & Troxtel, D. (1998). Relationship of coaching with performance in
situational employment interviews. Journal of Applied Psychology, 83, 128-136.
Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppressor variable in the Minnesota
Multiphasic Personality Inventory. Journal of Applied Psychology, 30, 525-564.
Mudgett, B. (1999). Influence of individual differences and job desirable responding on
personality distortion. Poster presented at the 14th annual conference of the Society for
Industrial and Organizational Psychology, Atlanta.
Mumford, M. D., & Owens, W. A. (1987). Methodological review: Principles, procedures, and
findings in the application of background data measures. Applied Psychological
Measurement, 11, 1-31.
Ones, D. S., & Viswesvaran, C. (1998). The effects of Social Desirability and faking on
personality and integrity assessment for personnel selection. Human Performance, 11,
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality
testing for Personnel Selection: The red herring. Journal of Applied Psychology, 81, 660-
Pannone, R. D. (1984). Predicting test performance: A content valid approach to screening
applicants. Personnel Psychology, 37, 507-514.
Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R.
Shaver, & L. S. Wrightman (Eds.), Measures of personality and social psychological
attitudes (pp. 17-59). San Diego: Academic Press.
Paulhus, D. L., Bruce, M. N., & Trapnell, P. D. (1995). Effects of self-presentation strategies on
personality profiles and their structure. Personality and Social Psychology Bulletin, 21,
Piedmont, R. L., & Weinstein, H. P. (1994). Predicting supervisor ratings of job performance
using the NEO Personality Inventory. The Journal of Psychology, 128, 255-265.
Quigley, A. M. (1998a, April). Putting personality in your organization: Neither pyrrhic victory
nor panacea. In W. J. Campbell (Chair), Personality Testing: Look before you leap.
Symposium conducted at the thirteenth annual conference of the Society of Industrial and
Organizational Psychology, Dallas, TX.
Quigley, A. M. (1998b). U.S. Postal Service and Federal Trade Commission Unite to “Stamp
Out Job Fraud”. The Industrial Organizational Psychologist, 35 (4), 81.
Rosse, J. G., Stecher, M. D., Miller, J. L., & Levin, R. A. (1998). The impact of response
distortion on preemployment personality testing and hiring decisions. Journal of Applied
Psychology, 83, 634-644.
Ryan, A. M., Ployhart, R. E., & Friedel, L. (1998). Using personality testing to reduce adverse
impact: A cautionary note. Journal of Applied Psychology, 83, 298-307.
Ryan, A. M., Ployhart, R. E., Greguras, G. J., & Schmit, M. J. (1998). Test preparation programs
in selection contexts: Self-selection and program effectiveness. Personnel Psychology,
Sackett, P. R., Burris, L. R., & Ryan, A. M. (1989). Coaching and practice effects in personnel
selection. In C. L. Cooper & I. Robertson (Eds.), pp. 145 – 183, International Review of
Industrial and Organizational Psychology. Chichester: John Wiley & Sons.
Schmit, M. J., & Ryan, A. M., (1993). The Big Five in personnel selection: Factor structure in
applicant and nonapplicant populations. Journal of Applied Psychology, 78, 966-974.
Schmit, M. J., Ryan, A. M., Stierwalt, S. L., & Powell, A. B. (1995). Frame-of-reference effects
on personality scale scores and criterion-related validity. Journal of Applied Psychology,
Snell, A. F., Sydell, E. J., & Lueke, S. B. (1999). Towards a theory of applicant faking:
Integrating studies of deception. Human Resource Management Review, 9, 219-242.
Strahan, R., & Gerbasi, K. C. (1972). Short, homogenous versions of the Marlowe-Crowne
Social Desirability Scale. Journal of Clinical Psychology, 28, 191-193.
Stanley, S. A., & Stokes, G. S. (1999). Controlling faking with test format: An examination.
Poster presented at the 14th annual conference of the Society for Industrial and
Organizational Psychology, Atlanta.
Stanush, P. L. (1997). Factors that influence the susceptibility of self-report inventories to
distortion: A meta-analytic investigation. Unpublished doctoral dissertation, Texas A &
Stricker, L. J. (1987, November). Developing a biographical measure to assess leadership
potential. Paper presented at the annual meeting of the Military Testing Association,
Stricker, L. J. (1988, November). Assessing leadership potential at the Naval Academy with a
biographical measure. Paper presented at the annual meeting of the Military Testing
Association, San Antonio.
Telenson, P. A., Alexander, R. A., & Barrett, G. V. (1983). Scoring the biographical information
blank: A comparison of three weighting techniques. Applied Psychological
Measurement, 7, 73-80.
Vasilopoulos, N. L., Reilly, R. R., & Leaman, J. A. (2000). The influence of job familiarity and
impression management on self-report measure scale scores and response latencies.
Journal of Applied Psychology, 85, 50-64.
Velicer, W. F., & Weiner, B. J. (1975). Effects of sophistication and faking sets on the Eyesenck
Personality Inventory. Psychological Reports, 37, 71-73.
Vincent, N. M. P., Linsz, N. L., & Greene, M. I. (1966). The L scale of the MMPI as an index of
falsification. Journal of Clinical Psychology, 22, 214-215.
Viswesvaran, C., & Ones, D. S. (1999). Meta-analysis of fakability estimates: Implications for
personality measurement. Educational and Psychological Measurement, 59, 197-210.
Wesman, A. G. (1952). Faking personality test scores in a simulated employment situation.
Journal of Applied Psychology, 36, 112-113.
Wheeler, J. K. (1998, April). Practical considerations and experiences related to the use of
personality testing in selection. In W. J. Campbell (Chair), Personality Testing: Look
before you leap. Symposium conducted at the 13th annual conference of the Society for
Industrial and Organizational Psychology, Dallas.
Zickar, M. J., & Drasgow, F. (1996). Detecting faking on a personality instrument using
appropriateness measurement. Applied Psychological Measurement, 20, 71-87.
Zickar, M. J., & Robie, C. (1999). Modeling faking good on personality items: An item level
analysis. Journal of Applied Psychology, 84, 551-563.
Table 1 (Continued on Next Page)
Means, Standard Deviations, and Correlations of Variables for the Entire Sample (N = 246)
Variable Mean SD NEO C Reliabili Stress Intellect. Dep. & Planful Organ. Self
ty Rel. Disc.
NEO C 36.72 7.81 .90
Reliability 10.66 3.78 .55**
Stress 17.72 5.41 .64** .64**
Intellectance 15.77 5.43 .46** .26** .47**
Dependable & 31.33 5.31 .72** .51** .54** .46** .80
Planful 27.26 4.56 .61** .41** .51** .49** .70** .63
Organization 31.34 5.37 .64** .43** .46** .45** .64** .67** .75
Self Disclosure 29.39 5.27 .73** .51** .55** .42** .76** .70** .67** .78
Deliberate & 26.82 5.72 .52** .22** .30** .45** .47** .61** .61** .47**
High Standards 26.91 4.17 .24** .06 .04 .29** .22** .24** .17** .26**
Attention to Detail 30.65 4.50 .62** .41** .44** .43** .65** .61** .73** .63**
Particularity 28.04 5.97 .70** .44** .47** .41** .65** .65** .73** .69**
Selection Test 72.63 13.06 .92** .71** .67** .46** .87** .65** .65** .77**
Marlowe-Crowne 6.11 2.75 .60** .54** .58** .37** .56** .55** .53** .57**
Note. Alpha coefficients are presented along thediagonal except for Hogan PI scales as item data is not given by the publisher.
Table 1 (Continued from Last Page)
Means, Standard Deviations, and Correlations of Variables for the Entire Sample (N = 246)
Variable Del. & High Att. Part. Sel. Test MC
Rat. Stan. Detail
Deliberate & .73
High Standards .33** .56
Attention to Detail .60** .29** .63
Particularity .50** .25** .63** .80
Selection Test .48** .23** .64** .69**
Marlowe-Crowne .35** .05 .52** .52** .53** .79
Note. Alpha coefficients are presented along the diagonal except for Hogan PI scales as item data is not given by the publisher.
Means, Standard Deviations, and One-Way Analyses of Variance (ANOVAs) for Effects of Training on the Conscientiousness Scale
Control Trained ANOVA
Scale M SD M SD F (1, p
Conscientiousness 36.94 6.82 40.21 6.90 11.00 .001
Reliability 10.67 3.90 11.71 3.57 3.78 .053 *
Intellectance 14.83 5.38 17.70 5.08 14.61 .001
Stress Tolerance 17.74 5.24 20.03 4.32 11.11 .001
Dependable & 31.18 5.18 33.12 5.33 6.64 .01
Planful 26.88 4.29 28.78 4.77 8.42 .01
Organization 30.81 5.18 33.67 4.87 15.07 .001
Self Discipline 29.35 4.59 31.05 5.72 5.20 .05
Deliberate & 25.90 5.68 29.14 5.20 17.18 .001
High Standards 26.19 4.20 27.39 4.15 4.03 .05
Attention Detail 30.16 4.19 32.49 4.44 14.00 .001
Particularity 28.30 5.17 30.01 6.00 4.50 .05
Note. * p = .053, not p < .053
Effect Size Estimates of Faking and Coaching
Scale Faking d Coaching d Coaching above
Conscientiousness 1.06 1.66 .48
Reliability .59 .89 .28
Intellectance .20 .77 .55
Stress Tolerance .90 1.49 .48
Dependable & Reliable .67 1.09 .37
Planful .50 .93 .60
Organization .60 1.23 .57
Self-discipline .70 .99 .33
Deliberate & Rational .34 .97 .60
High Standards -.27 .02 .29
Attention to Detail .57 1.14 .54
Particularity .88 1.12 .31
Marlowe-Crowne 1.09 .48 -.44
Note. Faking = Control Training, Faking Instructions (Group 5, N = 62) vs. Control Training, Honest Instructions (Group 6, N = 51).
Training = Personality or Lie Training, Faking Instructions (Group 3 or 4, N = 33 or 34) vs. Control Training, Honest Instructions
(Group 6, N = 51). Coaching over Faking = Personality or Lie Training, Faking Instructions (Group 3 or 4, N = 33 or 34) vs. Control
Training, Faking Instructions (Group 5, N = 62).
Analysis of Trends of Proportion of Participants Selected by Group for Various Selection Ratios
Group Training Instructions Proportion of Proportion Proportion Proportion Proportion
Total Sample Selected 1% (5) Selected 5% Selected 10% Selected
(13) (26) 20% (54)
1, 2 Pers, Lie Fake 27% (66) 20% (1) 38% (5) 35% (9) 37% (20)
3 Pers, Fake 13% (33) 40% (2) 38% (5) 46% (12) 30% (16)
4 Lie Fake 14% (34) 40% (2) 24% (3) 12% (3) 7% (4)
5 Control Fake 25% (62) 0% (0) 0% (0) 7% (2) 26% (14)
6 Control Honest 20% (51) 0% (0) 0% (0) 0% (0) 0% (0)
Note. Pers = Personality. Numbers in parentheses indicate