Assessing Validity of Computerized Voice Stress Analysis
Edinboro University of Pennsylvania
To assess the validity of the computerized voice stress analysis test (CVSA), 16 undergraduates participated in a standard mock crime scenario. Eight
participants committed a mock theft and lied about their involvement and 8 were falsely accused of the theft. All were given a CVSA to assess the
veracity of their denials. Next participants completed the defining moral issues test and were then asked what they remembered about the questions from
the CVSA. Accuracy rates of the CVSA and the relationships between accuracy, moral development, and types of questions remembered will be analyzed.
One of the great challenges facing law enforcement is the ability to detect when someone is Methods
being deceptive. During an investigation, police officers face the difficult challenges of Participants
deciphering who is telling the truth, whether it be a suspect, witness, or even a victim. Seventeen undergraduate psychology students at Edinboro University of
Some methods used have been simple observations of behaviors, physiological responses Pennsylvania participated in this study. Students were ranged in age from 18 to 40,
like the polygraph, and assessing changes in the pitch of a suspect’s voice (Vrij et al. 2000). and consisted of 10 females and 7 males. Two participants were deemed ineligible
One method that has become increasingly popular is detection of deception by analyzing due to microphone difficulties and were removed from the study. Participants were
the changes in the pitch of a suspect’s voice, commonly referred to as voice stress analysis. recruited through an announcement in their General Psychology class and were given
Recently there has been a revival in the use of voice stress analysis due to advances in the opportunity to sign up for this study; they were offered extra credit in their
computers hardware and software that allow the digitalization and computer analysis of the psychology course for participating. Students who did not want to participate but
voice. Now referred to as the Computerized Voice Stress Analysis Test, or CVSA, the wanted to gain the extra credit were offered an alternate exercise.
program works to detect microtremors or modulations in the voice (Nachshon, Elaad, Apparatus
Amsel 1985) which are associated with higher levels of stress, arousal, and “deceptive A laptop computer containing the “Truster” software was used, equipped
utterances” (Streeter et al. 1977). with a microphone. A book containing an envelope with fake money was used for the
The concept of voice stress analysis has been investigated since the early 1970’s, mock theft, as well as an envelope containing the condition instructions. The
when an instrument known as the PSE (Psychological Stress Evaluator) was developed and Defining Issues Test was used as a time filler between the CVSA interview and a
marketed for the purposes of lie detection (Horvath 1982). These early analog versions review of questions used during the interview.
provided a paper printout of vocal changes that were then interpreted by the examiner. Procedures
More recent applications of the software are now available in computerized form, such as Participants were randomly assigned to the control group (truth) or the
the Digital Voice Stress Analyzer, Truster software, and CVSA. Here the subject speaks experimental group (deceptive) by researcher #2, thus neither researcher #1 nor the
into a microphone or telephone and the voice is analyzed by the computer for deception. CVSA examiner knew which condition the participant was in. The participant was
The developers and distributors of this equipment claim “hundreds of professional studies” then greeted by researcher #1, led into a small room (containing a desk with a drawer
have shown their software to be reliable, consistently accurate at a rate of 96% (2001). and two chairs), and was asked to have a seat and read through the consent form.
Several controlled studies were also mentioned in the “Truster” handbook, but neither piece After the participant had read and signed the form, researcher #1 answered any
of the literature gave specific citations for the research or made any such research available. questions the participant had about the experiment, then presented and explained a
In a recent review by Dr. Donald Krapohl, for the American Polygraph filler task (introduced as a “cognitive task”). Researcher #1 then informed the
Assoication, none of the published literature he reviewed found the CVSA to have accuracy participant that there were additional instructions in the envelope, and it was
greater than chance in detecting deception. (2003). A 1973 study by Kubis found that imperative that the participant read the instructions before beginning the task. Before
“neither the PSE nor the VSA was effective…” The VSA showed an average accuracy of leaving the room, researcher #1 informed the participant he/she had five minutes to
36% and the PSE showed a 32% accuracy rate in detecting a guilty party. A study by complete the task, after which she would return. The instruction in the envelope
Nachshon, Elaad, and Amsel (1985) argues the validity of the Psychological Stress directed the participant on whether or not to take the book, and to deny taking it when
Evaluator, with the equipment being 95.65% accurate in determining truth teller, but not the asked. If they were successful in their denial of the theft they would receive $3 in
deceivers (33.33%). This most likely represents a truth bias in the test where most cash (Table 1).
everyone is labeled as being truthful and as having little value in identifying who is lying. After five minutes had elapsed, researcher #1 re-entered the room and
Research by the Air Force at Rome Laboratory (2001) has identified that there are thanked the participant for what they had completed. Researcher #1 then told the
predictable changes in voice that are associated with personal experiences of stress. participant that there was a book in the desk drawer with money inside, intended to
However, there are a variety of factors that may result in a suspect feeling stress, one of pay the participant. Researcher #1 then looked inside the drawer, noted out loud that
which is that they are lying. Thus, CVSA may be a valid predictor of stress but not the book was missing, and asked the participant if he/she had seen anyone else
deception. besides the researcher enter the room, if they had opened the drawer anytime during
Some concerns surrounding the use of the CVSA include its reliability and the experiment, and finally, if they themselves had taken the book. After the
validity. Dr. Palmatier found that the company marketing the CVSA, called the National participant denied taking the book, the researcher informed the participant that the
Institute for Truth Verification (NITV), is not only “disinterested in subjecting the CVSA to book would have to be found, and that a voice stress analysis examiner will want to
additional study, but the organization has attempted to thwart research conducted in a field, ask him/her a few questions to identify if they had taken the book. Once the
or real world, setting”. The APA has concluded through research reviews that the CVSA is participant agreed, he/she was asked to wait while the researcher went to consult with
not accurate at detecting deception, and the AAPP also has found a lack of scientific studies the voice stress analysis examiner.
to support the CVSA as a method of discriminating between truth and deception (APA and Researcher #1 returned to the room a minute later and asked the
AAPP 2002). Currently, despite the lack of validating research, 800 known law participant to please follow her into the CVSA testing room. This room contained the
enforcement agencies are currently utilizing the CVSA, including 22 agencies in VSA equipment, two chairs and a table, and the layout was monitored and recorded
Pennsylvania (National Institute for Truth Verification, 2000). Thus, one of the first by a video camera. The participant was introduced by researcher #1 and was greeted
questions we were interested in was does the CVSA accurately detect who is being truthful by the voice stress analysis examiner. The participant was seated and was again
and who is lying? asked if they had taken the book. The examiner identified that researcher #1 had
In addition to the question of validity regarding the CVSA, there is also a accused them of taking the book with several dollars in it that was very important to
question of why changes in the voice might signal deception. One theory that has been the experiment. Next the participants were instructed on how the VSA worked,
proposed by Waid and Orne (1981) to explain similar changes observed during a polygraph including a quick discussion of the autonomic nervous system. Then all of the
test is the Cognitive Load theory. Cognitive Load theory proposes that: “the questions the participant would be asked during the CVSA tests were discussed and
psychophysiological detection of deception depends on the subject giving larger autonomic any confusion about the questions were clarified. There were a total of 10 different
responses to questions associated with his or her deception than to appropriate control questions asked during the CVSA exam. The questions were divided into four types:
questions” and that these responses are a function of cognitively processing the 1)neutral question that the participant was to respond truthfully to, 2)control questions
question( Waid, Orne, Orne 1981). The idea of finding a person deceptive based on their that the participant was most likely lying to, 3)directed lie question that the participant
cognitive load is supported by the idea that more relevant questions asked the person will be was directed to lie to, 4)the question relevant to the theft of the book (Table 2). The
remembered. The more attention or importance a person gives a question, the more likely it CVSA began with the examiner having the participant read a calibration statement to
will be recalled later on, and the greater the physiological response. Thus, the more get a baseline of the participant’s vocal patterns. Then each of the 10 questions was
attention/importance a guilty suspect focuses on the relevant questioning, the larger his/her asked in a preset order with the participant verbally responding, “Yes, I did do that”
physiological response to the question and the more likely he/she will be able to remember or “No, I did not do that” depending on which question was asked. These 10
those questions later on. questions were asked three times with a short break between each set, and the test was
To test the ability of CVSA to validly detect deception, the current experiment then ended.
hypothesizes that the participants in the experimental group, who were asked to conceal After the questioning was complete, the participant was led back into the
their participation in a mock theft, will, in response to questions about the theft, produce room by researcher #2, who then administered the Defining Moral Issues test. After
more speech disturbances than those who did not participate in the theft. These this test was complete, researcher #2 then interview the participant to gain insight as
disturbances will be detected by the Truster VSA system. In addition to exploring the to whether or not the participant remembered the questions asked and how they
underlying mechanisms that might cause difference in physiological processes like vocal responded. They were also asked to rate the importance of the question on a 1 (not
pitch and its association with deception, it was also hypothesized that guilty participants important) to 5 (very important) scale, to ascertain their understanding of the
would rate the relevant questions asked about the theft as more important that the other relevance of the question. Once the interview was complete, the participant was
question types, whereas innocent subjects will rate the control and directed lie as being debriefed by the CVSA examiner and paid three dollars for their time.
Results Table 1
Two dependent variables were used to test our first hypothesis. The Truster Instructions for Deceptive Condition: Open the top left-hand drawer of the desk take
rating of participant reliability was examined using a Chi square test (Table 3). This test the book and place it in your book bag. Then complete the test you were given by Ms.
was based on the number of truthful and deceptive participants the Truster had rated as Ober. When she returns, Ms. Ober will look for the book, ask if anyone had come in,
either being Reliable or Unreliable. This test was not significant χ2 (1, N=15)=1.607, p>.05. and then accuse you of taking it. Neither she nor Dr. Craig will know if you had
Overall, the Truster assessment of reliability correctly identified 33% of the participants, taken the book or not. You need to convince both of them that you did not take the
with the same level of accuracy for both deceptive and truthful participants. book. If you convince them you did not take the book, you will receive $3.
An additional measure of deception was calculated where a participant was
deemed deceptive with more than one relevant questions identified as being deceptive, Instructions for Truthful Condition: Complete the test you were given by Ms. Ober.
otherwise the participant was deemed truthful (Table 3). Results of a chi square analysis When she returns, Ms. Ober will look in the desk for a book, which will be missing.
did not identify that the new measure of deception did any better than chance at identifying She will ask if anyone had come in and then accuse you of taking it. Neither she nor
whether a participant was truthful or not χ2(1, N=15)=.185, p>.05. Overall, the assessment Dr. Craig will know if you had taken the book or not. You need to convince both of
of guilt based on being deceptive on more than one relevant question yielded an accuracy them that you did not take the book If you convince them you did not take the book,
rating of 56%. More truthful participants were correctly identified (67%) than deceptive you will receive $3.
To further examine the ability of the CVSA to detect deception a T-Test was Table 2
conducted on the number of relevant questions identified as deceptive by the test (Table 4). True Questions –
There was no significant difference (t(15)=-.119, p>.05) between the two validity When were you born?
conditions on the number of relevant questions identified as deceptive, with the mean Where do you go to school?
number of relevant questions identified as deceptive for truthful participants of M=1.167 How old are you?
(SD=1.169) and deceptive participants M=1.33, (SD=1.80). Relevant Questions –
To test our second hypothesis, a repeated measures ANOVA was run with Did you take the book from the desk?
validity as the between subject variable and the importance ratings for the two questions Did you place the book in your book bag?
types (Relevant and Control-Directed Lie) as the repeated measure. Significant difference Do you have the book with you right now?
was found between the Relevant and Control/Directed Lie questions, with the Relevant Standard Control Questions (Probable Lie) –
questions being rated as more important than the Control/Directed lie questions Before today, have you done anything illegal, immoral, or broken a rule or
F(1,15)=14.304, MSE=.312, p<.05. The mean importance rating for the Relevant questions regulation?
was higher (M=3.67, SD=.43) than the Control-Directed Lie (M=2.50, SD=.75). Analysis Before today, have you ever taken something of value that did not belong to
of the interaction between validity and importance of question type was not significant you?
(F(1,15)=3.10, p=.10). However, it was close and was in the predicted direction, with Directed Lie Questions –
deceptive subjects rating the relevant questions more important than the Directed Have you ever made even one mistake?
Lie/Control questions, whereas their truthful counterparts appeared equally concerned about Have you ever told a lie?.
both question types.
We did not assess the questions remembered due to a ceiling effect encountered Table 3
during the data analysis, likely due to not enough time between CVSA test and question
recall. Participants remembered a total of 98% of the questions.
Subject was Subject was
found reliable found unreliable
Truthful 2 4
The present study found the CVSA system did not differ significantly from
chance in the detection of deception of the guilty or innocent participants. Whether based
Deceptive 6 3
on the programs final determination of reliability or via evaluations of deception on
individual relevant questions, the CVSA did not do better than what would be expected by
chance. In fact, when the Truster reliability was the sole determinant of truth the accuracy
level was 33%. Thus, call into question this version of CVSA’s validity as a tool in
determining the truthfulness of a suspect. 33% accuracy
These results are consistent with the research done on the validity of the VSA,
such as the results seen in Kubis (1973) study and the results reviewed by Krapohl (2003). Table 4
These results concluded that the “validity of the analysis for practical lie detection is
questionable,” and that accuracy was not significantly greater than chance, as well as the Decision
position of the American Polygraph Association and Association of American Police Truthful Deceptive
Validity Truthful 4 2
Polygraphers, who have found “no credible evidence to validate voice analysis as an
effective instrument for determining deception”. 66.7% 33.3%
With regards to the second hypothesis, while we did not find significant Deceptive 5 4
differences in the interaction between validity condition and importance ratings of the 55.6% 44.4%
question types. Howerver, the differences were approaching significance and were in the 53% accuracy
predicted direction. Deceptive participants rated the relevant questions as being more
important than did the Control-Directed Lie questions, with less difference in ascribing Figure 1
importance for truthful participants. Had there been a larger participant sample the 5.0
interaction between the importance ratings of the question types and deceptive condition
might have been identified.
One of the primary limitations of this study is the small sample size. Due to 4.0
amount of time, roughly 1 hour each, and the cost only 15 participants were used. A follow 3.5
up study should be done with a much larger sample size. Although the study was small, the
results are applicable to the field of law enforcement in that they add to the amount of 3.0
studies that are showing the failure of the VSA to be any better than chance at detecting 2.5
who is being deceptive and who is not. The 800+ current law enforcement agencies that are
Mean Importance rating
utilizing the voice stress analysis software should perhaps rethink their confidence in its
accuracy before applying its results to their investigations and interrogations. 1.5
Several questions stemmed from the completion of this study. We used a 1.0
standard polygraph protocol of relevant and control questions, a process not advocated by .5 Control questions
CVSA. In fact, there is not systematic questioning protocol advocated in the CVSA
0.0 Relevant questions
research to this point. Thus, the types of questioning used should be manipulated, with a Truthful Deceptive
scenario description used in place of the polygraph-based Control Question Test to
determine whether there are any differences based on test protocols. There is a chance that Condition
with a scenario description, a greater cognitive load will exist, thus leading to a better Special thanks to: Nilanjana Sarker and Myron Nowell for their assistance with the
chance that the VSA might pick up on a greater difference in modulations in the voice. completion of this project
Another aspect that should be investigated would be the validity of the polygraph and the
VSA used together; it should be noted as to whether either test increases in validity when Presented at the 31st Annual Western Pennsylvania Undergraduate Psychology
used with the other, perhaps for greater significant results than when used alone. Conference – Mercyhurst College, Erie PA – April 2003.
References Available Upon Request
The author can be contacted at: firstname.lastname@example.org