Qualitative researchers produce and use a large number of different kinds of records and
documents. These include textual records like reports or minutes, transcripts of unstructured
interviews, evidence transcripts, historical or literary documents, personnel records, field notes,
observation records, newspaper clippings and abstracts. Many researchers also collect non-
textual records like musical scores, photographs, drawings, tape recordings, films, maps and
plans. Consequently researchers and the writers of CAQDAS have developed ways of dealing
with them. In recent years this has meant transcribing textual documents into electronic form
using a word processor and digitizing other records such as images, video and sound recording.
Digitized recordings are growing in importance in qualitative analysis especially with the growth
of higher capacity storage media. However, for now textual documents remain the most
important form of data used by qualitative analysts. Even before the advent of CAQDAS,
qualitative researchers transcribed their interview recordings, observations and field notes. This
was more common with interviews, because it created an easily accessible copy. The use of
CAQDAS has made it more than ever attractive to have an electronic, word-processed copy.
Most programs have the ability to show the full text associated with the results of searches,
coding and other forms of analysis, provided the text is available at the start in electronic form.
It is not necessary to transcribe all or even any of the information you have collected in your
project in order to analyse it.
The software covered in this book, NVivo, can be used quite productively without a word-
processed copy of the interviews, texts or observations you have collected or recorded. You can
type notes or summaries – or even a full transcription – directly into NVivo‟s own rich text
editor. For some researchers this is preferable to using a word processor, since, as we shall see,
you can code and annotate as you type. Alternatively you can use a proxy document to represent
the tape recording and code and annotate the proxy. In fact some researchers advocate coding
directly from a tape recording. That way you are more likely to focus on the bigger picture and
not get bogged down in the details of what people have said. This may be possible for some
types of analysis, but for others, such as discourse and conversation analysis, a detailed transcript
Nevertheless, no matter what the analytic approach, there are good reasons for transcribing. Most
qualitative researchers find it important to produce a typescript copy of their data for two
reasons. It forces you to read carefully what is recorded on tape or in your notes and it provides
you with an easily readable transcript that can be copied as many times as necessary. Having a
transcript also makes it easier to work in a team, where tasks have to be shared and there has to
be good agreement about the interpretation of the data. A typescript means everyone can read the
texts and everyone can have a copy. These advantages are summarized in Table 1.
Table 1 Advantages of transcripts
A corrective to the limitations of intuition and recollection
Enables repeated and detailed examination of the events of the interaction
Extends the range and precision of the observations that can be made
Permits other researchers to have direct access to the data about which claims are being made
Makes analysis potentially subject to detailed public scrutiny
Helps minimize the influence of personal preconception or analytical bias
Data can be re-used in other investigations and re-examined in the context of new findings.
(Adapted from Heritage 1984: 238)
In NVivo there is an important, further advantage of transcribing. If you code a transcript, the
program can retrieve all the coded passages about a particular topic. If documents are transcribed
you can read the contents of these passages together, and review the range of data coded there
and code to finer categories. This process of retrieving from documents all the actual text that
relates to the same idea and then reviewing it in order to refine the coding or even to do further
coding is called „coding on‟ and is a central and important feature of qualitative data analysis
programs like NVivo.
There are also drawbacks to transcribing. Not least is the time it takes (or the cost) to do it.
Estimates vary from author to author and depend on what level of detail you transcribe and how
talented the typist is. A common figure is that transcribing takes somewhere between 2 to 5 times
as long as it takes to collect the data. Most researchers who are competent typists and transcribe
their own interviews find it takes about 4 hours of transcribing time for each hour of interview.
This means work can pile up, especially for lone researchers doing their own transcription. Many
PhD students using qualitative methods have experienced the anxiety brought on in the later
stages of their fieldwork by the growing “pile” of tapes waiting to be transcribed. The only real
advice here, albeit hard to follow, is, if you can‟t pay someone to do it for you, keep transcribing
“little and often”.
Who should do the transcription?
The choice of who should do the transcription usually comes down to either you, the researcher,
or someone else who is paid to do it. Despite the nature of the activity, which can be tedious,
especially if you are not a good touch typist, there are advantages to doing your own
transcription. If your data are field notes, almost certainly you are the only person able to
interpret them, so there is no choice. But if you have taped conversations or interviews, it may
still help to transcribe them yourself. It gives you a chance to start the data analysis. Careful
listening to tapes or reading of your field notes along with reading and checking of the transcripts
produced means that you become very familiar with their content. Inevitably you start to
generate new ideas about the data. Nevertheless, researchers usually do their own transcription
because they have no choice. They have no funds to employ an audio typist or the content of the
text is such that no-one else can do it. For instance, the interviews may be about a highly
technical subject or, what is often the case with anthropological work, in a language very few
others can understand.
It is not necessary to transcribe all your interviews or field notes. You could, for example, only
transcribe parts. For the rest, you could just type notes and use those for coding, or even code
directly from the tape or your field notes. In some cases you may find that your memory of an
interview or your research diary tells you that at certain points the respondent went off topic and
so these parts can be ignored. Nor is it necessary to transcribe everything you need before
starting analysis. NVivo is very flexible in this respect. You can start setting up nodes in the
program and write memos on them before any transcription at all. Later you can import some
transcribed documents (or even part transcribed documents) and start coding them, then import
further documents (or complete the partial ones) and code them as well.
If you are transcribing tapes yourself, try if at all possible, to use a proper transcription machine.
This is a tape player that can play normal audio-cassettes. There are many types of transcription
machines that can play mini-tapes of the kind used in dictation machines. However, qualitative
researchers usually use normal audio-cassette recorders for taping interviews, so you will need a
machine that can play these tapes. Transcription machines have two facilities that make them
superior to simply using an audiocassette player. They have a foot control that allows you to
pause the tape without using your hands. This is very useful if you are a good typist and
especially a touch typist. Second, when the play is restarted after a pause the tape has rewound a
little and play starts a little before the place where you paused. Typically the length of rewind
can be adjusted to match your speed and accuracy of typing, and how difficult it is to make out
what is on the tape. You could possibly use an ordinary audiocassette player, but you will find
yourself constantly frustrated by having to rewind the tape a little each time you stop.
Employing someone else to do the transcription, if you can afford it, is a good option, but only if
the tapes are easily understandable or the notes and documents are easy to read. It is best if the
typist you are employing knows something about the subject matter and the context of the
interviews. However, for general subject matter a good audio typist will be fine. The audio typist
may have his or her own transcription machine or you may have to lend yours. Either way this is
important when the typist is paid by the hour as anything you can do to make the typing easier
will reduce your costs. Alternatively, you could negotiate with your typist to be paid by results.
Work out a reasonable price per hour of tape and apply this to all the work. No matter who you
use, you will still need to check through the document produced against the recording or original
text to eliminate mistakes. However, this is not all lost time as, again, reading the transcript (and
listening to the tape) will be an opportunity to begin your analysis.
Don‟t forget that the typist will be listening to or reading all your data. As Gregory, Russell and
Phillips (1997) remind us, they are „vulnerable‟ persons. If the content of your data is
emotionally loaded and sensitive, you might want to consider including your transcribers in the
scope of your ethical considerations and you may wish to offer some debriefing to support them.
OCR and speech recognition software
In recent years two new technologies have become available that can help the transcription
process. If you have some typed or printed documents that you need to get an electronic copy of,
then optical character recognition (OCR) software used with a scanner will help. Provided the
original paper copy is good quality and that standard fonts are used, like Courier for typescript,
then the software will work well in producing word processing files from the paper copies. Some
of the most common packages are OmniPage from Ceare software and TextBridge Pro 98 from
ScanSoft Inc and there are both PC and Macintosh versions.
A more recent technology that is just getting to be usable by qualitative researchers is speech
recognition software. This software can take speech spoken into a special, high quality
microphone and convert it into a word processed file. With early versions of the software you
had to speak with a mid-Atlantic, English accent and say - each - word - separately, with a pause
between each one. Natural speech has very few gaps between words, and recent versions of the
software can recognize such continuous speaking. The new software can also cope with other
versions of English, such as UK English, S.E. Asian and Indian English, as well as a number of
other languages. However, all of them still need to be trained to recognize the speech of one
particular user and need very good quality sound. For these reasons they cannot be used directly
with tape recordings of interviews. However, what some enterprising researchers have done is to
set up a tape player with a pair of headphones with which they can listen to the recording of an
interview. Then as the tape plays they dictate what they hear into their version of the speech
recognition software. This is a little awkward to begin with, but the knack is quickly acquired.
The quality of recognition is not as good as with OCR software, but it is generally good enough
for a first draft transcription that can then be checked against the tape properly. Leading
packages include Dragon Dictate‟s Naturally Speaking, and ViaVoice from IBM. Speech
recognition is a computationally intensive task and all programs need fairly powerful computers.
Check before you buy.
Transcription, especially of interviews, is a change of medium and that introduces some issues of
accuracy. Kvale (1988: 97) warns us to “beware of transcripts”. When moving from the spoken
context of an interviews to the typed transcript there are, he suggests, dangers of superficial
coding, decontextualization, missing what came before and after the respondent‟s account, and
what the larger conversation was about. As we shall see later, this change of medium is
associated with certain kinds of errors that researchers must watch out for.
No matter how the transcription is produced, OCR, speech recognition or human typist, it will
need checking against the original recording. Errors arise for a variety of reasons. First there are
simple typing errors, misspellings and so on. Most of these can be picked up using the spelling
checker built into most word processors. Other, and often more significant errors arise because
the transcriber has misheard what was said on the tape. Sometimes this is because the recording
is „noisy‟ and it is hard to make out what is said. For instance the recording was made in a noisy
place or it has picked up the sound of the recorder mechanism. In face-to-face speech humans are
very good at filtering out such noises, but recordings don‟t and then we experience more
difficulty hearing over the background. But even where the sound is good there are many cases
where the transcriber has heard one thing whereas the respondent said something else. Hearing
exactly what is said involves understanding and interpretation. Sometimes the right sound is
heard but the interpretation is wrong, as in the common linguistics example of „ice cream‟ and „I
scream‟ which both sound the same. More often than not, though, it is in the process of
interpretation that something different is heard from what was actually said.
Various things can be done to minimize these errors. It helps to have as good a quality sound as
possible. Recording quality is improved significantly by using a good microphone such as a
battery powered lapel microphone. A good quality recorder such as a mini-disk or high quality
audio-cassette recorder will help as will a good transcribing machine with good headphones.
Some transcription machines can be less sensitive to low voices than normal audio cassette
players. Despite the advantages of transcription machines outlined above, you might find it is
easier to make out what is on a tape using a good hi-fi cassette deck. But no matter how good the
sound, there is always going to be a need for interpretation and understanding of what is heard.
The best way to reduce errors here is to make sure that the transcriber understands the context
and subject matter he or she is transcribing and is used to the accent, cadence and rhythm of the
speakers. This is one of the biggest advantages of doing your own transcription. You will know
the context of the interview, and we hope, be familiar with the subject matter. Table 2 lists some
examples of the errors of interpretation found by a Canadian researcher using audio typists to
transcribe interviews on trade union activities.
If you are concerned that the transcription may be inaccurate, you could try taking it back to the
respondents to check it with them. Of course you can‟t expect respondents to remember, word
for word, what they said, but they should be able to pick up any nonsensical interpretations – the
kinds of things they couldn‟t possibly have said. However, sometimes respondents will disagree
with the transcript, even though it is clear from the recording what they said. What do you do
then? There are two options. You can treat the respondent‟s statements as new data and try to
find out why the interviewee may have changed her or his opinion. They could be embarrassed
over what was said now that it is frozen on tape, or there may have been intervening events
which have altered the situation, or they may have had a genuine change in opinion or they feel
pressure from peers or authority figures to change their opinions. You could treat the transition in
the opinion as interesting data itself. The second option is when the interviewee wants the
previous statement removed and not used. This is the interviewee‟s right especially if you have
used a fully informed consent form mentioning the right to withdraw. You have little option but
to respect it. You could try to convince the interviewee that the change constitutes valid data
itself, and so treat it as the first option. But if you are unsuccessful, then you should respect the
wishes of the interviewee and throw away the data.
Table 2 Transcription errors
Transcriber's typed phrase What interviewee actually said
layer market labour market
reflective bargaining collective bargaining
self-support soft support
the various those areas
certain kinds of ways of understanding surface kinds of ways of understanding
you know the most
general contact general context
and our and/or
delegates to hire bodies delegates to higher bodies
generally gender lines
new committees union committees
mixed service lip service
it runs again it runs the gamut
as a hole as a whole
accepted committee executive committee
denying neglect benign neglect
was a committee member Women’s committee member
ever meant to never meant to
inversions to class analysis conversions to class analysis
it just makes sense it doesn’t make sense
there isn’t a provision for day care there is a provision for day care
the union can take concerted action the union didn’t take a concerted action
there's one thing I can add there's nothing I can add
there's more discernible actions there aren’t discernible factions
it wasn't like I had to take on new things [domestic it wasn’t like he [spouse] had to take on new
chores] things [domestic chores]
it was union activities [that broke up my it wasn’t union activities [that broke up my
From e-mail from Carl Cuneo, Thu, 16 Jun 1994.
Level of transcription
As noted above the act of transcription is a change of medium and therefore necessarily involves
some kind of transformation of the data. There are varying degrees to which you can capture
what is in the sound recording and you need to decide what is appropriate for the purposes of
your study. Sometimes just a draft version of what is said is sufficient. This is often the case in
policy and evaluation research, where the salient factual content of what people have said is good
enough for analysis. However, most researchers who are interested at least in respondents‟
interpretation of their world need more detail than this. They aim at a transcribed text that looks
like normal text and is a good copy of the words that were used by the respondent. This may
seem unambiguous, but even here there are decisions to be made. Continuous speech is very
rarely in well-constructed sentences of the kind found in written language. Speakers stop one line
of thought in mid sentence and often take up the old one again without following the
grammatical rules used in writing. You may therefore be tempted to „tidy up‟ their speech.
Whether you should do this depends on the purpose of your study and whether you intend to
quote passages in your publication or report. Tidy, grammatical transcriptions are easier to read
and hence analyse. If your study is not much concerned with the details of expression and
language use and is more interested in the factual content of what is said, then such tidying up is
acceptable. On the other hand it clearly loses the feel for how respondents were expressing
themselves and if that is significant in your study you will need to try to capture that in the
transcription. The downside is that it makes the actual typing more difficult to do. A similar
dilemma arises when respondents speak with a strong accent or use dialect. The most common
practice here is to preserve all the dialect words and regional terms and grammatical expressions,
but not to try to capture the actual sound of the accent by changing the spelling of the words.
Table 3 Examples of different levels of transcription
Just the gist
“90% of my communication is with … the Sales Director. 1% of his communication is with me.
I try to be one step ahead, I get things ready, … because he jumps from one … project to
another. …This morning we did Essex, this afternoon we did BT, and we haven't even finished
(… indicates omitted speech)
“I don‟t really know. I‟ve a feeling that they‟re allowed to let their emotions show better. I think
bereavement is part of their religion and culture. They tend to be more religious anyway. I‟m not
from a religious family, so I don‟t know that side of it.”
Verbatim with dialect
“„s just that – one o‟ staff – they wind everybody up, I mean, – cos I asked for some money –
out o‟ the safe, cos they only keep money in the safe – ‟s our money – so I asked for some
money and they wouldn‟t give it me – an‟ I snatched this tenner what was mine.”
Bashir: Did your ever (.) personally assist him with the writing of his book. (0.8)
Princess: A lot of people.hhh ((clears throat)) saw the distress that my life was in. (.) And they
felt it was a supportive thing to help (0.2) in the way that they did.
(Discourse example from Silverman 1997: 151)
In some cases an even more detailed transcription is necessary. Not only is natural speech often
non-grammatical (at least by written conventions) but it is also full of other phenomena. People
hesitate, they stress words and syllables, they overlap their speech with others and they raise and
lower both volume and pitch in order to add meaning to what they are saying. If your interest is
in the detailed examination of language use, for example if you are doing conversation or
discourse analysis, then you will probably need an even more detailed transcription. This can be
done by adding in special symbols for stress, overlap, pauses etc. Table 3 gives some examples
of different transcription styles and Table 4 shows some of the common transcription
Table 4 Transcription Conventions
Try to have the spelling of words roughly indicate how the words were produced. Often this i nvolves a
departure from standard orthography.
Arrows in the margin point to the lines of t ranscript relevant to the point being
made in the text.
() Empty parentheses in dicate talk too obscure to transcribe. Words of letters inside
such parentheses indicate the transcriberÕ best estimate of what is being said.
hhh hÕ i
The letter Ô s used to indicate hearable a spiration, its length roughly
proportional to the number of Ô s. If preceded with a dot, the aspiration is an in-
breath. Aspiration internal to a word i s enclosed in parentheses. Otherwise Ô shÕÕ
may indicate anything from ordinary breathing to sighing t o laughing, etc.
[ Left-side brackets indicate where overlapping talk begins.
] Right-side brackets indicate where overlapping talk ends, or marks alignments
within a continuing stream of overlapping t alk.
Talk appearing within degree si gns is l ower i n volume relative to surrounding
>< ÔGreater thanÕ a d Ôl ss thanÕsymbols enclose talk t hat is noticeably faster than
the surrounding talk.
Words in double parentheses in dicate transcriberÕ comments, not transcriptions.
(0.8) Numbers in double parentheses in dicate period of silence, in t enths of a s econd Ğ
a dot inside parentheses indicates a pause of less than 0.2 seconds.
::: Colons indicate the lengthening o f the sound just preceding them, proportional to
the number of colons.
becau- A hyphen indicates an abrupt cut-off or self-interruption of the sound in progress
indicated by the preceding letter(s) (the example here r epresents a s elf-interrupted
He says Underlining indicates stress or emphasis.
dr^ink AÔ or
hatÕ ci rcumflex accent symbol indicates a marked pitch rise.
= Equal s igns (ordinarily at the end of one line and the start of an ensuing o ne)
indicate a Ô latchedÕ r lationship Ğ n silence at all between t hem.
(From Silverman 1997: 154)
Gregory, D., Russell, C.K. and Phillips, L.R. (1997) Beyond textual perfection- Transcribers as
vulnerable persons, Qualitative Health Research, 7: 294-300.
Heritage, J.C. (1984) Garfinkel and ethnomethodology. Cambridge: Polity Press.
Kvale, S. (1988) The 1000-page question, Phenomenology and Pedagogy, 6: 90-106.
Silverman, D. (Ed.) (1997) Qualitative research: theory, method and practice. London: Sage