WHICH VERSUS THAT
When to use each in subordinate clauses.
To judge from correspondence, and from comments in alt.usage.english from time to time, there
is confusion over which of these words to use when introducing clauses that modify nouns. This is
not surprising, as there has been a shift in usage this century and older style books give different
advice from newer ones. The usage is intimately linked with the distinction which grammarians
made between restrictive and non-restrictive clauses.
A restrictive clause is one which limits, or restricts, the scope of the noun it is referring to. Take
these examples:
The house that is painted pink has just been sold.
The house, which is painted pink, has just been sold.
In the first of these, the clause “that is painted pink” is a restrictive clause, because it limits the
scope of the word “house”, indicating that the writer doesn’t mean all houses, only the one that
has been painted in that particular colour; if you take that clause out, we are left with The house
has just been sold: we no longer know which house is being referred to and the sentence loses
some crucial information. The second example is non-restrictive: the writer is giving additional
information about a house he is describing; the clause “which is painted pink” is here
parenthetical—the writer is saying “by the way, the house is painted pink” as an additional bit of
information which is not essential to the meaning and could be taken out.
Here’s another example:
Another cause of stress is a traumatic event that is out of the ordinary and has a major impact on
the person’s life.
The argument here is that the clause “that is out of the ordinary and has a major impact on the
person’s life” modifies and constrains “event”. It’s not just any event but one specific type of event,
to the extent that the whole block from “event” onwards forms one idea. That makes the clause
restrictive.
Older style guides make two firm points about the difference between the two types of clause:
Restrictive clauses are introduced by that and are not separated from the rest of the
sentence by commas.
Non-restrictive clauses are introduced by which and must be separated by commas from
the rest of the sentence to indicate parenthesis.
The problem is that few people have followed these rules systematically, and you can find lots of
examples where the relative pronoun which is used to start a restrictive clause. The 1965 edition
of Fowler’s Modern English Usage comments:
If writers would agree to regard that as the defining relative pronoun, and which as the non-
defining, there would be much gain both in lucidity and in ease. Some there are who follow this
principle now; but it would be idle to pretend that it is the practice either of most or of the best
writers.
This is even more true today than when he wrote it and most modern style guides say that either
relative pronoun can be used with restrictive clauses. For example, I found this sentence quoted
approvingly as an example under the equivalent section in “Oxford English”:
A suitcase which has lost its handle is useless.
The clause “which has lost its handle” is certainly restrictive. If you take it out, you are left with “A
suitcase is useless”, obviously a different meaning to that intended. So, according to Fowler’s rule,
the which ought to be that.
Despite the shift in style, there remain some situations in which that is still regarded as preferable
to which, though they’re difficult to tie down. Here are some instances, but don’t take them as a
full list of cases, and they are tendencies, not full-blown rules:
In clauses that follow impersonal constructions, such as it is, that is preferred: “It was the
dog that died”.
Clauses which refer back to the words anything, nothing, something, or everything have a
slight preference for that over which: “Can you think of anything that still has to be done?”
Clauses which follow a superlative also tend to prefer that: “Thank you for the most superb
dinner that I’ve ever eaten”.
In part, it seems probable that this preference is derived from stress and rhythm. The word that
contains “soft” sounds and is usually unstressed, whilst which contains a “harder” initial sound
and is easier to stress. Several writers note that that tends to be preferred in speech, which may be
due to the comparative ease with which that is and similar phrases can be contracted, for example
to that’s, compared with the equivalent expressions using which.
Though you can use which instead of that in restrictive clauses, you can’t do so the other way
round: non-restrictive clauses ought always to start with which. Also, you can’t change the
punctuation rules; it is particularly important to watch this point if you decide to use which in a
restrictive clause, as otherwise your poor reader has no clue at all how you intend the sentence to
be read. Here is a rather artificial example to make the point:
The cup which he stepped on is in the bin.
The cup, which he stepped on, is in the bin.
In the first, you are being told about a specific cup with the special property that it is the one he
stepped on; in the second, the fact that he stepped on it is an ancillary bit of information. My view
is that punctuation is more important than choice of pronoun in such situations. You won’t be
thought wrong if you use that in the first case (and will avoid the thunder of pedants’
condemnation) but you will be justly criticised if you leave out the commas in the second.
A further point worth noting is that the opening pronoun in restrictive clauses is frequently left
out, so that you can say “The cup he stepped on is in the bin”. Again, you can’t do this with non-
restrictive clauses.
If you wish to write naturally, don’t fuss too much about the usage of that versus
which. Obsessive correction (what has sarcastically been called a “which hunt”) is
best avoided. If your sense of the language is not strong enough to be sure of the
right pronoun, use that for the restrictive cases and which for the others and you
won’t go wrong. (bold by Carlo)
HOW MANY WORDS?
How many in the language and how many
does any one person know?
One of the more common questions that arrive for the Q&A section asks how many words there
are in the English language. Almost as common are requests for the average size of a person’s
vocabulary. These sound like easy questions; I have to tell you that they’re indeed easy to ask. But
they’re almost impossible to answer satisfactorily, because it all depends what you mean by word
and by vocabulary (or even English).
What we mean by word sounds obvious, but it’s not. Take a verb like climb. The rules of English
allow you to generate the forms climbs, climbed, climbable, and climbing, the nouns climb and
climber (and their plurals climbs and climbers), compounds such as climb-down and climbing
frame, and phrasal verbs like climb on, climb over, and climb down. Now, here’s the question
you’ve got to answer: are all these distinct words, or do you lump them all together under climb?
That this is not a trivial question can be proved by looking at half a dozen current dictionaries.
You won’t find two that agree on what to list. Almost every word in the language has this fuzzy
penumbra of inflected forms, separate senses and compounds, some to a much greater extent
than climb. To take a famous case, the entry for set in the Oxford English Dictionary runs to
60,000 words. The noun alone has 47 separate senses listed. Are all these distinct words?
And in a wider sense, what do you include in your list of words? Do you count all the regional
variations of English? Or slang? Dialect? Family or private language? Proper names and the
names of places? And what about abbreviations? The biggest dictionary of them has more than
400,000 entries—do you count them all as words? And what about informal and formal names for
living things? The wood louse is known in Britain by many local names—tiggy-hog, cheeselog, pill
bug, chiggy pig, and rolypoly among others. Are these all to be counted as separate words? And,
to take a more specialist example, is Saccharomyces cerevisiae, the formal name for bread yeast,
to be counted as a word (or perhaps two)? If you say yes, you’ve got to add another couple of
million such names to the English-language word count. And what about medical terms, such as
syncytiotrophoblastic or holoprosencephaly, that few of us ever encounter?
The other difficult term is vocabulary. What counts as a word that somebody knows? Is it one
that a person uses regularly and accurately? Or perhaps one that will be correctly recognised—say
in written text—but not used? Or perhaps one that will be understood in context but which the
person may not easily be able to define? This distinction between what linguists call active and
passive vocabularies is hard to measure, and it skews estimates.
The problem doesn’t stop there. English speakers not only know words, they know word-forming
elements, such as the ending -phobia for some irrational fear. A journalist rushing to meet a
deadline might take a word he knows, like Serb, and tack on the ending to make Serbophobia.
He’s just added a word to the language (probably only temporarily), but can he really be said to
have that word in his vocabulary? If nobody ever uses it again, can we legitimately count it? By
reversing the coining process, a reader of the newspaper can easily work out the word’s origin and
meaning. Has the reader also added a word to his vocabulary?
Can you now see why estimates of the total number of words in the English language and in a
person’s vocabulary are so difficult to make, and why they vary so much one from another? David
Crystal, in the Cambridge Encyclopedia of the English Language, suggests that there must be at
least a million words in the language. Tom McArthur, in the Oxford Companion to the English
Language, comes up with a similar figure. David Crystal further says that if you allow all scientific
terms the total could easily reach two million (this doesn’t count the formal names for organisms I
spoke about earlier, just technical vocabulary).
Assessing the size of the vocabulary of an individual is at least as problematical. Take
Shakespeare: you’d think it would be easy to assess his vocabulary. We have the plays and sonnets
and we just have to count the words in them (according to the American Heritage Dictionary,
there are 884,647 of them, made up of 29,066 distinct forms, including proper names). But
estimates of Shakespeare’s vocabulary vary from about 18,000 to 25,000 in various books,
because writers have different views about what constitutes a distinct word.
It’s common to see figures for vocabulary quoted such as 10,000-12,000 words for a 16-year-old,
and 20,000-25,000 for a college graduate. These seem not to have much research to back them
up. Usually they don’t make clear whether active or passive vocabulary is being quoted, and they
don’t account for differences in lifestyle, profession and hobby interests between individuals.
David Crystal described a simple research project—using random pages from a dictionary—that
suggests these figures are severe underestimates. He concludes that a better average for a college
graduate might be 60,000 active words and 75,000 passive ones. But this method of assessing
vocabulary counts dictionary headwords only; it would be possible to multiply it several-fold to
include different senses, inflected forms, and compounds. Another assessment—of a million-word
collection of American texts—identified about 38,000 headwords. Bearing in mind this was all
general writing, this doesn’t sound so different from David Crystal’s estimates for graduate
vocabularies.
GENDER-NEUTRAL PRONOUNS
Can one avoid sexist writing?
As English has no gender-neutral pronoun in the singular (its can only be used of objects, not of
people) writers are faced with a knotty problem when they want to speak of one person, but either
don’t want to identify that person by sex, or don’t know what it is. This is a matter of increasing
importance as writers and their readers are becoming more sensitive to the sexist implications of
such language. Various solutions are possible:
Use the male pronoun as the gender-neutral pronoun (Your child should always
be comforted when he cries). This is the traditional solution and the one still advocated in many
style books. However, it is increasingly being seen as unacceptable.
Use both pronouns together, such as he or she or he/she (Ask the first shop assistant
you find whether he or she can tell you the price). Though this may be unexceptionable enough
from the point of view of gender, it’s a messy and ungainly solution stylistically, and one to be
avoided.
Use another pronoun instead, in particular they/their (if that spectator keeps waving
their arms about, someone is going to get hurt). Some people dislike seeing this in print, though it
is increasingly common in speech and informal usage and is rapidly becoming a standard. There
are many historical precedents for it (see, for example, Henry Churchyard’s collection of
examples).
Invent a new pronoun. This would be the ideal solution, but pronouns are part of the
deepest core of our vocabulary and it has been a very long time indeed since a new one has come
into the language. However, over the years a large number of such suggestions have been put
forward, though the only ones seen at all frequently are sie and hir. There are great barriers to
using them, especially unfamiliarity and the lack of any consensus about which to use. But if
enough writers turn to them, they could become mainstream terms in short order.
Alternate male and female forms. Avoid this within one text, as it leads only to
confusion on the part of the reader.
Use the female pronoun instead. Writers do use she as a conscious alternative
relatively frequently. However, it is as open to the arguments about inherent sexism as continuing
to use he for the generic form.
Rephrase the sentence to avoid the need for a pronoun. So instead of The
customer went in search of a mechanic to ask him for advice one could say The customer went in
search of a mechanic to ask for advice. This often works, but if you are writing in the active mood,
the changes to the passive for the circumlocutions can be irksome.
Avoid the pronoun by repeating the noun it replaces. This is sometimes
practicable but, after all, the main reason for using pronouns is to avoid such repetition and you are
then presented with a different problem. In moderation, however, and in combination with other
methods, it can help; see the first sentence of this section, for example, where I have used one
person ... that person.
Use the plural. Again, I did this in the first sentence above when speaking about writers.
This avoided having to say The writer is faced with a knotty problem when he/she wants to ... .
When the context permits, this is the simplest way out of the difficulty.
As you can see, there is no perfect solution. The best options seem to be to use the plural
pronouns them and their in casual or informal writing and rewrite your text to avoid the problem
in more formal writing.
UNPAIRED WORDS
Accentuating the negative
Re-reading P G Wodehouse’s The Code of the Woosters the other day reminded me of the many
words in English which are the negatives of words whose positive forms are now obsolete or rare.
He spoke with a certain what-is-it in his voice, and I could see that, if not actually disgruntled,
he was far from being gruntled. There are many such unpaired negatives. We can say someone is
unkempt, unruly, disconsolate or uncouth, but we can’t normally say that he is kempt,
ruly, consolate or couth unless we are exploiting the unfamiliar word for humorous effect. It is
very often the negated form which has survived: we seem to find negative words more useful and
so more enduring. It’s clearly not the disapproving or derogatory meaning we seek in these
negatives, as we can describe something as ineffable, unscathed, indomitable, innocent or
innocuous but not the inverse.
The word unkempt has a complicated history. Kempt comes from the Old English word kemb,
“comb”. It seems to have gone out of use about 1600 but to have been reintroduced about 1860.
Its usual and literal negative form was unkembed which survived into the middle of the
nineteenth century. The form unkempt began to be used about 1580 to mean “language that was
inelegant or unrefined”. In the eighteenth century it came to mean specifically “uncombed;
dishevelled”, perhaps influenced by the Flemish equivalent ongekempt, and was used alongside
the older form for about a century, only taking on a stronger sense of “neglected; not cared for” in
the middle of the nineteenth century. Incidentally, the root form of kemb seems to come from a
Germanic form which meant “tooth”, so a comb is named for its teeth; the modern form
uncombed appeared about 1560.
Hardly anyone who uses unruly realises that it was originally formed as the opposite of ruly, an
adjective formed about 1400 from rule (as in rule of law), to mean “law-abiding; disciplined;
orderly”. Someone unruly was ungovernable or disorderly; the modern sense is a weakening of
this. Someone ungainly is now “awkward, clumsy, ungraceful”, a sense which developed about
1600; its opposite gainly, never very common, was formed sometime after 1300 from the
adjective gain, meaning “straight; near”. This was used especially in the phrase the gainest way,
meaning the shortest, most direct route, but it quickly took on a figurative sense when applied to
people of “well-disposed; kindly”, and of “useful; convenient” for objects; the root form is also the
source of our words again and against. Unwieldy comes from an Old English verb wield,
derived from the same Indo-European source meaning “to be strong” as the Latin word from
which we get valient. It variously meant “rule; govern; command; possess” and “to control;
manage; deal with successfully”. Its adjective wieldy was derived from this latter sense and
applied to persons, not things, in the sense “capable of wielding one’s body or weapon; active,
agile, nimble”. So to be unwieldy was to be clumsy or incapable or infirm. Only later was the word
transferred to the thing being manipulated to give us the modern sense. Our word untoward is
formed from an obsolete medieval sense of toward applied to young people: “promising; making
good progress; moving forward (in ability)”; untoward was first applied only to people in the
sense of “stubborn; intractable; disinclined (to work)” and developed its modern meaning of
“unseemly; inconvenient; perverse” from about 1630.
Strictly speaking, inept doesn’t belong in this list, as there was never a word ept of which it was
the negative form. It was actually formed from the Latin ineptus, “unsuited, absurd, foolish”, at
the beginning of the seventeenth century. However, it just scrapes a place because ept was created
from it in modern times (by E B White of the New Yorker in 1938) and it turns up from time to
time in humorous or deliberately incongruous contexts, as do its derived forms eptly and
eptitude. Similarly, dishevelled comes from the Old French deschevelé and was not derived
from a word shevelled. That word was created from it later by losing its first syllable through a
process called aphesis and had the same sense. However, it was never common and has long since
vanished from the lexicon.
There are a number of other words which begin with in or dis and so look like English negative
formations but which came into the language from French or Latin with their negative prefixes
already present; other examples are dismayed (from the Old French verb desmaier) and
disparate (from the Latin disparatus). Others came across at various times already formed into
pairs, such as mantle and dismantle which came from the equivalent French verbs in about
1400 and 1580 respectively. (The early literal sense of dismantle was to remove one’s cloak or
mantle, and hence to undress; it was later applied figuratively to the process of stripping a fortress
of its defences; all these meanings existed in French before the word came into English. Here,
mantle has not vanished, though it is rarer than its opposite.) Similarly, consolate and
disconsolate were introduced from Latin consolatus and disconsolatus, the former by Caxton in
1489, the latter half a century earlier.
The word couth was once common. It was a form of the Old English word cunnan, “well-known;
familiar” (related to the modern German kennen). So uncouth meant “unknown; unfamiliar’.
Over the years this developed in the spirit of that old Punch cartoon: “Who’s ’e? A stranger? ’Eave
’alf a brick at ’im!”, which encapsulated the notion that what was unfamiliar was also strange,
foreign, suspect and unacceptable. The modern sense of “awkward and uncultured” only
developed in the sixteenth century. The positive form couth went out of use later that century
except in Scotland. It was re-introduced in 1896 by Max Beerbohm as a deliberate and humorous
back-formation from uncouth but has never really become established again in mainstream
English.
Another word in which a modern inverse has taken hold is disabled. This was formed in the
sixteenth century from the verb disable, but the corresponding adjective abled seems not to have
been used at that time. It was created by back-formation in the US in the early eighties by disabled
people to refer to those not so affected and which became part of euphemistic phrases like
differently abled. And the term disgruntled that started this exploration is actually not that old.
It is first recorded as an adjective by the OED only in 1847, though its verb disgruntle had first
appeared about 1680. That was the inversion of the even older verb gruntle which was a form of
the verb grunt used for frequently-repeated action; both grunt and gruntle were used of people
expressing a sound “as of discontent, dissent, effort or fatigue” and hence took on something of
the sense of grumble (though the words are unrelated). Though the dis- prefix can often indicate a
negative, here it acts to intensify the root word.
Another group of unpaired words are those ending in the negative suffix -less for which the
corresponding antonym in -ful do not exist. Examples are ageless, countless, hapless (formed
from the obsolete Old English term hap, “fortune; chance”), leafless, peerless (based on the old
sense of peer as “one’s equal in standing or rank”), toothless and voiceless.
There are, of course, positive words for which no common negative form exists. Sometimes this is
because an appropriate negative term already exists: bad is preferred to ungood unless you are of
an Orwellian disposition or trying to be funny. Other examples with no direct negative forms are
nice and fascinate. Some examples of words ending in -ful that have no forms in -less are awful,
bashful and deceitful.
WHERE IT’S AT
Names for a common symbol
The @ symbol has been a central part of the Internet and its forerunners ever since it was chosen
to be a separator in e-mail addresses by Ray Tomlinson in 1972. From puzzled comments which
surface from time to time in various newsgroups, it appears the biggest problem for many Net
users is deciding what to call it. This is perhaps unsurprising, as outside the narrow limits of
bookkeeping, invoicing and related areas few people use it regularly. Even fewer ever have to find
a name for it, so it’s noted mentally as something like “that letter a with the curly line round it”.
Its use in business actually goes back to late medieval times. An Italian academic, Giorgio Stabile,
a professor of the history of science at La Sapienza University, claimed recently to have found
evidence of its use in the records of Florentine merchants nearly 500 years ago. At that time, it
was either a unit of weight or of volume, representing one amphora, a measure that was based on
the capacity of the standard terracotta jars that were then employed to transport grain and liquid
about the Mediterranean (the capacity of an amphora was one thirtieth of a barrel). The sign was
a handwritten letter A (for amphora), embellished in the typical Florentine script.
Previously, the symbol had been thought to be a contraction for the Latin word ad, meaning “to,
toward, at”; it was thought that in cursive writing the upright stroke of the d had curved over to
the left and extended around the a so that eventually the lower part fused with the a to form one
symbol.
Whatever its source, in northern Europe the symbol seems to have soon adopted its modern sense
of “at the price of”. It was used in accounts or invoices to give the unit price of something (“3 yds
of lace for my lady @ 1/4d a yard”).
Because business employed it, it was put on typewriter keyboards from about 1880 onwards,
though it is very noticeable that the designers of several of the early machines didn’t think it
important enough to include it (neither the Sholes keyboard of 1873 nor the early Caligraph one
had it, giving preference to the ampersand instead). Later it became part of the standard keyboard
set and it was carried over to the standard computer character sets of EBCDIC and ASCII in the
sixties. From there, and especially because of its ubiquity in the Internet, it has spread out across
the networked world, perforce even into language groups such as Arabic, Tamil or Japanese which
do not use the Latin alphabet.
A discussion on the LINGUIST discussion list about names for @ in various languages produced
an enormous response, from which most of the facts which follow are drawn. Some have just
transliterated the English name “commercial at” or “at” into the local language. What is
interesting is that nearly all the languages cited have developed colloquial names for it which have
food or animal references.
In German, it is frequently called Klammeraffe, “spider monkey” (you can imagine the monkey’s
tail), though this word also has a figurative sense very similar to that of the English “leech” (“He
grips like a leech”). Danish has grisehale, “pig’s tail” (as does Norwegian), but more often calls it
snabel a, “a (with an) elephant’s trunk”, as does Swedish, where it is the name recommended by
the Swedish Language Board. Dutch has apestaart or apestaartje, “(little) monkey’s tail” (the “je”
is a diminutive); this turns up in Friesian as apesturtsje and in Finnish in the form apinanhanta.
Finnish also has kissanhäntä, “cat’s tail” and, most wonderfully, miukumauku, “the miaow sign”.
In Hungarian it is kukac, “worm; maggot”, in Russian “little dog”, in Serbian majmun, “monkey”,
with a similar term in Bulgarian. Both Spanish and Portuguese have arroba, which derives from a
unit of weight or volume that Professor Stabile suggests is closely related to that of the amphora—
25lb weight (just over 11kg) or six Imperial gallons (nearly 23 litres). In Thai, the name translates
as “the wiggling worm-like character”. Czechs often call it zavináč which is a rolled-up herring or
rollmop; the most-used Hebrew term is strudel, from the famous Viennese rolled-up apple sweet.
Another common Swedish name is kanelbulle, “cinnamon bun”, which is rolled up in a similar
way.
The most curious usage, because it seems to have spread furthest from its origins, whatever they
are, is snail. The French have called it escargot for a long time (though more formal terms are
arobase or a commercial), but the term is also common in Italian (chiocciola), and has recently
appeared in Hebrew (shablul), Korean (dalphaengi) and Esperanto (heliko).
In English the name of the sign seems to be most commonly given as at or, more fully,
commercial at, which is the official name given to it in the international standard character sets.
Other names include whirlpool (from its use in the joke computer language INTERCAL) and fetch
(from FORTH), but these are much less common. A couple of the international names have come
over into English: snail is fairly frequently used; more surprisingly, so is snabel from Danish.
Even so, as far as English is concerned at is likely to remain the standard name for the symbol.
But there is plenty of evidence that the sign itself has moved out from the Internet to printed
publications. For a while it seemed likely to become a standard signal for the Internet, though the
popularity of e- compounds seems to have set that trend back somewhat.
Quite a history for a modest little symbol ...