What is corpus linguistics?
Four essential characteristics are: 1. it is empirical, analyzing the actual patterns of use in natural texts 2. it utilizes a large and principled collection of natural texts, known as a ‘corpus’, as the basis for analysis 3. it makes extensive use of computers for analysis, using both automatic and interactive techniques 4. it depends on both quantitive and qualitative analytical techniques Why do corpus linguistics? Corpus linguistics allows us to ask (and answer) previously unanswerable questions. Corpus linguistics can provide interesting and meaningful insights into areas of language use where our intuitions often fail us – or worse yet, give us the wrong information. What are major issues related to selecting or designed a corpus? The goals for use and the research questions to be answered shape corpus selection and design. Considerations: 1. Research questions/goals 2. Representativeness (Does the corpus capture the range and variation?) 3. Size (number of texts, number of words, number of words per file) 4. File format (text only, annotation) 5. Header design (SGML, COCOA, XML) 6. Compatibility of file formats 7. Tagging and mark-up 8. Permission and use agreements (from Randi Reppen, Introduction to corpus design and analysis, paper presented at the Fourth North American Symposium of the American Association for Applied Corpus Linguistics, Indianapolis 1-3 november 2002.
Computerized corpora available for B, C and D essay projects
Brown 1 million words Written AmE Registers A-R ca 1961 LOB 1 million word Written BrE Registers A-R ca 1961 London-Lund 500,000 words Spoken BrE Registers 1-12 ca 1959-1973 Frown 1 million words Written AmE Registers A-R 1991 FLOB 1 million words Written BrE Registers A-R 1991 British National Corpus (BNC) Spoken 10 M Written 90 M 1970s-1990s ACE 1 million words Written AuE Registers A1980s Wellington C 1 million words Spoken & written NZ English 1980s American National Corpus (ANC) Spoken 3,8 M Written 18,3 M 1990s Helsinki C 1.5 M Old English, Middle E, EModE Kolhapur 1 million words Written Indian E Registers A-R 1970s-1980s
USE: Uppsala Student English
ESPC: EnglishSwedish Parallel Corpus
CSPA: Corpus of Spoken Professional American English ULEC: Uppsala Learner English Corpus
CONCE: A Corpus of 19th-century English
Upenn: Corpus of Middle English Prose
Department of English Uppsala University
Printing: 4 September 2008
UPPSALA UNIVERSITET Engelska institutionen
Version: november 2007 WEN/GER
Computerized corpora available for B, C and D essay projects
The following English language corpora can be used by students of English at the English Department, Uppsala University. Most of the corpora (except those marked *) can be accessed on the computers in Engelska parken. Note that the use of these corpora is allowed only for students enrolled in courses at the English Department, which is the holder of a license to use them for research only. It is illegal to copy them (except for examples excerpted for a specific project) or to transfer them to other computers. Swipe card and personal code are required for entry and use of the corpora. For the BNC Corpus, an extra code is necessary (obtained from your supervisor). The BNC Corpus is searched with the help of SARA, whereas most of the others can be searched with WordSmith Tools. Further information about the corpora marked ICAME is provided on the Internet address http://nora.hd.uib.no/icame/newcd.htm, where you can find the answers to most of your questions. You can also read the information found in the book references below. Table 1. The Brown corpus and its sisters:.........................................................................4 Table 2. Text categories in LOB and Brown......................................................................5 Table 3. WSC Wellington Spoken......................................................................................6 Figure 1. Organization of The London-Lund Corpus (LLC) .............................................6 Table 4. Text categories in the London-Lund Corpus........................................................7 Figure 2. Opening screen of Wordsmith Tools. .................................................................7 Figure 3. Opening the files. ...............................................................................................8 Figure 4. Example of KWIC concordance of “deal*”:.......................................................8 Figure 5. Kwic-concordance of “though”...........................................................................9 Table 5. Acronyms of corpora in the directory “c:\wsmith\wsicame”. ..............................9 (Corpora marked * are not directly available to students but can still be used through special arrangement.) A. Written present-day English 1. The Brown Corpus. ICAME. American English. 1961. (See Kennedy. 1998: 23–27.) 1 million words. Tagged and untagged. 15 text categories with a varying number of 2000-word extracts (‘texts’). A–J contain expository prose and K–R contain fiction. The first three (A–C) consist of newspaper prose. A: Press: Reportage, 44 texts B: Press: Editorials, 27 texts C: Press: Reviews, 17 texts D: Religion, 17 texts E: Skills and hobbies, 36 texts F: Popular lore, 48 texts G: Belles lettres, biography, memoirs etc., 75 texts H: Miscellaneous (mostly government documents), 30 texts J: Learned and academic writing, 80 texts K: General fiction, 29 texts L: Mystery and detective fiction, 24 texts M: Science fiction, 6 texts N: Adventure and western fiction, 29 texts P: Romance and love story, 29 texts R: Humour, 9 texts
Corpus information 1
2. The Lancaster–Oslo–Bergen Corpus (LOB). ICAME. British English. 1961. (See Kennedy. 1998: 27–29.) 1 million words. Tagged and untagged. 15 text categories of a varying number of 2000-word extracts (‘texts’). A–J are expository prose and K–R are fiction. The first three (A–C) contain newspaper prose. Modelled on the Brown Corpus (see above) and designed for comparisons between American and British English. The same text categories as in Brown, with some minor differences concerning the number of texts in each category. 3. The Freiburg-Brown Corpus (FROWN). ICAME. American English. 1991. (See Mair. 1997.) 1 million words. Modelled on Brown. Suitable for investigations of American—British differences as well as language development over a 30-year span. 4. The Freiburg-LOB (FLOB). ICAME. British English. 1991. (See Mair. 1997.) 1 million words. Modelled on LOB. Suitable for investigations of language development over a 30-year span. 5. The Uppsala Press Corpus (UPC).* British English, 1994. (Available from WEN. See Axelsson. 1998: 21–23, 195) About 276,000 words matching the three newspaper text categories in the Brown and LOB / FROWN and FLOB corpora. Tagged and untagged. 6. The Australian Corpus of English (ACE) (The Macquarie Corpus of Written Australian English.) ICAME. 1 million words. Modelled on LOB. 1986. (See Kennedy. 1998: 29–31.) Tagged and untagged. 7. The Wellington Corpus (New Zealand). ICAME. 1 million words. Tagged and untagged. Modelled on LOB. 1986-1990. (See Kennedy. 1998: 29–31.) 8. The MicroConcord Corpus. Oxford University Press. British English from the 1970s, 80s and early 90s. Two million words, distributed in two ‘collections’ of roughly one million words each, newspaper texts from the Independent (MicroConcord corpus collection a (mca)) and academic texts in various disciplines from journals and books published by Oxford University Press (MicroConcord corpus collection b (mcb)). See copy of content information from the accompanying booklet. 9. The Uppsala Student English Corpus (USE).* (See Axelsson, Margareta Westergren. 2000. See also the English Department Home page at http://www.engelska.uu.se/.) A collection of student essays from the writing courses in English. Approximately 800, 000 words. Suitable for studies of learner English. Partly tagged. Limited access with Margareta Westergren Axelsson as supervisor. 10. American and British newspapers on CD-rom*: The New York Times 1994-1999. About 4 million words. The Times 1995. Changing Times 1785-1992. Ten million words. The Guardian and The Observer 1999. B. Written and spoken language 1. The British National Corpus (BNC). (See Kennedy. 1998: 50–54; Berglund. 1997. Using the British National Corpus (a description); Berglund. 1999. Introduction to SARA (Instructions for use.) See also http://info.ox.ac.uk/bnc British English. 100 million words, with about 10 million words of spoken language. No specific year. Collected roughly 1975 – 1990.
Corpus information
2
2. The BNC Sampler. One million words of written English and one million words of spoken English from the BNC. See information included on the CD C. Spoken language 1. The London–Lund Corpus (LLC). ICAME. (See Kennedy. 1998: 31–33.) 500,000 words (100 texts of 5000 words each). 1970s. 34 of these 100 texts have been printed as a book (Svartvik&Quirk, 1980), under the title A Corpus of English Conversation (CEC). 2. The Lancaster/IBM Spoken English Corpus (SEC). ICAME. (See Kennedy. 1998: 36.) 55,000 words. Collected 1984–1987. Prosodically marked. Various tagged versions. 3. The Corpus of London Teenage Language (COLT). ICAME. 500,000 words. 4. The Wellington Spoken Corpus (New Zealand). ICAME. (See Kennedy. 1998: 38.) 1 million words. Collected 1998–1993. Tagged and untagged. 5. Corpus of Spoken Professional American-English (CSPA).* See http://www.athel.com/cspa.html 6. Corpus of spoken Ulster Irish. * (Tagged and untagged) Contact GER for more information. D. Historical English (See Diachronic corpora. Kennedy. 1998: 38–40.) 1. The Helsinki Corpus of English Texts: Diachronic Part 1.5 million words. Texts from Old English (c. 750) to Early Modern English (to c. 1700). ICAME. Letters, sermons, diaries, legal and official documents, plays and other genres. 2. The Helsinki corpus of Older Scots. ICAME. 1450–1710. 830,000 words. 3. The Corpus of Early English Correspondence Sampler. ICAME. (http://www.helsinki.fi/doe/projects/ceec) Ca. 1420 – ca. 1680. 450,000 words. 4. The Newdigate Newsletters. ICAME. 17th century. 750,000 words. 5. The Lampeter Corpus of early English Tracts. ICAME. 1640–1740. 1.1 million words. 6. The CONCE Corpus.* The Corpus of Nineteenth Century English. (See Kytö, Rudanko and Smitterberg 2000.) 7. The Archer Corpus.* E. Parsed corpora (clause elements analysed). 1. The Polytechnic of Wales Corpus (POW). ICAME. (See Kennedy. 1998: 41–42.) 65,000 words spoken by 120 children aged 6–12 years in South Wales. Designed to show the acquisition and development of syntactico-semantic structures in children’s language. 2. Lancaster Parsed Corpus (LOB). ICAME.
Corpus information 3
3. The Penn-Helsinki Parsed Corpus of Middle English.* (see http://www.ling.upenn.edu.modeng/) Middle English prose text samples annotated for syntactic structure (from the Helsinki corpus, see above). 510,000 words. References: Axelsson, Margareta Westergren. 1998. Contraction in British newspapers in the late 20th century. (Information on the UPC corpus to be copied from master copy.) Axelsson, Margareta Westergren. 2000. USE – The Uppsala Student English Corpus: An instrument for needs analysis. ICAME Journal 24: 155-157 Axelsson, Margareta Westergren. 2000. USE – The use of a corpus of students' written production in university English teaching. Korpusar i forskning och undervisning (KORFU 99). Gunilla Byrman, Hans Lindquist och Magnus Levin (red). Växjö: Reports from Växjö University. Humanities. Axelsson, Margareta Westergren and Ylva Berglund (forthcoming). The Uppsala Student English Corpus (USE): A multi-faceted resource for research and course development. (To be copied from master copy.) Berglund, Ylva. 1999a. Introduction to SARA (Instructions for use. To be copied from master copy.) Berglund, Ylva. 1999b. A somewhat annotated guide for further reading (about corpus linguistics). (To be copied from master copy.) Berglund, Ylva. 1997. Using the British National Corpus (a description); Kennedy, Graeme. 1998. An introduction to corpus linguistics. London and New York: Longman. (Relevant parts of the book to be copied from master copy.) Kytö, Merja, Juhani Rudanko and Erik Smitterberg. 2000. Building a bridge between the present and the past: A corpus of 19th-century English. ICAME Journal 24:85-97. MicroConcord. Academic texts. 1993. Oxford University press. (Content information to be copied.) Mair, Christian. 1997. Parallel corpora: A real-time approach to the study of language change in progress. Corpus-based studies in English. Papers from the seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17), Stockholm, May 15-19, 1996, ed. by Magnus Ljung. 195-209. Amsterdam: Rodopi. MicroConcord. British newspaper texts. 1993. Oxford University press. (Content information to be copied.) Table 1. The Brown corpus and its sisters: 1 million Written American Brown words English 1 million Written British LOB words English FROWN 1 million Written American words English 1 million Written British FLOB words English 1 million Written Australian ACE words English 1 million Written New WC words Zealand English Kolhapur 1 million Written Indian words English
15 text categories labelled A-R 15 text categories labelled A-R 15 text categories labelled A-R 15 text categories labelled A-R 15 text categories labelled A-W 15 text categories labelled A-R 15 text categories labelled A-R
1961 1961 1991 1991 1980s 1980s 1970s
Corpus information
4
The Lancaster Oslo/Bergen (LOB) Corpus is a million-word collection of present-day British English texts, compiled under the direction of Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen. Like its American counterpart, the Brown Corpus (see Francis and Kucera 1979), it contains 500 text samples of approximately 2,000 words distributed over 15 text categories: Table 2. Text categories in LOB and Brown. Text categories Number of samples in each category Brown LOB Corpus Corpus Press: reportage 44 44 Press: editorial 27 27 Press: reviews 17 17 Religion 17 17 Skills, trades and hobbies 36 38 Popular lore 48 44 Belles lettres, biography, essays 75 77 Miscellaneous (government documents, 30 30 foundation reports, industry reports, college catalogue, industry house organ) Learned and scientific writings 80 80 General fiction 29 29 Mystery and detective fiction 24 24 Science fiction Science fiction 6 6 Adventure and western fiction 29 29 Romance and love story 29 29 Humour 9 9 500 500
A B C D E F G H
J K L M N P R Total
For more details, see the LOB Corpus Manual of Information (Johansson et al 1978). The present manual deals with the tagged versions of the corpus. For more information on sampling and sources of the texts, the user must turn to the original manual.
Manuals and information about the ICAME corpora at: http://nora.hd.uib.no/icame/newcd.htm
Corpus information
5
The Wellington Corpus: spoken component Table 3. WSC (Wellington Spoken) CATEGORIES Code Text Category Number of Extracts MSN Broadcast news 36 MST Broadcast monologue 5 MSW Broadcast weather 12 MUC Sports commentary 10 MUJ Judge's summation 2 MUL Lecture 14 MUS Teacher monologue 8 DPC Conversation 226 DPF Telephone conversation 46 DPH Oral history interview 10 DPP Social dialect interview 11 DGB Radio talkback 37 DGI Broadcast interview 40 DGU Parliamentary debate 14 DGZ Transactions and 80 Meetings TOTAL 551
Word Target 24,000 10,000 2,000 20,000 4,000 28,000 12,000 500,000 70,000 20,000 30,000 80,000 80,000 20,000 100,000 1,000,000
Words Transcribed 28,929 11,205 3,641 26,010 4,489 30,406 12,496 500,363 70,156 21,972 31,058 84,321 96,775 22,446 102,332 1,046,599
Figure 1. Organization of The London-Lund Corpus (LLC)
Corpus information
6
Table 4. Text categories in the London-Lund Corpus. S.1 - 35 texts c. 175,000 words Spontaneous, surreptitiously recorded conversations S.3 between intimates and distants. S.1-2: Conversations between equals S.3: Conversations between disparates S.4 7 texts c. 35,000 words. Conversations between intimates and equals. S.5 13 texts c. 65,000 words. S.6 9 texts c. 45,000 words. Non-surreptitious conversations between disparates. S.7 3 texts c. 15,000 words. Surreptitious telephone conversations between personal friends. S.8 4 texts c. 20,000 words. Surreptitious telephone conversations between business associates. S.9 5 texts c. 25,000 words. Surreptitious telephone conversations between disparates. S.10 11 texts c. 55,000 words. Spontaneous commentary: Sport S.11 6 texts c. 30,000 words. Spontaneous oration S.12 7 texts c. 35,000 words. Prepared but unscripted oration. Wordsmith Tools on Windows XP machines: “wshell.exe” Note the disk drive and directory: c:\wordsmith, the corpora can be found in c:\wsicame.
Figure 2. Opening screen of Wordsmith Tools. Wordsmith version 4.
Corpus information
7
Figure 3. Opening the files. Wordsmith version 4. Example of a KWIC concordance (Key-Word-In-Context): (Concord button > File > New) Search: “deal*”
Figure 4. Example of KWIC concordance of “deal*”: Wordsmith version 4.
Corpus information
8
Figure 5. Kwic-concordance of “though”. Wordsmith version 4. Table 5. Acronyms of corpora in the directory “c:\wsicame”. Directory Corpus ACE Australian Corpus of English (written) Brown1 Brown Corpus, format 1 (written) Brown2 Brown Corpus, format 2 (written) Browntag Brown Corpus, tagged version (written) CEECS Corpus of Early English Correspondence Sampler (written) COLT Corpus of London Teenage Language (spoken) FLOB Freiburg-LOB Corpus of British English (written) Frown Freiburg-Brown Corpus of American English (written) Helsinki Helsinki Corpus of English Texts, Diachronic part (written) Ice_ea International Corpus of English, East-African component (written/spoken) Innsbruc Innsbruck Computer-Archive of Machine-Readable English Texts (ICAMET) Kolhapur Kolhapur Corpus of Indian English (written) Lampeter Lampeter Corpus of Early Modern English Tracts (written) LLC London-Lund Corpus (spoken) LOB Lancaster-Bergen-Oslo Corpus (written) LOBTAG Lancaster-Bergen-Oslo Corpus, tagged version (written) Newdigat Newdigate Newsletters (written) Old_Scot Helsinki Corpus of Older Scots (written) POW Polytechnic of Wales Corpus (spoken) SEC Lancaster/IBM Spoken English Corpus WC Wellington Corpus of Written New Zealand English WSC Wellington Corpus of Spoken New Zealand English
Corpus information
9
The Brown Corpus: Key-word-in-context concordance of “efficient”
N Concordance 1 he second floor. She was a clever girl, a most efficient secretary. She let him come and go as he 2 usic with ease, in a non-sentimental and ultra-efficient manner. #@# An impressive technician, 3 ge yourself. It is a full scale, small, but efficient house that can become a year 'round retr 4 ving the wishes of the client fairly and in an efficient manner. But as conversation goes on, 5 h to congratulate Inspector Trimmer and his efficient police troops in cleaning the city of 6 t and replaced the cushion. "They are the most efficient". "And the deadliest", Poet comment 7 o the 40,000 people per day who are provided with efficient, reasonably priced transportation in 8 ith our continuing development of new and more efficient mill machinery, a sounder U& S& incom 9 while his body did these quick, appalling, and efficient things. He brushed by the idiotic b 10 t ordinary maids did for housework- and doubly efficient. When the parents emerged from the bed 11 a dealer who has volume enough to afford the most efficient specialized equipment to deliver e 12 out through down here". It all did look very efficient and shipshape. There was no question 13 re quickly, and to keep it running at its most efficient temperature through the proper circulati 14 s in the Far East, he is likely to urge a more efficient mobilization of Vietnamese military, 15 ntly, this heat pump method of warming air was efficient only in areas of mild winters and when 16 , those most industrialized and therefore most efficient of homebuilders, say they save hundreds 17 and improved to obtain increased production on an efficient basis. The area available at Heywood 18 have done on one trip. Even "America's most efficient builder", Bob Schmitt of Berea, hopes 19 n though these nozzles were only about 5 per cent efficient in producing an initial cloud in the 20 large picnic area or camping development is most efficient in shape as a square or rectangle sev 21 increase our production capacity, permit more efficient manufacturing, and substantially redu 22 based on the policy of designing and building efficient machines which will help produce better 23 . #PROBLEMS OF SHIFTING STYLES# The problem of efficient production in textiles is complicated 24 ddles by the relentless slam, slam of the cruelly efficient Hawkinses. Others, badly wounded, gri 25 y J& Packard, states a editorial, was "efficient, pains-taking, self-effacing, loving 26 ronic switches.) The preceding methods allow efficient use of index words and electronic switch 27 armers who, forgetting that birds are the most efficient natural enemies of insects and rodents, 28 But it is plain that a warning system, however efficient, is not enough. In the vulnerable are 29 oward making the motion picture the intricate, efficient time machine that it has remained sin 30 ar is an obstacle to the planning of clear and efficient state-local revenue and expenditure r 31 values- can turn around and develop completely efficient means for controlling people. Thus we 32 out of it, but that won't matter. It looks pretty efficient and that's the important thing". He w 33 haul visitors, would taxis be cheaper? How efficient and necessary are your intra-company veh
Corpus information
10
The Northern Ireland Corpus of Speech: Key-word-in-context concordance of after + verb + ing (“after *ing”)
WordSmith Tools -- 2002-05-21 13:00:03 N Concordance Set Tag Word No. File 1 says the old RM, says he, what got you get to America after 2 he's *a...* {*Does*, does she sound different after 3 age. You couldn't go to England, or America directly after 4 older men?} Oh, surely. Yes, young fellows after 5 he's the same age as me, and she said that she'd just after 6 ey, eh, they were footed, and put in to rickles {mm}, after 7 y're just questions, like the ones that you were only after 8 hen the bell goes at six you just think you were only after 9 with us for a few days at a time, for, to recuperate after 10 d I was with them for nine years {yes}. {PAUSE} Then after 11 City Hospital and qualified there {ahah}. And then, after 12 all that handy for them, changing over to a tractor, after 13 WS26> Yes. {When do you do that?} After % being {in your head}. Says he, a big boat, having been there for...} No, she leaving school, say 14 years of age {yeah}. leaving school, and things like that, was hearing that somebody seen the papers in footing. *And then...* {*And the asking me there {ahah}, only a little harder going over, and you get out and up again. doing, taking part in so much, eh, singing getting married I went into the Belfast nursing some years there, she nursed for being {mm}... When a man gets up, 50, and bring the cows in. {Well, now, do
The Northern Ireland Corpus of Speech: KWIK concordance lines of “place*” 1 d in Portavogie {ahah}. Beautiful wee place. {Ahah. Is that the one 2 and some from Coleraine, and all them places. {Ahah. And how do they 3 place is Enniskillen?} Big place. {Ahah. And what kind of 4 lephant and Castle was a, another Irish place {ahah}, mm, the Harp at Ne 5 ...?} Aye, Kesh is a smaller place {ahah}, now, than this {mm}. Nic 6 ublin, I think Dublin's a beautiful wee place {ahah}. Although I'm, eh, I came 7 Catholic school out here in the first place {ahah}. But then you go down to G 8 no, do you see *where you were at that place*? {*All over {INDISTINCT} 9 co's two different, it's two different places altogether {mm}. 10 Och, some of them is big, big places altogether {PAUSE}. 11 the pits, it was a big, long area of a place. And that was taken out and shook 12 ll), eh, sell ours to some of the meal places, and they get it sent away to 13 ult of the staff, it would just be the place, and there was no money, I suppose 14 re my mother was reared {aye}, in this place. {And your father from a 15 like?} Well, it was hard in places, and easy in pla(ces), there was
Corpus information
11
The London-Lund Corpus of Spoken English (LLC): KWIC concordance lines of “which”
<40 A> ( . clears throat) ((who)) are ^d\oing LEs# - <41 A> [@:m] ^have to consider which !{p\aper} . to "d\o# . . whether . [dhi] . gra:phology p/aper# <42 A> and I ^wondered
1 B> we`re ^having this meeting of :CSC as:sistant((s)) on the !fourth of Jul/y# <183 B> which is a ^S\/aturday# - <184 B> I`ll ^have about !half a day`s work to look at some/ odd 1 B> from the "^twenty-ninth of J\une# . <194 B> to the ^eighth of Jul/y# . <195 B> on ^ which I can . [@] I can ^spend the !wh\ole of th/at _time# <196 B> on ^those two p/apers# . 1 B> [@] ^I shall g=et# <210 B> [@:] ^scripts from :ten assistant ex:\/aminers# <211 B> which will ^mean . a !couple of days` w\ork# <212 B> ^I shall !get those on about the :eighth 1 ^that`s ((a)) p/\oint# <389 B> as ^w/\ell# <390 B> ^y/\es# <391 A> ^[\m]# - <392 A> ^ which I sup:p=ose# - <393 A> ^one could . repres/\ent# - <394 A> ^quite !r\/easonably# - want the +organi!z\ation stuff#+ . <400 B> ((^y\eah#)) <401 A> *^[\m]#* <402 A> + which would+ ^sound !quite !g\ood# <403 B> [m]^[h\m]# - - <404 B> ^well I suppose Roy can <411 B> and [@:] ^what - I`ve got !three or four years more/ of . ex/amining# <412 B> which ^makes *a* <413 A> *^[\m]#* <412 (B> spot of m/oney _for me# - - - <414 A> ^oh w=ell# 1
Corpus information
12
A Corpus of Formal Spoken American English (CSPA): text extract from a White House press briefing by by MIKE MCCURRY, May 13, 1997
VOICE: So that means you'd expect -- since the House is planning to do mark-up tomorrow, the Senate on Thursday, that means you're expecting this to be done this afternoon. MCCURRY: I wouldn't rule that out, although it may take some additional time. But both sides are working very hard to clarify the understandings needed in order to move forward. VOICE: Does this mean the President today or tomorrow would sign off on the final -- or did he -- has he signed off -MCCURRY: We already signed off on the agreement. What they're doing is ratifying and codifying some of the necessary understandings to move to the next level of activity, which is a formal drafting of the budget resolution.
The UPENN corpus of Middle English Prose (UPENN): concordance lines of “[w” (wh-element)
1 2 3 4 5 6 7 8 9 10 11 12 te ] [v grinde ] [d greot & hwete 1.1[H ] [p on hire freoliche flesch ] 1.1[Q ] [d +da +gekyndes of sennes ] , 1.2[Q 7) ( [s +Te hwise ] [vt aski+d ] 1[Q ah ] [vt loke ] [t nu ] [b biliue ] 1[Q tah ] [s~ na mon bute ham-seolfen ] 1[Q t% [v nuten ] [s +ge ] [t~ neauer ] 1[Q 1,69.252) (V [+ and ] [vt loke ] 1[Q ?} )(ANCRIW,I.48.101) ( [+ Ant ] 1[Q ,111.421) (V [+ and ] [vt loke ] 1[Q t nim+t ] [s he ] [d~ none +gieme ] 2[Q 1[U [a to ] [v fondi ] [d +te ] 3.1.1[Q [w [w [w [w [w [w [w [w [w [w [w [w hwe+der se ] [s ha ] [vt walde ] . hu ] [s ha ] [vt ferden ] [p hwannen and hwanne ] [s hie ] we+der ] [s ani +ting ] [vt hwe+der ] [g +te ] %s-1 x % [vt hwet ] [d ham ] [vt stiche+d ] hwenne ] . Q]1 {and you shall hwa+der ] [s +du ] [vt +tenke][b hwe+der ] %s p % %vt% [j hwite hw+a+der ] [s +du ] [at mu+ge ] hwa+der ] %[s hit ]% [vt bie ] hwe+der ] [s +tu ] [vt beo ] [j
Corpus information
13
Extract from the tagged LOB corpus (LOBTAG)
C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 C01 5 6 6 7 8 8 9 9 10 10 11 11 12 12 13 ^ the_ATI \0BBC's_NP$ dramatised_JJ documentary_NN on_IN Florence_NP Nightingale_NP last_AP night_NN cleverly_RB managed_VBD to_TO suggest_VB the_ATI person_NN behind_IN the_ATI legend_NN ._. ^ while_CS never_RB minimising_VBG the_ATI immensity_NN of_IN her_PP$ work_NN ,_, it_PP3 lifted_VBD the_ATI saintly_JJ halo_NN which_WDTR usually_RB surrounds_VBZ her_PP$ name_NN to_TO reveal_VB a_AT warm_JJ ,_, dedicated_JJ person_NN who_WPR accomplished_VBD most_AP by_IN perseverance_NN and_CC hard_JJ work_NN ._. ^ most_AP stories_NNS of_IN Miss_NPT Nightingale_NP begin_VB and_CC end_VB with_IN her_PP$ work_NN in_IN the_ATI Crimea_NP ._. ^ this_DT one_CD1 started_VBD from_IN that_DT point_NN and_CC devoted_VBD itself_PPL to_IN her_PP$ lifelong_JJ campaign_NN to_TO improve_VB nursing_NN in_IN
Concordance lines of “*ity” in the tagged LOB corpus:
1 2 3 4 5 6 7 8 N ,_, make_VB this_DT film_NN in_IN the_ATI streets_NNS of_IN that_DT city_NN and_CC then_RN fail_VB to_TO find_VB anyone_PN in_IN l_JJ ._. ^ this_DT splendid_JJ disc_NN proves_VBZ Joe's_NP$ versatility_NN ,_, which_WDTR is_BEZ going_VBG to_TO make_VB the_ATI Americans_NNPS love_VB to_TO debunk_VB !_! ^ a_AT pity_NN this_DT country_NN has_HVZ n't_XNOT anything_PN comparable_JJ statement_NN :_: *'_*' ^ the_ATI theme_NN will_MD be_BE the_ATI stupidity_NN of_IN war_NN ._. ^ the_ATI allies_NNS made_VBD e_NN will_MD also_RB be_BE the_ATI theme_NN of_IN *'_*' frightened_JJ city_NN ,_, **'_**' with_IN John_NP Gregson_NP and_CC a_AT stomach-heaving_JJ *'_*' sick_JJ **'_**' joke_NN ._. ^ City_NPL cinemas_NNS ._. ^ a_AT ten-year-old_JJB opus_NN by_IN Alfred ATI party_NN spirit_NN ._. **'_**' ^ \0Mr_NPT (_( *'_*' oh_UH ,_, calamity_NN !_! **'_**' )_) Hare_NP can_MD be_BE seen_VBN thirties_CDS *-_*- one_CD1 remembered_VBN for_IN her_PP$ vivacity_NN in_IN musicals_NNS ,_, and_CC the_ATI other_AP C16 147 for_IN
Corpus information
14
Uppsala universitet Engelska institutionen / GER
11.09.2007
Brown-family corpora: text size and composition (“texts” = number of text samples, “text size” = total number of words in each genre/register)
Text category A B C D E F G H J K L M N P R S W Genre / Register Brown texts 44 27 17 17 36 48 75 30 80 29 24 6 29 29 9 ─ ─ 500 Brown / Frown text size 88000 54000 34000 34000 72000 96000 150000 60000 160000 58000 48000 12000 58000 58000 18000 ─ ─ 1,000,000 LOB texts 44 27 17 17 38 44 77 30 80 29 24 6 29 29 9 ─ ─ 500 LOB / FLOB text size 88000 54000 34000 34000 76000 88000 154000 60000 160000 58000 48000 12000 58000 58000 18000 ─ ─ 1,000,000 ACE texts 44 27 17 17 38 44 77 30 80 29 15 7 8 15 15 22 15 ACE text size 88000 54000 34000 34000 76000 88000 154000 60000 160000 58000 30000 14000 16000 30000 30000 44000 30000 1,000,000 WC texts 44 27 17 17 38 44 77 30 80 126 ─ ─ ─ ─ ─ ─ ─ WC text size 88000 54000 34000 34000 76000 88000 154000 60000 160000 252000 ─ ─ ─ ─ ─ ─ ─ 1,000,000 Kolhapur texts 44 27 17 17 38 44 70 37 80 58 24 2 15 18 9 ─ ─ Kolhapur text size 88000 54000 34000 34000 76000 88000 140000 74000 160000 116000 48000 4000 30000 36000 18000 ─ ─ 1,000,000
Press: reportage Press: editorial Press: reviews Religion Skills, trades and hobbies Popular lore Belles lettres, biography, essays Miscellaneous Learned and scientific writings General fiction Mystery and detective fiction Science fiction Science fiction Adventure and western fiction Romance and love story Humour Historical Women's fiction Total