SLABank by C486He6T


									BilingBank Database Guide
This guide provides documentation regarding the data on bilingualism and second language
acquisition (SLA) in the TalkBank database. All of these data are available from . TalkBank is an international system for the exchange
of data on spoken language interactions. The majority of the corpora in TalkBank have
either audio or video media linked to transcripts. All transcripts are formatted in the
CHAT system and can be automatically converted to XML using the CHAT2XML
convertor. TalkBank data dealing with first language acquisition are available from the
CHILDES site at
To jump to the relevant section, click on the page number to the right of the corpus.
1. BELC (Spanish-English) ............................................................................................ 2
2. Connolly (Japanese-English) ..................................................................................... 8
3. CUHK (Chinese-English) ......................................................................................... 10
4. DiazRodriguez (Spanish-Various)............................................................................ 11
5. Dresden (German-English/French/Czech) .............................................................. 12
6. ESF (Arabic/Finnish/Punjabi/Spanish/Turkish-
Dutch/English/French/German/Swedish) ...................................................................... 14
7. FLLOC (English-French) ........................................................................................ 15
8. Køge (Turkish-Danish) ............................................................................................. 20
9. Langman (Chinese-Hungarian) ............................................................................... 21
10. Liceras ..................................................................................................................... 23
11. PAROLE (Various-English, Various-French) ...................................................... 24
12. Qatar ........................................................................................................................ 27
13. Reading (English-French)...................................................................................... 28
       The Interviews .....................................................................................................................................28
       List of Files ..........................................................................................................................................30
14. SPLLOC (English-Spanish) ................................................................................... 32
15. TCD (English-French) ........................................................................................... 33
   1. BELC (Spanish-English)
The Barcelona English Language Corpus (BELC) has its origin in the Barcelona Age
Factor (BAF) project. This is a project that examines the effects of age on the acquisition
of English as a foreign language.

The BAF Project began at a moment when the changes in the timing of foreign language
instruction brought about by a new Education Law were being progressively
implemented in both primary and secondary schools around Spain, entailing an earlier
introduction of the foreign language in primary education from grade 6 (11 years) to
grade 3 (8 years). The replacement of the previous curriculum by the new curriculum
took eight years, during which it was possible to find pupils who had begun English
instruction at the age of 11, under the previous curriculum, and pupils who had begun
English instruction at the age of 8, under the new curriculum. In addition to these central
groups, two other age groups were also included in the design of the study, one of
adolescents whose initial age of learning English was 14 and one of adults who began
instruction in English at the age of 18 or older.

The research on age effects on the learning of English as a foreign language was
conducted with students from state schools in Catalonia (Spain). It is important to note
that Catalonia is a bilingual community with a majority language, Spanish, known by
practically the totality of the population, and a minority language, Catalan, which is the
community language and the language of instruction in the state school system in
Catalonia. English is the first foreign language in most schools, hence being the third
language of school pupils. It is also important to remark that the earlier introduction of
the foreign language entailed a decrease in intensity. That is, whereas English had been
taught for three hours per week under the former curriculum (beginning in grade 6), at the
time of data collection in the new curriculum it was taught for two hours and a half per
week on average from grade 3 to grade 10, and for two hours per week in grades 11 and
12. The approximate amount of instruction in English was about 750 hours under the
former curriculum, distributed over seven years; and about 800 hours, distributed over ten
years, under the new one.

Data were collected at four times: after 200 hours of instruction, 416 hours, 726 hours
and 826 hours (Time 1, 2, 3, and 4, respectively) though only one of the groups was
available the four times (see Table 1 below). There were 2063 subjects in total, but it
should be noted that a number of them had had more hours of instruction, either because
of extracurricular exposure or because of retaking a course grade. Pupils with only school
exposure (OSE) fulfilled the conditions for comparison. Table 1 below indicates the
number of subjects in each group, the age at which they began instruction in English and
each group’s mean chronological age at testing.

Table 1. Characteristics of subjects in the study
               Group A            Group B            Group C        Group D
               AO = 8             AO = 11            AO = 14        AO = 18+
Time 1             AT = 10;9           AT = 12;9         AT= 15,9       AT = 28;9
200 h.         A1 N = 284         B1 N = 286         C1 N = 40      D1 N = 91
                               OSE = 164                             OSE = 107                         OSE = 21                 OSE = 67
Time 2                         AT = 12;9                             AT = 14;9                         AT= 19,1                 AT = 39;4
416 h.                      A2 N = 278                       B2      N = 240                   C2      N = 11                D2 N = 44
                               OSE = 140                             OSE = 96                          OSE = 4                  OSE = 21
Time 3                         AT = 16;9                             AT = 17;9                 _                                _
726 h.                      A3 N = 338                       B3      N = 296
                               OSE = 71                              OSE = 51
Time 4                         AT = 17;9                                  _                    _                                 _
826 h.                      A4 N = 155
                               OSE = 71

AO = age of onset
AT = age at testing
N = number of subjects
OSE = only school exposure
The data included in BELC correspond to those subjects who could be followed
longitudinally and for whom there are two, three or four collection times over a period of
seven years, although not all subjects fulfilled all the tasks (See Table 2).

The files in the TalkBank database are taken across the four times and across four tasks.
The files are grouped in folders by the tasks. The file names gives first the time (1, 2, 3,
4) then the group (A, B, C), then the task (c, i, n, r), then the subject number (L06, etc).

Written composition. The written composition dealt with a familiar topic: “Me: my past,
present and future”. Students were given a set time (15 minutes), the same for

Oral narrative. The narrative was elicited from a series of six pictures at which the
subjects could freely look before and while they were telling the story in the presence of
the researcher. In the story there are two main protagonists, a boy and a girl, who are
getting ready for a picnic; a secondary character, their mother; and a character that
disappears and later reappears, a dog that gets into the food basket and eats the children's

Oral interview. It was a semi-guided interview that began with a series of questions
about the subject’s family, daily life and hobbies. This constituted a warming-up phase
that helped students feel more at ease. In general, interviewers attempted to elicit as many
responses as possible from the learners, and accepted learner-initiated topics in order to
create as natural and interactive a situation as possible.

Role-play. The role-play task was performed in randomly chosen pairs. In the role-play
one of the students was given the role of the mother/father while the second student was
given the role of the son/daughter. The latter had to ask permission to have a party at
home and both students were asked to negotiate setting, time, activities (music, eating,
drinking), etc. The researcher gave the initial instructions and when needed also elicited

    Younger and less proficient learners did not use up all the time they were given because of their language limitations
talk by reminding learners of topics for discussion or led the task to its completion by
asking about the outcome of the negotiation.

Table 2. Spoken tasks performed by BELC longitudinal learners

Subjec   T1                 T2                   T3                  T4
Tasks    IN    NAR   ROL    IN    NAR    ROL     IN   NAR     ROL    IN    NAR    ROL
         T     R     E      T     R      E       T    R       E      T     R      E
L1                                   
L2                                                                      
L3                                  
L4                                  
L5                                                                        
L6                             
L7                                                                     
L8                                  
L9                                  
L10                                                   
L11                                                                  
L12                                                                  
L13                                                              
L14                            
L15                            
L16                                 
L17                                                   
L18                                                               
L19                                            
L20                                 
L21                                              
L22                                    
L23                                            
L24                                 
L25                                                     
L26                                          
L27                                                                 
 L28                                 
L29                            
L30                                   
L31                            
L32                                 
L33                                 
L34                            
L35                                 
L36                                  
L37                            
L38                                                                         
L39                                                  
L40                                                                       
L41                                                    
L42                                                                  
L43                                                    
L44                                                                       
L45                                                                  
L46                                                                       
L47                                                                   
L48                                                      
L49                                                  
L50                          
L51                                    
L52                                                     
L53                                                                       
L54                                                                         
L55                             

Table 3. Written compositions performed by BELC longitudinal learners.

 Subject       T1        T2           T3           T4
 L01                     
 L02                                 
 L05                                             
 L07                                              
 L10                                             
 L11                                             
 L12                                             
 L13                                              
 L15                    
 L16                                
 L17                                  
 L18                                             
 L19                                             
 L20           
 L21                                  
 L22                    
 L23                                 
 L25                                             
 L27                                
 L28                     
 L29                     
 L31                      
 L36                      
 L38                                              
 L39         
 L40                                  
 L41                     
 L42                                              
 L43                                  
 L44                                               
 L45         
 L46                                              
 L47                                 
 L48                     
 L49                                  
 L50                      
 L52                                              
 L53                                   
 L54                                              

The main results of the BAF Project so far can be found in the volume Age and the Rate
of Foreign Language Learning (see below).

Our research group (GRAL) consists of the following members. Unless otherwise
indicated, all participants are located in the Department of English at the University of

Dr. Mª Luz Celaya (
Dr. Natalia Fullana (Language and Literature Education); ( )
Dr. Roger Gilabert (also Universitat Ramon Llull); (
Ms. Immaculada Miralpeix (
Dr. Joan Carles Mora (
Dr. Carmen Muñoz (Coordinator) (
Dr. Teresa Navés (
Ms. Laura Sánchez (
Ms. Raquel Serrano (
Dr. Mª Rosa Torras (Dpt. of Language and Literature Education); (
Dr. Elsa Tragant (
Dr. Mia Victori (Dpt. of English Philology; UAB); (

Research assistants:
Ms. Cristina Aliaga (
Ms. Júlia Barón (
Ms. Mª Àngels Llanes (
Ms. Anna Marsol

Articles that make use of these data should cite:

C. Muñoz (ed.), (2006) Age and the Rate of Foreign Language Learning. Clevedon:
Multilingual Matters.
   2. Connolly (Japanese-English)
Steve Connolly
Hazawa, 2-12-11
Nerima-ku, Tokyo Japan 176-0003
(03) 5999-5997

This project was entitled “Peer-to-peer discourse journal writing by Japanese Junior High
School ERL Students” and was submitted as a doctoral thesis. A peer-to-peer “secret”
dialogue journal project, emulating projects by Green and Green (1993) and Worthington
(1997), was instituted between 30 Japanese junior high school students at one public
school, and 15 students each at two other Tokyo public schools. The project spanned five
terms during which the students exchanged journals weekly in English with partners, who
changed each term. Using names and school names was forbidden in order to maintain a
sense of mystery, and to force the partners to learn as much as permissible about each
other by communicating in English. The supervising teachers did not correct or respond
to the entries; the researcher occasionally scanned them to check for sole use of the L2
and for appropriateness of content.

There were four entries by each partner in Terms 1, 4, and 5. There were six each in
Terms 2 and 3. The 60 secret journal participants were average public middle-school
students from a Tokyo suburb. They entered the seventh grade at around 12 years old,
and were 12-13 at the beginning of the journal project. At the time the project ended, a
year-and-a-half later, the participants were in the ninth grade and were 14-15 years old.

All three schools that participated in the project are average public schools in the same
ward (county), and all three are in close proximity. Schools N and T are within
approximately 2.75 km and 1.75 km, respectively, of school K. The enrollments of the
schools varied. School T had only two classes of eighth graders: they averaged over 37
students per class. School K had three classes that averaged over 32 students per class,
and school N had four classes which averaged over 30 students per class.

Given that the schools are situated in the same ward, the curricula for the three schools
are uniform and are mandated by a combination of the Japanese federal agency
responsible for education (the Ministry of Education or Mombusho) and the ward
education committee. Mombusho provides general educational guidance, while the ward
committee chooses textbooks and makes other day-to-day administrative decisions.

The students attended three 50-minute English classes per week, which were taught by
their Japanese English teachers, based largely on the grammar-translation approach.
Dependent on the year of the student, the classes included 9-18 classes per year that were
team-taught by a Japanese English teacher and a native English speaker, in an effort to
bring more of a communicative approach to the classroom. The curriculum sequence was
dictated by a textbook common to all of the middle schools in the ward.
The purpose of the study was to investigate the pedagogical efficacy of peer-to-peer
dialogue journals. In addition to the journals themselves, three sets of data were collected
and analyzed: a free-writing quiz, a free-speaking quiz, and term-end surveys. The
journals themselves, the free-writing quiz, and the free-speaking quiz were transcribed
using the CHAT format.

In the third term, 290 eighth graders at all three schools took a surprise ten-minute free-
writing quiz. A one-way MANOVA showed that the journal participants statistically
significantly outperformed the journal non-participants on measures of fluency, accuracy,
and syntactic complexity.

In the fourth term, 96 eighth graders at one school took a surprise recorded three-minute
free-speaking quiz. A one-way ANOVA showed that the journal participants statistically
significantly outperformed the journal non-participants on the measure of fluency.

After each of the first four terms, the participants completed written surveys to gauge
their attitudes toward their partners, the activity, and their feelings about their linguistic
improvement. In general, the responses indicated that the participants enjoyed the project,
and they felt that the journal contributed to increases in their writing and reading
proficiencies, less so to their listening and speaking proficiencies. They also felt that on
occasion they learned something from their partners.

After the project, the journals were analyzed, using repeated-measures ANOVAs, for
trends over the five terms in measures of total words and word types per entry, mean
length of utterance (MLU), and for common errors. Only the MLU showed no significant
term-to-term changes over the five terms. The trends were generally down in Terms 2
and 3 (six entries apiece) and back up to Term 1 levels in Terms 4 and 5 (four entries
apiece). The trends did not show marked improvement in any of the measures, however,
the journal participants statistically significantly outperformed the journal non-
participants on both the free-writing and free-speaking quizzes.

This type of activity is one that adolescents enjoy because of their desire to socialize, and
doing so in English probably contributes greatly to linguistic improvement. Furthermore,
because teachers do not intervene at all, the workload on supervising teachers is minimal.

Green, C., & Green, J. M. (1993). Secret friend journals. TESOL Journal, 2(3), 20-23.

Worthington, L. (1997). Let’s not show the teacher: EFL students’ secret exchange
  journals. Forum, 35(3), 2-7.
   3. CUHK (Chinese-English)
Brian MacWhinney
Department of Psychology
Carnegie Mellon University
Pittsburgh, PA 15213

These data were collected and transcribed by students in a class that Brian MacWhinney
taught at Chinese University of Hong Kong in the Spring semester of 2007. They track
Chinese speakers at various ages learning English and, in one case, French.
   4. DiazRodriguez (Spanish-Various)
Lourdes Diaz Rodriguez

This DIAZ corpus contains Adult Spanish L2 oral data of Indoeuropean and Asian
Learners, both semi-spontaneous and experimental, obtained in Barcelona, Spain under
the umbrella of a research project supervised by Dr. Lourdes Díaz Rodríguez
(Universitat Pompeu Fabra, Spain), and funded by the Spanish Government. A parallel
set of data was gathered in Ottawa, in instructed FL setting (no immersion in the
language) under the supervision of Prof. J.M. Liceras.

    (a) Semi-spontaneous data were obtained through structured interviews (conducted
        by a Spanish speaking interviewer), the topics being student’s context and
        language contact profile, mainly.
    (b) Experimental data came from structured questionnaires consisting of 1-2 picture
        description tasks, eliciting vocabulary, DPs and verb inflection; 1-3 sets of
        questions requiring the production of interrogative sentences, relative clauses,
        cleft-clauses and repetitions.
Subjects’ mother tongues were: German, Swedish, Icelandic, Korean and Chinese.
All data in this set were gathered in Barcelona among learners of L2/L3 Spanish who
volunteered. All were interviewed by consent at school (EOI) and University premises
(UPF). Their production was audio-taped and later transcribed at the Universitat Pompeu
Fabra, Spain. The research team that has taken part in the different intervals of data
gathering consisted of: P. Álvarez; K. Bekiou; A. Bel; M. Bini; A. Blanco; P. Deza; R.
Fernández Fuertes; B. Laguardia; J. A. Redó; E. Rosado; G. Feliu; A.Ruggia and L.

The research reported was supported by grants from the Spanish Ministerio de
Educación, and Ministerio de Ciencia e Innovación to Dr. Lourdes Díaz Rodríguez from
1995-2000, namely: PB94-1096-C02-01; BFF2000-0928; HUM2006-10235.
   5. Dresden (German-English/French/Czech)
Angelika Kubanek-German
ELL Saxony
University of Braunschweig

The Early Language Learning (“Fruehes Fremdsprachenlernen”) Project was a project
funded by the Department of Education of Saxony in 2000. A foreign language - English,
French, Czech - was offered for 4 hours per week to 8 and 9-year-olds, i.e. grade 3 and 4,
instead of the then standard 1 hour per week. A study, commissioned by the Department
of Education and conducted by Angelika Kubanek-German, investigated 12 classes (150
pupils) during the first two years of the program, autumn 2000 to summer 2002. The
overall research project (see preliminary report, Kubanek-German 2003) pursued three
    1. to assess the linguistic achievement of the children after 2 years of learning,
        contrasting the subgroups: intensive versus standard; and between different
    2. to gain a holistic picture of primary foreign language learning by focusing the
        research activities not only on the foreign language but also on more unexplored
        territory such as cultural awareness and,
    3. as a sub-question, to investigate whether curricular anchored notions of what a
        child can do in the foreign language class are justified, thus expanding on the
        notion of child-orientation (cf. Kubanek-German 2001)

Data in TalkBank are from assessment interviews that lasted 25 minutes and were
composed of three parts. Part 1 (warm up) included themes familiar to the children. Part
2 (water interview) involved questions based on an unfamiliar picture book about the
theme of water. In part 3 (rat search) students used teamwork to solve the “rathunt”
puzzle. Children were interviewed in pairs and the same tasks were used in all three
languages by the same interviewer. For English, there were 20 boys and 18 girls. For
French, 10 boys and 8 girls. For Czech, 16 boys and 16 girls. Data were collected in
Chemnitz, Radebeul, Dresden, and Leipzig.

The English teacher set high objectives in the linguistic domain. The pedagogical style
was rather teacher-centred. She used immediate correction. She most clearly changed her
attitude towards the research project towards the positive. For her class, there was no
catchment area restriction. Her pupils did very well in the communication test. She was a
trained primary teacher, and had taught Russian at primary level. After 1990, a re-training
for English was offered to those teachers of Russian, including language training in

The French teacher had spent some time in France teaching German as a foreign
language. There was a fear at the inception of the intensive programme that French would
not meet with acceptance on the part of the parents (in contrast to English). However:
after one year, the whole school where she was employed successfully started offering
only intensive French (i.e. in both grade 3 classes): the programme is non-selective. This
teacher supported the less fluent teachers of French in the project. Her approach is
holistic, she uses a lot of body language. She took the 4th graders to Brittany (classe de
mer) - a long way from Dresden. “It is just fascinating to see how much they can do” is
the statement that best characterises her attitude.

The Czech teacher is a native speaker with training for grammar school, but he had been
teaching at the primary level before the pilot project began. He taught grammar more
explicitly and was concerned about pronunciation. He explained this by stressing the
difficulties of the Czech language. It should be stated, though, that he, as well as the
others, did many songs and dances and rhymes with the class.
   6. ESF (Arabic/Finnish/Punjabi/Spanish/Turkish-
Wolfgang Klein
Clive Perdue
Max Planck Institut
Nijmegen, Netherlands

    The ESF (European Science Foundation Second Language) Database is a
computerized archive of data collected by research groups of the ESF project in five
European countries: France, Germany, Great Britain, The Netherlands and Sweden. The
project concentrates on the spontaneous second language acquisition of forty adult
immigrant workers living in Western Europe, and their communication with native
speakers in the respective host countries. The target languages are Dutch, English,
French, German and Swedish. For each target language, two source languages were
selected. The corpora are:
    - Dutch L2 and Arabic L1
    - Dutch L2 and Turkish L1
    - English L2 and Panjabi L1
    - English L2 and Italian L1
    - FrenchL2 and Arabic L1
    - French L2 and Spanish L1
    - German L2 and Italian L1
    - German L2 and Turkish L1
    - Swedish L2 and Finnish L1
    - Swedish L2 and Spanish L1
The Dutch, English, and French L2 transcripts have accompanying audio. The German
and Swedish L2 transcripts do not. Biographical information about the informants is
currently in the file. A filename like lsfbe24a.1.cha indicates:

         l subject from the longitudinal group,
         s source language is Spanish,
         f target language is French,
         be the informant's name is Berta,
         2 the session took place in the 2nd data collection cycle,
         4 it was the 4th encounter in that cycle,
         a the activity transcribed is a free conversation (activity code A),
         1 it is the 1st conversation in the encounter,

Publications that use this corpus should cite:

Perdue, C. (ed.) (1993). Adult Language Acquisition. Vol 1: Field Methods. Cambridge
   University Press
   7. FLLOC (English-French)
Florence Myles
Modern Languages
School of Humanities
University of Southampton
Southampton SO17 1BJ

Linguistic Development in Classroom learners of French: a Cross sectional Study: This
directory contains sound files and corresponding transcripts from an ESRC-funded one
year project which ran from October 2001 to September 2002 (ESRC grant
R000234754). One of its aims was to provide a database of learner language for years, 9,
10 and 11 of secondary education in the UK context. The Project Director was Florence
Myles and the other team members were Emma Marsden, Rosamond Mitchell and Sarah

Three groups of twenty learners in each of years 9, 10 and 11 (i.e. in their 3rd, 4th and
5th year respectively of learning French in the UK educational context; age 13-14, 14-15,
15-16 respectively) in a local secondary school were tested.

A gender-balanced sample from the three different year groups, and containing pupils of
all the ability range, as judged by the teachers and the pupils' school grades, was used in
the study. The sample is however slightly biased towards the top ability pupils, as they
are more likely to show signs of further development. The participants were numbered 1 -
20 for each year group. However as this was a short term cross-sectional study if a cohort
pupil was absent then a replacement pupil carried out the task and these were given
random numbers between 60 and 90. This ensured that the number of pupils in each year
that carried out a particular task was always 20. In selecting and involving informants in
the research, the project followed the Recommendations on Good Practice in Applied
Linguistics of the British Association of Applied Linguistics (1994) on the responsibility
of researchers in respecting the privacy of participants, ensuring confidentiality of
personal details and in maintaining openness about the goals of the research.

4 oral tasks were administered to all 60 subjects, on a one-to-one basis with a researcher.
The tasks used were the same for all years, in order to enable a comparison of results.
Moreover, some of the tasks were the same as those used in the 'Progression Project' (to
enable comparisons to be drawn). The tasks were as follows:

Cartoon story (Loch Ness Monster): in this task, learners have to tell a story on the basis
of a series of cartoon pictures. This task was developed and used in the Progression
Project. It also provides valuable information on learners' developing discourse level
skills. Task Code L
Interrogative elicitation task: this task is an information gap activity in which the subjects
have to find out missing information from the researcher in order to reconstruct a
drawing. The main purpose of this task is to elicit interrogative constructions and
pronominal reference, as well as gender markings. This task was also developed and used
in the Progression Project. Task Code I

Photos task: One-to-one interview with a researcher: this is a directed conversation with a
researcher in which the subject has to respond to a number of questions, as well as ask
questions based on photographs brought by the researcher. The main purpose of this task
is to elicit a wide range of structures, with a particular focus on verbal morphology (past
tense, future). A version of this task was used in the Progression Project, although we
modified it in order to ensure elicitation of a range of temporal reference (as we were
dealing with more advanced learners). Task Code P

Negative elicitation task: learners have to describe a famous person by saying what they
do and do not do (following picture cues), and the researcher has to guess who the
famous person is on the basis of the learner's description and a series of possible
celebrities. Task Code N

All tasks were recorded digitally, and took around 15 minutes each, in a one-to-one
situation with a researcher, making a total of around one hour of spoken language per

Additional Conventions

In this section, we describe some of the general decisions we have taken in the
transcribing of French interlanguage oral data, as well as some of the adaptations we have
made to the CHILDES system, in the context of L2 data. As will become obvious, many
of the decisions were dictated by our research agenda in both the Linguistic Development
and the Progression projects, and our choice to use the automatic morphosyntactic parser.
And although it means that sometimes, the transcription is somewhat deviant from the
actual phonological shape of the words produced by learners, we felt it is not too much of
a problem as other researchers interested in e.g. phonology, can listen to the sound files
as they read the transcripts, and add their own level of coding. The data has been
transcribed orthographically. This is necessary in order to use the French
morphosyntactic parser on the completed transcripts, as it will not recognise non-words.
There is no extensive coding of errors and overlaps are not marked, since they can be
heard in the sound files. Learner utterances have been carefully segmented into distinct
utterances, but this has not been done for the researcher.
If a participant exactly repeats the researcher (or another participant in the case of pair
tasks), it has been coded as follows:

*32N: [^ eng: how do you say he goes]
*ADR: il va
*32N: il@g va@g au cinema
@g is added after every repeated word. @g has been added to the special form marker
file sf.cut file in the French MOR program. @g is used to ensure the imitation is not
included for analysis by the French morphosyntactic parser, as this could give misleading
information about the current grammar of the learner .

In order for the French MOR programme to ignore the English we coded whole
utterances as follows:

*SAR: [^ eng: yes you begin by asking questions]
*43P: [^ eng: how do you say dog?]

Use of a single English word to complete a French Phrase

If an English word has been used to complete a French phrase, then we have coded the
words as follows:
Noun;         @s:d
Adjective     @s:a
Adverb        @s:adv
Preposition @s:pre
Verb          @s:v
Pronoun       @s:pro
Determiner @s:det
Conjunction @s:con

For example:

*28L: il achete le skirt@s:d

These forms are then analysed by the morphosyntactic parser as 'English N, or V, or A
etc., rather than just ignoring them and producing outputs which do not correspond to the
learner's grammar (e.g. in this example, suggesting that this learner's grammar allows a
determiner to be followed by nothing, as the parser would not recognise 'skirt'). These
special form markers have been added to the sf.cut file in MOR and they have also been
added to the depfile in CLAN (so the files pass through check) .

Indeterminate forms

In beginner datasets, it is often difficult to determine which form a learner has intended,
as learners often produce something very approximate. There are four examples of this
use of indeterminate forms that occur consistently in our data and we coded them as

      Definite articles which sound like something between le and la: le@n
      Indefinite articles which sound like something between un and une: un@n
      First person subject pronoun which sound like something between je and j'ai:
      A verb form which sounds like something between a and est: a@n

These forms have been added to the neo.cut file (see below), and are analysed by the
parser as e.g. definite article, without specifying the gender.

Neologistic verb endings

Our learners also used neologistic verb forms, which were usually non-finite. Each of
these new forms is written on the main tier then added to the MOR programme in a
neo.cut file, created, then saved as part of the MOR lexicon. For example:

pren {[scat neo:v:inf]} "prendre"

will be transcribed as pren on the main tier, and analysed by the parser as neo:v:inf

We have also added a number of words, particularly nouns, to the MOR lexicon,
For example, we added le shopping, le jogging, le badminton, and le t_shirt, so that they
can be recognised and therefore tagged by the parser.

Additionally, the following project-specific conventions were used in order to code
'intended tense', in the context of the Photos task:

In the 'Photos' task, each photoset was designed to elicit a dialogue in the present, past or
future (by referring to holidays just gone - Christmas, forthcoming - summer, and to
hobbies - present). We have therefore coded the data for intended tense use. For example,
in the following sentence, we wanted to be able to know that the infinitive form 'aller'
was produced in a context where a future form would be expected:

*84P: l'ete prochain je aller Marjorca .
would be transcribed as follows:
*84P: l'ete prochain je aller@f Marjorca .
where the following tags have been added to the sf.cut file in MOR :
@p {[scat inf:pres]} for contexts where a present form would be expected
@f {[scat inf:future]} for contexts where a future form would be expected
@c {[scat inf:past]} for contexts where a past form would be expected

this enables the morphosyntactic parser to analyse these forms as v:inf:future|aller, and
therefore to retrieve them easily for analysis .
Interrogatives Year 9
Interrogatives Year 10
Interrogatives Year 11
Loch Ness Year 9
Loch Ness Year 10
Loch Ness Year 11
Negatives Year 9
Negatives Year 10
Negatives Year 11
Photos Year 9
Photos Year 10
Photos Year 11
All the files in each directory have a corresponding MOR file in the appropriate
directory. We would like to acknowledge Chritophe Parisse's expert guidance in making
some of these adaptations to the French MOR programme,

The Files are labelled in the following way:
Soundfiles: 01L9SAR.wav
Transcriptions: 01L9SAR.cha (01 is the number of the student, L is the task code, 9 is the
student's year, SAR is the abbreviation for the researcher)

Publications using these data should cite:
Myles 2002: Full Report of Research Activities and Results. Linguistic Development in
   Classroom Learners of French.
   8. Køge (Turkish-Danish)
Jens Normann Jørgensen
University of Copenhagen
Copenhagen, DK

This data were collected from adolescent Turkish-Danish bilinguals in the town of Køge
near Copenhagen. The data include interviews in Danish and Turkish and group
discussions in both Danish and Turkish. There are audio files, but they are not yet
available to TalkBank.
   9. Langman (Chinese-Hungarian)
Dr. Juliet Langman
Division of Bicultural-Bilingual Studies
University of Texas at San Antonio
6900 North Loop, 1603 West
San Antonio, TX 78249

    This corpus is made up of 10 files consisting of interviews conducted in 1994 with 11
Chinese immigrants living in Hungary. The bulk of the conversation is in Hungarian, al-
though in the case of those who speak English there is also English, and in the case of
one transcript (KIN10) there are significant amounts of Chinese (with a Hungarian
translation in a %tra dependent tier). Interviews focused on issues related to their arrival
in Hungary as well as their daily life activities. With the exception of KIN2 and KIN10
none of the participants had had formal training in Hungarian. Interviewers were the
researcher, as well as three different Hungarian undergraduates. Data were collected with
two purposes in mind: the analyses of communicative strategies among adult second-
language learners learning in a nonstructured environment, and the analysis of the
acquisition of morphology of an agglutinative language. The following additional form
markers have been used in the (*) speaker lines of the transcripts:
    @e = english word, e.g., go@e
    @c = chinese word, e.g., xie@c
    @a = adult-invented word, e.g., pigyilni@a

The following special codes have been used on the %lan tier:
   $MIX       utterances with some form of code-switching or borrowing
   $CHI       utterance in Chinese (used only in KIN10)

    The following special codes have been used on the %rep (repetition) tier to identify:
1. whose speech is repeated
     SRP        self-repetition of immediately previous utterance
     ORP        other repetition of immediately previous utterance
     SRE        self-repetition of an utterance not immediately preceding
     ORE        other repetition of an utterance not immediately preceding
2. the function of the repetition
    MIS         misunderstanding, prompting, asking for clarification
    VAL         validation repetition of previous utterance
    EXP         explanation to ease understanding
    COR         correction and language learning functions
3. the form of the repetition
     PAR        partial
     COM        exact
     TRA        translation
     PLU        repetition including additional information
These three types of codes could be combined as in: %rep: SRP:MIS:PAR

    Error coding focused exclusively on morphology and is represented on two separate
tiers, %err and %mor. The %mor tier shows the actual target form for each error marked.
The %err tier marks the types of errors using the following codes:
    $OMI:             omission
    $OMI:PAR          partial omission
    $INS:             insertion
    $INS:PAR          partial insertion
    $SWI              switched form
    $SWI:PAR          partially switched form

Partial support for data collection and analysis was provided through a grant awarded to
Dr. Csaba Pléh, OTKA grant T018173, A magyar morfológia pszicholingvistikai
vizsgálata (The psycholinguistic study of Hungarian morphology).

Publications using these data should cite:

Langman, Juliet. (1998) “Aha” as Communication Strategy: Chinese speakers of Hungar-
   ian. In Regan, V. (ed.) Contemporary Approaches to Second-language Acquisition in
   Social Context: Crosslinguistic Perspectives. Dublin: University College Dublin
   Press, 32-45.
Langman, Juliet. (1997).     Analyzing second-language learners’ communication
   strategies: Chinese speakers of Hungarian. Acta Linguistica Hungarica 44, 277–299.
Langman, Juliet. (1995-1996). The role of code-switching in achieving understanding:
   Chinese speakers of Hungarian. Acta Linguistica Hungarica, 43, 323–344.
   10.        Liceras
*ZAR: Ahora vas a escuchar, a oir unas frases, escuchar y tú repetir ,por
      favor escuche atentamente y repita, uno, estos niño s cantan bien .
*ZAR: Este niño no es amigo mio .
*ZAR: Tres, todos los sabados cantan en la iglesia ..
*ZAR: Cantan la cancion que leen en el libro .
*ZAR: cinco por que la cantan con alegría ?
*ZAR: Uno, La chica mueve la pierna .
*ZAR: Como se mueve esta perja cuando baila ?
*ZAR: Bailar es divertido pero llorar no lo es .
*ZAR: Cuatro, todos hemos bailado alguna vez en la vida.
*ZAR: Cinco, que contentos estan estos jovenes cuando bailan juntos !
*ZAR: Uno, es Maria ama de casa ?
*ZAR: No creo que cocine todos los dias.
*ZAR: Tres, Elena esta de vacaciones en Varadero.
*ZAR: Cuatro, le gusta pasar las vacaciones en la playa .
*ZAR: Cinco, Maria non quiere pasarlas en la cocina.
*ZAR: Cinco, Maria tambien quiere ir de vacaciones a Varadero.

Josiane       F female
LucAndre      F male
Nicholas      E male
NicholasM     E male
Tristan       F male
ClaireH              E female
ClaireP       F female
Falco         E male
Ginger        E female
Joanna        E female, Polish also
Phillippe     F male

form formaciónpreguntas formulaciónpreguntas
narr narraciónes
cont contestarpreguntas
pers preguntaspersonales
rep repeticiones
comp completaroraciones
comppreg completarpreguntas
role roleplaying
   11.         PAROLE (Various-English, Various-French)
The Corpus PAROLE (PARallèle Oral en Langue Etrangère) was compiled by members
of the Langages research team (Laboratoire LLS) at the Université de Savoie (Chambéry,
France), to investigate the characteristics of different L2 proficiency levels. The
particularity of the corpus is our attempt to incorporate temporal elements of spoken
production in the main transcription line, along with more classic coding of errors and

PAROLE is composed of oral productions by 68 young adult learners of three foreign
languages (English, French, Italian), as well as a benchmark corpus of productions by 27
native speakers performing the same tasks. Transcripts and recordings of three tasks (two
summaries of a video clip immediately after viewing, and a short autobiographical
narrative) will constitute the PAROLE corpus. Task details are provided in the PAROLE
Manual (PAROLE_documents folder).

In addition to the speaking tasks, all the non-native subjects completed a battery of tests
and questionnaires, furnishing complementary data on their L2 knowledge, experience,
motivation for L2 study, and two aspects of language-learning aptitude (nonword
repetition and morpho-syntactic analysis). Test results for the learner subjects are
available in the subject_data file (PAROLE_documents folder), and references for the
tests used are provided in the PAROLE Manual (same folder). Pdf files of the subject
profile and the motivation questionnaires used (English L2 subjects) are also included in
the documents folder.

PAROLE was funded through a global research grant given to the Laboratoire LLS by
the French Ministère de l'Education Nationale, as part of the contrats quadriénnaux
between the Ministry and the Université de Savoie for 2003-2006 and 2007-2010. The
Ministry also provided funds for two doctoral students working on the corpus.

We began pre-testing production triggers and assembling test materials in 2003; most of
the French L2 and English L2 subjects were recorded in 2005 and the native speakers in
2006, and transcription work began in earnest in 2006. Due to illness and a shortage of
personnel, the Italian recordings and transcriptions are lagging behind English and
French; the first wave of Italian files should be available on-line by the end of 2008 (and
we apologize for this frustrating delay).

We have attempted to adhere to CHAT conventions as closely as possible; major
innovations concern the scoped timing of "hesitation groups" (unbroken sequences of
hesitation phenomena, such as silent pauses, filled pauses, and certain paralinguistic
noises). We have also made a distinction between words produced in the learners' L1
(coded with the new suffix "@l1"), and words produced in another foreign language
(coded "@s").
See the PAROLE Manual for detailed descriptions of our use of CHAT coding symbols,
occasional additions to the code base, our criteria for utterance delimitation, error coding,
etc. (PAROLE_documents folder).

Participants in the learner corpus (54 females, 14 males):
33 learners of English (24 French-L1, 9 German-L1; average age 21);
12 learners of French (5 Spanish-L1, 3 Chinese-L1, 2 Swedish-L1, 1 Polish-L1, 1
English-L1; average age 23);
23 learners of Italian (all French-L1; average age 19).
Participants in the native-speaker corpus (20 females and 7 males):
9 English-L1 (average age 21);
8 French-L1 (average age 22);
10 Italian-L1 (average age 23);.
All participants were enrolled in a French or Italian university (either in a normal or
study-abroad program) at the time of recording. See the subject_data file for detailed
information on each participant (PAROLE_documents folder).

The corpus consists of audio files (.wav format) and transcripts for each participant
performing two short video summary tasks ("task A," "task C"), and one short
autobiographical narrative ("task E"; on-line publication planned in late 2008). Sound
files and transcripts are segmented according to task. All transcripts have been carefully
linked to the digital sound files with bullet points in Sonic Mode. We recommend that
researchers wishing to work with PAROLE organize their files with sound files and
transcripts in the same folder, for optimal comparison between the transcripts and the
productions. Carefully disambiguated tagged files are stored together in a special folder
for each language.
Key to file names (three-digit numbers refer to each subject):
        L2 English learners: 0
        L2 Italian learners: 2
        L2 French learners: 4
        British and NZ English: N0
        North American English: N1
        Italian native-speakers: N2
        French native-speakers: N4
The single letter (a, c, or e) following the subject number indicates which task is
involved: file "010a.cha" is the CHAT transcript for English learner 010 performing task
A (first video description); file "010a.wav" is the sound file corresponding to this
transcript & task; file "010a.pst.cex" is the tagged transcript.

All recordings took place in a small, closed classroom or office, without distractions or
interruptions. Video support material ("triggers") were presented on a portable computer,
and integrated into .html pages that the subject manipulated directly. See PAROLE
Manual for details of interview structure, video presentation, interviewer behavior,
recording equipment, etc.

HILTON, H. E. (forthcoming, 2008) Connaissances, procédures et productions orales en
   L2. AILE.
HILTON, H. E. (forthcoming, 2008) The link between vocabulary knowledge and spoken
   L2 fluency. Language Learning Journal.
OSBORNE, J. (2007) Investigating L2 fluency through oral learner corpora. In M.C.
   Campoy & M.J. Luzón (eds.) Spoken Corpora in Applied Linguistics. Frankfurt:
   Peter Lang, 181-197.
OSBORNE, J. & RUTIGLIANO, S. (2007) Constitution d’un corpus multilingue
   d’apprenants d’une L2: recueil et exploitation des données. In H. Hilton (ed.)
   Acquisition et didactique, Actes de l’atelier didactique, AFLS 2005. Chambéry : LLS,
   Collection Langages, 141-156.
   12.          Qatar
Yun Zhao
Department of Modern Language
Carnegie Mellon University

This is a corpus of spoken interviews with Qatari learners of English, contributed by Yun

Name            Grade    Nationality Gender     Reading        Language      Average
                                                skills         Usage         English
Sam             12       Qatari       Male      39.61          65.76         52.685
Abe             12       Qatari       Male      62.57          73.67         68.12
Charles         11       Qatari       Male      61.12          80.31         70.715
Tom             12       Qatari       Male      44.35          39.31         41.83
Larry           12       Qatari       Male      93.66          89.91         91.785
Ali (missing)   12       Qatari       Male      47.66          52.76         50.21
Bill            12       Qatari       Male      75.32          99            87.16
Harry           12       Jordanian    Male      Missing        Missing       Missing
Arnold          12       Qatari       Male
Jenny           12       Qatari       Female    77.63          97            87.315
Nancy           12       Qatari       Female    96.81          99            97.905
Lucy            12       Qatari       Female    99             97.2          98.1
Anne            12       Qatari       Female    94.54          87.75         91.145
Alice           11       Qatari       Female    78.46          98.8          88.63
Paula           11       Qatari       Female    53.06          57.4          55.23
Pat             12       Qatari       Female    53.37          84.62         68.995
Tina            12       Qatari       Female    79.65          89.32         84.485
Linda           11       Qatari       Female    66.95          90.51         78.73
Donna           11       Kuwaiti      Female    71.32          91.1          81.21
   13.         Reading (English-French)
Brian Richards
Dept. of Arts and Humanities in Education
University of Reading
Bulmershe Court
Earley, Reading RG6 1HY United Kingdom

These data on French foreign language oral interviews were transcribed as part of a study
of the reliability and validity of oral assessment in modern foreign languages in the
General Certificate of Secondary Education (GCSE). GCSE is a public examination nor-
mally taken by school children in the United Kingdom at the age of 16, i.e. after the 11
years of compulsory schooling. The 34 interviews constitute one part of the French oral
examination: the so-called “free conversation.” Here, the French teacher interviews
students about everyday topics such as school, home, family, holidays, future aspirations
and hobbies, and interests. Other parts of the oral examination such as role-plays are not
part of these data. The title of the project was “Oral Assessment in Modern Languages
Project”, funded by the Research Endowment Trust Fund of the University of Reading.

Our analyses have compared lexical and grammatical features of the children’s language
with teachers’ expectations of foreign language learners of this age, and with the lan-
guage of French native speakers in a similar interview setting (Chambers & Richards,
1995). We have also compared teachers’ impressionistic assessments of the presence of
qualities specified in the assessment criteria with our own objective counts using the
CLAN software (Richards & Chambers, 1996). We are currently looking at teacher-
student interaction, focusing on the teachers’ accommodation strategies.

The Interviews
Teachers conduct the oral examinations, including the interviews on set dates and on
topics determined by the official examination board. Only one teacher and one student
are present during each interview, the audio recording being made by the teacher. The
teacher enters assessments on a mark sheet during the interview, and on completion of
the examination the tapes and mark sheets are sent to the examination board. A sample of
tapes is remarked by a moderator appointed by the examination board and the teachers’
assessments adjusted if necessary. The average length of the interviews is 5 minutes 30
seconds. They range from 3 minutes to 12 minutes.

All 34 participants come from the same all-ability secondary school (11-18 comprehen-
sive school) in an English-speaking area of South Wales. They are 16 years old and are
native speakers of English who have been learning French for 5 years. All have also spent
at least one year learning Welsh and some have had the opportunity to learn German.

The school is situated in a predominantly working-class area, but the students selected
here cover a wide range of social background. It should be noted that students with the
weakest performance in French were excluded from this sample because the focus of our
study was the Higher Level examination. This part of the examination, which is taken in
addition to Basic Level, gives students access to the highest grades. Students in the
sample obtained pass grades ranging from Grade A (the highest) to Grade E. No students
with Grades F and G were included.

Two teachers, one female and one male, are involved in the conduct of the interviews.
Neither are native speakers of French; both are native speakers of British English who
have learned French as a foreign language and have a degree in Modern Languages.

As a condition of using the school’s tapes we promised that the identity of the school,
teachers, and students would not be revealed. We have therefore used pseudonyms for
these. In addition, we have changed the names of all locations mentioned on the tapes, as
well as names of sports teams, and exchange schools in France and Germany. Francine
Chambers who is a native speaker of French transcribed the recordings and subsequently
checked the transcripts edited and coded by Brian Richards. Fiona Richards did the final

The following points should be noted:
   1. In transcribing the French language we have followed the CHILDES manual (sec-
       tions 4.5.14 and 27.4.1) in dealing with apostrophes and hyphens: apostrophes are
       followed by a space (l’ aim, c’ est); hyphens in compounds are replaced by a plus
       sign (le week+end); dashes between words (est-ce que) are replaced by spaces (est
       ce que).
   2. It is difficult to draw a line between an English accent and a pronunciation error;
       because an assessment criterion of the GCSE examination is whether an utterance
       would be comprehensible to a “sympathetic native speaker,” only those student
       errors that were serious enough to cause a breakdown of communication, or
       which were followed by a teacher correction, were coded. These were transcribed
       in UNIBET on the %err tier.
   3. Some students answer questions in English or insert English words. Where the
       whole utterance is in English, a separate speaker tier for the student (*STE) has
       been created. English words inserted in French are marked with the @e suffix (fa-
       ther@e). Students who are also learning German sometimes use German words.
       These are marked with a @g suffix. Both the @e and @g symbols are contained
       in the 00DEPADD file.
   4. Other additions to the 00DEPADD file are: +//? (self-interruption of a question)
       and +..? (question tailing off).
   5. Acknowledgment tokens have been coded as back channels and are marked [+
       bch]. These can be excluded from MLU and MLT counts using the -s”[+ bch]”
   6. The exclamations and interactional markers used are: “aah,” “euh,” “mm,” and
       “um.” To omit these from analyses they can be placed in an exclude file.
List of Files
In the table below, the fourth column shows the combined total of points obtained by
each student for the tests in Speaking, Listening, Reading, and Writing in the GCSE
examination. A maximum of 7 points is awarded for each of these 4 skills, giving a
possible total of 28 points. The fifth column shows the score for the whole oral test,
including the interview and role-plays.

  Table 1:      Recordings and GSCE Scores

                 File numberSex      Teacher   GCSE PointsOral Test
                 W01.cha    male     male      19          4
                 W02.cha    male     male      17          3
                 W03.cha    female   female    16          2
                 W04.cha    female   female    11          3
                 W05.cha    female   male      16          4
                 W06.cha    male     male      19          4
                 W07.cha    male     male      18          4
                 W08.cha    female   male      22          5
                 W09.cha    male     male      15          4
                 W10.cha    female   female    14          3
                 W11.cha    female   male      20          4
                 W12.cha    male     male      17          4
                 W13.cha    male     female    12          2
                 W14.cha    male     male      12          3
                 W15.cha    male     male      19          4
                 W16.cha    female   female    11          2
                 W17.cha    male     female    16          4
                 W18.cha    female   male      23          6
                 W19.cha    male     male      23          6
                 W20.cha    male     female    12          2
                 W21.cha    male     male      19          5
                 W22.cha    female   female    10          2
                 W23.cha    female   male      20          4
                 W24.cha    male     male      17          4
                 W25.cha    female   male      21          5
                 W26.cha    female   male      14          4
                 W27.cha    female   male      21          5
                 W28.cha    female   male      21          5
                 W29.cha    male     male      21          4
                 W30.cha    female   female    16          3
                 W31.cha    male     male      24          6
                 W32.cha    female   male      25          6
                 W33.cha    male     male      8           7
                 W34.cha    female   male      26          6
Publications using these data should cite:

Chambers, F., & Richards, B. J. (1995). The “free conversation” and the assessment of
   oral proficiency. Language Learning, 11, 6–10.
   14.         SPLLOC (English-Spanish)
Laura Dominguez
University of Southampton

SPLLOC is a corpus of L2 Spanish (a.k.a. SPLLOC) that has been collected by a team of
researchers in Southampton, Newcastle, and York universities sponsored by an ESRC
research grant award (2006-2008). The data is also freely available in anonymised form
through the project website (<>) for
use by other second language acquisition researchers.

The L2 oral Spanish data have been collected from classroom learners in schools and
universities in England, using a series of specially designed elicitation tasks, including
storytelling, picture description, discussion and individual interview. There were 20
learners at each of 3 levels: beginners (Year 9 students aged 13-14), intermediate students
(A2 students aged 17-18), and fourth year undergraduates. All of them were native
English speakers. Depending on their level, each learner was audiorecorded undertaking
between 3 and 5 oral tasks. They also completed computer based and paper based tasks
that provided complementary data on aspects of their Spanish knowledge. For
comparison purposes, small numbers of native speakers were also recorded undertaking
the same tasks. The resulting database contains 290 digital soundfiles (240 learner
recordings, 50 native speaker recordings) that are accompanied by transcripts in
CHILDES format. Some files also have an extra layer of tagging which identifies parts of
   15.         TCD (English-French)
Seán Devitt, F.T.C.D
Senior Lecturer in Education
School of Education
University of Dublin, Trinity College
Dublin 2, Ireland

This project, designed and implemented by Seán Devitt, School of Education, Trinity
College, Dublin, set out to track the development of the means of expressing temporality
by children learning French as a second language in France. The subjects were five
children, aged between eight and twelve, of three different nationalities Irish, Polish and
Cambodian who were in primary school in Paris in the early part of 1982. The data
presented here were gathered over a five-month period from March 31 to September 6
1982, during a sabbatical term and a summer holiday.

The five-month stay of the researcher and his family in France was funded by two grants,
one from the National Board for Science and Technology (now Entreprise Ireland) and
one from the Ministére des Affaires Étrangéres of the French Government, organized by
the Service Culturel de l'Ambassade de France in Dublin. The French Government,
through the Ministére de l'Education Nationale, provided further support by arranging for
the researcher’s three children to attend school in Paris. The school picked by the
Ministére for Marie and Ann was Ecole rue de la Plaine in the 20th arrondissement of
Paris. [Their older brother, Séamus, was admitted to the nearby Lycée Héléne Boucher.]
The Ministére also helped in locating the other three subjects in nearby schools.

The two Irish subjects, Marie and Ann, were aged 11 and 8. They were the researcher's
daughters, and had been to France twice prior to 1982 for holidays. On one of these
occasions (July-August 1980) they had spent three of the five weeks of their holiday in a
Centre Aéré, a type of holiday camp, which is described below. Neither had studied
French at school and their exposure to French had been minimal apart from on these
visits to France. Their stay in France was planned to be of five months duration. After
that they were to return to Ireland.

On March 31 the family (parents and three children) arrived in Paris to find that the
apartment they had booked was quite unsatisfactory. Ten days were spent in looking for
proper accommodation. A small apartment was eventually located but did not become
available until April 23. The intervening two weeks were spent with English-speaking
friends in Hermonville, a village some 9 kilometers from Reims. Marie and Ann were
allowed to attend the village school for one of these weeks; the other week coincided
with the Easter holidays.

The language spoken at home was normally English. Contact with French was, therefore,
confined mainly to the school in the first three months. However, further opportunities
for contact were provided by television in the evenings and at weekends, by visits to
friends, and by visits of friends to the apartment. There was one longer visit of three days
(without their parents) to friends in Reims.

The third subject, PPM, was a twelve-year old Polish boy, an only child. His father had
come to France in 1978 to find work as a plumber; PPM and his mother had remained in
Poland. In October 1981 they came to Paris to visit the father for a few weeks. While
they were there, martial law was declared in Poland and they were unable to return
immediately. By September 1982 (the end of the research period) PPM seemed to have
accepted that he would be staying in France; his mother had not. In Poland PPM would
have been in the first year of secondary school.

PPM had absolutely no knowledge of French before coming to France. Neither had his
mother. Since she presumed that she was to return to Poland at the first possible
opportunity, she did not set about learning it. The family lived in an apartment in an inner
suburb of Paris. The language spoken at home was invariably Polish. At the time of the
recordings PPM had not made friends with French children. At weekends he would go to
the Bois de Vincennes with his father to play ball. He had one Polish friend who had been
in France since he was seven and spoke French fluently. Otherwise he had little contact
outside of school with native French speakers. In school his contact with native French
children also seemed limited.

The fourth and fifth subjects, PCF and CCM, were two Cambodian children, sister and
brother. PCF was nine-years old, the youngest of ten children. Her brother, CCM was
twelve. Some time in 1980 both had fled Cambodia with their parents and three other
siblings. Before that they had attended school but under very difficult conditions, often
having to spend a large part of the day working in the fields. The family spent some
months in refugee camps in Thailand before arriving in France in January 1981. They
stayed a few months with an older brother who had come to Paris some years previously
and had married a French woman. The family then moved to their own apartment.

Neither child had had any contact with French before coming to France. While they were
staying with their brother, he and his wife were an important source of support for
learning French. At home the language spoken was generally Cambodian, with some
Chinese. When their sister-in-law visited the home, or when they visited her home, she
spoke French with them.

Schooling for the five subjects

From April 23, three weeks after their arrival, until the end of June Marie and Ann
attended the local primary school in their area. Marie was in CM2, Ann in CE2. They
received no special treatment in the form of a special class for foreigners, but were fully
integrated into their classes from the first day. This was specifically requested when
applying for permission for them to attend school in France.
In January 1982, three months after his arrival in France, PPM began to attend the local
primary school, L'Ecole X in V. The school had a special language programme for
foreigners like PPM, involving several hours of French tuition per day. As the children
were felt to be able for it, they were permitted to attend lessons in other subjects, usually
in a class of children a little younger than themselves. PPM was taking this programme at
the time of the first recording in late May. He had just begun to attend Mathematics
lessons in a mainstream class (CM1 for 10 year olds). He had been in France for seven or
eight months at the time of the first recording.

In April 1981, three months after their arrival in France, PCF and CCM went to a school
in Z, an inner suburb of Paris, where they followed a special programme in French for
foreign children. In February 1982 they changed to Ecole B near their apartment. Here
they were fully integrated into the school, PCF in CM1, CCM in CM2. Both had close
French friends According to their teachers both children were very bright and were
performing very well in class. They had been in France for a year and a half at the time of
the first recording.

In France the school day lasts from 8.30 am to 4.30 pm, with a two-hour break for lunch
and two other shorter breaks. Children are free to go home for lunch or to have it in the
cantine. Marie, Ann, and the Polish boy stayed in school for lunch. The two Cambodian
children went home. There was also the option of remaining in school from 4.30 to 6.00
for supervised study, preceded by a short break. Marie and Ann remained for the
supervised study until 6.00.

In the second half of the five-month period, when schools were closed, the two Irish
children were allowed to attend a Centre Aéré by the Mairie de Paris. [A Centre Aéré is a
type of holiday camp that French municipal authorities organize during the summer
months for children up to the age of 16. These centres are usually located in nearby
forests and children are transported there and back by bus from various collection points.
They are very carefully organized and supervised, providing a wide range of physical
activities (football, horse-riding, swimming etc), activities to develop manual skills,
(macramé, model making etc), nature walks etc. Children are assigned each day (in
groups of about seven) to specially trained moniteurs/monitrices.] From the beginning of
July to the end of August (with a break of ten days in the beginning of August) Marie and
Ann went daily to a Centre Aéré. They had to be at the meeting point by 8.30 am each
morning. Shortly afterwards they were taken by bus to the Centre Aéré. They returned
late in the afternoon and were met by their parents between 6.00 and 6.15. They did not
go to the Centre Aéré for the first ten days of August, because they were unhappy that
their friends were not staying on for August and that they would have a different set of

PPM spent July in Paris, with only minimal contact with French speakers. He spent
August with his parents on the Mediterranean. For CCM and PCF, the two months of the
holidays were spent in Paris or with relatives in the suburbs. During the holidays their
French friends were away and they had little or no contact with French speakers, except
with their sister-in-law.
Frequency and timing of recordings

Because of the extended settling-in time while accommodation was being sought, it was
over three weeks after arrival before the first recordings could be made with Marie and
Ann. At first every possible opportunity was taken to record them in contact with native
speakers. Once the rhythm of school-life was established certain constraints were
imposed and recordings in the school could be made only at intervals of about a week.
Recordings at home continued to be made as often as the opportunity presented itself.
Once the holidays began (beginning of July) all recordings had to be made in the
evenings at home, since Marie and Ann objected strenuously to the idea of recordings
being made at the Centre Aéré.

In the cases of PPM, CCM and PCF recordings began in late May, since it was some time
after the researcher's arrival before they were located as suitable subjects. The recordings
were made in specially designated rooms in the schools on a set day every week, unless
that day happened to be a holiday, when the recording for that week had to be dropped.
Once the holidays began recordings took place in the homes of the subjects, but at longer
intervals so as not to intrude too much on family life.

A number of Marie and Ann's recording sessions were totally unstructured. For example,
they simply wore the radio-microphone in the canteen or in the playground. Others were
carefully structured, with the children having a particular task to perform, such as filling
out a family tree for someone else in a group, or preparing with friends for a class outing
to a big store.

This wide range of settings for recordings (both those the children were aware of and
those they were unaware of) might be expected to have provided a rich supply of
linguistic output. It did not. Marie and Ann interacted very naturally in many of these
settings with very little or no language. For example, Ann and her friend were filmed
playing with dolls for over two hours during which very little was said. A game of
elastique involving three or four children produced almost no language at all. On other
occasions the structured interactions produced many instances of the same basic
structures. For example, in the session filling out the family tree the question “Comment
s'appelle...” kept recurring. In the preparation for the class outing to the big store, Marie
was not inclined to intervene as the other children became totally taken up in the activity.
These early recordings yielded very sparse data and have generally been disregarded.

For this reason it was decided to fall back on interview-type settings for most of the
remaining recordings, since these seemed to produce much more data. In the case of
PPM, CCM and PCF, this was the solution adopted from the beginning because of the
limited access (about one hour per week). While school lasted native-speaker peers were
used for the interviews that took place in specially designated rooms in the schools. The
native speakers were given general indications to follow, such as to share information
about how the previous weekend had been spent, or to find out how the subjects had
come to France, or to have them compare their native countries with France.
These interviews with native speaker peers were more or less successful depending on
the person involved. In some cases the native speakers (especially those of about nine
years of age) simply "ran out of steam" and had nothing further to say. Alternatively they
jumped from one topic to another. In general, however, the interview-type setting, in
spite of its limitations, provided the subjects with the opportunity of using French in a
wide variety of discourse types. On certain occasions, and especially during holiday time
when native speaker peers were not available and recordings had to be made in the home,
the researches conducted the interviews.

There are twelve recordings each of Marie and Ann on their own, and a further four
where they were recorded together. Overall frequency was once every ten to twelve days.
There are eight recordings of PPM, the first five at weekly intervals, and the remaining
three at three to four week intervals. There are eight recordings of PCF, and two of CCM
on their own, and a further two where they were recorded together. Frequency was
similar to that of PPM. The date of each recording is coded through the three digits at the
end of the file name. Thus, 615 means the 15th of June.


The research was facilitated in every possible way by the principals and teachers in the
three schools concerned. Rooms were made available for recording, arrangements were
made for the subjects to leave their classes, and French children were recruited to take
part in the interviews. On occasion the teachers in rue de la Plaine allowed cameras and
video-recorders into their classrooms for whole class recordings. I would like to thank the
headmaster of Ecole de la Plaine, M. Watier, and to the teachers of Marie in CM2, M.
Rubelli and Mme Dutot, and of Ann in CE2, Mlle. Schmidt, for the way they welcomed
and looked after our children. To my wife Ann I owe an enormous debt of gratitude for
her support and encouragement for the project right from the beginning. Without her
constant encouragement it would never have reached this stage.

Above all, I must thank the children who took part in the project: the native children who
so readily agreed to act as interviewers, but especially the five subjects who were
prepared to participate so readily and so fully over the whole period of the project.
Without them there would have been nothing. It is to them, Marie, Ann, PPM, CCM and
PCF, that this body of data is dedicated in a very special way.

To top