Appendix B The SignPhon database B Introduction This appendix

Reviews
Shared by: Shame Ona
Stats
views:
12
rating:
not rated
reviews:
0
posted:
2/11/2009
language:
English
pages:
0
Appendix B: The SignPhon database B.1 Introduction This appendix describes the SignPhon database project. Parts of this appendix have also appeared in the SignPhon manual (Crasborn et al. 1998) and in a recent article (Crasborn et al. to appear); it also appeared in Crasborn (2001). We start out by describing the goals and history of the project (B.2-3). The structure of the database and the information that can be stored in it is discussed in sections B.4 -5; a more elaborate discussion of all the database fields can be found in the manual. Both the software and the manual are freely available from the SignPhon website (http://www.leidenuniv.nl/hil/sign-lang/signphon2.html). B.6 describes the (selection of) data that were transcribed for the purposes of this thesis. Finally, B.7 discusses our experiences in using the database for phonological research, including several drawbacks of the project and areas for future improvement. The SignPhon project has been financially supported by two grants from the Dutch Organization of Scientific Research (NWO), project numbers 300-75-009 and 300-98-031. We also received a grant for equipment from the Gratama Foundation. Finally, the Faculty of Arts of the Leiden University gave us financial support for both equipment and research assistants. The help of all of these institutions is gratefully acknowledged. We wish to thank Marja Blees, Corine den Besten, Henk Havik, Alinda Höfer, Karin Kok and Annie Ravensbergen for collecting, digitizing and coding of data, and Rob Goedemans and Jos Pacilly for technical assistance. In designing the structure of the database we have discussed various linguistic issues with Diane Brentari and Wendy Sandler. With support from NWO, in 1996 we organized a workshop in which we discussed the structure of SignPhon with a number of sign language researchers. We wish to thank Jean Ann, Diane Brentari, Thomas Hanke, Bob Johnson, Scott Liddell, Chris Miller, Elena Radutzky, Wendy Sandler, Leena Savolainen, Trude Schermer and Ronnie Wilbur for their useful comments and criticism during this workshop. B.2 Goals An analysis of the phonology of a language starts at the lowest level: the level of morphemes. We must establish what the set of distinctive units is from which morphemes are constructed. This is a purpose in its own right, leading to the description of lexical forms, but it is also a necessary step before we can investigate the phonology at higher levels, i.e., at the level of complex words and phrases. An 302 APPENDIX B investigation into the phonological structure of any language requires insight into the array of phonetic properties and into the role that these properties play in distinguishing the morphemes of the language. Phonological research generally uses three kinds of data sources: 1. 2. 3. dictionaries native user judgments transcriptions of videotaped materials Dictionaries generally provide illustrations of one prototypical token of each sign. Sometimes, they also offer a transcription of that one token, and information about regional or phonetic variation. Since most SLN dictionaries that have been made so far list the core lexicon of the sign language, consisting of highly frequent items, they provide useful data on the frequency of phonetic characteristics as well. This information is typically hard to access, since even on CD-ROMs, search facilities tend to be limited and are not always linked to an elaborate transcription (e.g., GLOS 1999). Native signer judgments can be used to confirm the realization of a sign found in a dictionary, to distinguish between prototypical and odd realizations of signs, to test the phonological status of non-existing signs, etc. In collecting new material on videotape, some kind of transcription is necessary to categorize what is found. When we started working on the phonological structure of SLN, we felt that detailed insight in the phonetic and phonological characteristics of the language was not sufficiently available and also was not easy to obtain by inspecting the data that had been published in a number of small dictionaries. We therefore decided to build a database, called ‘SignPhon’. The primary goal of SignPhon is to be a tool for research into the phonetic and phonological structure of sign languages. It is designed to store information about the phonetic and phonological structure of isolated signs. It does not easily allow for transcription of sentences in the way that the SignStream software does (see MacLaughlin, Neidle & Lee 1996, Neidle & MacLaughlin 1998). The information on the phonetic-phonological description of signs is divided over 65 fields. In addition, elementary semantic and morphological information as well as information on the signer and the source of the transcription can be stored in altogether 69 fields. By storing detailed information about a relatively large collection of signs, we enable ourselves (among other things) to establish the inventory of phonetic properties, to decide which of these are distinctive, and to make generalizations on combinations of these properties. B.3 Design history We were aware of the fact that probably we would not be the first to undertake such a project. After consulting several colleagues we learned that although there are (or THE SIGNPHON DATABASE 303 have been) quite a number of database projects, most of these have a more general lexicographic goal, implying that the phonological and/or phonetic encoding is usually rather limited. We therefore decided to start from scratch, relying as much as possible on insights into the structure of signs that are available in the theoretical literature on phonetic and phonological features and in encoding systems. The latter mainly consisted of notation systems, such as KOMVA (NSDSK 1988, which is based on Stokoe notation; Stokoe 1960) and HamNoSys (Prillwitz et al. 1989). We started designing a database structure in which every record specifies information on signs in isolation. The coding system (which is briefly outlined in B.5) allows us to encode relatively well-known properties of signs, such as movement shape and major location. However, we deliberately chose to also encode finer phonetic detail from various perspectives, using both familiar and new terminology. The goal of including more detailed and redundant information than most notation systems was to facilitate its use by researchers from various theoretical persuasions, as well as by more practically oriented linguists. Among other things, this implies that no systematic choice has been made for either a perceptually or an articulatorily based encoding: some properties are easier to encode perceptually, others are more tractable when we consider the production side. In many cases, both perspectives were included. During the second half of 1995 we designed the record structure which was implemented by Onno Crasborn and Rob Goedemans in early 1996, using the software package ‘4D First’. After that, we started entering the first 250 signs into the database. We expected that our first experience with storing data would lead to modification of the record structure and to fine-tuning the values necessary for each field. At the sixth conference on Theoretical Issues in Sign Language Research (TISLR6) in Montreal in 1996, we organized a special meeting with the aim of improving the current state of the database. Various specialists on databases and/or the phonology of sign languages were present at this meeting. Specific questions addressed concerned the content of the database (what are the signs that we should best transcribe?) and the way temporal sequencing in signs could be stored. We then designed a second version of SignPhon, which was implemented by Onno Crasborn in 1998 using the ‘4th Dimension’ software. Version 2 of SignPhon has been set up as a relational database (a set of connected databases; this is illustrated in Figure B.1), whereas version 1. was a single (‘flat’) database. As for the content, version 2 differs from version 1 with respect to several fields and their codes. In addition, version 2 allows us to encode various stages in time of a sign separately, thus better capturing the dynamic nature of signs. The choice of the number of stages is independent from a phonological analysis in terms of holds and movements, for example (Liddell & Johnson 1984). SignPhon has not been designed as a language-specific system. Parts of the database have been used in the PhD-project of Lodenir Karnopp, who did a study of 304 APPENDIX B the phonological acquisition of LIBRAS (Brazilian Sign Language; Karnopp 1999). Victoria Nyst used parts of the database for a first analysis of Uganda Sign Language handshapes (Nyst 1999). The database is presently used for a phonological analysis of Flemish Belgian Sign Language. In principle SignPhon can be used for any sign language. Expanding the database with data from other sign languages would make it a highly powerful tool for crosslinguistic investigations. Using a uniform computerized coding system will make it possible to easily compare data from various languages and exchange data with other researchers. SignPhon differs from transcription systems like HamNoSys not only in including non-phonetic information (about the morphology, semantics, and background of the signer), but also in being more detailed and more redundant. Orientation of the articulator, for example, is encoded in great detail (both ‘absolute’ and ‘relative’, Crasborn & Van der Kooij (1997) and in addition to specifications of point of contact. There is also redundancy in the aim to encode both perceptual and articulatory categories. Another difference with transcription systems is that SignPhon allows the user to add categories (or rather values in the fields) without interference of the designer of the database, and without the need to design new notation symbols. Most fields accept any (combination of) alphanumeric character(s), and in our experience this has been especially useful in for instance adding handshape distinctions that we did not foresee. B.4 Structure of the database The first version of the database had a so-called ‘flat structure’. The program contained a number of fields (columns), which concerned different descriptive categories (gloss, translation, handshape, movement, etc.). Each sign formed one record (row) with values for all of the fields. This design was very inflexible: it was not easy to describe multiple segments in the sign (e.g., an initial and a final orientation in an orientation change), or to encode that one sign occurred on multiple videotapes, etc. The new structure was designed with the goal of being more flexible. Especially in the area of the phonetic description in multiple stages in time, we wanted to be able not just to describe very simple signs, or to make gross abstractions enforced by the program. Below we present the new structure of SignPhon in the form of a diagram. It is a relational database design: the information is distributed in different subdatabases, and there need not be a one-to-one correspondence between them. Arrows in the diagram point from ‘one’ to ‘many’: one language has many signs, one sign has multiple sequential stages of phonetic description, one handshape can occur in multiple stages of phonetic description, etc. The ‘articulation’ and ‘compound parts’ parts of the structure have not yet been fully implemented at this time (version 2.22 of the software). THE SIGNPHON DATABASE 305 Languages Information about the language 3 fields Compound Parts Constituent parts of compound signs Articulation Signs Semantic and morphological information Articulatory description in terms of joints states Sources Phonetic Description Phonetic and phonological transcription Occurrences of the sign on video tapes, in books, etc. Signers Handshapes Detailed transcription of handshapes Information about the signer Figure B.1 Macro structure of SignPhon In the description of the record structure below it will become clear that not all fields need to be filled for all signs. In addition, some fields are filled only once for all signs, like the fields in the handshape subdatabase. Generally it takes about 20 minutes to encode one sign. The present number of encoded SLN signs is a little over 3,000 (September 2001). 306 APPENDIX B The central unit for usage is the Signs subdatabase. Most frequently, the database will be used to answer questions like ‘how many signs are there which have property X?’. One sign can be found in multiple places (on different video tapes, in different pictures and drawings in books, etc.). Each occurrence can be registered in the Sources subdatabase. Each signer that is identified in the source of a sign is characterized in the Signers subdatabase. The language in which the sign occurs can be specified in the Language subdatabase. One sign can be described in multiple phases, i.e., stages in time. We call these stages of Phonetic Description. In SignPhon, as many stages are used as is necessary to completely describe the sign. This is independent of the phonological analysis of the sign: a stage of phonetic description does not correspond to a hold segment in the model of Liddell & Johnson (e.g., 1986, 1989) or to an X slot in the Leiden model (e.g., Van der Hulst 1993, 1995b). Thus, SignPhon transcription is similar to IPA transcription of speech sounds: it will take some effort to use the system independently of and unbiased by one’s own phonological analysis. More importantly, conventions and agreement will have to be established by intensive use of the system. (Even so, as in spoken language transcription, it is unlikely that perfect agreement between transcribers can ever be achieved, despite explicit guidelines and intensive training; cf. Vieregge 1985.) Handshapes are described in detail in a separate part of the database, as it would be very inconvenient to have to describe for each sign with a B-hand in terms of its selected fingers and their exact position. Future developments include the addition of the Articulation part, allowing transcription of each degree of freedom of each joint in the arm. The articulation of handshapes, i.e., the state of the joints of the fingers, can already been transcribed in the handshapes section. Further, to enter information on different parts of compounds, a separate subdatabase Compound Parts will be created. B.5 Description of the fields In this section, all the fields are listed from the most important subdatabases. The compound parts and articulation databases have not yet been fully implemented; the languages subdatabase only contains a few fields such as the number of signers, and is only useful if several languages are transcribed in the same data file. Below the most important fields of the operational subdatabases are listed with a few brief comments, in order to provide an idea of the amount of detail that is stored for each sign. As was remarked above, not all fields are relevant to all signs. B.5.1 General fields The fields mentioned here do not constitute a separate subdatabase in the diagram. They surface in most subdatabases and contain administrative information, keeping THE SIGNPHON DATABASE 307 REMARKS track of who entered what and when. Every subdatabase also contains a field, where the user can store an unlimited number of comments. ENTRY DATE ENTERER NAME LAST MODIFICATION DATE MODIFIER NAME REMARKS B.5.2 Signs This subdatabase stores elementary semantic and morphological information. GLOSS LANGUAGE NUMBER NUMBER OF STAGES DUTCH TRANSLATION ENGLISH TRANSLATION SEMANTIC FIELD MOTIVATEDNESS MORPHOLOGY, WORD TYPE BORROWING NUMBER OF PARTS SEQUENTIAL COMPOUND COMPOUND PART[1-5] 308 APPENDIX B B.5.3 Phonetic description The phonetic description subdatabase is the most important database, because here we encode in great detail the shape of the signs, divided over 46 fields. Most field names speak for themselves (if one is familiar with analyzing the formational structure of signs). Some fields with administrative information have been left out of the list below. ARTICULATOR NUMBER RELATION STATIC RELATION DYNAMIC ALTERNATING HANDSHAPE STRONG HANDSHAPE WEAK HANDSHAPE CHANGE PALM ORIENTATION STRONG PALM ORIENTATION WEAK FINGER ORIENTATION STRONG FINGER ORIENTATION WEAK ORIENTATION CHANGE LEADING EDGE LOCATION TYPE PLANE LOCATION STRONG LOCATION WEAK LOCATION SYMMETRY CONTACT MOMENT CONTACT PLACE MONO/BI DIRECTIONAL PATH ON PATH PATH SHAPE MOVEMENT DIRECTION MOVEMENT DIRECTION WEAK SEQUENTIALITY REPETITION SPEED INTENSITY SIZE SIGNING SPACE FACIAL EXPRESSIONS FACIAL MOVEMENT ORAL COMPONENT POSTURE HEAD POSTURE BODY B.5.4 Handshapes This subdatabase allows the transcription of all the attested handshapes. The handshapes that are used in every individual sign are encoded in the phonetic description of that sign in terms of a code (HANDSHAPE CODE ) which refers to this handshape database. The abbreviations MCP, IP, PIP, and DIP refer to the different joints in the fingers and thumb (see Appendix B); these fields include an articulatory description of each handshape. The articulatory information for the rest of the articulator can be made in the articulation subdatabase. PICTURE HANDSHAPE CODE NAME OTHER NAMES CLASSIFIER CLASSIFIER MEANING HANDSHAPE REMARKS HAND ALPHABET SELECTED FINGERS SHAPE SELECTED FINGERS THE SIGNPHON DATABASE 309 SHAPE UNSELECTED FINGERS SPREADING SELECTED FINGERS SPREADING UNSELECTED FINGERS CROSSING THUMB INDEX MCP INDEX PIP INDEX DIP MIDDLE MCP MIDDLE PIP MIDDLE DIP RING MCP RING PIP RING DIP PINKY MCP PINKY PIP PINKY DIP THUMB MCP THUMB IP CONTACT PART POINT OF CONTACT TENSE B.5.5 Sources One can distinguish between signs from different sources, although currently for our own research most of them are recorded especially for the SignPhon project. Other sources might be dictionary CD-ROMs, for example. SOURCE SIGNER DIALECT SWITCH REGISTER CONTEXT COLLECTION B.5.6 Signers In the Signers subdatabase, we keep track of as much information about the informants as might be relevant. SIGNER CODE FIRST NAME SURNAME SEX HANDEDNESS BIRTH YEAR HEARING GRANDPARENTS PARENTS SIBLINGS CHILDREN 309 310 PARTNER FRIENDS FIRST LANGUAGE SECOND LANGUAGE AGE OF ACQUISITION SCHOOL REGION CURRENT RESIDENCE APPENDIX B A detailed description of the database structure and of the implementation can be found in the SignPhon Manual which is available on the SignPhon web site (http: //www.leidenuniv.nl/hil/sign-lang/signphon2.html). B.6 Data collection The signs stored in the SignPhon database that were used in this study are all signs in isolation (citation forms). A citation form is the answer to the question: ‘what is the sign for X?’, where ‘X’ was a written Dutch word. The citation form yields a particular prosodic context that may not (often) be found in current signing. However, looking at signs in isolation is a necessary start for a phonological analysis of lexical signs. The procedure we followed in the recording of the data stored in SignPhon was to offer a written Dutch word to the consultant, who first signed the sign in isolation and then made up a ghost story, situation or sentence containing that sign. Sometimes the informant did not know of a good sign translation for the written word that was offered; in that case the word was skipped. These signs in isolation and the sentences or stories following them were recorded on videotape. Both the signs in isolation and the sentences have been stored as digitized QuickTime files, but only the former were encoded in the database.282 The video files were point of departure for the encoding of the signs in the database. This encoding appeared extremely time consuming. An experienced transcriber need about 20 minutes to fully describe a single sign. For over two years we benefited from the help of a hearing research assistant who was also involved in the development of the database. Later, two Deaf research assistants pursued the transcription work for several months. The criteria for the set of Dutch words that we used as elicitation material were threefold. Firstly, the set of words should include the set of most frequent (type and/or token) lexemes of SLN. Secondly, no phonological or phonetic criteria could be employed, as one of the goals of the database and of the research project was to 282 The video files have not been integrated in the SignPhon structure as yet. At this point, the video fragments have to be played back by another application such as QuickTime Player. In the future SignPhon could be linked to a database of digitized video fragments so that for each occurrence in SignPhon (a ‘source’, see below), the actual realization of the sign can be consulted. THE SIGNPHON DATABASE 311 look at relative frequency of formal elements. Finally, the words had to originate from different semantic fields, as we acknowledged the fact that in sign languages more than in spoken languages formal elements can be associated to specific semantic fields. These criteria materialized as follows. Since there was no information on the set of most frequent signs of SLN, and no corpus of (coded) current signing is available to make frequency counts, we started with the signs of one dictionary (KOMVA 1989). We also tried to find out what the criteria were for storing signs in some other dictionaries (e.g., of BSL and ASL), and we realized that translating these dictionaries to Dutch would involve many culture-specific terms. As we did have easy access to frequency counts of words in written Dutch texts, we decided to use the 30,000 most frequent term of this corpus, the CELEX database (Baayen et al. 1995). We added some items to this list that appear to be specifically frequent in Deaf culture, such as ‘TTY’ (teksttelefoon). We also removed many items from the list, such as grammatical items that do not exist in SLN, because of the difference in grammatical structure. A complete list is given in (B.1). (B.1) Items that were removed from the list of most-frequent Dutch words • • • • • • • • • • • • • • Words that were already stored in SignPhon Words referring to sounds, such as knisperen and tsjilpen Archaic or difficult coordination words such as noch... noch, aldus, namelijk Local and temporal items starting with er, such as ervoor , erna, erbij ‘Difficult’ or specialist terms, such as anemie Names, except for country names and city names Counting words Inflected verbs Inflected adjectives Diminutives Parts of expressions such as: (ter) sprake, (ons) inziens, (als het) ware Plurals of count nouns Interjections, such as trouwens, he, toch Words ending in -heid such as: oneindigheid B.7 Drawbacks and improvements In the course of the PhD projects of Els van der Kooij (this thesis) and Onno Crasborn (2001), the main types of queries concerned frequency distributions of formal elements (for instance; handshapes, points of contact, location types), and of combinations of elements (for instance: all handshapes in neutral space or on the head). The database also turned out to be useful in collecting data for specific investigations. For instance, it can be used to answer the question what proportion of 312 APPENDIX B signs are made near the temple that are not in the semantic field of ‘mental state or activity’. A major drawback of the database was its content in relation to the goal and the type of research that we performed with it. All signs that were stored in immense phonetic detail were based on a single phonetic instance articulated by a single signer. Ideally, to carry out phonological analyses, one would want to compare different instances of the same sign, signed by various signers in various contexts. In principle this can be done in SignPhon, even though the need would arise for more sophisticated search facilities than presently implemented: comparing the content of the fields for different signs is a laborious task in the present implementation. The encoding of signs would also benefit from such search facilities. In the current encoding procedure the encoder had no easy way to check if the sign at hand was already encoded. The database currently warns the encoder if a sign is already present with the same gloss; but as this gloss is just a label for the sign, it might also be the case that the phonetic form itself is already present, but with a different gloss label. At this point it is not possible to automatically compare different records in the database. In the ideal case, one should be able to select a sign and search for similar signs, where the comparison criteria could be chosen from a set of predefined characteristics or composed by the user. Currently one has to do this by searching for a combination of criteria that is featured in the sign that one wants to compare. To determine the nature and precise content of phonological features the comparison of several instances of a sign is desirable. Given our experience that it takes about 20 minutes to transcribe one sign, it is obvious that building such a corpus of transcriptions in SignPhon would require the investment of a great deal of time and money. Further, when hypotheses about details of a sign’s form become specific and detailed enough to do quantitative research, using transcriptions may not be the best methodology. Actual measurements of the articulation of many different tokens then becomes a better strategy. A more fundamental problem also becomes apparent during the design of a notation system or a database like SignPhon, which one could refer to as ‘the database paradox’. In order to design the system, and decide on the categories that can be distinguished by the user, one should already have the results of the research that the transcription system is designed for. Every transcription is an act of analysis – and this holds a fortiori for the design of a transcription system. SignPhon is used (among other purposes) to determine the distinctivity vs. predictability of phonetic features in a sign language. However, one can never know in advance what these properties are, and how many values along a certain scale might be ‘used’ in the language. We tried to address this paradox by including most of the distinctions that we found in the literature on ASL and other sign languages, and by not striving towards the elimination of all redundancy. To reduce the time needed to transcribe items, it was also necessary to limit the number of distinctions that can be made in each field. The database is designed so that in most fields, the user can add values that appear to occur in the language that is being transcribed.

Related docs
APPENDIX B
Views: 0  |  Downloads: 0
appendix a
Views: 11  |  Downloads: 1
appendix-b
Views: 4  |  Downloads: 0
APPENDIX B
Views: 4  |  Downloads: 0
appendix b
Views: 0  |  Downloads: 0
appendix b
Views: 1  |  Downloads: 0
APPENDIX B
Views: 1  |  Downloads: 0
appendix a
Views: 2  |  Downloads: 0
appendix a
Views: 15  |  Downloads: 0
Appendix-B
Views: 2  |  Downloads: 0
Appendix-B
Views: 0  |  Downloads: 0
APPENDIX-B
Views: 3  |  Downloads: 0
Appendix_B
Views: 6  |  Downloads: 0
Other docs by Shame Ona
Sample Business Plan grandosity
Views: 288  |  Downloads: 4
Sample Business Plan International Mobile Payment
Views: 563  |  Downloads: 40
Sample Business Plan Professo
Views: 502  |  Downloads: 29
Sample Market Analysis Zif Medical Devices
Views: 479  |  Downloads: 16
Sample Operational Plan Airex
Views: 543  |  Downloads: 15
Sample Financial Plan Time Merchants
Views: 235  |  Downloads: 1
OSHA PARTNERSHIPS PRODUCE RESULTS
Views: 134  |  Downloads: 0
THE FRENCH REVOLUTION A HISTORY
Views: 250  |  Downloads: 7
Sample Business Description Airex
Views: 2217  |  Downloads: 16
Sample Business Plan Froghop
Views: 370  |  Downloads: 9