Lexical Frequency in Sign Languages

Journal of Deaf Studies and Deaf Education Lexical Frequency in Sign Languages Trevor Johnston* Macquarie University Received April 3, 2011; revision...
Author: Lorena Garrison
11 downloads 2 Views 636KB Size
Journal of Deaf Studies and Deaf Education

Lexical Frequency in Sign Languages Trevor Johnston* Macquarie University Received April 3, 2011; revisions received June 13, 2011; accepted June 16, 2011

Measures of lexical frequency presuppose the existence of corpora, but true machine-readable corpora of sign languages (SLs) are only now being created. Lexical frequency ratings for SLs are needed because there has been a heavy reliance on the interpretation of results of psycholinguistic and neurolinguistic experiments in the SL research literature; yet, these experiments have been conducted without the benefit of such measures. In addition, measures of lexical frequency can also guide SL teachers by identifying which signs would be prioritized in early language instruction. I begin by a discussion of lexicalization and sign types in order to explain what constitutes a lexical sign in SLs. I then present the annotation method and results. In the discussion, I raise the potential limitations of previous studies of lexical frequency in terms of the discrimination of lexical signs from other kinds of signs, consistent lemma glossing, part of speech tagging, and the systematic treatment of depicting signs. I conclude in cautioning that descriptions of SL grammars that do not accommodate typical mixtures and sequences of signs as shown in data are likely to be unreliable.

The study of sign languages (SLs) is important for identifying properties common to all human languages, a central concern of linguistics. Because data from SLs are crucial in establishing the validity of generalizations about human language, which have been based solely on spoken languages (SpLs), it is valid to ask just how reliable the SL data may be. Data used to inform linguistic theory and language description are derived from three main sources: (a) the study of the natural SpL (and the written language, when available) of a speech community in typical usage contexts, and by the comparison of languages with each other; (b) the elicitation of meaning, use, and grammatical acceptability judgments from members of a speech *Correspondence should be sent to Trevor Johnston, Department of Linguistics, Macquarie University, Balaclava Road, North Ryde, Sydney 2109, New South Wales, Australia (e-mail: [email protected]).

community on the constructions that are posited for language observed in (a); and (c) the results of experimental studies on language production and perception to test directly or indirectly generalizations generated in (a) and (b), as well as to generate additional generalizations about language and its processing in the brain. In the normal course of events, psychologists and neuroscientists enter the scene ‘‘late’’ in the process of language identification and description as they need to draw on observations and facts gleaned from (a) and (b) before designing experiments. It is thus not surprising that the vast bulk of experiments in the language sciences are conducted using well-described, well-documented languages. Recently identified and poorly documented languages are rarely the subject of these studies. On all three fronts, the study of SLs appears to be an exception. With respect to psycholinguistics broadly understood, ever since the early seminal studies of Klima and Bellugi (1979), experimental studies of SL users have been conducted just as much to establish the facts of SLs as to test claims about language structure and use arrived at through linguistic analysis of normal language output or through intuitions and introspection. Experimental studies of this type have made up a significant proportion of the SL linguistics literature.1 With respect to elicitations, intuitions, and introspection, the vast majority of SL grammatical descriptions have been based on the judgments of extremely small sets of native users, rather than the analysis of representative samples of natural language output. (It goes without saying that relatively few SL linguists have themselves been native signers.) Most grammars and teaching materials are written from this perspective and informed by this type of

Ó The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

doi:10.1093/deafed/enr036 Advance Access publication on August 12, 2011

164

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

language research. With respect to the description and analysis of natural language output (recordings of signing or transcriptions thereof), the examples of this within the SL research literature are few and far between. Datasets have been miniscule; corpora, in the strictest sense of the word, have been virtually nonexistent. In this regard, it is important to remember that a modern linguistic corpus is not simply a dataset upon which a particular linguistic analysis has been based. This is not the sense of ‘‘corpus’’ that is being used in this study. The sense of corpus I intend here is that which constitutes the type of language data identified in (a) above: a collection of texts in a given language that is representative, comprehensive, accessible, machine–readable, and searchable (McEnery, Xiao, & Tono, 2006). How this applies to the creation of SL corpora is discussed in detail in Johnston (2010b).

Why Frequency Is Important Frequency is important in applied linguistics, language description and linguistic theory, and research into the processing of language in the brain. Uncontroversially, in applied linguistics, for example, in language teaching, lexical frequency is used to inform curriculum design so that L2 learners are taught lexical items that are most commonly used first, before being exposed to less frequent or rare vocabulary (Conrad, 2005; Nation, 2001). With respect to language description and linguistic theory, frequency is important because it is only through the identification of recurrent patters of structure (form/meaning pairings) can one reliably distinguish collocations, formulaic language, and idioms from the more schematic or grammatical constructions of a language, and then the degree of centrality or codification of the latter within in the grammatical system, including their obligatoriness (Goldberg, 2006; Wray, 2002; Wulff, 2010). These constructions range in size and specificity from individual morphemes or words, through phrases and clauses, to larger discourse units (Croft & Cruse, 2004). In other words, one needs to identify tokens of relevant symbolic units as instances of symbolic types, as appropriate. At the most basic level, one needs to know the composition of the lexicon of

a language. In lexicography, if nothing else, this means that any basic dictionary of a language should prioritize the highest frequency words of a language. Arriving at these frequencies invariably involves the analysis of corpora of some kind. Frequency also helps us track the association of one construction with another, and the changes in both form and meaning that these elements may be undergoing (e.g., processes of language change such as grammaticalization; Bybee, 2010). With respect to the processing of language in the brain, the lexical frequency of test items is important in psycholinguistic research as it has long been known to affect, for example, tasks involving lexical retrieval and recognition, and phonological and grammatical processing. Indeed, it is a commonplace within psycholinguistics that there are frequency effects that must be taken into account when evaluating language processing tasks, that is, common words need to be identified and controlled for because they are recognized quicker and more accurately than other words. As Morford and MacFarlane (2003, p. 213) state, ‘‘[f]requency effects are so ubiquitous in language processing tasks that no peer-reviewed journal would accept a psycholinguistic study of a spoken language for publication if the study didn’t control for word frequency.’’ Due to the weight that has been given and still is given to psycholinguistic research into SL processing in SL linguistics—not just for comparing SLs with SpLs but also for establishing the lexical or grammatical status of various aspects of SL structure—measures of lexical frequency in SLs are vital for robust research protocols and the correct interpretation of results. For all these reasons, the question of lexical frequency in the field of SL research is thus relatively important. Previous Research Lexical frequency is operationalized by the frequency of occurrence in written or SpL corpora. Not surprisingly, there have only been two previous studies of lexical frequency in SLs because no true machine-readable linguistic corpus of any sign language has been created—or at least commenced—until relatively recently (Johnston, 2010b), for example, corpora of Australian, Dutch, Irish, British, German, French, and Swedish SLs. The first study used a very small database of 4,111 sign

Lexical Frequency in SLs 165

tokens taken from transcripts based on commercially available videotapes of American SL (ASL; Morford & MacFarlane, 2003), with examples of narratives and casual and formal signing styles. The second was considerably larger and used a database of 100,000 sign tokens, representing 7,222 sign types, taken from transcripts from the Wellington Corpus of New Zealand SL (NZSL; (McKee & Kennedy, 2006). The corpus was purpose created and included video recordings of committee meetings, conversations, narratives, speeches, and group discussions on a range of topics. Both studies reported that indexical signs were the most frequent kind of sign. Frozen lexical signs accounted for the bulk of the remaining signs, followed by fingerspellings: 6.9%2 (NZSL) compared with 9.4% (ASL). The ASL study coded for a number of kinds of sign, including so-called classifiers3 (signs that are not frozen lexical signs and that are not indexical signs as defined in these studies). They found that the frequency of use of classifiers, as a subcategory, varied considerably according to genre, being most frequent at 17% of the tokens in the narratives and least frequent at 0.9% in the formal texts. They concluded that genre may be a very important consideration when determining lexical frequency of various kinds of sign, such as classifiers, and not just content signs. (The impact of the topic of source texts on lexical frequency in small corpora, signed or spoken, is a well-known and unremarkable fact: the larger the corpus, the less any particular text or genre can skew the count toward topic-specific content vocabulary.) The NZSL corpus also coded for and included some nonmanual signals and mimed actions (gestures or enactments of some kind), but the study made no specific report of their overall frequency in the data. Both studies observed that few high-frequency lexical signs were grammatical (function) signs when compared with English; rather, they are content signs. Thus, each concluded their SL—ASL and NZSL, respectively—had a high lexical density because there was a higher proportion of content words to function words in their respective SLs than, say, in English or SpLs generally. (Lexical density is a measure of the ratio of content words to function words in specific text types or in a language as a whole as reflected in a representative corpus.) The NZSL study also observed many more verbs in the top

350 most frequent signs when compared with the top 350 words in English. Other previous research has focused on familiarity ratings as a surrogate for frequency counts. Given the absence of corpus-based lexical frequency counts for SLs, and given its importance for valid psycholinguistic research, researchers have attempted to compensate by using subjective familiarity ratings for lexical items in SLs. This exploits an apparent correlation of word familiarity with objective corpus-based frequency counts in English (Balota, Pilotti, & Cortesse, 2001). Subjective familiarity ratings have been created for ASL (as cited in Emmorey, 2002, but unpublished), Spanish SL (Carreiras, Guitie´rrez-Sigut, Baquero, & Cornia, 2008), and British SL (BSL; Vinson, Cormier, Denmark, Schembri, & Vigliocco, 2008). I return to these measures for comparison after presenting the results of this study. This Study This article reports the results of a study of sign frequency in Australian SL (Auslan). It aims to provide a basic account of lexical frequency that can serve as a resource for researchers wishing to conduct psycholinguistic experiments with Auslan users, and as a guide to teachers of Auslan identifying which signs should be prioritized in early language instruction, to both children and adults. This article also aims to describe the makeup of typical texts in Auslan according to general sign type. In this way, I believe that researchers and educators of Auslan, and indeed other SLs, will have a better sense of how meaning is constructed in native SLs. This study differs from the cited ASL and NZSL studies in two important ways. First, it is based on the analysis of a digital video corpus of an SL, using timealigned glosses and annotations; and it uses, as this article will explain and exemplify, a system of glossing conventions for type/token matching that yields more reliable counts of sign types and thus lexical frequency. Thus, this article also aims to detail the way in which a lexical frequency study of an SL should be conducted. Background It would be impossible to conduct an investigation of lexical frequency in an SL without a clear idea of

166

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

what constituted a sign. Moreover, one needs to know if the signs produced in a recorded language sample are conventional symbolic units and, if so, if they are all conventional in the same way or to the same degree. Without a clear notion of what constitutes a token and what types they may be instances of, there is no possibility of conducting a rigorous count. However, a comprehensive and detailed discussion of the question of lexicalization in SLs is beyond the scope of this study. Rather, in this section, I simply summarize the main issues as presented in Johnston and Schembri (1999, 2010), Johnston and Ferrara (2010), and Johnston (2010a), insofar as it is needed to understand the terminology that underlies the coding used in extracting the frequency counts presented in this article. Lexicalization in SLs The signs uttered when communicating in an SL are not all of the same kind. Unproblematically, from one point of view, the conventionalized units of an SL can be divided into the two broad classes, just as they can in SpLs: an open class of content (or lexical) signs/words and a closed class of function (or grammatical) signs/ words. Discovering the language-specific conventionalized semantic content of these forms is an empirical question, often requiring detailed textual analysis and fieldwork. Nonetheless, this is a relatively straightforward procedure. (Surprisingly, most SL researchers do not have access to comprehensive lexical databases of the SL or SLs they deal with, as most SLs have not been well documented in this regard.) From another point of view, however, there is a further word- or sign-level distinction that needs to be made for SLs: a distinction between fully lexical signs and partly lexical signs. The existence of this other dimension in which a distinction needs to be made has proved problematic for SL lexicographers and linguists alike. In this second-order distinction, a fully lexical sign is roughly equivalent to the commonsense notion of word generally used to refer to the conventionalized minimal form/meaning pairings found in a language (i.e., the free morphemes). Of course, a fully lexical sign may be either a content sign or a function sign. Fully lexical signs constitute the listable lexicon of an SL in the strictest sense of the word. Partly lexical signs, in contrast, are signs that, even though conven-

tionalized at the level of the form and meaning of some or all of their parameters, do not have associated with them in any usage event a meaning that is additional to or unpredictable from the value of those components given the context of the usage event (see Table 1). Lexicalization in SLs essentially occurs when a signed unit acquires a clearly identifiable and replicable citation form that is regularly and strongly associated with a meaning that is more specific than the sign’s componential meaning potential, even when cited out of context; cannot be predicted based on these components alone; or is quite unrelated to its componential meaning potential, that is, it may be arbitrary. There remains a third category—nonlexical signs or gestures. In this context, I mean by gesture any intentional communicative bodily act (both manual and nonmanual) with little or minimal conventionalization of meaning and form (though a shared culture tends to regularize many common gestural forms; see Figure 1). Gestures rely on context to be construed as signs and to be correctly interpreted (e.g., that an instance of the articulation in Figure 1 is actually a dismissive gesture, rather than, say, an attempt to disperse some cigarette smoke). Gestures can fulfill a range of functions in SLs and SpLs: They may act as or substitute for a verb or a noun, they may augment or modify the meaning of nouns and verbs, they may modulate and express the mood or attitude of the speaker, and they may regulate the discourse and interaction. If a mimetic enactment or iconic depiction found in an SL text is similar to the type of production typical of hearing nonsigners in the same culture in a similar communicative situation, it is assumed the act is gestural. Of course, the highly conventionalized gestures found in speech communities are not gestures in this sense; they are signs or, more precisely, emblems (Kendon, 2004). Within the embedded SL-using community, these emblems are indistinguishable from other conventional lexical signs. Pointing Signs and Depicting Signs There are two major kinds of partly lexical signs—deictic (indexical) pointing signs and depicting (‘‘classifier’’) signs—and there are subkinds of both. Subkinds of pointing signs can function as pronouns, locatives, determiners, and possessives. Subkinds of depicting signs can show the location of an entity, the displacement of an

Lexical Frequency in SLs 167 Table 1 A fully lexical and a partly lexical sign in Auslan, compared Fully lexical sign

Fully lexical meaning

Partly lexical sign

As a noun 1. The choice you make at an election, or at a meeting where decisions are made. English 5 vote. 2. An organized process in which people vote to choose a person or group of people to hold an official position or to represent them in government. English 5 election. As a verb 1. To make your choice in an election or at a meeting, usually be writing on a piece of paper. English 5 vote. 2. To choose a person to hold an official position or to represent you in government by voting. English 5 elect.

n/a

Partly lexical meaning

‘‘Put something small into a cylindrical container, or any thing or activity associated with this’’

‘‘Eat/put-in-mouth something small from a cylindrical container, or any thing or activity associated with this’’

Contextual meanings that complete partly lexical meaning

Only if context forces abandonment of default fully lexical meaning and where context motivates and narrows interpretation to: money-box, put coin in money-box; sewing-kit, put something into sewing-kit; pin-cushion, put pin into pin-cushion; drill-bit; crane lowers drill-bit into wellhead; and so on .

Only where context motivates and narrows interpretation to; popcorn, eat popcorn; nuts, eat nuts; nibbles, nibble; finger food, eat finger food; pin-in-mouth, take pin from pin-cushion and place in between your lips; and so on .

Corpus gloss

VOTE

entity, the size and shape of an entity, the handling of an entity, or, finally, act as a real or metaphoric ground or point of reference for any of the other kinds. Both pointing and depicting signs have type and token characteristics. The type characteristic of points is the handshape. Pointing signs in Auslan are cited with an extended index finger with the tip of the finger

DSH(F): describe-as-appropriate

directed toward the referent (a fist or flat handshape with the palm directed to the referent for possessives). However, all other aspects of a pointing sign are unspecified and dependent on spatial mappings within the signing space leading to a multitude of forms (tokens). The type characteristics for depicting signs involve handshape and usually orientation, especially for

168

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Figure 1

An example of a gesture.

depictions that show the location of an entity, the displacement of an entity, and the size and shape of an entity. In this article, the term ‘‘partly lexical signs’’ refers to both pointing and depicting signs. Within the context of describing lexical frequency in SLs, it should be evident that depicting signs in particular should not be ignored. In this sense, I agree with Morford & MacFarlane (2003, p. 219) that ‘‘classifiers are an important lexical resource for ASL, and so a study that did not include classifiers would not be representative of the ASL lexicon.’’ Clearly, in order to make proper comparisons between the lexicon of Auslan and the lexicon of other SLs or SpLs, one should identify and count all kinds of signs. Method The Auslan Archive and the Auslan Corpus The data were taken from the Auslan Archive (Johnston, 2008a), which consists of approximately 300 hr of digital video recordings of naturalistic signing by 255 deaf participants, edited into approximately 1,100 video clips suitable for detailed transcription and annotation. Less than 30% of these video clips, representing about 5–10% of the total edited hours available in the archive, have to date (March 2011) been annotated (i.e., tokenized and assigned glosses) using ELAN (a multimedia annotation software program). This subset of annotated data constitutes the study corpus (henceforth simply the ‘‘Auslan Corpus’’). Frequency counts were based on automatically generated statistics within ELAN, the results

of queries and sorts made in ELAN, or by exporting glosses into database programs, such as Excel, where they were then processed further. A total of 63,436 signs produced in 360 clips by 109 different signers were examined. The texts in these randomly chosen clips ranged from very brief sessions in which participants identified themselves (giving their age and name, name sign and the explanation for their name sign when available, and the basic details of their childhood and schooling), to much longer retells of prepared narratives based on a previously read text, spontaneous narratives produced during free conversation, descriptions of a cartoon viewed on screen, and responses to a face-to-face questionnaire on issues relating to the deaf community. The Annotation Schema Signs were annotated according to the principles outlined in Johnston (2008b, 2010b). Basically, the texts were segmented (tokenized) primarily for manual signs, though some nonmanual signs were also identified. These signs were then assigned an identifying label—a unique gloss-based annotation known as an ID gloss. ID glosses regularize type/token distinctions within a signed text. Essentially, this involves distinguishing between fully conventional language-specific signs—what were defined as fully lexical signs above—from other kinds of manual signs such as partly lexical signs (points and depictions) and nonlexical signs (gestures). The Auslan Corpus annotators included deaf and hearing native signers and professional Auslan interpreters. All fully lexical signs are glossed using an ID gloss as assigned in Auslan Signbank (www.auslan.org.au), an Internetbased dictionary that can receive user feedback. This dictionary has been online since 2004 and the lexical database that lies behind it has informed several editions of the Auslan dictionary since the initial print dictionary in 1987. It has been constantly refined and updated since that time based on user and community feedback. Each annotator had online access to Auslan Signbank and this enabled them to use consistently the assigned ID gloss for each fully lexical sign. Glosses for partly lexical and nonlexical signs are preceded by letter prefixes that distinguish them from fully lexical signs, as described in the Annotation Guidelines (Johnston, 2010c) and summarized in Table 2. Name signs and fingerspelling routines

Lexical Frequency in SLs 169

are also identified with special prefixes and use their own glossing conventions. Of relevance to the discussion of the frequency results presented here is the convention that deictic pointing signs that function as pronouns, determiners, and locatives are further specified with the codes PRO1, PRO2, PRO3, LOC, and DET after the PT. Similarly, deictic possessives are specified with the codes POSS1, POSS2, and POSS3 after the PT.4 All depicting sign glosses are prefixed with DS, which is followed by letter codes for the type-like characteristics of the depiction—L for locative, M for movement and displacement, H for handling, S for size and shape or descriptive, and G for ‘‘ground.’’ The configuration and orientation of the handshape are placed in the added parentheses, for example, the index finger handshape held vertically is coded as (1VERT). Contrasting with this is the second part of the annotation, which comes after a colon. It describes the meaning of the sign, such as the entity referred to, and the type of displacement, movement, location, shape, action, or handling action that is depicted, thus: DSM(1-VERT):YACHT-PASS-MAST-SWAYING. A similar approach is adopted in the annotation for the most common gestures. It should be remembered that multimedia annotation is a difficult and time-consuming activity. Consequently, details are added progressively to files, over time, by more than one annotator (and reviewer). Naturally, one annotates from the general to the particular. For example, at first points, depictions, and gestures are often simply identified (as PT, DS:description, and G:description); it is only later that these are further

specified, as described above. At the time of writing, all points in the corpus have been given a detailed annotation. However, not all the depicting signs and the gestures have been given a detailed treatment. Therefore, I am as yet unable to report in detail about the frequencies of different subkinds of depictions and gestures, nor with regard to their distributions across genres, but I can give global statistics for gestures and depicting signs. Each glossed file was reviewed by at least one other researcher to confirm type identification and consistent ID glossing. If the annotator was not a native signer, the reviewer was. Four native signers have been involved in the process (one of whom is the author). In fact, most files have been reviewed more than once by multiple researchers over months or years (annotations began in 2005) because the Auslan corpus is being constantly enriched with detailed linguistic annotations, for example, tagging for grammatical class (‘‘part of speech’’), spatial and directional modification, movement modification for aspect, eye gaze behavior, and so on. The resulting gloss annotations have almost universal agreement with regard to the identification of fully lexical signs. There was some unavoidable and expected disagreement in grammatical class tagging because one must grammatically parse a text as a part of doing this—a given string (phrase, clause, sentence) may be parsed by different researchers in slightly different ways, yielding alternative grammatical class identification for some signs. Of the subset of annotation files that have grammatical class tagging for ID glosses (see Rank Frequency of Fully Lexical Signs by Grammatical Class section) rates of disagreement ranged from a high of 1 in 10 signs in one

Table 2 Key to prefixing abbreviations used in glossing in the Auslan Corpus Abbreviation

Meaning

PT: FS: DS: NS: G: G(NMS):

An indexical or pointing sign A fingerspelling A depicting sign A name sign A manual gesture A global description of a nonmanual gesture involving the face and head when it is the only activity occurring (i.e., there is no cooccurring manual sign)a A manual gesture that is an enactment as part of a period of constructed actionb

G(CA): a

Other tiers within the corpus in ELAN code in detail all nonmanual activity (including mouthings) with or without the co-occurrence of a manual sign. The exact duration of the constructed action and the character/entity assumed is coded on other tiers in the ELAN file.

b

170

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

difficult text, to 1 in 20 in the more straightforward texts. For similar reasons, the subcategorization of pointing signs can never, and did not, yield complete agreement. These ranges do not affect the general observations being made in this study. Finally, the author conducted the regularization of depicting and gestural glosses described below (see Depicting Signs Revisited section). Results Even though at 63,436 tokens (as at March 2011) the Auslan corpus is the second largest SL corpus to be reported on in the literature, it is still relatively small when compared with the millions, tens of millions, and even hundreds of millions of words found in a typical modern linguistic corpus. One must, therefore, be cautious about the types of generalizations that can be made from such a small corpus. Nonetheless, some important observations and comparisons can still be made. Overall Percentage Distribution of Different Kinds of Signs The first type of analysis is the frequency of each kind of sign across the corpus. As Table 3 shows, almost one third of all signs are not fully lexical signs. The majority of these are points or depicting signs, but a sizeable proportion (6.5%) are gestures. One can immediately see, therefore, that any stretch of text in Auslan makes use of significant numbers of gestural elements. These are involved in regulating the flow of the interaction, conveying emotion and attitudes, engaging in enactments or mimetic behavior (the signer acts out something rather than convey the same information using fully conventionalized lexical signs), or engaging in idiosyncratic minimally conventionalized representations (rather than using some conventionalized elements in a complex depiction). There are strong similarities between the Auslan and ASL data and the differences that do exist could easily be attributed to different glossing and transcription practices.5 For example, it is likely that many of the gestural elements in the ASL texts were not transcribed, except the most common forms like the upturned open hand (glossed as WELL! in the ASL corpus),6 or were transcribed as another type of sign, usually a ‘‘classifier.’’ (A distinction between mimetic

gestures and handling depictions—so-called handling classifiers—is often difficult to make.) However, it may also reflect the small sizes of the corpora, especially the ASL corpus. In this respect, the source of the data, that is, the genre (or topic) and the register, may impact on not only the frequency of specific content signs but also the kind of signs that occur. Only a suggestive comparison of text type within the Auslan Corpus and between Auslan and ASL is possible, as the categories used in the ASL study (formal, casual, and narrative) actually vary in two dimensions: genre and register. For the purposes of this comparison, the interview/questionnaire and self-identification session were classified as formal texts (containing 38.7% of corpus tokens), the stories or anecdotes produced spontaneously during a period of casual free conversation were classified as casual texts (20.2% of tokens), and the story retells were classified as narratives (41.1% of tokens). Nonetheless, the data do indicate some variation according to these tentative categories (see Table 4). In both SLs, there appears to be an increased depicting sign use from formal to casual register, and then in the narrative genre; and decreased pointing sign use from a high in casual signing, lower in formal texts, and the lowest in the narratives. Perhaps surprisingly, given longheld assumptions by SL linguistics that formal registers or high prestige forms of SLs are, or are likely to be, more influenced by the majority SpLs than other more casual registers or forms—leading to higher rates of fingerspelling in the former (Deuchar, 1978; Sutton-Spence, Woll, & Allsop, 1990; Woodward, 1973),7 both datasets show the highest rates of fingerspelling in the casual texts. However, given the small sizes of the corpora and the overlapping and tentative categories represented in the dataset, I believe it is premature to conclude that this type of distribution is robust. It may not persist as the size and representativeness of SL corpora continue to grow. With respect to gestures, in the Auslan data in Table 4, there is a higher percentage of gestures in the formal texts compared with the narratives and the casual texts. This may appear surprising because it might be supposed that one would gesture less in a formal situation, whether signed or spoken. Indeed, Quinto-Pozos and Mehta (2010) report that the

Lexical Frequency in SLs 171 Table 3

Corpus distribution of kinds of signs in Auslan and American Sign Language (ASL)

Sign type

Auslan (n 5 63,436), %

ASLa (n 5 4,111), %

Fully lexical (frozen, including numbers) Fully lexical (fingerspelling) Fully lexical (name signs) Partly lexical (pointing/indexical, including possessives) Partly lexical (depicting/classifiers) Nonlexical (gestures, including fragments)

65.0 5.0 0.2 12.3 11.0 6.5

73.2 6.4 2.3 13.8 4.2 0.2

a

Data taken from Morford and MacFarlane (2003).

presence and intensity of constructed action—a type of mimetic gesture coded as G(CA)—may be less in formal registers.8 This may well be true of the gestural events considered to be enactments, but in the interviews in the Auslan Corpus many of the gestures used appear to be of a different kind. They tend to express regulation of the interaction by the signer: hedging, qualifying, searching for the right sign, eliciting empathy and agreement, and so on. Because all gestures have yet to be comprehensively tagged for subtype in the Auslan Corpus—G:, G(NMS), and G(CA)—I am unable to identify all instances of constructed action separately at this time. With respect to the register itself, examples of texts from unambiguously formal registers, such as talks, addresses to meetings, sermons, and so one, are not currently part of the Auslan Corpus, so it is not yet possible to compare or comment further on this issue. A comparison of Auslan and ASL in this regard is also inconclusive with respect to gesture: either speakers of Auslan use far more gesture in all environments than ASL signers, or the way the data have been glossed and annotated in both datasets is quite different. Larger corpora with linked media files are needed to clarify the situation.

Overall Rank Frequency of Types and Their Apparent Grammatical Class In the Auslan Corpus, the majority of the signers are right-hand dominant. Of the 63,436 sign tokens in the dataset, approximately 10% were produced by lefthand dominant signers. Because a comparison of subsets of data between right- and left-hand dominant signers conducted for a previous study (de Beuzeville, Johnston, & Schembri, 2009) showed no impact of handedness on the overall percentage of the different kinds of signs or their relative distribution on strong and weak hands, and because the searching, exporting, and sorting routines used in ELAN and other databases for this study are simpler and more reliable without attempting to combine two parallel datasets, I henceforth report on the 55,859 sign tokens from the right-hand dominant signers only. The 55,859 sign tokens in the Auslan dataset represented approximately 6,171 types that ranged in frequency from a high of 50.7 per 1,000 for the first-person indexical point (PT:PRO1) to a low of 0.017 per 1,000 for each of 3,606 hapax legomena (signs that occur only once in the corpus), which are 58% of all types but only 7% of all tokens. Many of

Table 4 Distribution of different kinds of signs in Auslan and American Sign Language (ASL) according to genre and register Auslana

ASL

Sign type

Casual (n 5 11,485), %

Formal (n 5 22,099), %

Narrative (n 5 23,401), %

Casual (n 5 1,969), %

Formal (n 5 1,363), %

Narrative (n 5 799), %

‘‘Frozen’’ Fingerspelling Name signs Pointing Depicting Gestures

64.1 6.4 0.2 16.1 7.3 5.9

69.4 4.7 0.5 15 1.6 8.8

60.7 5.1 0.0 7.4 21.4 5.4

68.5 8.7 4.2 17.3 1.1 0.2

80.2 4.8 0.7 13.4 0.9 0.1

72.2 3.3 0.3 5.8 17.7 0.6

a

The total number of tokens is fewer than the Auslan Corpus as some elicited material, which are also in the study data set and were excluded in this count as they did not fall into these three categories.

172

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

the hapax legomena are unlexicalized fingerspellings, gestures, and as yet unregularized depicting signs (regularization is explained in the Discussion section). Table 5 lists the 100 most frequent signs in the Auslan Corpus in rank order (see the Appendix for a list of the top 300 signs). Together they represent 53% of all tokens in the corpus. Deictic (or indexical) pointing signs that appear to function as pronouns, locatives, determiners, and possessives—PRO1, PRO2, PRO3, DET, LOC, POSS1, POSS2, and POSS3—have been identified accordingly. However, second- and thirdperson points have been collapsed into one category each, partly because there are so few second-person forms and partly because it has yet to be established if there is any formal difference between these two point types (Meier, 1990). Indeed, the grammatical status of index finger pointing signs remains an open question (Cormier, 2010, in press; McBurney, 2002). Thus, there is an argument for considering all index finger pointing signs as one sign type not only for certain kinds of comparisons (e.g., above in the overall distribution of PT signs in the corpus) but also because the subcategories of points may themselves deserve reappraisal (Johnston, 2010d). One can see that 25 of the 100 top most frequent signs appear to be function words, 24 appear to be verbs, and 19 nouns. Of the remaining 32, 16 appear to be adjectives or adverbs, 2 are gestures or ‘‘gesture types,’’ and a significant number (13) are depicting signs. The function and word class of many depicting signs are often difficult to determine, especially handling depictions that can be difficult to distinguish consistently from gestures. For this reason, the gesture category sign G(CA): has been included in this initial lexical frequency count so that researchers can gain some appreciation of the frequency with which signers engage in apparently nonlexical constructed action or enactment. These acts would not be appropriately labeled as depicting signs. Finally, 1 of the top 100 signs has the gloss INDECIPHERABLE. It is actually not a type at all: The label is used for signs that were unidentifiable and unclassifiable. Table 6 lists those signs that appear more than 4 times per 1,000 signs in the Auslan Corpus. They are presented with ASL and NZSL signs for comparison, where the data are available. As can be seen, there are

both similarities and differences, neither of which are surprising. First of all, there is considerable overlap on the very high frequency items, especially signs that would be classed as grammatical (PRO1, PRO2/PRO3, POSS2/3, WHY, WHAT), but also including content signs such as SAY and LOOK, likely to be frequent in any language corpus. However, there are differences in specific content signs in this group. This is clearly symptomatic of the participants, and the composition of the corpora (primarily retells, narratives, interviews) and their relatively small size. For instance, the signs SIGN and DEAF are clearly related to the participants and the language they are using; and the signs BOY and WOLF, and FROG and DOG are due to the retelling of an Aesop fable (The Boy Who Cried Wolf) and a picture story (Frog Where Are You?), respectively. The presence of elicited texts in the dataset also impacts on the frequency of depicting signs. For example, DSM/L(BENT2):ANIMATE-MOVES/AT is ranked no. 8 in the Auslan Corpus. It is clearly related to the narratives (Hare and Tortoise, Boy Who Cried Wolf, and Frog Story) found in that corpus. With respect to pointing signs, the data across the corpora are comparable, but there are differences. For example, PT:PRO1 occurs 50.7 times per 1,000 tokens in the Auslan Corpus and 56.4 and 67.2, respectively, in the ASL and NZSL corpora; and PT:LOC occurs 12.5 and 9.1 per 1,000 for Auslan and NZSL respectively. No comparison can be made for PT:DET as neither of the other two corpora identified this sign type separately (the category is not equivalent to the ASL sign glossed THAT). The frequency of all nonpossessive pointing (indexical) signs appears to be lower in Auslan at 111 per 1,000 than in ASL at 135 (Morford & MacFarlane, 2003, p. 221). (There is insufficient data reported for NZSL to make an overall comparison.) This may be explained by the nature of the corpora. The Auslan Corpus dataset as it currently exists has a higher percentage of narrative texts than the other two corpora, and as the data in Table 4 discussed above reveal, there appears to be less use of pointing signs in narratives. Nonetheless, across all three corpora, the most frequent sign or sign type is overwhelmingly the pointing sign, by several orders of magnitude.

Lexical Frequency in SLs 173 Table 5 Rank frequency of the top 100 types in the Auslan Corpus (representing 53% of all tokens)a Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

ID gloss PT:PRO1 G(5-UP):WELL PT:PRO2/PT:PRO3 DEAF1/2 LOOK BOY PT:LOC DSM/L(BENT2):ANIMATE-MOVES/AT HAVE SAME GOOD PT:DET DSM(1/X):ENTITY-MOVES DSS(BC):CYLINDRICAL/CURVED/ WHAT THINK NOTHING NOT DOG1/2 REAL PEOPLE1 ONE WHY-BECAUSE SIGN DSM/L(2/H):ANIMATE-MOVES/AT G(CA): SAY PT:POSS1 WITH1/2 DSM/L(5):MANY-MOVE/AT IN FROG1/2 DSS(1):TRACE WOLF DSS/L(5):MASS/SHAPE-AT WANT SEE YELL1/2 KNOW CAN GO YES1/2 BUT2 PT:POSS2/PT:POSS3 CAT1/2 HEARING2 DISABILITY PT:BUOY TORTOISE BAD SCHOOL DSH(A/S/6):HOLD

n

% Database

Cumulative %

2,837 1,993 1,818 825 805 707 701 592 590 572 556 520 430 423 412 408 407 390 387 381 380 364 353 351 333 326 323 323 315 294 279 269 263 250 248 247 233 230 226 226 224 221 221 207 206 204 191 190 178 170 165 160

5.08 3.57 3.25 1.48 1.44 1.27 1.25 1.06 1.06 1.02 1.00 0.93 0.77 0.76 0.74 0.73 0.73 0.70 0.69 0.68 0.68 0.65 0.63 0.63 0.60 0.58 0.58 0.58 0.56 0.53 0.50 0.48 0.47 0.45 0.44 0.44 0.42 0.41 0.40 0.40 0.40 0.40 0.40 0.37 0.37 0.37 0.34 0.34 0.32 0.30 0.30 0.29

5.08 8.65 11.90 13.38 14.82 16.09 17.34 18.40 19.46 20.48 21.48 22.41 23.18 23.93 24.67 25.40 26.13 26.83 27.52 28.20 28.88 29.54 30.17 30.80 31.39 31.98 32.55 33.13 33.70 34.22 34.72 35.20 35.67 36.12 36.57 37.01 37.42 37.84 38.24 38.65 39.05 39.44 39.84 40.21 40.58 40.94 41.28 41.62 41.94 42.25 42.54 42.83

174

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Table 5 Continued Rank 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 a

ID gloss FS:IF CAN-NOT INDECIPHERABLEb ALL PERHAPS MAN WILL FINISH.6 GIRL1/2 WHERE FEEL DSL(BC):CYLINDRICAL/CURVED-AT BIRD WORK SOME TWO NAME DIFFERENT RABBIT KNOW-NOT SHEEP LITTLE DSL(1):ENTITY-AT FOR1/2/3 SEARCH LADY1 LAUGH DAY CAR1/2 STOP ONLY GO-POINT ARRIVE RIGHT1 OTHER CHILDREN1 PAST CONTINUE ALWAYS1 DSS/L(B):SURFACE ENCOURAGE1 DSM/L(4):MANY-MOVE/AT DSM(B):VEHICLE-MOVE SLEEP SPEECH AREA AGAIN COME

Gray cell 5 assumed to be a function (grammatical) sign based on the ID gloss. A sign that could not be recognized and not a sign that meant ‘‘indecipherable.’’

b

n

% Database

Cumulative %

158 157 151 147 145 142 141 141 138 131 129 129 129 128 127 123 123 123 122 121 118 118 118 117 115 113 112 110 108 101 101 101 101 100 100 100 99 99 99 98 97 97 97 95 94 93 93 92

0.28 0.28 0.27 0.26 0.26 0.25 0.25 0.25 0.25 0.23 0.23 0.23 0.23 0.23 0.23 0.22 0.22 0.22 0.22 0.22 0.21 0.21 0.21 0.21 0.21 0.20 0.20 0.20 0.19 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.18 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.16

43.11 43.39 43.66 43.93 44.19 44.44 44.69 44.95 45.19 45.43 45.66 45.89 46.12 46.35 46.58 46.80 47.02 47.24 47.46 47.67 47.88 48.09 48.31 48.52 48.72 48.92 49.12 49.32 49.51 49.69 49.88 50.06 50.24 50.42 50.60 50.77 50.95 51.13 51.31 51.48 51.66 51.83 52.00 52.17 52.34 52.51 52.67 52.84

Lexical Frequency in SLs 175 Table 6

Rank frequency of types occurring more than four times per 1,000 signs in three corporaa

Auslan (n 5 55,859) Rank ID gloss PT:PRO1 1 G(5-UP):WELL 2 PT:PRO2/PT:PRO3 3 DEAF1/2 4 LOOK 5 BOY 6 PT:LOC 7 8 DSM/L(BENT2):ANIMATEMOVES/AT HAVE 9 10 SAME GOOD 11 PT:DET 12 13 DSM(1/X):ENTITY-MOVES 14 DSS(BC):CYLINDRICAL/CURVED/ CIRCULAR WHAT 15 16 THINK NOTHING 17 NOT 18 19 DOG1/2 REAL 20 21 PEOPLE1 22 ONE WHY-BECAUSE 23 SIGN 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

ASL (n 5 4,111) Per 1,000 ID gloss PT:non-PRO1 50.8 PT:PRO1 35.7 PT:POSS1 32.5

NZSL (n 5 100,000)

14.8 14.4 12.7 12.5 10.6

NAME SIGN IN THEN PT:non-POSS1

Per 1,000 ID gloss PT:PRO1 79.3 PT:PRO3 56.4 GOOD 14.8 DEAF 13.4 PT:PRO2 11.7 POSS1 11.2 PT:LOC 10.2 ONE 10.0

10.6 10.2 10.0 9.3 7.7 7.6

SCHOOL THAT OH-I-SEE BOY HAVE WELL1 (G(5-UP):WELL

9.2 9.2 8.5 8.0 8.0 7.5

DEAF NOT TELL(SAY) TO GO LOOK WITH BUT/DIFFERENT FATHER GROW-UP NO/NONE(NOTHING)

7.1 7.1 7.1 7.1 6.8 6.3 6.3 6.1 6.1 6.1 5.4 5.1 5.1 4.9 4.9 4.6 4.6 4.4 4.4 4.4 4.4 4.1 4.1

7.4 7.3 7.3 7.0 6.9 6.8 6.8 6.5 6.3 6.3 DSM/L(2/H):ANIMATE-MOVES/AT 6.0 G(CA): 5.8 SAY 5.8 PT:POSS1 5.8 WITH1/2 5.6 DSM/L(5):MANY-MOVE/AT 5.3 IN 5.0 FROG1/2 4.8 DSS(1):TRACE 4.7 WOLF 4.5 DSS/L(5):MASS/SHAPE-AT 4.4 WANT 4.4 SEE 4.2 YELL1/2 4.1 KNOW 4.0 CAN 4.0 GO 4.0 YES1/2 4.0 BUT2 4.0

FOR YES FINISH WHAT DOG WHERE RIGHT(REAL) SAY WHY YEAR BANANA KNOW-NOT

SAME SCHOOL YES SIGN

Per 1,000 67.2 36.4 14.6 14.1 11.5 10.9 9.1 6.7 6.6 6.5 6.4 6.2

ASL 5 American Sign Language; NZSL 5 New Zealand Sign Language. a The gray cells in the ASL and NZSL lists also occur in the Auslan list (also shown in gray).

Considering the data presented in Tables 5 and 6 together, the numbers of depicting signs in the Auslan data are noteworthy. No depicting signs at all occur

within the top 37 signs in ASL (more detailed frequency data beyond this rank was not published), and only one depicting sign occurs within the first 350

176

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

ranked signs in NZSL (no. 303 ‘‘PCL-PEOPLE-FLOCK,’’ equivalent to the Auslan no. 30 DSM/L(5):MANY-MOVE/ AT) compared with at least 16 depicting signs in Auslan. It seems unlikely that text type alone can explain this discrepancy. (However, as has been noted, the data presented in Table 4 do show a higher incidence of depicting signs in narratives and there are more narratives in the Auslan Corpus.) Glossing and annotation practices, based on assumptions on what constitutes a lexical item, undoubtedly also play a part (see Discussion section). Distribution of Types in the Corpus The sorting of the number of tokens of different sign types gives us a rank frequency, as we have seen. However, linguists are not just interested in which types are more frequent relative to other types—the distribution of each type (i.e., how many tokens of each type occur as a percentage of all other signs in the corpus) is also important. This information is contained in the ‘‘% db’’ and ‘‘cumulative %’’ columns in Table 5. Table 7 presents summary data on the distribution of all signs in the Auslan Corpus and compares them to the NZSL and ASL corpora as well as to the distribution of words in the British National Corpus (BNC). The Auslan data are presented twice: the first column listing all different signs (unique glosses) in the corpus, including partly lexical and nonlexical signs; the second column listing all different fully lexical signs (i.e., unique ID glosses). In the second column, the measure of the percentage of different tokens in the corpus is that of fully lexical signs with respect to other fully lexical signs. The data from the BNC are also presented in several ways in order to highlight similarities and differences. First, spoken English data are presented because the most appropriate form of English to compare with Auslan is spoken conversational English: Like all SLs, Auslan is a quintessentially face-to-face language. Data from the entire BNC, which consists of mostly written data with some spoken data, are also provided for further comparison. Finally, the BNC data are presented in lemmatized and unlemmatized forms. (Grouping all the inflected forms of a word, e.g., talk, talks, talked, as instances of a single word or lemma, e.g., TALK, is called lemmatization. This process enables one to treat all the instances of a word, irrespective of inflection, as a single item. The lemma

is used as the headword in a dictionary.) As we shall see, both forms of the lexical data are revealing. One can see from Table 7 that in all languages, only very few signs account for a significant proportion of each corpus. For example, across the datasets one finds that only one or two types account for about 5% of all tokens in each respective corpus. Looking at a higher percentage threshold, we see that only 6–15 types in each language account for about 20% of all tokens. The differences are minor at this level and depend on the language, whether the data have been lemmatized or not, or if the type counts listed represent only fully lexicalized signs or include partly lexical and nonlexical signs. That a small number of high-frequency types should account for a significant proportion of all of the words or signs in a corpus is a common pattern cross-linguistically and is unremarkable (Conrad, 2005; Nation, 2001). However, a point of real divergence between SLs and SpLs appears to emerge when the number of tokens accounting for 80% or 90% of each corpus is reached. In a written corpus of English, the number of different written English words begins to increase exponentially after the 80% level. In a spoken corpus of English (much more comparable with an SL), this explosion begins after the 90% level. In the SLs, a rapid increase in the number of types only occurs after the 95% level, and this growth is still not exponential, it merely represents a doubling of types. This difference can only partly be explained by corpus size (the BNC has 100 million tokens, 90% from written texts, 10% from spoken texts). Another explanation lies in the well-known fact, supported by lexicographical research in Auslan (and other SLs) over several decades (e.g., Johnston, 2003b), that SL lexicons are relatively modest in size when compared with SpLs, both those with and without writing (Johnston & Schembri, 1999). That is, the total number of fully lexical signs in Auslan appears to be comparatively small. At last count (March 2011), there were 3,733 unique fully lexical signs in the Auslan lexical database. Given this relatively small total number, one may thus expect the type counts in an SL to plateau relatively early. Ironically, there also appears to be ‘‘too many’’ uniquely identified sign types for the SLs when we look at all ‘‘unique’’ types for all Auslan signs (left

Lexical Frequency in SLs 177 Table 7

The distribution of types in the Auslan Corpus (cf. ASL, NZSL, and English) No. of different signs or words in the corpusa

% Tokens in corpus

Auslan (all)

Auslan (fully lexical)

5 10 15 20

1 2 5 10

2 6 11 15

25 30 40 50 60 70 80

15 23 44 85 155 280 574

21 29 54 92 150 237 396

85 90 95

908 1,615 3,378

535 764 1,216

100 Of which hapax

6,171 3,606

2,578 933

ASLb

NZSLb

Spoken English (lemmatized)

All English (lemmatized)

Spoken English (unlemmatized)

All English (unlemmatized)

11

1 2 4 6

1 2 4 6

2 3 5 8

1 3 5 9

30 64 116 194 343 665

9 12 22 39 72 166 476

10 17 46 131 329 1,012 2,997

12 17 31 54 94 191 468

15 25 61 176 547 1,538 4,163

829

6,318

868 1,893 6,001

7,313 14,490 39,159

54,532

938,871

1 2 9 15 22

7,222 3,599

a

Data extracted from British National Corpus frequency lists http://ucrel.lancs.ac.uk/bncfreq/flists and http://www.kilgarriff.co.uk/bnc-readme. The lemmatized files only have frequency data for types covering the top 85% of tokens in the British National Corpus. b Data adapted from Morford and MacFarlane (2003) and McKee and Kennedy (2006).

most column in Table 7) and for NZSL signs. Both sign counts at the 100% level conflict with the sizes of dictionaries of these languages, which claim to be comprehensive, that is, there appears to be 6,171 unique types in the Auslan Corpus and 7,222 unique types in the NZSL corpus, which are numbers that far exceed the entries in their respective dictionaries. The NZSL dictionary, for example, has 4,000 entries. As reported in Johnston (2001), no SL dictionary, including those of Auslan or NZSL, has more than several thousand unique sign entries (rarely more than 4,000, including regional variants). A decade later, this still holds true. Even giving lexical status to phonological variants that differed most dramatically—maybe in more than one way—from their assumed base form, for example, using another handshape known to be phonemic in other environments as well as having, say, another movement or location value, would barely double the number of unique sign entries in most databases. The Auslan lexical database, for example, would have approximately 7,000 unique entries if one gave lexical status to the large number of common variants that

have been recorded. McKee and Kennedy (2006) make a point of reporting that only 15 lexicalized signs in the top 500 ranked signs in the NZSL corpus were not already in the dictionary of NZSL and have thus since been added. (By way of comparison, it is estimated that approximately only 50 new entries for fully lexical signs have needed to be created in Auslan Signbank as a result of 4 years of corpus annotation.) Yet McKee and Kennedy make no mention of the balance of 3,207 other hitherto unrecorded signs that are listed as the total of unique types within their study corpus. With respect to the Auslan data, the reason for this apparent mismatch is obvious. There are only 2,578 unique fully lexical types in the Auslan Corpus (the second Auslan column in Table 7). This is a figure lower than the current type count in the Auslan lexical database of 3,733. Of course, it is to be expected that a type count from any linguistic corpus will be less than the known lexicon because except in the relatively few instances in which a corpus will throw up neologisms or genuine conventional signs that have simply never been recorded in a dictionary of a language, a corpus should

178

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

not have more types (conventional signs) than the known lexicon. So, of the apparently 6,171 unique types in the Auslan Corpus, approximately 3,593 types must be partly lexical and nonlexical signs, particularly depicting signs. Of course, it is not at all surprising that a small corpus would not include tokens of all the words or signs of a language. Indeed, even very large corpora do not have tokens of all the lemmas of a language. What is notable, however, is that out of a corpus of Auslan of only 55,859 tokens—a rather small language sample—that 2,578 of 3,733 types or about 70% of the known conventional lexical units of the language occur at least once. Rank Frequency of Fully Lexical Signs Given these observations and caveats, we can now turn to the results for the frequency of the fully lexical signs in an SL. These are the type of signs closest to what is usually thought of as the lexicon of a language. From the previous discussion, it should be evident that these signs only represent a subset of all the signs in an SL corpus and cannot, by themselves, give a full picture of sign frequency. Nonetheless, these data are important—they count and rank the major citable conventional signs of an SL. From Table 8, one can see that there appear to be 24 function signs, 27 verbs, and 28 nouns in the top 100 signs in Auslan. The remaining 21 signs are adverbs or adjectives, save the single indecipherable category. According to the spoken portion of the BNC, the top 100 words in English contain 56 function words, 18 verbs, 5 nouns, 17 adverbs or adjectives, and 4 minor categories. In both the frequency lists I have presented thus far (Tables 5 and 8), there is a much lower presence of function signs in the high-frequency ranks of Auslan (and NZSL and ASL) compared with English and other SpLs and, of the content signs, considerably more nouns in the SLs than in English. Rank Frequency of Fully Lexical Signs by Grammatical Class The ID glosses in the Auslan Corpus are used to identify sign types rather than to serve as the basis of independent or standalone written transcriptions of the source signed text. They are annotations appended to media files. Of course, it is only natural that the most common meaning and use of a sign motivate the choice

of the English word used in the assignment of ID glosses to sign forms. Thus, in many sign tokens of a given ID gloss, the wording may reflect its probable meaning and use (grammatical class). However, ID glosses essentially identify lemmas and, because many of the signs of Auslan (and many other SLs) can function in more than one grammatical role, the grammatical class of a sign cannot be transparently inferred from the ID gloss. (This is why in our earlier discussion of the data I used the qualification appear to be verbs based on the ID gloss.) Each token thus needs to be separately tagged for grammatical class for more detailed and accurate linguistic analysis. Indeed, the very determination of grammatical class is itself the product of linguistic analysis: it is not simply a prerequisite and it is certainly not a given. Determining grammatical class is not a simple or straightforward procedure (see Method section). Not only is the grammatical class of some kinds of signs, like pointing and depicting signs, still open to question, the range and type of grammatical sign classes found in Auslan have yet to be rigorously investigated. This is also true of all other SLs (Schwager & Zeshan, 2008). Establishing empirically the type and number of grammatical classes in Auslan and the way this is manifested in the morphosyntax of the language is actually one of the central reasons for the creation of the Auslan Corpus: to make accountable and empirically ground assumptions of grammatical class (and hence the linguistic analyses that flow from them). By way of comparison, the glossing practices followed in the ASL and NZSL corpus transcriptions were described in only the most general of terms in the respective studies and it is thus impossible to know if unique glosses for apparently fully lexical signs in these corpora actually do represent formationally distinct sign types—thus deserving of separate lexical counts—or if they better described as contextual translation glosses of formationally identical signs—thus not necessarily appropriately counted as separate lexical items. In other words, because glosses are assigned contextually, the English word used in the gloss often acts as a surrogate marker of sign grammatical class. However, the glossing is ad hoc so it is not a reliable guide to grammatical class. Without an explicit act of categorization, one should make few assumptions.

Lexical Frequency in SLs 179 Table 8 Rank frequency of the top 100 fully lexical types in the Auslan Corpusa Rank ID gloss 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

DEAF1/2 LOOK BOY HAVE SAME GOOD WHAT THINK NOTHING NOT DOG1/2 REAL PEOPLE1 ONE WHY-BECAUSE SIGN SAY WITH1/2 IN FROG1/2 WOLF WANT SEE YELL1/2 KNOW CAN GO YES1/2 BUT2 CAT1/2 HEARING2 DISABILITY TORTOISE BAD SCHOOL FS:IF CAN-NOT INDECIPHERABLEb ALL PERHAPS MAN WILL FINISH.6 GIRL1/2 WHERE FEEL BIRD WORK SOME TWO NAME

% Lexical Cumulative % 2.24 2.18 1.92 1.60 1.55 1.51 1.12 1.11 1.10 1.06 1.05 1.03 1.03 0.99 0.96 0.95 0.88 0.85 0.76 0.73 0.68 0.67 0.63 0.62 0.61 0.61 0.61 0.60 0.60 0.56 0.55 0.52 0.48 0.46 0.45 0.43 0.43 0.41 0.40 0.39 0.38 0.38 0.38 0.37 0.35 0.35 0.35 0.35 0.34 0.33 0.33

2.24 4.42 6.33 7.93 9.48 10.99 12.10 13.21 14.31 15.37 16.42 17.45 18.48 19.47 20.42 21.37 22.25 23.10 23.86 24.59 25.26 25.93 26.56 27.19 27.80 28.41 29.02 29.62 30.22 30.78 31.33 31.85 32.33 32.79 33.24 33.66 34.09 34.50 34.90 35.29 35.67 36.06 36.44 36.81 37.17 37.52 37.87 38.21 38.56 38.89 39.22

Table 8 Continued Rank ID gloss 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 a

DIFFERENT RABBIT KNOW-NOT SHEEP LITTLE FOR1/2/3 SEARCH LADY1 LAUGH DAY CAR1/2 STOP ONLY GO-POINT ARRIVE RIGHT1 OTHER CHILDREN1 PAST CONTINUE ALWAYS1 ENCOURAGE1 SLEEP SPEECH AREA AGAIN COME OLD NEVER BABY GROUP TALK HAVE-NOT FS:SO FS:TO WINDOW FRIEND WHEN COCHLEARIMPLANT REPEAT ON COINCIDENCE HEAR DEER TREE1 BOWLING NIGHT1 MORNING WRONG

% Lexical Cumulative % 0.33 0.33 0.33 0.32 0.32 0.32 0.31 0.31 0.30 0.30 0.29 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.26 0.26 0.25 0.25 0.25 0.25 0.24 0.24 0.24 0.24 0.24 0.23 0.23 0.23 0.23 0.23 0.22 0.22

39.56 39.89 40.22 40.54 40.86 41.17 41.48 41.79 42.09 42.39 42.68 42.96 43.23 43.51 43.78 44.05 44.32 44.59 44.86 45.13 45.40 45.66 45.92 46.17 46.42 46.68 46.92 47.17 47.41 47.66 47.89 48.13 48.36 48.60 48.83 49.05 49.28 49.51 49.73

0.22 0.22 0.22 0.21 0.21 0.21 0.21 0.21 0.21 0.20

49.95 50.17 50.39 50.60 50.81 51.02 51.23 51.44 51.64 51.85

Gray cells 5 function signs. A sign that could not be recognized and not a sign that meant ‘‘indecipherable.’’ b

180

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Part of speech tagging has only been completed for a subsection of the Auslan Corpus, approximately 9,000 tokens. Nonetheless, though the data only come from a subset of the corpus, it is still possible to use these data to compare the ranking of ID glosses without tagging for grammatical class with those that have been so identified. These data are presented in Table 9. The first observation to make is the degree of overlap: Only two thirds of the top 100 ID glosses sorted by grammatical class also occur in the uncategorized list. There are two explanations for this difference: one unremarkable, that is, text type, and one that goes to the heart of the nature of the lexicon in SLs, that is, sign multifunctionality. With respect to the first, as is to be expected, source data and text type are clearly having an effect here because the categorization of lemmas by grammatical class has only been done for a subset of the corpus. Topic-related signs such as BOY, SHEEP, TORTOISE, SHOUT, GRAZE, and so on vary in each list depending on the number of texts in each reference set that are Aesop’s fables, spontaneous narratives, or interviews. This will change their relative frequencies in an unsurprising way. More importantly, there is the question of multifunctionality. It is common for a given Auslan sign to perform a different grammatical function in different environments without any apparent change in form—for example, as noun, verb, and adjective. (It should be noted, however, that not all signs have this potential.) The tagging for grammatical class within the corpus at each token will, therefore, vary accordingly. The various different multiple functions of many signs in Auslan will clearly affect rankings after each token is subcategorized. For example, the tagging of DEAF sometimes as an adjective and sometimes as a noun, depending on function in context, has the effect of creating two distinct entries to be ranked. This thus drops the ranking of DEAF from no. 1 (without grammatical class subcategorization) to no. 10 (as an adjective) and no. 29 (as a noun). A similar pattern can be observed with GOOD, which drops from no. 6 to no. 45 and no. 88, respectively. Indeed, some high-frequency signs that appear to be straightforward function signs (based on the English gloss) disappear from the top 100 list when tokens are subcategorized by grammatical class, for example, WITH1/2 (which can also be used as a verb ‘‘go with,’’ not only as a preposition ‘‘with’’) or

BUT2

(which can also function as an interjection ‘‘hold on,’’ not only as a conjunction ‘‘but’’). There is no such effect with a set of very high frequency signs that appear equally high in both lists. This is because these signs (LOOK, HAVE, SAY, THINK, and WANT) tend to function primarily or exclusively as verbs. Some form/usage pairs are actually promoted up the frequency rank scale in this process because signs that have multiple and hence less frequent functions have been demoted in the rank scale, vacating the higher ranking slots. A case in point is the sign FINISH.6, originally ranked at no. 43. Its frequency of use, even when subcategorized into two of its major functions (as a verb and as an auxiliary), is sufficient for it to be promoted to no. 24 and no. 28, respectively. Indeed, the example of FINISH.6 is worth looking at in some more detail. When all the uses of this sign and another semantically related sign FINISH.5 are considered across the corpus, the value of subcategorizing by grammatical class is clearly seen (Table 10). The token ratio of the two independent signs for ‘‘finish’’ in the corpus is 2:1 in the favor of FINISH.6, and each type has a different pattern of use, for example, 38.6% of FINISH.6 tokens are used as full verbs compared with only 13.6% of FINISH.5 tokens. If we were to take a notional representative sample of 100 tokens of ‘‘finish’’ signs based on this distribution, 32 would be verbs, of which 27 would be FINISH.6 and only 5 would be FINISH.5. Considering these and other distributional facts, the pattern of usage suggests that what could easily have been thought of as two dialectal or regional variants of a simple verb, each perhaps with its own phonological subvariants, may actually represent a much more complex situation of variation and change (grammaticalization; Johnston, Cresdee, & Schembri, 2011). One can therefore see that both sorts of measures of lexical frequency—type by ID gloss and then subcategorized by grammatical class—are relevant for linguistic description and analysis.

Discussion The basic lexical frequency statistics presented above suggest that Auslan, like ASL and NZSL, is lexically dense. As Morford and MacFarlane (2003) and McKee dand Kennedy (2006) conclude, SLs appear to have many

Lexical Frequency in SLs 181 Table 9 The frequency of fully lexical types in the Auslan Corpus distinguished by grammatical classa

a

Gray cells = signs that appear in both lists; dashed-boxed cells = signs for comparison.

182

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Table 10

Percentage distribution of two sign types in the semantic area ‘‘finish’’ by grammatical class

Grammatical class Verb Auxiliary Interjection Discourse marker Noun Conjunction Adjective Adverb Ratio of tokens

FINISH.6 (%)

FINISH.5 (%)

38.6 30.8 9.2 7.7 6.9 3.0 3.0 0.8

45.8 13.6 13.6 10.1 10.1 5.1 1.7 0.0 2

fewer function signs overall and that the function signs that do exist account for an overall smaller proportion of signs in any corpora of these languages. There is certainly also support in the Auslan Corpus data for this general observation. There is a plausible and likely explanation for the small number of fully lexical function signs: It is an observation of long-standing by SL linguistics that SLs exploit simultaneous nonmanual and spatial modifications to convey meanings usually encoded in SpLs by separate and sequential morphemes (affixes) and words (function words). Morford and MacFarlane (2003), however, do qualify their observation of lexical density: ‘‘It is important to investigate whether the mixture of lexical and grammatical signs among the most frequent signs in this database is related to database size or is a more general characteristic of ASL’’. Data from two much larger corpora of SLs, NZSL and now Auslan, suggests that the low token frequency of grammatical signs in SLs appears unrelated to corpus size but also that a true measure of lexical frequency in SLs will only emerge when two issues are addressed. First, all corpora, such as they exist, need to be expanded to include a wider sample of registers and genres, in particular, free unplanned conversational data. This will give us a much better picture of the core lexicon.

:

Grammatical class Auxiliary Adverb Verb Conjunction Discourse marker Interjection Noun Adjective

1

Second, the treatment of partly lexical signs (pointing signs and depicting signs) and nonlexical signs (gestures) needs to be made as consistent and systematic as possible both within and across corpora to facilitate quantification and comparison. Continued annotation of archived data already collected in some corpus projects will go some way to addressing the first point, as in the Auslan Corpus, but further data collection is also needed in most cases in order to broaden the sample base to make the corpora truly representative. With respect to partly lexical and nonlexical signs, both glossing practices and categorization need to be reconsidered as this has a direct impact on what is counted and then used as a measure of comparison. The data on sign distribution presented in Table 7 could be reevaluated in order to better draw out the similarities and differences between the SL data and the SpL data. I now revisit these signs with this in mind. Pointing Signs Revisited Most SpLs have a large number of deictic words that are fully specified phonologically (e.g., he, she, it; here, there; this, that, the, etc.) and can thus, as a class, constitute a large number of tokens and a number of distinct types, that is, formationally distinct deictic (indexical) lexical words in the grammatical classes

Lexical Frequency in SLs 183

‘‘pronoun,’’ ‘‘locatives,’’ ‘‘determiners’’ or ‘‘demonstratives,’’ and ‘‘possessives’’ (Enfield, 2009). In contrast, deictic (indexical) pointing signs in SLs are unspecified phonologically: They are primarily indexic with ‘‘prototypical’’ handshapes rather than categorical handshapes (except for possessives in many SLs). Thus, though the information in Table 7 correctly identifies the frequency of the multiple deictic words along with all the other lexical words in an SpL like English, it does obscure the fact that an SL like Auslan has a single deictic form, the point, that has functions carried out by multiple deictic words in English. If we group the deictic words of English into two broad categories to parallel the SL forms, we can see how Auslan and English behave in terms of fully specified, fully lexical, ‘‘context-independent’’ signs, even if ultimately they have very differently sized conventional lexicons (i.e., numbers of fully lexicalized signs). Table 11, for example, re-presents and recalculates the data in Table 7. The Auslan deictic points have been conflated into two groups: (a) the prototypical pointing signs that use the index finger (PT:INDEX) and the possessive pointing signs that use the flat hand or the fist hand (PT:POSS). The English deictic words are conflated into two categories that map onto these Auslan categories, that is, the deictic words that would be realized by an index pointing sign in Auslan are conflated into the single pointing category, and those that would be realized by a possessive pointing sign in Auslan are conflated into a single possessive category. In this way, the highly context-dependent deictic words and signs in both languages are similarly identified. With each of these two superordinate categories now considered as a type, we then recalculate the percentage of signs within the corpus represented by a given number of types, descending from the most frequent to the hapax legomena. In Auslan, PT:INDEX is now first ranked as the most frequent sign (over 11% of all tokens), and PT:POSS is the 10th ranked (approximately 1.0% of all tokens); in English, the deictic point equivalent is similarly first ranked (but representing over 20% of all tokens) and the possessive is ninth ranked (approximately 1.2% of all tokens). In Table 11, one can see the effect of this: The single large superordinate category accounts for at

least 20% of the tokens that English speakers use. All these tokens are primarily concerned with various forms of deixis. Auslan actually has fewer signs with this primary type of function—just two, the index point and the possessive point—and this is reflected in the larger number of sign types accounting for 20–80% of the corpus in Auslan than in the equivalent bands in English, for example, Auslan has 18 sign types accounting for 30% of the corpus, whereas English has only 4 when looked at from this perspective. Depicting Signs Revisited Depicting signs are important in considering lexical frequency. Depicting signs are usually only specified for handshape and, to a lesser extent, orientation and this is manifested in the type characteristics of these signs. Just about every other formal feature of these signs depends on the context-specific conceptualization by the signer of the particular event or entity represented in the depiction. These features are expressed in the token usage event characteristics of these signs. Glossing practices should not obscure the dual type-token characteristics of depicting signs as this could seriously inflate the numbers of hapax legomena and lead to a misleading ranking of sign types. Given that annotators have some freedom in how much detail is given in each depicting sign annotations, many of them appear as unique tokens in the corpus because they do not exactly match each other. However, inspection of the actual video clip of suspected similar depictions often reveals that apparently different glosses essentially describe identical representations in form and meaning, that is, they depict the same kind of event when it is understood, and glossed, sufficiently abstractly. For example, DSM(1VERT):UPRIGHT-THING-MOVES-ERRATICALLY-LEFT is simply a very ‘‘abstract’’ gloss for DSM(1-VERT):YACHT-PASSMAST-SWAYING or for DSM(1-VERT):DRUNKARD-STAGGERPAST. The former annotation could be substituted for the other two; it could be even further simplified or regularized to DSM(1-VERT):ENTITY-MOVES. In this way, the number of low-frequency or hapax legomena is inflated by ‘‘overspecific’’ glosses for many depicting signs. Many could be assigned to a more broader typelike category.

184

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Table 11 Adjusted distribution of types in Auslan and English No. of different types in the corpus (deictics conflated in two type categories) % Tokens in the corpus

Auslan

Spoken English (lemmatized)

5 10 20 25 30 40 50 60 70 80

1 1 6 11 18 37 76 145 270 562

1 1 1 2 4 11 26 56 145 453

90 95 100

1,601 3,361 6,171

,1,893 (unlemmatized) ,6,001 (unlemmatized) ,54,532 (unlemmatized)

This is exactly what happens in one aspect of corpus management, which I refer to as the regularization of depicting sign glosses. During regularization, the glosses for the forms that reoccur most are made simpler and more uniform so that their frequency ‘‘as a type’’ can be ranked. Of course, the annotation of the Auslan Corpus is still in its early years and regular ‘‘type-like’’ patterns of use of depicting signs, based on actual usage rather than simply user intuitions, are only now starting to emerge from the data (cf. Engberg-Pedersen, 2010). Due to time and resource constraints, the data presented in Table 7 have been based on a partial regularization of the 20 most frequent types of depictions found in the corpus, such as DSM/L(BENT2):ANIMATEMOVES/AT or DSM(1/X):ENTITY-MOVES. Some of these regularized depictions occur hundreds of times in the corpus, as I have reported (see Table 5). Of those that are as yet unregularized, approximately 950 are depictions that already appear between 2 and 16 times in the corpus, and 1,526 appear as hapax legomena. If regularized, they would together represent no more than a few hundred type-like depictions many of which would represent many tokens. The apparent 6,171 separate types in the corpus are thus likely to be reduced considerably if this is taken into consideration.

Regularizing depicting sign annotations in this way does not, of course, make any difference to the overall count of depicting signs in the corpus. Naturally, until the regularization is completed, the real number of unique glosses (and thus tokens of rarely occurring or single-occurrence ‘‘types’’) at the 90–95% level of corpus sampling will remain inflated. The phenomenon of depicting signs needs to be taken into account in any discussion of SL data. For example, it is misleading to equate each token of these depicting signs with a lexical item within the context of calculating lexical frequency. They are not instances of ‘‘types’’ in exactly the same way as fully lexical signs are. Nonetheless, they should not be ignored in profiling the lexicon and corpus. Indeed, the indexic nature of depicting signs may partly explain why Auslan uses comparatively few primarily overtly deictic signs. It is beyond the scope of this article to discuss this issue at length here, but within a broader semiotic framework depicting signs can be seen to be a type of symbolic indexical sign in just the same way that gestural deictics and SpL deictic words are symbolic indexical signs (Enfield, 2009). Indeed, in both SLs and SpLs the frequency of symbolic indexical signs, as a semiotic type, may actually be comparable (Johnston, 2010a). Gestures Revisited Precisely the same sort of issues regarding gloss annotations of gestures will also impact on our account of sign distribution and frequency. Gestures are identified and glossed in both the Auslan and the NZSL corpora. The former uses the prefix G for all gestures, manual or nonmanual, with finer distinctions made elsewhere in the annotation; the latter uses the codes MIME and NMS (for nonmanual sign), respectively. However, some signs are treated as gestures in one corpus and as conventional signs in the other (e.g., compare Auslan G(5-DOWN):PHOOEY with NZSL PHOOEY or Auslan G(5-UP):WELL or NZSL WELL). On this basis of this alone, fewer signs in the NZSL corpus will be considered gestures than in the Auslan Corpus. More importantly, the vast majority of gestural acts, in either gloss or annotation system, will appear as unique sign tokens because the specification of the meaning of the gesture (i.e., that which is appended to the G or NMS or MIME in each system) will vary

Lexical Frequency in SLs 185

enormously from token to token for exactly the same reason as occurs in depicting signs—the ‘‘precise’’ meaning is context dependent. (Of course, as we have seen above, the entire set of gestures can nonetheless be extracted as a ‘‘type’’ for quantification of their overall occurrence and distribution.) In the Auslan Corpus, many gestural act annotations can be regularized in much the same way as are those for depicting signs. However, because they are nonconventional signs, all except the most common forms, which are used in very much the same way (e.g., G(5-UP):WELL), cannot be usefully or easily aggregated. The few that are most type-like are, naturally, the very forms about which there will be uncertainty as to their degree of conventionalization (e.g., ‘‘phooey’’): Are they essentially gestures, or are they fully lexical signs? Thirteen gesture types account for approximately 2,637 tokens of gestures that occur more than 11 times each, 195 gesture types account for approximately 603 tokens of gestures that occur between 10 and 2 times, and 504 other gesture types are hapax. In other words, a considerable number of these would be expected to be conflated into a much smaller set of recurring gesture types, even though—almost by definition—many more will remain as unique (and thus hapax) than the more conventionalized depicting signs. Fingerspelling There are a number of fingerspelling routines that should be considered to be fully lexical signs in Auslan (Johnston & Schembri, 2007) and other SLs (Cormier, Schembri, & Tyrone, 2008; Padden & Gunsauls, 2003). These forms are identified in the annotation conventions and are included in the count of fully lexical signs. To be precise, in the Auslan Corpus, there are approximately 933 fingerspelling types, the vast majority of which (approximately 860) were not considered to be lexicalized fingerspellings. It was this number of nonlexicalized fingerspellings that were subtracted from the count of unique fully lexical signs presented in Tables 3 and 7. The Lexical Gap Trap The corpus data I have presented confirm that a ‘‘lexicon gap’’ between Auslan and English (and, I would suggest, SLs and SpLs generally) is real. However, the observation that Auslan has a small conventional lex-

icon should not be misinterpreted or taken as a value judgment. First of all, as already noted above with respect to the number of unique sign entries in SLs dictionaries published to date, Auslan does not appear to be special in this regard compared with other SLs. Second, whatever it is that signers are using in meaning production, their SLs obviously ‘‘do the job’’ that all languages are asked to do in face-to-face interaction. There is no expressive ‘‘limit’’ in this regard whatsoever for users of an SL. Third, in cultures with writing, especially those using an alphabet or a even a syllabary, and with a long history of deaf education, a degree of familiarity with the written majority SpL is almost universal in the deaf community, and expressed through fingerspelling. Individual degrees of bilingualism will determine how much of the majority language lexicon is available and used by any signer and the extent to which it forms part of one single mental lexicon for any individual. Lexically, in terms of conventional sign forms for content words, it may be inaccurate to think of two clearly separate languages and the presence of fingerspelling may actually be a false demarcation marker between the two, an assumption all too readily made by non-deaf people (Padden & Gunsauls, 2003). Signers may experience the two languages, at least in terms of content signs/words, as not quite so categorically separated as external observers assume (Johnston, 2002). In other words, observing a limit on the number of fully lexical signs (conventional lexical signs) may say less about the full range of form/meaning pairings available to users of the language than imagined because a significant percentage of the English lexicon is available to, and used by, many signers. Lexicon size brings us to the final point of potential misinterpretation. SLs are unwritten face-to-face languages. As we have seen, lexical frequency and sign type counts of SpLs in their face-to-face mode are comparable, for example, in the BNC of spoken English (recall Table 7), 85% of the corpus is made up of just 868 types. This is actually fewer than Auslan, which has 908 types at the 85% level. English speakers and Auslan signers use 6,001 and 3,378 types, respectively, to produce almost 95% of the tokens in their everyday language output. This is not unlike the range estimates for the numbers of words (types) students of a second language need to learn in order to be familiar

186

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

with around 90–95% of words (tokens) produced in any kind of text, spoken or written, in another language (Nation, 2001). In other words, the huge discrepancy in conventional lexicons occurs in only the 5% of tokens that are swollen with tens of thousands of English word types that occur relatively rarely (the BNC spoken data have 54,532 types, the Auslan data less than one tenth of these). That is, for most of the time, speakers get by with a lexicon whose size is of the same order of magnitude as that of signers. Nonetheless, the observation that there is a relatively small number conventional citable listable fully lexical signs in Auslan and apparently other known sign languages still holds true. Comparison of Auslan and other SLs with SpLs in communities without writing may reduce these differences somewhat, but it would not make it disappear. Literacy, a written literature, institutional writing-based education, and a technological culture does have the effect of expanding the lexicon of a language even in nontechnical spoken genres—after emerging first in specialist or technical ones—and thus not just in written texts. However, even non-literature small-scale traditional societies have huge inventories of lexemes that name things in the natural and physical world, often running into many thousand or tens of thousand of terms. These nomenclatures form large, if not vast, hierarchies. Basic-level ethnobiological terms rarely number anything less than 1,000–1,200 even in preliterate societies. Although some of these languages appear to have a very small established lexicon in some semantic areas or in some word classes (e.g., commonly used basic verbs in some Papuan languages can number as few as 25 simple root forms), many of these have a large store of ethnobiological terms (Foley, 1986). Familiarity and Frequency As mentioned above, previous research has attempted to use familiarity ratings instead of or as a surrogate for measures of lexical frequency in SLs. A comparison of the BSL norms for sign familiarity (Vinson et al., 2008) with the frequency ranking of fully lexical signs in the Auslan Corpus shows only a partial overlap. Of the 300 signs used in the norming study, approximately half occur in the Auslan Corpus even though almost all are recorded in the Auslan

lexical database or have and close equivalent therein. The two lexicons align only approximately because, although BSL and Auslan are closely related languages (Johnston, 2003a; McKee & Kennedy, 2000; Woll, 1987), they are not identical and the glossing of the BSL test items has not been made with reference to a lexical database. Nonetheless, downloadable resources published with the article made it possible to view the test items and to modify the glosses where necessary so as to accommodate different glossing decisions and/or minor variant forms. It was unsurprising that half the test items did not occur in the Auslan Corpus. Some do not appear because they were chosen deliberately in the norming study precisely because they were unfamiliar (e.g., the town name BASINGSTOKE and a regional sign for PEOPLE), others do not appear for the well-known reason that small corpora can be very sensitive to register and genre effects. Though the Auslan Archive has a large amount of conversational data, this has yet to be annotated so it does not yet form part of the Auslan Corpus. When selecting familiar signs to include in a norming study, it is inevitable, and appropriate, that many of these signs will be those found in everyday conversational interaction. Many of these signs are unlikely to appear in the story retellings, interviews, and spontaneous narratives that form the bulk of the annotated corpus, at the present time. Nonetheless, some general observations can be made. Even taking genre and register into account, it appears that familiarity and frequency may not be as closely related in SLs as reported for English (Balota et al., 2001). By way of example, only 26 (8.7%) of the 300 signs in the BSL norming set appear in the top 100 ranked fully lexical signs in Auslan; and only 57 (19%) of the BSL signs appear in the top 300 ranked Auslan signs. Of the 153 BSL norming set signs that appear anywhere in the Auslan Corpus (i.e., irrespective of rank), 127 of these BSL signs actually have a familiarity rating of 5 or higher on the 7-point scale used—the higher the rating, the more familiar the sign. (The researchers report the test items favored signs that were highly familiar.) Of these 127 BSL signs with a high familiarity rating, only 39 (29.9%) appear in the 300 most frequent fully lexical signs in Auslan, and only 18 (14.2%) in the top 100. Alternatively, if we look at the 83 BSL test item signs

Lexical Frequency in SLs 187

that have the highest familiarity ratings (i.e., 6.0 or higher), only 38 (12.5%) are in the top 300 most frequent fully lexical signs in Auslan, and only 14 (4.7%) in the top 100. From these data, it appears that a rating of high familiarity may not strongly reflect the frequency of use or occurrence of a fully lexical sign in Auslan. Perhaps both the actual selection of signs for a familiarity norming study and the very assessment of an item as being ‘‘very familiar’’ by the subjects are also related to degree and patterns of lexicalization in SLs. A subjective measure of familiarity for a phonologically fully specified, fully lexical ‘‘frozen’’ sign may be very high indeed even though it may be a lowfrequency sign, simply because it is very citable and ‘‘memorable’’ (e.g., SCISSORS, ELEPHANT, KANGAROO), perhaps because there are relatively few other candidates contesting for recognition, that is, due to apparently modest lexical inventories. When we look at all signs in a naturalistic stretch of discourse in an SL, that is, not just at the conventionalized citable fully lexical signs but also the depicting signs, the points, and the gestures (i.e., the rank frequency of all signs, not just fully lexical signs), this link of familiarity with frequency is further weakened. However, none of this is to deny that familiarity is, nonetheless, an entirely valid, useful, and independent measure of sign and word status that should be taken independently into account along with iconicity ratings and measures in psycholinguistic research, particularly in assessing the processing of conventional ‘‘lexical’’ signs. Conclusion In this study, I have tried to show that measures of lexical frequency in SLs need to be made in ways that yield useful information for linguists and psycholinguists. Useful and accurate measures depend on (a) the consistent and principled identification of major kinds of signs (e.g., fully lexical, partly lexical, and nonlexical signs), (b) the lemmatization inherent in the practice of ID glossing for lexical signs (type/token matching), (c) the subcategorization of types by grammatical class for detailed corpus-based linguistic research (e.g., grammatical description and understanding language change and grammaticalization), and (d) the regularization of the glossing of depicting signs so as to identify their type-like characteristics while accommodating their specific context-bound realization. In par-

ticular, the clear identification of depicting signs (e.g., the DS prefix) enables one to treat them as a group that can then be counted and their distribution in the corpus calculated, an approach also adopted in the ASL and NZSL corpus studies, but using different codes; and the regularization of annotations glosses wherever possible corrects the error of treating each depiction as a unique sign (tokens that are one-member types). Failure to do so may mean that too many specific glosses for ‘‘classifiers’’ appear as types or hapax legomena, as in the ASL and NZSL frequency lists, the only other studies on lexical frequency in SLs yet conducted. These depicting signs (‘‘classifier’’ signs) therefore do not appear in those languages in the ranks of higher frequency signs, when they indeed should, and this gives us an incorrect picture of what are the most frequent signs in SLs. This has consequences in applied linguistics, for example, in the design of curricula for the teaching of SLs. Two extremes need to be avoided: first, fully lexical signs should not be taught to the exclusion of depicting signs and gestures as the latter constitute a significant proportion of all signs produced in normal interaction; second, depicting and pointing signs should not be taught as if they were equivalent to fully lexical signs—this is likely to distort teachers’ and students’ understanding of, and hence appropriate use of, the lexical resources of the language. Lexical and sign type distribution and frequency data also have descriptive and theoretical implications. The data show that a typical SL text is composed of several different kinds of signs, not just fully lexical signs. SL grammars that do not accommodate typical mixtures and sequences of signs as shown in the data are thus likely to be inadequate or misleading because the descriptions may be skewed toward unrepresentative and idealized sentence types. Overall, I would conclude that the relatively small size of SL corpora and the unique type and token characteristics of points and depicting signs make it difficult to give an unproblematic single measure and ranking of lexical frequency in any SL. Lager more representative corpora created with both intra-corpus and inter-corpus annotation consistency in mind are needed if robust measures of lexical frequency are to be made available to SL researchers and educators.

188

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Notes

Conflicts of Interest

1. The studies are simply too numerous to cite within the limits of this article. An excellent summary of the literature up to the early 2000s can be found in Emmorey (2002). 2. Overall, 14.9% of signs involved fingerspelling; however, of these only 46.4% were fully fingerspelled rather than being one letter ‘‘initializations,’’ which often simply prompt a mouthing. So, the overall fingerspelling rate is probably 6.9%. 3. My preferred terminology for this type of sign is ‘‘depicting sign.’’ I introduce and explain this term in the discussion of lexicalization, below. 4. The numerals 1, 2, and 3 refer to first person, second person, and third person, respectively. Points were categorized on the basis of primary intended function: (1) indexing a discourse participant/referent 5 PRO, (2) indexing a location 5 LOC, and (3) signaling that a referent—which was named by another sign immediately before, after, or simultaneously with this point—was known or given in some way (usually be previous mention in the discourse) 5 DET. 5. Though the NZSL corpus identified depicting signs, of course, using classifier terminology (McKee & Kennedy, 2006, p. 376), the overall distribution of sign types in the corpus was not reported and cannot be calculated based on published data and hence cannot be compared with the Auslan and ASL data. 6. It is interesting to note that approximately 40% of all the gestures in the Auslan Corpus are tokens of one single gesture type: G(5-UP):WELL. 7. Sutton-Spence (1999, p. 370) observes that rates of fingerspelling on a television program (‘‘See Hear!’’) were around 9.8% of all signs and that this was ‘‘higher than would be expected for other registers of BSL.’’ The assumption being that the television program was somewhat formal and prone to English language contact and influence. (The research of Sutton-Spence et al. (1990) actually showed a decline in rates of fingerspelling on the program over a decade presumably because of an apparent weakening of the association of formality and prestige with English and a new positive attitude toward BSL in the deaf community.) 8. The data are taken from a detailed analysis of a single text by two native deaf signers told to three different audiences. The data are as expected insofar as constructed action appears less frequent in the more formal situations, but it is nonetheless an important feature of all the texts in all the situations. As with so much SL research, a large corpus with multiple signers in multiple situations is needed to tease apart individual factors from language-specific and modality-specific factors.

No conflicts of interest were reported.

Funding Australian Research Council (DP1094572); the Hans Rausing Endangered Languages Documentation Project, School of Oriental and African Studies, University of London (MDP0088).

Acknowledgements The author acknowledges the coresearchers, research assistants, and annotators who have contributed to the current body of corpus annotations since 2003—Adam Schembri, Julia Allen, Donovan Cresdee, Karin Banna, Michael Gray, Dani Fired, Della Goswell, and Gerry Shearim—the hundreds of deaf signers who contributed material to the Auslan Archive; and the current postgraduate students who have contributed to corpus annotations—Lindsay Ferrara, Gabrielle Hodge, and Michael Gray. Thanks also go to Adam Schembri and Kearsy Cormier for their comments on an earlier draft of this article, and to the comments of three anonymous reviewers, that together helped improve this article. ny remaining errors, as with the ideas expressed in this article, are the author’s. References Balota, D. A., Pilotti, M., & Cortesse, M. J. (2001). Subjective frequency estimates for 2,938 monosyllabic words. Memory & Cognition, 29, 639–647. doi: 10.3758/BF03200465. Bybee, J. (2010). Language, usage and cognition. Cambridge, UK: Cambridge University Press. Carreiras, M., Guitie´rrez-Sigut, E., Baquero, S., & Cornia, D. (2008). Lexical processing in Spanish Sign Language (LSE). Journal of Memory and Language, 58, 100–122. doi:10.1016/j.jml.2007.05.004. Conrad, S. (2005). Corpus linguistics and L2 teaching. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 393–409). Mahwah, NJ: Erlbaum. Cormier, K. (2010). Pronouns and pointing: Where do sign languages fit in? Paper presented at the ‘‘Between You and Me: Local Pronouns Across Modalities’’ Workshop, June 7–8, 2010, Radboud University, Nijmegen, the Netherlands. Cormier, K. Pronouns, in press. In R. Pfau, M. Steinbach, & B. Woll (Eds.), Sign language: An international handbook. Berlin: Mouton de Gruyter. Cormier, K., Schembri, A., & Tyrone, M. (2008). One hand or two Nativisation of fingerspelling in ASL and BANZSL. Sign Language and Linguistics, 11, 3–44. doi:10.1075/ sll.11.1.03cor. Croft, W., & Cruse, D. A. (2004). Cognitive linguistics. Cambridge, UK: Cambridge University Press. de Beuzeville, L., Johnston, T., & Schembri, A. (2009). The use of space with indicating verbs in Auslan: A corpus based

Lexical Frequency in SLs 189 investigation. Sign Language & Linguistics, 12, 53–82. doi:10.1075/sll.12.1.03deb. Deuchar, M. (1978). Sign language diglossia in a British deaf community. Sign Language Studies, 17, 347–356. Emmorey, K. (2002). Language, cognition, and the brain: Insights from sign language research. Mahwah, NJ: Erlbaum. Enfield, N. J. (2009). The anatomy of meaning: Sign, gesture, and composite utterances. Cambridge, UK: Cambridge University Press. Engberg-Pedersen, E. (2010). Factors that form classifier signs. In D. Brentari (Ed.), Sign languages (pp. 252–283). Cambridge, UK: Cambridge University Press. Foley, W. (1986). The Papuan languages of New Guinea. Cambridge, UK: Cambridge University Press. Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford, UK: Oxford University Press. Johnston, T. (2001). The lexical database of Auslan (Australian Sign Language). Sign Language & Linguistics, 4, 145–169. doi:10.1075/sll.4.12.11joh. Johnston, T. (2002). The representation of English using Auslan: Implications for deaf bilingualism and English literacy. Australian Journal of Education of the Deaf, 8, 23–37. Johnston, T. (2003a). BSL, Auslan and NZSL: Three signed languages or one? In A. Baker, B. van den Bogaerde, & O. Crasborn (Ed.), Cross-linguistic perspectives in sign language research, selected papers from TISLR 2000 (pp. 47–69). Hamburg, Germany: Signum. Johnston, T. (2003b). Language standardization and signed language dictionaries. Sign Language Studies, 3, 431–468. Johnston, T. (2008a). The Auslan Archive and Corpus. In D. Nathan (Ed.), The endangered languages Archive. London: Hans Rausing Endangered Languages Documentation Project, School of Oriental and African Studies, University of London. Retrieved from http://elar.soas.ac.uk/languages. Johnston, T. (2008b). Corpus linguistics and signed languages: No lemmata, no corpus. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (Eds.), Proceedings of the Sixth International Language Representation and Evaluation Conference (3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Signed Language Corpora) (pp. 82–87). May 26 to June 1, Marrakech, Morocco. Johnston, T. (2010a). Degree, not kind: Non-lexicalized points are symbolic indexicals regardless of whether they occur in the composite utterances of spoken languages or signed languages. Paper presented at the 4th Conference of the International Society for Gesture Studies (ISGS), European University Viadrina Frankfurt/Oder, July 25–30, 2010. Johnston, T. (2010b). From archive to corpus: Transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics, 15, 104–129. doi: 10.1075/ijcl.15.1.05joh. Johnston, T. (2010c). The Auslan Corpus annotation guidelines. Retrieved from http://www.auslan.org.au/about/annotations/ Johnston, T. (2010d). Points and pronouns in face-to-face language: Are pointing gestures fundamentally different in the composite utterances of signed languages and spoken languages?

Paper presented at the ‘‘Between You and Me: Local Pronouns Across Modalities,’’ Radboud University, Nijmegen, the Netherlands. Johnston, T., Cresdee, D., & Schembri, A. (2011). A corpusbased study of synchronic contextual variation in a signed language. Paper presented at the International Cognitive Linguistics Conference, Xi’an, China, July 11–17. Johnston, T., & Ferrara, L. (2010). Lexicalization in signed languages: When is an idiom not an idiom?. Paper presented at the 3rd UK Cognitive Linguistics Conference, University of Hertfordshire, July 6–8, 2010. Johnston, T., & Schembri, A. (1999). On defining lexeme in a sign language. Sign Language & Linguistics, 2, 115–185. doi:10.1075/sll.2.2.03joh. Johnston, T., & Schembri, A. (2007). Australian Sign Language (Auslan): An introduction to sign language linguistics. Cambridge, UK: Cambridge University Press. Johnston, T., & Schembri, A. (2010). Variation, lexicalization and grammaticalization in signed languages. Langage et Socie´te´, 131, 19–35. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge, UK: Cambridge University Press. Klima, E. S., & Bellugi, U. (1979). The signs of language. Cambridge: Harvard University Press. McBurney, S. L. (2002). Pronominal reference in signed and spoken language: Are grammatical categories modality-dependent? In R. P. Meier, K. A. Cormier, & D. Quinto-Pozos (Eds.), Modality and structure in signed and spoken languages (pp. 329–369). Cambridge, UK: Cambridge University Press. McEnery, T., Xiao, R., & Tono, Y. (Eds.). (2006). Corpus-Based Language Studies: An advanced resource book. London and New York: Routledge. McKee, D., & Kennedy, G. (2000). Lexical comparison of signs from American, Australian, British and New Zealand Sign Languages. In K. Emmorey & H. Lane (Eds.), The signs of language revisited: An anthology to honor Ursula Bellugi and Edward Klima (pp. 49–76). Mahwah, NJ: Erlbaum. McKee, D., & Kennedy, G. (2006). The distribution of signs in New Zealand Sign Language. Sign Language Studies, 6, 372–390. Retrieved from: http://gupress.gallaudet.edu/ SLS/SLS6-4.html. Meier, R. (1990). Person deixis in American Sign Language. In S. D. Fischer & P. Siple (Eds.), Theoretical issues in sign language research, volume 1: Linguistics (pp. 175–190). Chicago: University of Chicago. Morford, J., & MacFarlane, J. (2003). Frequency characteristics of American Sign Language. Sign Language Studies, 3, 213–225. Nation, P. (2001). Learning vocabulary in another language. Cambridge, UK: Cambridge University Press. Padden, C., & Gunsauls, C. (2003). How the alphabet came to be used in a sign language. Sign Language Studies, 4, 10–33. Quinto-Pozos, D., & Mehta, S. (2010). Register variation in mimetic gestural complements to signed language. Journal of Pragmatics, 42, 557–584. doi:10.1016/j.pragma. 2009.08.004.

190

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Schwager, W., & Zeshan, U. (2008). Word classes in sign languages: Criteria and classifications. Studies in Language, 32, 509–545. doi: 10.1075/sl.32.3.03sch. Sutton-Spence, R. (1999). The influence of English on British Sign Language. International Journal of Bilingualism, 3, 363–394. doi::10.1177/13670069990030040401. Sutton-Spence, R., Woll, B., & Allsop, L. (1990). Variation and recent change in fingerspelling in British Sign Language. Language Variation and Change, 2, 313–330. doi:10.1017/ S0954394500000399. Vinson, D. P., Cormier, K. A., Denmark, T., Schembri, A., & Vigliocco, G. (2008). The British Sign Language (BSL) norms

for acquisition, familiarity and iconicity. Behaviour Research Methods, 40, 1079–1087. doi: 10.3758/BRM.40.4.1079. Woll, B. (1987). Historical and comparative aspects of BSL. In J. G. Kyle (Ed.), Sign and school (pp. 12–34). Clevedon, UK: Multilingual Matters. Woodward, J. C. (1973). Some observations on sociolinguistic variation and American Sign Language. Kansas Journal of Sociology, 9, 191–200. Wray, A. (2002). Formulaic language and the lexicon. Cambridge, UK: Cambridge University Press. Wulff, S. (2010). Rethinking idiomaticity. London: Continuum.

Appendix The 300 most frequent types in the Auslan Corpus (71% of all tokens)

Appendix Rank

ID gloss

Rank

ID gloss

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

PT:PRO1 G(5-UP):WELL PT:PRO2/PT:PRO3 DEAF1/2 LOOK BOY PT:LOC DSM/L(BENT2):ANIMATE-MOVES/AT HAVE SAME GOOD PT:DET DSM(1/X):ENTITY-MOVES DSS(BC):CYLINDRICAL/CURVED WHAT THINK NOTHING NOT DOG1/2 REAL PEOPLE1 ONE WHY-BECAUSE SIGN DSM/L(2/H):ANIMATE-MOVES/AT G(CA): SAY PT:POSS1 WITH1/2 DSM/L(5):MANY-MOVE/AT IN FROG1/2 DSS(1):TRACE WOLF DSS/L(5):MASS/SHAPE-AT

36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

WANT SEE YELL1/2 KNOW CAN GO YES1/2 BUT2 PT:POSS2/PT:POSS3 CAT1/2 HEARING2 DISABILITY PT:BUOY TORTOISE BAD SCHOOL DSH(A/S/6):HOLD FS:IF CAN-NOT INDECIPHERABLEa ALL PERHAPS MAN WILL FINISH.6 GIRL1/2 WHERE FEEL DSL(BC):CYLINDRICAL/CURVED-AT BIRD WORK SOME TWO NAME DIFFERENT RABBIT KNOW-NOT

Continued

Lexical Frequency in SLs 191 Appendix

Continued

Appendix

Continued

Rank

ID gloss

Rank

ID gloss

73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

SHEEP LITTLE DSL(1):ENTITY-AT FOR1/2/3 SEARCH LADY1 LAUGH DAY CAR1/2 STOP ONLY GO-POINT ARRIVE RIGHT1 OTHER CHILDREN1 PAST CONTINUE ALWAYS1 DSS/L(B):SURFACE ENCOURAGE1 DSM/L(4):MANY-MOVE/AT DSM(B):VEHICLE-MOVE SLEEP SPEECH AREA AGAIN COME OLD NEVER BABY GROUP G(5-DOWN):PHOOEY TALK HAVE-NOT FS:SO FS:TO WINDOW FRIEND WHEN COCHLEAR-IMPLANT REPEAT ON PT:LOC/PRO COINCIDENCE HEAR DSH(BENT5):HANDLE DEER TREE1 DSM(2):ANIMATE-FALL BOWLING NIGHT1

125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176

MORNING WRONG SPRINT HOUSE BED WHO1/2 REMAIN1 WALK ORAL MONKEY MAKE EAT FINISH.5 MUST GET-ATTENTION FS:VILLAGE BETTER TAKE HEARING-AID HEARING HANDICAPPED WAKE SIT-ON MORE GO-TRACE G:UM FS:JAR NOW3 PUT CATCH LIKE SHOUT STRONG GIVE BOTH UNDERSTAND1 PT:DEM OVER LET’S-SEE TEN PERSON CHAT TEACHER SUCCESS STILL1 SLOW PLUS DSM(BC):CYLINDRICAL/CURVED TITLE TIME2 G(5-WIGGLE):UMM CLOTHES

192

Journal of Deaf Studies and Deaf Education 17:2 Spring 2012

Appendix

Continued

Appendix

Continued

Rank

ID gloss

Rank

ID gloss

177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228

AND1 ABOUT1/2 TIME-WATCH THROUGH MEET JOKE FS:DO FROM COMMUNICATE BORN2 RUN BEE AUSLAN FOLLOW FIND ENGLAND BELIEVE2 UNIT1 PLAY MOST LATER GRAZE WATER PARENTS1 MOTHER LIFE ACCEPT NEXT HELP EDUCATE BREAK ALRIGHT WRITE FS:BEE FAR YOUNG STORY OR LANGUAGE GROW-UP ESCAPE TOMORROW START FS:WOLF NEED BORING STAND TIME-CLOCK REMEMBER1 PARENTS2 NO-WAY LOVELY

229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280

HAPPEN GONE1 GO-HOME ATTENTION THREE QUIET NOW1 MATTER-NOT HAPPY1 FALL CHANGE BLIND BIG SHOULD EASY DOUBT BEFORE1 AGREE WANT-NOT DRIVE BALL2 BACK TRY PT: PATIENCE FINGERSPELL BEST TEASE LOOK-AFTER BORN1 TERRIBLE READY1 FS:PIPE FS:BUT FORGET DOOR COMMUNITY SWALLOW LOVE FAMILY1 BRICKLAYER BAD-LUCK ANGRY1 TABLE LIVE LEARN3 HOLD-ON GO-OUT FS:OF FS:LOG FIRST CHOOSE

Lexical Frequency in SLs 193 Appendix

Continued

Appendix

Continued

Rank

ID gloss

Rank

ID gloss

281 282 283 284 285 286 287 288 289 290 291

YEAR PRETEND EXCITED CHILD CHAIR2 THING SHOCK REGULAR FATHER DSL(B):VEHICLE-AT WORK-OUT

292 293 294 295 296 297 298 299 300

WHEELCHAIR2 SURPRISED FS:FROG CRY TREE2 SOMETIMES NEW HOLE3 FS:OWL

a

A sign that could not be recognized and not a sign that meant ‘‘indecipherable.’’

Suggest Documents