The sentence production test for aphasia

C/e: QS C/e QA: RV Aphasiology, 2014 Vol. 00, No. 00, 1–34, http://dx.doi.org/10.1080/02687038.2014.893555 The sentence production test for aphasia...
Author: Nathan Norris
52 downloads 1 Views 1MB Size
C/e: QS

C/e QA: RV

Aphasiology, 2014 Vol. 00, No. 00, 1–34, http://dx.doi.org/10.1080/02687038.2014.893555

The sentence production test for aphasia Carolyn E. Wilshire, Carolina C. Lukkien, and Bridget R. Burmester AQ1

School of Psychology, Victoria University of Wellington, Wellington, New Zealand Background: Researchers and clinicians have long known that in aphasia, the ability to produce connected speech is poorly predicted by tests of single-word production. Connected speech is most commonly assessed using rating scales, in which the examiner rates the speech on various fluency-rated and grammatical well-formedness measures. However, with this method, interrater and test–retest reliability can be poor, and since the intended utterance is not known, accuracy and appropriate of the speech content is difficult to measure. Aims: The aim of the present study was to develop and investigate the validity and usefulness of a new, freely accessible sentence production test (SPT) based on simple pictured event description. Methods & Procedures: The SPT involves describing simple pictured events. The test pictures represent a range of sentence constructions and lexical items, which elicited high response agreement in healthy controls. The simple automatised scoring procedure generates both general and specific accuracy measures. This article describes the test construction and norming procedure and reports test data from 24 participants with aphasia. Outcomes & Results: Interrater reliability for the scoring protocol was excellent. The overall sentence score was found to measure unique variance not accounted for by single-picture naming. It was unrelated to fluency measures such as speech rate. Specific scores, such as the closed-class score, measure partially overlapping, but qualitatively distinct constructs from other speech assessments. Conclusions: The SPT is quick to administer, easy to score and can be used even when a person’s speech is very limited. It provides a range of measures of sentence production that may prove informative for both clinical and research purposes. Keywords: Aphasia; Sentence production; Assessment.

Address correspondence to: Carolyn Wilshire, School of Psychology, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand. E-mail: [email protected] The authors would like to thank Richard Moore for creating the beautiful pictures used in our test materials (for further information, see http://www.artbyrichardmoore.com). Thanks also to Dr Nadine Martin from the Temple University School of Speech and Hearing Sciences, for allowing us to include test data for participants FS, XX, DD, EC and TB. Thanks also to Alana Oakly for her help with the interrater reliability analysis. Finally, we are grateful for all those who helped to collect test data for us, particularly Christina Cameron Jones and Corinne Bareham, who tested and transcribed data for many participants on our behalf. The first and third authors were supported in part during this work by a grant from the Marsden fund of New Zealand [VUW0505; C Wilshire Principal Investigator]. The second author’s work was supported by a Victoria University of Wellington PhD Scholarship (2002–2005) and a Victoria University Doctoral Completion Scholarship (2005–2006). © 2014 Taylor & Francis

5

10

15

20

25

2

WILSHIRE, LUKKIEN, BURMESTER

Many people with aphasia have particular difficulty producing sentences. However, assessing this difficulty can be challenging. Unlike single-word production, where pictures can be used to elicit specific words, it is much harder to constrain a person’s speech output in order to examine specific kinds of sentences or utterances. Even in tasks where there are clear expectations about the propositions to be expressed (such as recounting a story or describing a pictured scene), there are almost always various ways of expressing them. Consequently, it is not always easy to establish what constitutes “normal” behaviour, let alone define what is impaired. A second difficulty is that connected speech in aphasia may be abnormal across a number of different dimensions, which include not only its informational content, but also its syntactic complexity, grammatical and morphological well-formedness and rate of production. This article presents a new test, which uses a simple picture description task to assess aspects of sentence well-formedness and lexical content. However, before describing the test and its most important precursors, we will briefly consider some of the methods that are currently used to assess connected speech in aphasia. Currently, the most widely used assessments of aphasic connected speech are based on subjective rating scales. For example, in the Boston Diagnostic Aphasia Examination (BDAE) (Goodglass, Kaplan, & Barresi, 2001) and the Western Aphasia Battery (Kertesz, 2006; Shewan & Kertesz, 1980), a speech sample is elicited using open-ended questioning and picture description tasks. It is then rated on various dimensions, which may include articulatory agility, phrase length, grammatical form, melodic line, incidence of paraphasias, information content and completeness of utterances. In both assessments, the rated dimensions are primarily selected for their effectiveness in discriminating fluent from nonfluent aphasia. However, since these methods rely heavily on subjective judgements, they demand a considerable amount of skill and training on the part of the examiner. Also, even trained examiners may vary widely in their ratings of the same speech sample, so interrater reliability may be poor (Gordon, 1998). In response to these concerns, several more stringent, free speech scoring protocols have been developed, which enable the examiner to derive numerical scores for aspects of the speech. Some focus primarily on the form of the person’s utterances; one such example is the quantitative production analysis or QPA (Berndt, Wayland, Rochon, Saffran, & Schwartz, 2000; Rochon, Saffran, Berndt, Schwartz, & Schwartz, 2000; Saffran, Berndt, & Schwartz, 1989; for other examples, see Edwards, 1995; Shewan, 1988; Thompson, Shapiro, Tait, Jacobs, Schneider & Ballard, 1995; Vermeulen, Bastiaanse, & Van Wageningen, 1989; Wagenaar, Snow, & Prins, 1975). In the QPA, the examinee is asked to retell a well-known story such as Cinderella. First, the speech rate, or number of words produced per minute, is calculated. Then, the speech is stripped of fillers, repetitions and other extraneous material, and several other scores are generated. These include the total number of narrative words, numbers of open- and closed-class words, nouns, verbs, pronouns and determiners. There are also protocols for segmenting the speech into sentences, enabling the examiner to derive measures such as the proportion of well-formed sentences and the mean length of utterances. The QPA is capable of quantifying certain specific speech patterns, such as agrammatic speech, a pattern characterised by disproportionate omission of function words and/or other closed-class elements, which is often observed in individuals with Broca’s aphasia (e.g., ‘Mum…Dad… shopping…Friday’).

30

35

40

45

50

55

60

65

70

75

THE SENTENCE PRODUCTION TEST FOR APHASIA

AQ2

3

Other quantitative scoring protocols focus on speech informativeness or content/ communicative effectiveness. One example, developed by Brookshire and Nicholas (1994), involves eliciting a sample of relatively open-ended speech, and then identifying each correct information unit (or CIU) (e.g., Nicholas & Brookshire, 1993, 1995). A CIU is defined as any word that is informative, accurate and relevant to the present speech context (for applications of this procedure, see Gordon, 2008; see also Fink, Bartlett, Lowery, Linebarger, & Schwartz, 2008; Jacobs, 2001; Yorkston & Beukelman, 1980). The more recent story retell procedure takes a similar approach (Doyle et al., 1998, 2000; Doyle, Tsironas, Goda, & Kalinyak, 1996; Hula, McNeil, Doyle, Rubinsky, & Fossett, 2003; McNeil, Doyle, Fossett, Park, & Goda, 2001; McNeil, Doyle, Park, Fossett, & Brodsky, 2002; McNeil et al., 2007). Participants hear a number of short stories, which they have to subsequently retell. The examiner then rates the content of the person’s speech against an inventory of the total propositions in each of the stories. (For other similar approaches, see Menn, Ramsberger, & Estabrooks, 1994; Yorkston & Beukelman, 1980 and Menn et al., 1994.) Despite wide differences in their aims and scope, these protocols share one important strength: the scoring criteria for each measure are transparent, and consequently, interrater reliability is much higher than on more subjective rating scales. However, the protocols also have some significant limitations. First, the unconstrained elicitation procedures mean that some individuals may be very effective at avoiding structures or lexical content that is difficult for them. Second, since the appropriateness of lexical elements is not scored (only their incidence), some types of abnormalities, such as paragrammatic substitutions of closed-class elements, may be difficult to detect. Third, although the scoring systems themselves are reliable, estimates of test–retest reliability are less: for example, on the various measures from the QPA, Rochon et al. obtained test–retest intraclass correlations ranging from 0.53 to 0.92 (2000). This lack of reliability is likely to reflect the unconstrained nature of the speech samples used, which allow for enormous variability in speaker output from session to session (see also Cameron, Wambaugh, & Mauszycki, 2010, for a recent evaluation of Nicholas and Brookshire’s protocol). And finally, the scoring is often complex and extremely time consuming. Finally, and perhaps most importantly, many of the procedures cannot be used with severely impaired individuals who are unable to produce the minimum speech sample required for scoring. Ideally, then, these protocols might be complemented by more constrained methods, which assess speech accuracy against a clear expectation of what a nonaphasic person would say. One paradigm that has been used effectively for this purpose is pictured event description, where the participant describes a single pictured event in one sentence. The use of this kind of task has a long history in aphasia research (see e. g., Gleason, Goodglass, Obler, Hyde, & Weintraub, 1980; Helm-Estabrooks & Ramsberger, 1986; Helm-Estabrooks, Fitzpatrick, & Barresi, 1981; Saffran, Schwartz, & Marin, 1980). However, it is only more recently that researchers have begun to develop more rigorous tests, selecting pictures for which clear ‘norms’ have been established in healthy control participants, and adopting detailed, reliable scoring systems. In the last decade or so, several such protocols have been developed, mostly with the aim of assessing aspects of syntactic competence (see e.g., Bastiaanse, Edwards, & Rispens, 2002; Bastiaanse, Edwards, Mass, & Rispens, 2003; Caplan & Hanna, 1998; Cho-Reyes & Thompson, 2012; Faroqi-Shah & Thompson, 2003; Thompson, Lange, Schneider, & Shapiro, 1997; but see also Whitworth, 1995,

80

85

90

95

100

105

110

115

120

125

4

WILSHIRE, LUKKIEN, BURMESTER

which focuses on thematic role assignment). Many of these tests provide comparative measures of performance across different types of syntactic structures, for example, by varying the verb argument structure of the sentence or its surface form (e.g., active vs. passive voice). Although this is not our aim in the current study, a brief review of this research can provide information about the strengths and limitations of this method, and the most important design and scoring considerations. In Caplan and Hanna’s (1998) task, participants had to use a single sentence to describe a pictured event. The target sentences included five exemplars of each of the following sentence constructions: actives, passives, datives (The woman is giving the rattle to the baby) and dative passives (The ball was thrown to the boy by the man).1 Arrows were used to identify the items to be included in the sentence, with a dot indicating the one to be produced first. An example is shown in panel (a) of Figure 1. Also, the target verb was provided orally by the experimenter. In a group of 55 nonaphasic controls, all but one picture yielded near-ceiling levels of response agreement. The scoring protocol specified: (a) how responses should be divided into individual utterances; (b) in cases where multiple attempts were given, how the ‘best effort’ should be identified for scoring and (c) how each constituent grammatical element and lexical item should be scored (e.g., subject determiner, subject noun, object determiner, indirect object noun, auxiliary, root verb, inflectional affix or preposition). Caplan and Hanna (1998) reported test data from 20 individuals with aphasia—12 with Broca’s aphasia and 8 with fluent aphasia. Both groups performed more poorly as grammatical complexity was increased; the Broca’s group showed a

Figure 1. Example of stimulus pictures from previous picture description tasks. Panel (a) shows an example from Caplan and Hanna (1998). The picture shown here was designed to elicit the sentence The ball was thrown to the boy by the man. Panel (b) shows an example from the NAVS argument structure production tests (Cho-Reyes & Thompson, 2012). This picture is designed to elicit the sentence The man is washing the clothes. 1

A further five pictures were included in the original test to elicit subject–object relative sentences. These were subsequently excluded from the analysis, as they failed to reliably elicit the target construction.

130

135

140

145

THE SENTENCE PRODUCTION TEST FOR APHASIA

AQ3

AQ4

5

particularly steep drop in content word scores as complexity increased. However, many of these trends were not statistically significant, which may be due to heterogeneity within the participant subgroups themselves. More recently, Faroqi-Shah and Thompson (2003) developed a picture description task to explore whether lexical cues improved sentence production accuracy in different aphasia subgroups. There were 60 stimulus pictures, each depicting a single-object transitive sentence, with equal numbers of reversible and nonreversible sentences (e.g., The runner lifted the skier). Similar to Caplan and Hanna (1998), an arrow was placed next to the item to be mentioned first. Participants were further prompted with the question, ‘What is he/she doing?’ (for actives), or, ‘What is happening to her’? (for passives). All picture stimuli yielded 100% response agreement when normed on a group of 10 control participants. Several key aspects of the scoring system are worth noting here. First, the best effort response was identified, using criteria similar to those of Caplan and Hanna (1998), and this was then scored as correct or incorrect overall (allowing for appropriate alternatives and phonemic errors). Then the errors themselves were categorised. Error categories included role reversal (agent → patient, or vice versa), grammatical morpheme errors (and whether substitution or omission), preposition errors and non-sentences. Interrater agreement was also reported, which was 92% for the identification of the best effort, 92% for overall accuracy scoring and 100% for error categorisation. Several recently published aphasia assessments also include carefully constructed picture description tasks. The sentence construction test in the verb and sentence test battery (VAST) comprises 20 pictures depicting 10 transitive and 10 intransitive sentences (Bastiaanse et al., 2002, 2003). A group of older controls scored close to ceiling on the English version of this test. The scoring system is simple: sentences are scored on a correct/incorrect basis, and various finer aspects of the utterances can then be examined informally (e.g., syntactic well-formedness, lexical content, incidence of nouns/verbs). Scores on this test can then be compared on other VAST subtests that examine specific various of verb production and comprehension, enabling the research to build up a profile of each person’s difficulties, especially with respect to verbs. The recent Northwestern Assessment of Verbs and Sentences (NAVS) includes the argument structure production test, which uses picture description to elicit sentences that vary in verb argument structure (e.g., one argument: The dog is barking; two: The man is washing the car; three: The postman is delivering the package to the man; Cho-Reyes & Thompson, 2012). In some sentences, all arguments are obligatory, and in others, at least one was optional. The target nouns are labelled on the picture; an example is shown in panel (b) of Figure 1. All tests in the assessment were carefully normed, with control participants scoring at or near ceiling on all stimuli, and interrater agreement was also reported to be high. Data is also reported were administered to 35 individuals with agrammatic aphasia, and 24 with anomic aphasia, diagnosed using the Western Aphasia Battery. On the verb argument structure test, responses were scored as correct if they contained the target verb and the correct number of arguments in the right order; substitutions of similar nouns (e.g., man → boy) were not penalised. The agrammatic group produced more incorrect responses overall. Sentence accuracy in both groups declined as the number of verb arguments increased; however, this decline was steeper for the agrammatic group. Preposition errors (e.g., The woman is giving the gift for the boy) were the most common error in both groups, although the agrammatics did produce disproportionately more argument errors (mostly argument omissions), role reversals and non-

150

155

160

165

170

175

180

185

190

195

6

WILSHIRE, LUKKIEN, BURMESTER

sentences. Again scores on this test can be compared with those on the NAVS’ other tests, which focus primarily on verb production and comprehension. This battery also includes the sentence production priming test, a picture description task where the participant is provided with the key nouns and root verb, and also an example sentence describing a picture with the elements in role reversed (e.g., prime sentence: The cat is chasing the dog; target sentence: The dog is chasing the cat). The participants’ task is to provide a sentence with the same form that describes the target picture. Target sentence types include active and passive transitive sentences, wh-questions (subject: Who is chasing the dog? object: Who is the cat chasing?) and relative clause structures (subject: Pete saw the cat who was chasing the dog; object: Pete saw the dog who the cat was chasing). This task is rather different from regular picture description, as it focuses primarily on sentence transformation, rather than spontaneous sentence generation. However, it is interesting to note that the agrammatic group were considerably less accurate than the anomics with passives and object wh-questions, even though the scoring system did not penalise for noun or verb substitutions. The agrammatics were particularly prone to role reversal errors in these sentences. The primary advantage of these highly constrained picture description tasks is that they can be used to elicit very specific responses, which can then be scored for both informative accuracy and grammatical well-formedness at the same time. Also by norming the pictures on nonaphasic controls, any substantial divergence from the normed response can be considered ‘impaired’ and scored as such. Also, the scoring protocols are generally quicker to learn and easier to apply than those used to assess open-ended speech. Another advantage of this method is that it can be extended to individuals whose spontaneous speech output is sparse. No minimum speech sample is required; as long as the participant is able to produce some speech in response to at least some of the items, it is possible to obtain useful information from the task. Many of the picture description tasks described so far were designed to examine the effects of syntactic complexity on performance, while minimising non-syntactic demands (e.g., by using very high-frequency nouns: Bastiaanse et al., 2003; Caplan & Hanna, 1998; or by providing written noun labels: Cho-Reyes & Thompson, 2012 or auditory verb prompts: Caplan & Hanna, 1998). The reasoning here is that individuals with grammatical encoding deficits should show a decline in performance as syntactic complexity increases. However, one problem is that accuracy appears to decline as syntactic complexity increases in all aphasia subtypes, not just those hypothesised to have a grammatical deficit. For example, Caplan and Hanna (1998) observed a comparable reduction in overall accuracy on passive (relative to active) sentences in their Broca’s and fluent aphasia groups. Similarly, in the baseline, uncued condition in Faroqi-Shah and Thompson’s (2003) picture description task, both the Broca’s and the Wernicke’s groups performed less accurately on reversible passive sentences than on their active counterparts—in fact, the drop in performance was more dramatic for the Wernicke’s cases. Cho-Reyes and Thompson’s (2012) study using the NAVS argument structure production test documented a marked decline in accuracy in their agrammatic group as the number of verb arguments was increased, but it is unclear whether this is an effect of syntactic complexity or a simple consequence of the increased number of noun elements that needed to be produced in the more complex sentences. Nevertheless, we use the picture paradigm in a slightly different way: to explore an individual’s ability to produce words within the context of longer utterances more

200

205

210

215

220

225

230

235

240

245

THE SENTENCE PRODUCTION TEST FOR APHASIA

AQ5

AQ6

7

generally, irrespective of their particular syntactic form. This approach takes its inspiration from theories of normal sentence production that postulate a close interplay between the process of content word retrieval and syntactic structure generation (e.g., Chang, Dell, & Bock, 2006; Stemberger, 1985). It has long been known that individuals with aphasia, particularly nonfluent aphasia, are more accurate at producing words in isolation than within sentences (Schwartz & Hodgson, 2002; Speer & Wilshire, in press; Williams & Canter, 1982). Other individuals, particularly those with Wernicke’s aphasia, may actually show the opposite pattern (Williams & Canter, 1982). Further, several recent studies suggest that the retrieval of any particular lexical content item may be influenced by the other content items in the utterance (e.g., Freedman, Martin, & Biegler, 2004; Martin & Freedman, 2001; Scott & Wilshire, 2010; Speer & Wilshire, in press). There is a need for measures that can quantify phenomena of this kind, and which can be obtained easily in the course of a more general assessment. The test we have developed provides aggregate measures of overall sentence production accuracy (considering both informational content and syntactic well-formedness), as well as scores for the accurate use of specific elements, such as nouns, verbs and closed-class elements. These various measures can then be compared and contrasted with those from other types of language tasks to gain a richer picture of the person’s cognitive profile. Indeed, the test is intended to be used not just with individuals with agrammatic/Broca’s aphasia, but rather across a range of different aphasic syndromes. Structured pictured event description tasks provide information that is to some extent complementary to that obtainable from more open-ended assessments. In picture description tasks, the emphasis is on accuracy of production, rather than on timing, so the information the test generates is largely orthogonal to that obtained from fluency ratings and other measures of speech rate. Indeed, comparisons between these two sets of measures may themselves help us to tease apart some of the factors that influence connected speech in aphasia, including the possible role of speech rate limitations. In this respect, it is interesting to note that a number of recent studies have suggested that the grammatical well-formedness of aphasic utterances may vary significantly depending upon the elicitation context. For example, some individuals with Broca’s aphasia tend to produce more grammatically well-formed utterances in constrained tasks (such as action or picture description) than they do in ‘freer’ more conversational speech tasks (Beeke, Maxim, & Wilkinson, 2007; Beeke, Wilkinson, & Maxim, 2003; Hofstede & Kolk, 1994; Sahraoui & Nespoulous, 2010). It is possible that in these freer tasks, the speaker trades grammatical accuracy for speed, so as to maintain the listener’s interest. If this is the case, then assessments of open-ended speech may underestimate the actual grammatical capabilities of the speaker, and more constrained tasks may be a better way to assess them. Indeed, a comparison of scores in picture description and more open-ended tasks may provide important theoretical insights into the ways in which grammatical form may vary depending on the sociolinguistic context and the speaker’s communicative intent. The materials for our test were designed with the following five considerations in mind. First, since our aim was to examine sentence production in the broadest sense, including both syntactic well-formedness and lexical retrieval in context, we wished to include as many types of syntactic structures as possible and also to incorporate a variety of different lexical content items. Second, the stimulus items needed to be sufficiently constrained and well-normed that each person’s response could be evaluated against an expectation of ‘normal’ performance. Third, the assessment needed

250

255

260

265

270

275

280

285

290

295

8

WILSHIRE, LUKKIEN, BURMESTER

to be quick to administer, and the scoring procedures simple and reliable, requiring little or no special training. Fourth, the test should include a sufficient range of materials that can be used to evaluate individuals with very limited output, as well as those with milder speech difficulties, and the stimuli should be available throughout each attempt to minimise the demands on short-term memory. Fifth, since the assessment was intended to measure accuracy rather than fluency, it should be untimed. One further stipulation we made was that the test should not require the use of additional metalinguistic prompts, such as arrows or cues, to constrain the form of the utterance. Such cues constitute an additional task instruction, which must be maintained in working memory and actively utilised to modulate output, skills that may be particularly challenging for those with damage to anterior language regions (see, e.g., Kimberg & Farah, 1993). Also, according to one prominent theory, the order of elements within a planned sentence is determined by their relative salience in the mind of the speaker at the time of initiation (Chang et al., 2006); if so, then any attempt to override this salience gradient through the use of external cues may lead to increased competition for production of the first phrase, and increased likelihood of failure. Our aim was to design the stimulus pictures in such a way that they elicit a single dominant sentence structure without the use of additional cues. The remainder of this article is organised into three sections. In the ‘Development of the SPT’ section, we describe the development of the sentence production test (SPT). In the ‘Administration and scoring of the final test’ section, we summarise the scoring procedures used in the test, and the justifications for them. And finally, in the ‘Test performance of 24 individuals with aphasia’ section, we report preliminary test data from a sample of 24 individuals with chronic aphasia. Using these data, we examine interrater reliability for our scoring procedure and also explore relationships between our measures and those from several other widely used assessments.

300

305

310

315

320

DEVELOPMENT OF THE SPT The first phase in the development of the SPT involved testing a sample of participants without aphasia on a large cohort of potentially suitable pictures, chosen with our five primary considerations in mind (see above). Because our aim was to examine lexical as well as grammatical aspects of sentence production, agreement about lexical content was as important as agreement about sentence structure. Starting with an initial cohort of 48 pictures that met these criteria, we then collected response agreement data from a 150 nonaphasic individuals: 50 were aged between 18 and 30 years (M = 21.24, SD = 3.94); a further 50 were aged between 31 and 50 years (M = 39.8, SD = 6.62) and the remaining 50 were aged between 50 and 81 years (M = 58.32, SD = 7.16). Each of these participants was asked if they had any history of neurological illness or injury, and only those who responded ‘no’ were selected into the study. Each participant was given a booklet containing 48 black and white line drawings of scenes (10 cm by 10 cm), which were specifically drawn for the study by New Zealand artist Richard Moore. Participants were instructed to describe the picture using a single complete sentence. The target sentences were designed to represent as wide a range of target structures as possible, including intransitives (e.g., The dragon is flying), single-object transitives (e.g., The dog is pushing the pram), double-object/indirect object constructions (e.g., The clown is throwing a ball to the seal), passives (e.g., The boy is being stung by a bee) and embedded sentences (e.g.,

325

330

335

340

THE SENTENCE PRODUCTION TEST FOR APHASIA

9

Figure 2. Examples of stimulus scenes piloted in Part 1. Panel (a) depicts The dog is swimming and panel (b) depicts The cat is watching the children play.

The cat is watching the children play). The scenes were selected on the basis of results of an earlier pilot study that utilised a different set of drawings (see Lukkien, 2006, for further details). Appendix A contains a complete list of the target sentences corresponding to each of the pictures that underwent norming, and Figure 2 shows some examples of the pictures. For the analysis of response agreement, the data were collapsed across all three age groups. A total of 36 scenes elicited at least 80% response agreement across all participants (i.e., at least 80% of all participants described the scene using exactly the same sentence). For many of the picture stimuli, response agreement exceeded 90%, and in many just below this level, the alternative responses given were usually a very close variant of the target. Nevertheless, levels of agreement were lower for some types of structures than others. Generally, as syntactic complexity increased, response agreement decreased. For example, pictures depicting passive sentences tended to produce lower agreement levels than those depicting active sentences. Nevertheless, three passive sentences successfully met the agreement criterion and one embedded sentence (The cat is watching the children play). Turning to the lexical content items, response agreement tended to be higher for medium- to low-frequency words with very specific referents (e.g., clown, nurse) than for broader more common terms (e.g.,

345

350

355

360

10

WILSHIRE, LUKKIEN, BURMESTER TABLE 1 Sentences depicted in the 20 stimulus pictures chosen for use in the final version of the sentence production test Target sentence Intransitives The dragon is flying The girl is running The dog is swimming The nuns are praying The cats are sleeping The sheep are skiing Single-object constructions The cats are playing the piano The nurse is feeding a baby The dog is pushing a pram The monkey is eating a banana The caterpillar is eating a leaf The clown is feeding a baby Double-object constructions The clown is throwing a ball to the seal The angel is throwing a star to the nun The fairy is giving a crown to the girl The sheep is throwing a carrot to the rabbit Passives The boy is being stung by a bee The clown is being bitten by a snake The house is being struck by lightning Embedded sentences The cat is watching the children play

% Response agreement 92 98 98 88 82 92 82 96 97 99 96 94 89 89 89 92 80 85 83 81

girl, man). For verbs, those eliciting the highest name agreement tend to be semantically heavy verbs with very specific, picturable referents (e.g., skiing, flying). From the 36 items that passed the norming procedure, a subset of 20 items was selected that represented all possible syntactic structures and included a range of lexical items, both high and low frequency (see Appendix B for further information). 365 The sentences depicted in the final 20-item set are given in Table 1.2 Of the six items with the lowest response agreement (

Suggest Documents