Computers in Human Behavior 48 (2015) 525–534

Contents lists available at ScienceDirect

Computers in Human Behavior journal homepage: www.elsevier.com/locate/comphumbeh

On the utility of pictorial feedback in computer-based learning environments Albert D. Ritzhaupt a,⇑, William A. Kealy 1 a

School of Teaching and Learning, College of Education, University of Florida, 2423 Norman Hall, PO Box 117048, Gainesville, FL 32611, United States

a r t i c l e

i n f o

Article history:

Keywords: Feedback Pictures Learning Experiments Computer-based learning environments

a b s t r a c t Extensive research has added to what is known about the nature of feedback and how to best incorporate it into instruction. Yet, many questions related to learner feedback remain unanswered. One problem of practical importance is the utility of incorporating semantically related pictures into the feedback. Decades of research on feedback have largely focused on the use of verbal feedback in written instruction. This research included two experiments. The first experiment (n = 63) addressed the incorporation of pictorial feedback into instruction; the second experiment (n = 69) extended this study through the use of a more ecologically valid intervention. Results suggest that the use of pictures in feedback did not influence learning any more than text-only treatments. A discussion and recommendations for future research are provided. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction In recent years, researchers in educational technology have been criticized for holding a one-directional view of the connection between theory and practice in which basic research questions (theory) precede and gives rise to investigations having an applied focus (practice). The concept of developmental research or ‘‘design experiments’’ offers an exciting and useful alternative, whereby practical instructional interventions are rigorously studied for their usefulness in solving authentic problems (Reeves, 2000). One area of research that has been characteristically a theory-driven research perspective is that of feedback in written instruction. Feedback is a key component in both behaviorist and cognitive theories of learning that has been reflected in the research questions and methods posed by each (Bangert-Drowns, Kulik, & Kulik, 1991). By contrast, the current research is driven by practical concerns of exploring approaches to the design and implementation of feedback may yield learning improvement, particularly in computer-based learning environments. A practical matter of particular concern in our research was the exploration of alternatives to the verbal responses that have characterized the feedback used in most research studies. The current research examined the effectiveness of accompanying verbal feedback with a semantically ⇑ Corresponding author. Tel.: +1 (352) 273 4180 (O); fax: +1 (352) 392 9193. 1

E-mail address: [email protected] (A.D. Ritzhaupt). Retired.

http://dx.doi.org/10.1016/j.chb.2015.01.037 0747-5632/Ó 2015 Elsevier Ltd. All rights reserved.

related image. Hypothetically, such a strategy should improve the efficacy of feedback due to the dual coding (linguistic and imaginal) of the feedback. In Paivio’s (1986) conception of dual coding, the mental processing of words and images occurs through separate but mutually accessible encoding mechanisms. When, in this manner, linguistic material and semantically related imagery are conjointly retained in memory (such as when a text is accompanied by relevant images), the latter are available to serve as secondary cue for the associated verbal information during recall (Kulhavy, Lee, & Caterino, 1985; Kulhavy & Stock, 1996). Historically, tests of the conjoint retention hypothesis have (CRH) only studied the impact of dual coding during the acquisition phase of learning. Alternatively, the potential benefits of adjunct images during feedback have not been fully explored. Conceivably, the addition of an adjunct picture during feedback should have the same beneficial effect on learning as when it is introduced during the initial study of prose. In both instances, CRH proposes that the adjunct picture serves as a secondary cue for prompting recall of related verbal storage. Moreover, as learners revise their thinking to accommodate the correction provided by verbal feedback, the accompanying image offers information redundancy that is processed through a modally different encoding channel. This theoretically affords a stronger memory trace for the correct material that lessens the chance for proactive interference by the incorrect response. Numerous studies on the memorial benefits of including a relevant map with a prose passage (Abel & Kulhavy, 1989; Kulhavy, Stock, & Kealy, 1993; Schwartz & Kulhavy, 1981) reveal such

526

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

adjuncts only support recall of text information that is semantically related to specific features on the map, especially when such features are represented mimetically as small pictures (Griffin & Robinson, 2000). For this reason, we expected that presenting a picture during feedback that was previously encountered during the reading of a text paragraph would only improve recall for facts relevant in meaning to the picture. By contrast, supplementing verbal feedback with a picture would not enhance subsequent recall for facts in the paragraph that were semantically unrelated to the image. Supplementation of traditional verbal feedback with adjunct pictures may also offer a strategy for overcoming the interference of a learner’s incorrect prior knowledge with the new correct information. Some research on feedback suggest that providing new information after a brief delay may allow incorrect prior knowledge to have less salience at the time of corrective feedback, a phenomenon referred to as the ‘‘delayed retention effect’’ (DRE). However, from an instructional design perspective, incorporating delayed feedback into instruction may be an inefficient practice. Conceivably, pictorially-supplemented immediate feedback may provide encoding of new material with sufficient strength to supplant incorrect prior knowledge. The absence of research in the use of pictures in feedback is especially surprising given that the value of including pictures in a text to be learned is well-documented (Carney & Levin, 2002; Levin, Anglin, & Carney, 1987). More than merely duplicating the information in a text, adjunct pictures can provide an alternate route for accessing and understanding the text (Schallert, 1980). Hence, supplementing feedback with an image may assist learners in revising their understanding of what was read. Kulhavy and Stock (1989) presented feedback in written instruction as a three-cycle phenomenon involving (1) eliciting a response to a question on what was read, (2) providing corrective feedback to the learner, and (3) again presenting the question response in the first cycle. In our study we sought to examine the impact of pictures during the first two phases of the feedback cycle as well as during the initial reading of the text to be learned. An additional area of research interest, reflected in the two studies described herein, is the effectiveness of feedback for constructed responses. Past research on feedback has, by and large, used criterion measures involving performance on multiple choice questions (Morey, 2004). By contrast, our current studies presented participants with prose passages followed by cued recall testing—a task that we believe represents more complex and educationally valid learning compared to recognition of a correct response from a given set of options (i.e., multiple choice). Our first experiment was undertaken with several hypotheses in mind. First, feedback that was supplemented with pictures would yield more accurate and lengthier constructed responses to recall questions on a subsequent test than feedback that did not have accompanying images. Second, this relatively superior performance would be evident for questions about text information semantically related to the accompanying images but not for ones based on story material that was unrelated to the picture. Third, based on the results of extensive prior research showing the benefits of adjunct pictures on learning (Carney & Levin, 2002), we predicted superior recall performance by those who studied an illustrated story compared to those who viewed text without relevant pictures. Finally, we speculated that higher confidence reported by learners on the correctness of their responses would correspond with increased time examining feedback, particularly when the response was incorrect (Kulhavy, Stock, Hancock, Swindell, & Hammrich, 1990).

2. Experiment 1 2.1. Method 2.1.1. Design and participants The study involved four experimental groups that varied in how learning material was presented—text with pictures (P) versus text alone (T)—as well as the nature of the feedback provided: text with pictures (P) versus text alone (T). Accordingly, the four groups were designated at PP, PT, TP, and TT. We also explored this effect of this between-subjects variable across test occasion (i.e., story recall immediately after being read compared to performance following feedback) as well as the relative success in recalling story details related to the pictures shown versus material unrelated to these images. Hence, the study was a 4 Group (PP vs. PT vs. TP vs. TT)  2 Test Occasion (Test 1 vs. Test 2)  2 Type Recall (text related vs. unrelated to the adjunct picture) factorial design with Group varied between-subjects while Type Recall and Test Occasion served as repeated measures. Sixty-three undergraduate education majors at a major university in the southeastern U.S. volunteered for the study, receiving extra course credit for their participation. As participants arrived for a research session, they were randomly assigned to computers containing one of four experimental treatments: an illustrated story followed by post-assessment feedback consisting of the pictures and text studied earlier (PP); an illustrated story followed by feedback consisting of text alone (PT); a text-only story followed by feedback consisting of pictures and text (TP); and a text-only story followed just textual feedback (TT). This random assignment resulted in the following participation in each experimental group: PP = 17; PT = 15; TP = 14; and TT = 17. 2.2. Materials Text. A 631-word fictitious story titled ‘‘The Roman Town of Albano’’ was used in the study, which had a Flesch Reading Ease of 62.3 and a Flesch–Kincaid Grade level of 8.5. The story consisted of two introductory sentences followed by 12 paragraphs, each containing three sentences. In every case, the first sentence provided a rich description of a prominent structure or landmark in the town (hereafter referred to as a feature) depicted by a picture that accompanied the text of some participants. The second sentence provided information semantically related to the feature discussed in the first sentence. Finally, a third sentence presented information unrelated to the feature and served as a transition between paragraphs as well as a source of information that was semantically unrelated to the feature. An example of a typical paragraph (feature shown in italics) follows: The cemetery had chipped, purple rocks sitting in short, square rows. The cemetery was where the trade unions held their initiations, and each new member received a small tattoo of the union symbol on the palm of the hand. Because there were no waterways, the townspeople traveled about the country in four-wheeled horse chariots. Twenty-four constructed-response questions were created based on the second and third sentences of each paragraph. In this manner, half of the questions were semantically related to the adjunct pictures accompanying the text while the rest were unrelated to these features. For the aforementioned sample paragraph, for instance, we derived the following two questions: Feature: What did each new member of the trade union receive upon initiation? (related to the picture of the cemetery where the initiations took place)

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

Non-feature: What was the means in Albano for traveling travel about the country? Pictures. A professional graphic artist created simple black-andwhite drawings depicting the scenes described by the first sentence of each paragraph. For each picture, care was taken to ensure the picture did not inadvertently convey information that would cue answers to questions in a literal way (e.g., a picture of a cemetery that included either a gathering of trade union members or a chariot being driven in the background). Using the previous example of a paragraph from the story, the typical way in which a story paragraph was integrated with an adjunct picture (treatments PP and PT only) is illustrated by Fig. 1. Programs. Using the text, accompanying pictures, and constructed-response questions, we created four computer programs using Macromedia Authorware 7.0 to form the four treatments. Each program consisted of four screens that provided an overview of the task to be performed and directions for its completion. This was followed by the experimental text presented in 14 screens: one introduced the story, another served as a closing, and 12 screens appeared in between with each screen showing a different paragraph from the story. These screens, from which the criterion questions were derived, were presented in a random order for each participant in an attempt to control for order effects. The four types of programs were loaded, in equal numbers, on 16 computers in a computer lab with ample space between computers to prevent participants’ casual viewing of alternate treatments. 2.3. Procedures Participants pressed the Tab key to start the program that first presented a brief explanation of the task they would be performing. They then read the story’s introduction followed by the 12 criterion paragraphs presented in random order. Participants clicked a button marked ‘‘continue’’ to progress through the screens at their own pace. After reading, participants completed three 2-column addition problems on their computers that were designed to clear shortterm memory. Participants then read that they would be responding to 24 short-answer questions based on the text just completed.

Fig. 1. An example of one of the twelve three-sentence paragraphs and its accompanying picture used in Experiments 1 and 2.

527

The onscreen instructions explained that for each answer they would rate the certainty of their response by using their mouse to drag the slider bar of an onscreen slide illustrated in Fig. 2. Participants then saw an example of the slider with the words ‘‘How sure are you that your answer was correct?’’ directly above it. The slider was, in effect, a semantic differential scale with the labels not at all certain and absolutely certain placed at the left and right ends of the slider, respectively. Participants were told to practice using the slider by clicking in the slider bar and, while holding down the mouse button, dragging the bar from left to right. During the actual rating activity, the computer recorded the position of the slider bar the instant the mouse button was released, yielding a rating between 0 (not at all certain) and 1 (absolutely certain) to the eleventh decimal place. Hence, the ratings obtained through this onscreen semantic differential scale were continuous values at an interval level of measurement. This approach allowed statistical analyses of the data not possible with conventional scales that incorporate discrete points of measurement (Kealy, Bakriwala, & Sheridan, 2003). Once participants completed their practice with the onscreen slide, they clicked the button labeled ‘‘Click to Continue’’ at the bottom of the screen. A new screen then appeared informing them that, after completing each certitude rating, they would view the sentence from the story containing the correct answer. With this feedback, they would be able to determine the accuracy of their response for themselves. Upon clicking the button labeled ‘‘Click to begin test,’’ the 24 constructed-response questions appeared one at a time and in random order. Immediately following the response to each question, a slider appeared on the screen for participants to rate their response certitude for that item. As soon as the rating was made, a screen appeared (see Fig. 3) that showed the sentence from the text containing the target information plus, in the PP and TP treatments, the image accompanying the associated paragraph. A button at the bottom of the screen with the words ‘‘Click for the next question’’ prompted participants to proceed to the next

Fig. 3. An example of text presented to participants for rating the accuracy certitude of a response.

Fig. 2. An example slider bar control used by participants to rate their response accuracy certitude.

528

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

question when finished examining the feedback. At the moment the button was clicked, the computer presented the next question and simultaneously recorded the amount of time (in hundredths of a second) spent viewing the feedback. After completing the test questions, participants read a 276word passage, presented over four computer screens, titled ‘‘The Lakes and Fishes of Albano.’’ This passage was included between the first and second 24-item tests to clear short-term memory and reveal the more durable effects of feedback on recall. Once participants finished the reading, they again completed the constructed-response text, but without any feedback or self assessment. Participants typically completed the experimental session in roughly 30 min. 3. Results and discussion Constructed responses of participants were evaluated using a ‘‘gist protocol’’ scheme that varied in the number of points awarded based on a response’s degree of correctness (Schreiber, Verdi, Patock-Peckham, Johnson, & Kealy, 2002). A response that captured the gist of the correct answer received one point whereas a correct response that was more elaborate in nature received two points. For example, a response of ‘‘tattoo’’ to the question ‘‘What did each new member of the trade union receive upon initiation?’’ would earn one point while a response ‘‘a tattoo of the union symbol’’ or ‘‘a tattoo on the palm’’ would be awarded two points. Two researchers independently scored an identical small sample of the protocols, compared their scores, resolved any differences, and repeated the procedure with another sample until agreement exceeded 90%. A Cronbach’s alpha of .83 was calculated for recall performance data by all participants on the 24 questions indicating the criterion measure reflected a high level of internal consistency reliability. An alpha level of .05 was used for all tests of significance. 3.1. Recall performance We first calculated the descriptive statistics, displayed in Table 1, on data from constructed-response questions. During test 1, recall for feature-related material was greatest for those in the

Table 1 Mean percentagea feature and nonfeature-related recall of prose either with or without pictures both preceding and following feedback consisting of either text alone or text with pictures. Type recall before feedback

Type recall after feedback

Feature

Feature

Nonfeature

Treatment group PP (n = 17) M .29 .41 SD .18 .19

.53 .23

.65 .22

Treatment group PT (n = 15) M .38 .45 SD .19 .17

.63 .20

.73 .14

Treatment group TP (n = 14) M .26 .46 SD .12 .14

.59 .22

.69 .13

Treatment group TT (n = 17) M .31 .47 SD .23 .21

.59 .22

.70 .15

Nonfeature

Note: PP = Study of pictorial text followed by pictorial text feedback; PT = study of pictorial text followed by text-only feedback; TP = study of unenhanced text followed by pictorial text feedback; TT = study of unenhanced text followed by textonly feedback. a Percentages reflect proportion of 24 possible points for correct responses to 12 questions on feature-related prose or the same number of questions dealing with nonfeature story material.

PT group while recall performance on nonfeature material was roughly the same in all four treatment groups. Following feedback on their constructed responses, the PT group again outperformed the other groups on test 2 in feature-related recall as well as in recall for nonfeature story content. Comparison of performance between the two tests shows the expected increases recall performance for text material either related or unrelated to the key feature mentioned in a paragraph. The largest gains in recall performance were by those in the TP group who showed a 227% improvement in feature-related recall. Increases in recall for nonfeature story material were roughly in the 150–160% range for all four treatment groups. Participants in all four treatment groups clearly performed better on nonfeature-related recall than feature recall on both test 1 and test 2. The mean percentage recall across all groups for feature-related story content was .31 on test 1 and .59 on test 2. By contrast, the average percentage recall for nonfeature material across all treatment groups during test 1 and test 2 was .45 and .69, respectively. Our analysis examined performance of the four groups on recall prior to feedback, first conducting a one-way ANOVA to determine if there were differences between those viewing the text with picture (condition PT versus PP) as well as those who viewed the text alone (condition TP versus TT). Results showed no significant differences between either the two groups viewing texts with pictures, F(1, 30) = 2.01, p = .17, or the two text-only groups, F(1, 29) = .47, p = .50. Consequently, we pooled the data of those with identical study conditions, subjecting the data to a 2 Presentation  2 Type Recall ANOVA. This reported a significant main effect, F(1, 61) = 57.75, p < .01, d = 1.0, for the type of recall exhibited by participants as well as a significant, F(1, 61) = 4.71, p = .03, d = .57, Presentation  Type Recall interaction. The latter, illustrated in Fig. 4, suggests overall superior recall for text unrelated to features with slightly improved recall of feature-related material among those who studied texts with pictures. To explore this ordinal interaction further, we conducted a simple effects analysis of Type Recall within each level of Presentation. The analysis showed that differences in the type of story information recalled were significant when the experimental text was illustrated, F (1, 61) = 14.98, p < .01, d = 1.0, as well as when it was unaccompanied by pictures, F (1, 61) = 46.98, p < .01, d = 1.0. For performance on recall following feedback, data were analyzed as a full factorial design involving the Presentation and Feedback between-subjects variables. This 2 Presentation  2 Feedback  2 Type Recall ANOVA reported only a significant, F(1, 59) = 45.71, p < .01, d = 1.0, main effect for differences in the

Fig. 4. Presentation  Type Recall interaction for recall accuracy on the initial test in Experiment 1.

529

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

type of story material recalled where, again, participants demonstrated better recall for information contained in the third sentence of each paragraph (i.e., content not associated with a picture) compared to material presented in the second sentence (text semantically related to a picture). Even so, an analysis of gain scores—that is, the degree of increased recall as a result of feedback—showed no significant, F(1, 59) = 1.10, p = .30, d = .18, difference in improvement between story material either related or unrelated to the key feature of a paragraph. 3.2. Response productivity Besides their correctness of participants’ recall we were also interested in the length of their constructed responses. In addition to being a reliable predictor of writing quality (Sadoski, Kealy, Goetz, & Paivio, 1997), the degree of writing productivity in one’s answer is an important consideration in extended forms of constructed response such as those required by essay questions. To examine response productivity, we calculated the mean total number of words generated individually by the four treatment groups during each test occasion for recall either related or unrelated to the pictures. The compiled data, depicted in Table 2, reveals that participants in each of the groups generated lengthier responses to feature-related questions than nonfeature-related items on both test occasions. In terms of feature-related questions, the TP participants produced the longest written responses during both tests while the PP group generated the shortest answers. The increase in response production on these items between test 1 and test 2 was 111% for the PP group and roughly 119% for the remaining groups. By contrast, response productivity on nonfeature questions remained relatively unchanged between the two tests. Of interest, the greatest degree of response productivity on both tests for nonfeature questions was achieved by participants in the TP and TT groups whose texts were not accompanied by adjunct pictures. As with recall accuracy, the analysis of response productivity data during the first test showed no differences between groups viewing the same presentation (i.e., picture present or absent). Differences in response productivity suggested by Table 2 were confirmed in a 4 Group  2 Test Occasion  2 Type Recall ANOVA that showed a significant main effect for Type Recall,

Table 2 Feature and nonfeature-related mean recall productiona for text studied with or without pictures tested before and after feedback consisting of text alone or text plus pictures. Type recall before feedback

Type recall after feedback

Feature

Feature

Nonfeature

3.64 1.26

2.92 1.06

Treatment group PP (n = 17) M 3.29 2.67 SD 1.26 1.10

3.3. Feedback study and assessment certitude A key assumption in this study was that participants’ certainty of self-assessment functions similar to that of response certitude: high ratings of certainty would conceivably facilitate more careful study of feedback provided yielding improved performance in subsequent testing. Hence, we were interested in validating this hypothesis by collecting assessment certitude ratings (Kealy & Ritzhaupt, 2010) for each constructed response and examining the degree of correlation between these measures and the correctness of responses as well as the amount of time spent studying the feedback provided after a response. Since our earlier analysis showed significant differences in correctness performance due to the type of recall involved, we first examined whether this distinction was also reflected (see Table 3) in participants’ ratings of certainty on their assessments. On a scale of 0 (not at all certain) to 1 (absolutely certain), mean assessment certitude ratings (with SD in parentheses) across all treatments were .50 (.38) for feature-related recall and .58 (.37) for nonfeature recall. Rating data were entered in a 4 Group (PP vs. PT vs. TP vs. TT)  2 Type Recall repeated measures ANOVA on assessment

TYPE FEEDBACK Text and Text Pictures Alone

Treatment group PT (n = 15) M 3.49 2.86 SD 0.92 0.70

4.13 1.04

2.95 0.82

Treatment group TP (n = 14) M 3.63 3.01 SD 0.84 0.72

4.36 0.88

3.07 0.69

Treatment group TT (n = 17) M 3.36 3.15 SD 1.26 0.81

4.02 1.11

3.14 0.78

Note: PP = Study of pictorial text followed by pictorial text feedback; PT = study of pictorial text followed by text-only feedback; TP = study of unenhanced text followed by pictorial text feedback; TT = study of unenhanced text followed by textonly feedback. a Production refers mean number of words per response generated by participants for feature and nonfeature-related cued recall items on the two testing occasions.

PRESENTATION

Mean Length of

Nonfeature

F(1, 59) = 83.05, p < .01, d = 1.0. The analysis also revealed a significant main effect for Test Occasion, F(1, 59) = 21.38, p < .01, d = .99, and a significant Type Recall  Test Occasion, F(1, 59) = 18.14, p < .01, d = .99, interaction in which response productivity across the two tests remained essentially the same for non-feature items but improved markedly following feedback for feature-related questions. Although increases in response productivity were noticeably smaller for the PP group compared to other participants, a one-way ANOVA on gains in word production for feature-related items indicated the differences were not significant, F(3, 59) = .71, p = .55. Following our strategy for analyzing recall performance, we examined just the response productivity during the second test through a 2 Presentation  2 Feedback  2 Type Recall repeated measures ANOVA. This revealed a significant, F(1, 59) = 110.22, p < .01, d = 1.0, main effect for Type Recall and a significant, F(1, 59) = 4.92, p = .03, d = .59, Presentation  Feedback  Type Recall interaction. Fig. 5 suggests differences due to the presence or absence of a picture—either while the story was read or during the presentation of feedback—were evident only for feature-related recall. However, an analysis of simple effects at each level of the Recall Type variable indicated no significant differences between groups for either response productivity for test items related to features, F(1, 59) = 2.22, p = .14, or ones unrelated to features, F(1, 59) = .01, p = .93.

Feature

Type of Text Recall

Text alone Text and Pictures

Non-feature

Fig. 5. Presentation  Feedback  Type Recall interaction for response productivity in Experiment 1.

530

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

Table 3 Mean ratings on certainty for self-assessment of responses to feature and nonfeaturerelated questions on a text studied with or without pictures and the mean duration of study of feedback with or without pictures. Condition

Type recall (SD in parentheses) Measure

Feature

Nonfeature

PP (n = 17)

Assessment certitudea Feedback study (s) Assessment certitude Feedback study (s) Assessment certitude Feedback study (s) Assessment certitude Feedback study (s)

.53 4.13 .53 4.48 .45 5.81 .50 3.71

.59 3.55 .58 3.50 .57 4.44 .58 3.25

PT (n = 15) TP (n = 14) TT (n = 17)

(0.37) (2.52) (0.38) (3.11) (0.40) (4.25) (0.38) (1.89)

(0.36) (1.87) (0.37) (2.72) (0.39) (3.23) (0.36) (1.36)

Note: PP = Picture-supplemented text studied with imaginally-enhanced feedback; PT = picture-supplemented text studied with verbal feedback alone; TP = text studied with imaginally-enhanced feedback; TT = text studied with verbal feedback alone. a Reflects ratings on a scale of 0 (not at all certain) to 1 (absolutely certain).

certitude ratings that showed significantly, F(1, 59) = 18.26, p < .01, d = .99, higher assessment certitude for responses to nonfeaturerelated questions. No other significant main effects or interactions were evident in the analysis. Table 3 also reports the mean period of time (in seconds) that participants studied the feedback on a response to a test 1 question before proceeding to the next item. Within every group, feedback to feature-related questions was examined longer than that for nonfeature questions. In particular, participants receiving text feedback with a picture not present during their initial reading of the story (i.e., the TP group), studied with the feedback longer than those in the other treatment groups. This was the case for feedback on both feature and nonfeature questions. Analysis of the data through a 4 Group (PP vs. PT vs. TP vs. TT)  2 Type Recall repeated measures ANOVA revealed significant main effects for Group, F(3, 59) = 3.00, p = .04, d = .68, as well as for Type Recall, F(1, 59) = 42.51, p < .01, d = 1.00. While not significant, F(3, 59) = 2.38, p = .08, d = .57, the Group  Type Recall interaction showed a medium effect size. Hence, separate post hoc analyses of feedback study time were conducted for feature- and nonfeature-related data. Two Bonferroni LSD tests (with a significance level of .05) revealed significantly longer feedback study by the TP group compared to participants in the TT group, but only when the associated questions were feature-related. To explore the relationship between assessment certitude, feedback study, and improved recall following feedback, Pearson correlations were calculated for each between-subjects group using gain scores and with the feature—nonfeature distinction included for each variable. Data for each group revealed significant positive correlations between the feature and nonfeature levels for assessment certitude with each treatment group. Similarly, each group showed a significant relationship between feedback study for feature and nonfeature questions and suggests high certitude ratings and study times were reliable measures between subjects. Both the PP and PT groups, in which participants first read text accompanied by images, failed to exhibit any relationship between improved recall and either assessment certitude or feedback study time. Participants in the latter group displayed a significant negative correlation (r = .59, p = .02) between certitude ratings and feedback study on nonfeature-related items indicating lengthier study for items for which participants had low certainty of correctness. By contrast, those in the TP group, who initially read text without pictures but viewed them during feedback, exhibited a significant (r = .56, p = .04) positive correlation between feedback study time and increased recall for feature-related story material. A similarly significant correlation between feedback study and

gains on feature-related recall—as well as nonfeature recall—was evident for the TT group. Since they studied feedback significantly less than the TP group, study of feature-related feedback may have had a global effect on recall by TT participants. Data from Experiment 1 provided limited evidence that including pictures in feedback improves accuracy of recall for information in an accompanying text and promotes greater response productivity when the preceding text lacks adjunct pictures. One important shortcoming of the study was that participants were not able to navigate back and forth through the computer text as one would normally do in a book. Moreover, the text used was both very brief and of limited instructional relevance and value. Perhaps the greatest limitation to the current study was that, for each three-sentence paragraph read, the last sentence presented ‘‘nonfeature’’ content whereas the sentence preceding it was always semantically related to—when available—an accompanying picture. Consequently, the better recall for nonfeature material could be explained as simply the result of a recency effect; content appearing at the end of a paragraph was more memorable than information located in the middle of the text. Our second experiment was designed to address these limitations. 4. Experiment 2 In our second experiment, we enabled the participants to navigate forward and backward in the text (as one could do with a book), and we included a more relevant text based on non-fictitious content (i.e., Australia). Further, we incorporate color representational adjunct pictures in the acquisition phase and during the feedback phase. We believed that the participants would be more motivated to learn the contents based on these necessary changes. Unlike Experiment 1, the current study included a multiple-choice test as a dependent measure; conceivably learners might perform better on a recognition task as oppose to a recall task given the greater depth of processing required by the latter. Nevertheless, the availability of the adjunct picture might prove especially useful in this situation (Ritzhaupt, Barron, & Kealy, 2011). Unlike the first experiment, we changed our hypotheses as well. When considering the various options for providing pictorial feedback, we speculated that, given both verbal and pictorial feedback, learners might be more motivated to study the former and ignore the latter. Hence, we studied a fourth condition (TPQP) where learners read a text containing adjunct pictures with each picture again appearing as a semantically-relevant question is presented. Conceivably, the picture would aid learners by cueing a response to the related verbal information during the cued-recall task, and consequently result in a higher score during the acquisition phase of learning. Further, we hypothesized those individuals that accessed the adjunct pictures more frequently and for longer durations would have superior performance than those that did not. 4.1. Method 4.1.1. Design and participants The study incorporated a 4 Group  2 Trial factorial design with Group as a between-subjects variable and Trials as a repeated measure. The four groups included the following treatments: TNFP = a study passage consisting of text alone with subsequent feedback incorporating both text and pictures; TPFP = an illustrated study passage with feedback consisting of text and pictures; TPFN = an illustrated study passage with feedback consisting of only text; and a final condition, TPQP, incorporating an illustrated study passage with pictures presented during cued recall followed by feedback consisting of both pictures and text. Additionally, we

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

created two text orders to control of an ordering effect of the text conditions. We intentionally did not include a condition without feedback as we have years of evidence to support this condition does not support learning (Morey, 2004). Participants consisted of 69 undergraduate students (59 female, 10 male) enrolled in a southeastern U.S. university. The participants were primarily education majors enrolled in an introductory educational technology course. During the study participants were randomly assigned to the four experimental conditions resulting in the following distribution: TNFP = 20, TPFP = 16, TPFN = 18, TPQP = 15. 4.2. Materials Text. For the study, we used a 1672-word story (Flesch-Kincaid Grade Level = 14.3) about the geography, history, vegetation, and wildlife of Australia, a topic selected for its likely unfamiliarity among participants, titled Discovering Australia. The story consisted of 22 three-sentence paragraphs of nearly the same length (M = 75.1 words, SD = 4.9) with two paragraphs placed side-by-side for each of the 11 computer screens used to present the material to participants. The first two paragraphs of the story served as an introduction while the remaining ones contained the tested information, which was always located in the second sentence of a paragraph. The experimental text has been used in several studies of multimedia learning (Ritzhaupt and Barron, 2008; Ritzhaupt et al., 2011). An example of a typical paragraph follows: Feral camels that roam Australia’s vast wilderness, gorging on Acacia trees and a juicy plant nicknamed ‘‘pig face,’’ are able to squeeze every little bit of moisture out of their food. These animals are the preferred beasts of burden for tribesman and hunters, primarily because they do not require large amounts of water as they travel. Despite its large size, the camel has effectively adapted to the country’s arid environmental conditions by having no need to sweat. An associated picture was selected to represent each paragraph of the text as noted below. A picture associated with the abovementioned paragraph can be seen in Fig. 6. Notice that the picture illustrates a feral camel in its natural habitat, surrounded by Acadia trees. Pictures. Using the Internet and a systematic validation process, a representational color picture was found for each of the target paragraphs and reviewed for their appropriateness to represent the text. Three researchers independently examined and judged each picture selected for its semantic congruence with the

Fig. 6. Representational adjunct picture to illustrate associated verbal information.

531

associated text. Additionally, eight expert reviewers were solicited to review the semantic relationship between the text and associated image following a systematic procedure (Ritzhaupt & Barron, 2008). Levin (1981) suggests pictures can serve as decorational, representational, organizational, and transformational. While decorational pictures serve no purpose and can actually hinder learning, representational images mirror part or all of some related text, and have been found to have moderate effects on learning (Carney & Levin, 2002; Levin, 1981). The images used in this study were purposefully representational in nature as they are intended to provide context relating to the passages in the Discovering Australia text. Measures. Twenty constructed-response questions were created from the Discovering Australia text to represent cued-recall. In all cases the tested information was derived from the second sentence of each paragraph in the story No recall items were developed for the introductory paragraphs. In this manner we controlled for the position of target material within the text, which may have inadvertently influenced recall by participants in Experiment 1. An example of the picture-related cued-recall item pertaining to Fig. 6 is ‘‘What do Australia’s feral camels eat?’’. This item prompts for the recall of picture-related information and can be activated referentially from verbal or the pictorial information. The constructed response items were scored dichotomously (i.e., either right or wrong) by hand. Additionally, 20 multiple-choice items were created based on the narrative, serving as content recognition. The multiple-choice questions were developed in a consistent format following established guidelines (Gronlund, 1998). Each stem posed one question for learners to consider, and the distracters were written as likely true/false statements with only one correct statement. Each item stem included four distractors with one correct answer. An example items is shown below and related to Fig. 6: The feral camels of Australia eat what type of vegetation? a. b. c. d.

Acacia leaves Uluru grasses Eucalyptus trees Spinifex grasses

Note that all distractors were realistic in that they were derived from the experimental text, but only one answer was correct. The data were scored dichotomously using a computer program. Computer programs. Using the Authorware 7.0 application, the text and images were integrated into four programs that provided participants with an overview of the study, instructions on the program’s use, and the experimental treatment. The story was arranged so that each screen presented two paragraphs side-byside for a total of 11 screens. Each screen in the story contained forward and back arrows allowing participants to navigate through the story at their own pace. In the upper-right corner of each story screen appeared a counter that counted back from 600 s to let participants know the allotted time remaining for completing the reading. In the three experimental treatments that provided pictures during the reading, there appeared two buttons centered below the text, one labeled ‘‘show left’’ and the other ‘‘show right.’’ For these programs, pressing a button replaced the paragraph on that side of the screen with a representational picture corresponding to the paragraph on the opposite side. In this manner, participants could access and view a picture and its related text simultaneously. As this was done, the computer recorded the number of time a given picture was accessed as well as the duration of viewing time. Finally, immediately below the picture-accessing buttons appeared the current screen number (out of 11 screens). Fig. 7 shows the

532

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

Fig. 7. Sample computer program screen illustrating the picture conditions during acquisition.

layout of a typical screen with its photograph on the right side being accessed. 4.3. Procedures As participants arrived for the study they were randomly assigned to a computer workstation containing one of the four treatments. Following an overview of the study by the experimenter, participants had an opportunity to ask any questions about the task and to leave the study if so desired. Participants, who remained anonymous, then started the program, which asked for their school major and gender. Following computer-based instructions on how to navigate through the story, they began reading the text on Australia at the same time.

When ten minutes elapsed, participants saw three completed 2-column addition problems. For each problem participants were instructed to type ‘‘Y’’ or ‘‘N’’ to indicate whether or not the sum was correct. For each response made, the correct answer was indicated. The purpose of this task was to clear short-term memory buffer so that the subsequent constructed response questions would be answered through retrieval from long-term storage. Participants then received computer instruction on the recall task in which they would type a short answer for each of the 20 questions on the story presented one screen at a time and in random order. Following each response by those in the TNFP and TPFP groups the screen presented the correct short-answer as well as the corresponding picture located between the question and the participants’ response (see Fig. 8).

Fig. 8. Sample computer program screen illustrating the pictorial and verbal feedback.

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

At the bottom of the screen appeared a button labeled ‘‘Click to get the next question’’ that, when clicked presented the next question. For those in the TPQP group, the appearance of each question was accompanied by its semantically-relevant picture, which remained on the screen during response and feedback. Participants in the TPFN group, meanwhile, saw the correct answer to a question once it was entered but viewed no accompanying picture. Once the constructed-response testing was over, participants read a 543-word story about New Zealand (Flesch-Kincaid Grade Level = 13.4) that was presented over four computer screens. In this instance participants had 200 s (3.33 min) to complete the reading. After the reading participants again answered the 20 constructedresponse questions about Australia followed by 20 multiple-choice questions on the text. Finally the program asked participants to describe any trick or mental strategy used to help them remember what they read. The entire experimental session was typically completed in about a half hour (M = 27.9 min, SD = 2.9). 4.4. Results and discussion To control for the order in which story information was presented, we created two versions of the computer program for each of the four experimental treatments that rearranged the text screens. This resulted in 35 participants being assigned to one text version and 34 participants to the alternative version. Posttest performance data among the two groups were entered in a one-way ANOVA to determine the effect of text order. The results showed no significant difference between the two text versions, F(1, 69) = .359, p = .551. Consequently, we collapsed the data from the two text versions when analyzing difference between the groups. 4.5. Recall Performance

.05 was used for tests of significance. Results showed no performance differences between groups, F(3, 65) =. 244, p = .865, g2 = .011 but, as reflected by the means shown in Table 4, revealed a significant main effect for Trials, F(1, 65) = 739.9, p < .001, g2 = .919. The Group  Trial interaction, F(1, 65) = 2.153, p = .102, g2 = .09, while not statistically significant, showed an effect size that accounted for roughly ten percent of the variability in performance. 4.6. Recognition Performance Following the second constructed-response test participants completed a multiple-choice test on what they read. The recognition items were purposefully arrange after the recall task to avoid a testing effect. Mean percent correct recognition among the four groups TNFP, TPFP, TPFN, and TPQP was .94, .90, .90, and .91. A one-way ANOVA on the data, did not reveal any significant difference between the experimental groups, F(3, 65) = .044, p = .99, g2 = .002. 4.7. Duration of Picture Access Analysis shows a high correlation (r = .86) between the duration of picture study and performance on the constructed response posttest. Similarly, a strong but somewhat smaller correlation (r = .79) was apparent between the amount of time participants examined pictures and their performance on the multiple-choice test. This suggests that the use of the representational pictures during the acquisition phase supported their learning the materials. 5. General discussion

We first calculated the descriptive statistics, displayed in Table 1, on data from constructed-response questions. Recall performance prior to feedback varied from 33% accuracy to 38% accuracy across the four treatment conditions. After feedback, recall performance ranged from 72% accuracy to 80% accuracy. The TNFP treatment condition resulted in the highest recall performance after feedback while the TPQP condition resulted in the lowest at 72%. However, the largest change was attributed to the TPFN condition, gaining approximately 133% after receiving feedback. As can be clearly gleaned from the results, all treatment conditions improved substantially after the feedback treatment. Recall performance data for both testing occasions were entered in a 4 Group  2 Trial repeated measures ANOVA. In this case, and for all inferential statistical analyses in the study, an alpha level of Table 4 Mean percentage recall of prose preceding and following feedback consisting by treatment condition. Treatment group

TNFP (n = 20) TPFN (n = 18) TPFP (n = 16) TPQP (n = 15)

533

Recall occasion

M SD M SD M SD M SD

Prior to feedback

After feedback

0.38 0.14 0.33 0.19 0.35 0.16 0.38 0.18

0.80 0.16 0.77 0.15 0.78 0.16 0.72 0.20

Note: TNFP = verbal information without pictures during instruction, and verbal information and feedback with pictures; TPFN = verbal information with pictures, and only verbal information with feedback; TPFP = verbal information with pictures during instruction, and verbal information and pictures with feedback; TPQP = verbal information with pictures during instruction, and a picture during cued recall response, and pictures and verbal information with feedback.

Results of this study leave unresolved the question of whether adding pictures to verbal feedback has a positive influence on learning. Conceivably, the pictures used in the two studies were too close a semantic match to the accompanying text. In such instances the images might not be considered useful by learners for interpreting what is read onscreen. Supporting this perspective is the fact that there was little difference between groups on performance prior to feedback regardless of whether of not an image was present in the story during experiment two. The group that read a text without pictures, in fact, scored higher on this initial testing that the three groups that viewed an illustrated story. Another limitation of this research is whether the learners actually attended to the pictorial feedback during the feedback stage of learning. We have strong evidence that the learners did use the pictures during the acquisition phase of learning as shown by the strong and significant correlations between the duration of viewing and the post scores on the on dependent measure in experiment two. We strongly believe that a follow up study using eye-tracking technology would overcome this limitation and provide evidence of whether the learners are actually attending to the pictures during the feedback cycle. Another alternative is building interactivity into the feedback treatment, in which the learners have to click on the picture or some other form of human–computer interaction that supports the use of the picture during the feedback cycle of learning. Another consideration in this research that might be viewed as a limitation relates to the modality of the treatment. We have ample evidence that when presenting a learner with a multimedia message, a modality effect occurs (Mayer, 2001; Mayer, 2003) and has a positive influence on learning. That is, the message should present the pictures on a visual channel while the verbal message should be on an auditory channel. A future study might change the

534

A.D. Ritzhaupt, W.A. Kealy / Computers in Human Behavior 48 (2015) 525–534

treatment conditions to acknowledge the modality principle from multimedia learning research. Most of all, a follow-on study of feedback incorporating images is needed that employs pictures which, while semantically related to the accompanying text, do not provide merely redundant information. The use of different types of pictures (e.g., organizational or transformational) might also provide more of durable effect than representational pictures alone. Ideally such images should compel learners to explore the corresponding text to tease out information for acquiring a context for interpreting the image. By the same token, a useful image should be one that learners need to consult to make sense of and mentally elaborate the surrounding text. A good example of this is Fleming’s (1987) recommendation to add questions to the caption of an image that motivate readers to mindfully examine the picture for answers. It is this type of interplay between image and text that offers the likeliest potential for enhancing feedback through pictures. The nature of feedback needs to escape the limitations of a testonly modality and, as Morey (2004) encourages exploit the capabilities available to multimedia technologies. With the numerous advances in information and communication technology (e.g., Learning Management Systems) over the past few decades, we are now in a position where the use of pictures in computer-based learning environments is cost-effective and relatively easy to implement. Yet, the use of pictures in providing feedback to learners has been largely ignored in practice and in research. Across our two experiments, we were unable to pinpoint a condition in which the use of pictures in feedback was superior to written instruction alone. Though our present studies did not show evidence of representational pictures during a feedback cycle having a positive influence on dependent measures of learning, we strongly believe that this research question demands further inquiry. As noted, changing the type and nature of the pictures themselves as well as using different technologies to determine whether the learners are actually using the pictures is necessary. Future researchers can expand on our understanding of feedback beyond written instruction. We welcome the future dialog on this important and relevant topic. References Abel, R. R., & Kulhavy, R. W. (1989). Associating map features and related prose in memory. Contemporary Educational Psychology, 14, 33–48. Bangert-Drowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. T. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61(2), 218–238. Carney, R. N., & Levin, J. R. (2002). Pictorial illustrations still improve students’ learning from text. Educational Psychology Review, 14(1), 5–26.

Fleming, M. L. (1987). Displays and communication. In R. M. Gagne (Ed.), Instructional technology: Foundations (pp. 233–260). Hillsdale, NJ: Lawrence Erlbaum. Griffin, M. M., & Robinson, D. H. (2000). Role of mimeticism and spatiality in textual recall. Contemporary Educational Psychology, 25, 125–149. Gronlund, N. E. (1998). Assessment of student achievement. Needham Heights, MA: Allyn and Bacon. Kealy, W. A., Bakriwala, D. J., & Sheridan, P. B. (2003). When tactics collide: Counter effects between an adjunct map and prequestions. Educational Technology Research and Development, 51(2), 17–39. Kealy, W. A., & Ritzhaupt, A. D. (2010). Assessment certitude as a feedback strategy for learners’ constructed responses. Journal of Educational Computing Research, 43(1), 25–45. Kulhavy, R. W., Lee, J. B., & Caterino, L. C. (1985). Conjoint retention of maps and related discourse. Contemporary Educational Psychology, 10(28), 37. Kulhavy, R. W., & Stock, W. A. (1989). Feedback in written instruction: The place of response certitude. Educational Psychology Review, 1(4), 279–308. Kulhavy, R. W., & Stock, W. A. (1996). How cognitive maps are learned and remembered. Annals of the Association of American Geographers, 86(1), 123–145. Kulhavy, R. W., Stock, W. A., Hancock, T. E., Swindell, L. K., & Hammrich, P. L. (1990). Written feedback: Response certitude and durability. Contemporary Educational Psychology, 15, 319–332. Kulhavy, R. W., Stock, W. A., & Kealy, W. A. (1993). How geographical maps increase recall of instructional text. Educational Technology Research and Development, 41(4), 47–62. Levin, J. R., Anglin, G. L., & Carney, R. N. (1987). On empirically validating functions of pictures in prose. In D. M. Willows & H. A. Houghton (Eds.), The psychology of illustration. Basic research (Vol. I, pp. 51–85). New York: Springer-Verlag. Levin, J. R. (1981). On functions of pictures in prose. In F. J. Pirozzolo & M. C. Wittrock (Eds.), Neuropsychological and cognitive processes in reading (pp. 203–228). New York: Academic Press. Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press. Mayer, R. E. (2003). Elements of a science of e-learning. Journal of Educational Computing Research, 29(3), 297–313. Morey, E. H. (2004). Feedback research revisited. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (2nd ed. Mahwah, NJ: Lawrence Erlbaum. Paivio, A. (1986). Mental representations: A dual coding approach. New York: Oxford University Press. Reeves, T. C. (2000). Socially responsible educational technology research. Educational Technology, 40(6), 19–28. Ritzhaupt, A. D., & Barron, A. (2008). Effects of time-compressed narration and representational adjunct images on cued-recall, content recognition, and learner satisfaction. Journal of Educational Computing Research, 39(2), 161–184. Ritzhaupt, A. D., Barron, A. E., & Kealy, W. A. (2011). Conjoint processing of timecompressed narration in multimedia instruction: The effects on recall, but not recognition. Journal of Educational Computing Research, 44(2), 203–217. Sadoski, M., Kealy, W. A., Goetz, E. T., & Paivio, A. (1997). Concreteness and imagery effects in the written composition of definitions. Journal of Educational Psychology, 89, 518–526. Schallert, D. L. (1980). The role of illustrations in reading comprehension. In R. J. Spiro, B. C. Bruce, & W. F. Brewer (Eds.), Theoretical issues in reading comprehension: Perspectives from cognitive psychology, linguistics, artificial intelligence, and education (pp. 503–524). Hillsdale, NJ: Erlbaum. Schreiber, J. B., Verdi, M. P., Patock-Peckham, J., Johnson, J. T., & Kealy, W. A. (2002). Differing map construction and text organization and their effects on retention. The Journal of Experimental Education, 70(2), 114–130. Schwartz, N. H., & Kulhavy, R. W. (1981). Map features and the recall of discourse. Contemporary Educational Psychology, 6, 151–158.