Video Game Experience and Gender as Predictors of Performance and Stress During Supervisory Control of Multiple Unmanned Aerial Vehicles

Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015 746 Video Game Experience and Gender as Predictors of Performance...
Author: Charla Boone
2 downloads 0 Views 904KB Size
Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015

746

Video Game Experience and Gender as Predictors of Performance and Stress During Supervisory Control of Multiple Unmanned Aerial Vehicles Jinchao Lin1, Ryan Wohleber1, Gerald Matthews1, Peter Chiu2, Gloria Calhoun3, Heath Ruff4, & Gregory Funke3 Institute for Simulation & Training, University of Central Florida, Orlando, FL1 University of Cincinnati, Cincinnati, OH2 Air Force Research Laboratory, Wright-Patterson AFB, OH3 Infoscitex, Dayton, OH4 To keep pace with increasing applications of Unmanned Aerial Vehicles (UAVs), recruitment of operators will need to be expanded to include groups not traditionally engaged in UAV pilot training. The present study may inform this process as it investigated the relationship between video game experience and gender on performance of imaging and weapon release tasks in a simulated multi-UAV supervisory control station. Each of 101 participants completed a 60 minute experimental trial. Workload and Level of Automation (LOA) were manipulated. Video gaming expertise correlated with performance on a demanding surveillance task component. Video gamers also placed more trust in the automation in demanding conditions and exhibited higher subjective task engagement and lower distress and worry. Results may encourage recruitment of UAV operators from nontraditional populations. Gamers may have a particular aptitude, and with gaming experience controlled, women show no disadvantage relative to men.

Copyright 2015 Human Factors and Ergonomics Society. DOI 10.1177/1541931215591175

INTRODUCTION The United States Air Force (USAF) has increasing needs for unmanned aerial vehicle (UAV) operators. The majority of USAF UAV pilots trained annually are recruited from officers with little prior flying experience who complete the Undergraduate RPA Training (URT) course. The remainder are pilots of manned aircraft who are recruited and crosstrained to operate UAVs (Paullin, Ingerick, Trippe, & Wasko, 2011). These challenges suggest the USAF should consider expanding its recruiting efforts to include groups not traditionally engaged in UAV pilot training. Civilian organizations may also increasingly face similar recruitment issues as UAVs become more widely employed for nonmilitary purposes. UAV operators require a wide range of cognitive and noncognitive attributes (Carretta, Rose, & Bruskiewicz, in press). While there is some overlap between the qualities required for manned and unmanned personnel, UAV operation also requires some specialized aptitudes. One of these is the ability to manage increasingly automated technology. Usage of automation may enable a single operator to supervise multiple UAVs. Although automation could enhance the UAV operator’s ability to manage task demands and improve mission effectiveness, an appropriate level of trust in the automation must be established and maintained (Lee & See, 2004). On the one hand, UAV automation carries the risk of operator over-reliance on the technology, leading to complacency effects (as shown empirically in recent simulations, e.g., Calhoun, Ruff, Draper, & Wright, 2011). On the other, if an operator doubts the automation’s functioning or reliability, under-reliance may result, limiting the potential benefit. Thus, it is important to determine what individual difference factors may impact both performance and reliance on automation.

Several US military reports have identified resilience under stress as important for operators (Chappelle et al., 2014; Williams et al., 2014). UAV operation is free of some of the major stressors that confront traditional pilots, such as physical danger, but more subtle psychological stressors may be prevalent. Factors such as long duty periods and humanmachine interface difficulties may produce both stress and fatigue (Ouma, Chappelle, & Salinas, 2011). Another feature of UAV operation is that it may involve considerable workload variation. There may be long periods of monotony as operators monitor the tactical environment, interspersed with episodes of high cognitive workload when, for example, a possible target is detected. During monotonous operations, operators must remain vigilant in potentially fatiguing, lowworkload conditions. Conversely, when workload is high, operators must be able to remain calm and allocate their attention among multiple tasks effectively. Panganiban and Matthews (2014) confirmed in a simulated environment that increases in workload produced substantial elevations of subjective distress. Video gamers constitute a population that may have a high level of aptitude for operating automated systems and UAVs. Exposure to video games (especially action games) appears to be positively associated with a range of relevant sensory, perceptual, and attentional abilities (Spence & Feng, 2010). In a UAV simulation study, McKinley, McIntire, and Funke (2011) found that experienced gamers showed visuospatial attention skills that exceeded those of pilots, and matched pilots in aircraft control skills. Cummings, Clare, and Hart (2010) found that experienced gamers collaborated more effectively with automation than non-gamers in a simulated UAV task. The cognitive strengths of gamers may in part be a consequence of an initial selection bias, if higher-aptitude individuals are more likely to take up gaming. However, experimental studies have confirmed that training on action

Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015

games directly improves aspects of spatial and/or attentional functioning (Spence & Feng, 2010). However, gaming enthusiasts are sometimes perceived as socially maladjusted. Chappelle et al. (2014) described an Air Force stereotype that sees non-pilot UAV training candidates “as videogamers whose emotional and social disposition was not as well suited for the rigors of aviation and high risk nature of traditional military flying” (p. 3). To dispel the use of “video gamer” as a pejorative label, it is necessary to test whether gaming experience is associated with greater vulnerability to stress under the twin challenges posed by both low and high workload UAV environments. Another non-traditional population of interest is women. The preponderance of male pilots of manned aircraft in the Air Force may reflect both cultural factors and higher aptitude in men, especially for spatially demanding task components (Carretta, 1997). However, given the somewhat different cognitive demands of manned and unmanned vehicles, gender differences seen in traditional piloting may not generalize to UAV operation. Women may be seen as less resilient than men, although Chappelle, Salinas and MacDonald (2011) found no association between gender and emotional exhaustion in an UAV operator sample. Therefore, examination of gender differences in stress response to the task loads of UAV operation is warranted. It is also important to disentangle gender and gaming experience as men are more likely to self-identify as serious gamers (Terlecki et al., 2011). The aim of the present study was to investigate relationships between video gaming experience, gender, performance, and acute stress. It used the ALOA (Adaptive Levels of Autonomy) multi-unmanned vehicle/automation research test bed developed by OR Concepts Applied (for more details see Calhoun et al., 2011). This simulation incorporates an operationally employed routing software/mission planner to provide needed complexity and realism. The tasks supported by the simulation are designed to represent the cognitive task demands envisioned for a single operator supervising multiple highly autonomous vehicles. The testbed is also unique in that it facilitates experimenter and operator manipulation of automation level and functionality for several types of tasks. We varied workload and level of automation (LOA) to test for associations that generalized across different task configurations. We manipulated demands of several of the tasks simultaneously to configure higher and lower workload missions. Two surveillance tasks were held constant across the two workload conditions. These tasks (Imaging and Weapon Release) provided the primary performance measures. METHOD

747

vision and hearing. None were experienced pilots. Participants reported a mean of 2.70 hours per week (SD = 2.00) experience on games of all types, a mean of 1.96 (SD = 1.62) on first-person shooter (FPS) games, and a mean of 2.18 hours (SD = 1.73) on action games. These means refer to the whole sample. Popular games cited by respondents included Call of Duty® (FPS) and Grand Theft Auto (action). The experiment used a 2 (workload: low vs. high) × 2 (LOA: management-by-consent vs. management-byexception) factorial, between-group design. Twenty-six participants took part in the high workload/management-byexception condition; n = 25 for the other conditions. Overview of Experimental Tasks Most task types (allocate new surveillance tasks to UAVs, detect new threats on the map, respond to systems status changes/audio prompts, compare digit pairs, and retrieve information) required making inputs in response to monitored aural and visual displays. Three other tasks, re-routing the vehicles and two types of surveillance tasks (Imaging and Weapon Release), employed one of two intermediate LOAs. In each case, the automation recommended a route or surveillance task response that was correct 80% of the time. With management-by-consent, participants either accepted the automation’s recommendation or made a change in the selection. In the other LOA (management-by-exception), the automation acted on its recommendation unless the participant made a different selection before the task timed out (e.g., within 30 s). Response time and accuracy were recorded for all tasks. The types of tasks presented in both workload conditions were the same; the frequency of the majority of tasks in the 60-min trials, however, differed. Differences in task frequency were used to induce differing levels of workload. There were approximately 6 tasks per minute versus 14 tasks/min in the low and high workload trials respectively (see Table 1). Figure 1 illustrates the testbed with labels showing the primary windows utilized for each task type. This paper focuses on reporting results pertaining to the two surveillance task types (described next) as a function of workload and LOA, in addition to correlations with other moderating variables. Table 1 Tasks manipulated across low and high workload conditions Task and Frequency in Trial Retrieve information Respond: visual status Respond: audio stream Compare digit pairs Monitor chat noise

Workload Condition Low High 10 80 30 240 32 240 10 80 20 180

Surveillance Tasks Participants and Experimental Design A total of 101 college students (43 men and 58 women; mean age = 18.95 years, SD = 1.80 years) participated for course credit. Participants represent the age group and educational level of the enlisted military service core that may be selected for future UAV operations. All participants reported having vision correctable to 20/20, and normal color

Each task was signaled with the addition of a row in the task window that included a counter showing time remaining. Participants had 30 and 20 seconds, respectively, to complete the Imaging and Weapon Release tasks before the row blanked and the task was recorded as a “miss.” Task completion began with row selection, which opened up a window on the left of the display (see Figure 1).

Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015

748

Figure 1. Multiple vehicle supervisory control ALOA testbed showing windows used for tasks. In the Imaging task, an image appeared with an overlay of 19-26 green symbols varying in shape (diamonds, squares, circles, and triangles). Participants were required to count the number of diamonds. The automation presented eight options below the image, each with a different number, highlighting (80% correctly) its recommended option. Participants were tasked with selecting the option that corresponded to the number of diamonds (1-8). If the participant agreed with the automation’s recommendation, only the “Select” button needed to be clicked. Alternatively, a different option, followed by the “Select” button could be clicked. In the Weapon Release task, the image included detected tanks and the automation (80% correctly) marked hostile tanks with a red square and highlighted either the “Authorize” or “Do Not Authorize” buttons. (Hostile and friendly tanks differed in width and barrel length). Participants had to determine if the automation had correctly marked all hostiles and no friendly tanks. Similar to the Imaging task, they could change the recommendation made by the automation. Procedure Sessions began with participants’ informed consent and a brief overview. Next, participants completed questionnaires to measure demographics and individual differences. Items included those on their perceived video gaming experience (e.g., mean number of hours played weekly and level of expertise for all games, and action and FPS type games). The Dundee Stress State Questionnaire (DSSQ, short-version) was administered both before and after the experimental trial to measure changes in task engagement, distress and worry (Matthews et al., 2013). Training followed with an explanation of the testbed’s displays and controls. The automation was described as “reliable, but not perfect.” Next, each task type was described and practiced, in turn, in single task vignettes. This was followed by a 15-min training trial where participants were required to complete all the task types. The training trial was repeated if participants failed to complete the primary tasks accurately and within the system defined timeout limits. Training took approximately 30 min and was followed by a single 60-min experimental trial with the assigned workload and LOA condition. After the trial, participants completed the post-task DSSQ, the NASA-TLX workload measure (Hart & Staveland, 1988), and additional

items for trust that are not reported here. The entire session time was approximately 2 hours per participant. RESULTS Subjective Response 2 × 2 × 2 (LOA × workload × pre- vs. post task) mixedmodel ANOVAs were computed to test the effects of experimental manipulations on subjective states, including distress, engagement, and worry. There was a significant main effect for workload on distress, F(1,97) = 8.52, p < .01, ηp2 = .08, and also a significant interaction between workload and pre-post task on distress, F(1,97) = 7.81, p < .01, ηp2 = .07. In the high workload condition, participants reported greater distress after the task, compared to the pre-task baseline (see Figure 2). A near significant interaction between workload and pre-post task on engagement was found, F(1,97) = 3.65, p = .059, ηp2 = .04. In the low workload group, participants showed lower task engagement after task exposure, compared to the pre-task baseline. Regarding worry, both workload, F(1,97) = 12.17, p < .01, ηp2 = .11, and the pre-post task factor, F(1,97) = 46.14, p < .01, ηp2 = .32, exerted significant main effects. Pre- to post-task, worry decreased in all groups, and worry was lower in the high workload condition. The main and interactive effects of LOA were not significant for any subjective state. We also confirmed that NASA-TLX workload (total score) was higher in high workload (M = 57.0) than in low workload conditions (M = 46.2). Engagement

Distress

23

12

22

10

21 8

20 19

6 Pre

Post

—— High Workload

Pre - - - - Low Workload

Post

Figure 2. Pre to post-task changes in task engagement and distress for different workload conditions. Error bars are standard errors.

Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015

Bonferroni-corrected t-tests were used to test for significant differences in subjective states between men and women. Women (M = 20.29, SD = 5.18) were initially less engaged than men (M = 23.21, SD = 4.38), t (99) = -2.99, p < .05. The gender difference in post-task engagement was not significant, t (99) = -1.83, p = .07, but still showed a similar trend wherein women (M = 20.31, SD = 6.33) reported being less engaged than men (M = 22.44, SD = 5.00). In addition, women reported less experience in relation to most computer experience items, including hours spent on video games (t = 4.94, p < .01), game expertise (t = -6.33, p < .01), hours spent on FPS games (t = -4.25, p < .01), FPS games expertise (t = 7.92, p < .01), hours spent on other action games (t = -3.60, p < .01), and other action game expertise (t = -6.45, p < .01). Gaming expertise correlated fairly consistently with more positive pre-task states (see Table 2). Hours spent on computer use and video games correlated with post-task engagement. A multiple regression found that gender did not predict pre-task engagement with gaming experience controlled. Table 2 Correlations between gaming experience and pre- and post-task subjective states Distress Engagement Worry Pre-task Game hours -.164 .342** -.148 Game expertise -.293** .296** -.129 -.158 FPS hours -.158 .287** FPS expertise -.201* .269** -.160 Action hours -.123 .199* -.089 Action expertise -.294** .335** -.216* Post-task Game hours -.167 .217* -.125 Game expertise -.139 .078 -.100 -.019 FPS hours -.079 .230* FPS expertise -.120 .226* -.132 Action hours -.103 .078 -.086 Action expertise -.167 .119 -.191 * p < .05, **p < .01. Game, all video games; FPS, first person shooter game; Action, other action video games

749

management-by-consent (7.11 vs. 5.11), F(1,91) = 4.20, p < .05, ηp2 = .04 (see Figure 3). Accuracy (%) 80

85

15 10

75

75 65

5

70 IM

WR

— High Workload - - Low Workload

Neglect

Reliance (%)

0 IM

WR

— By-exception - - By-consent

IM

WR

— High Workload - - Low Workload

Figure 3. Task performance in Imaging and Weapon Release tasks for different workload/LOA conditions. Error bars are standard errors.

A single gender statistical difference was found: women performed worse than men on the Weapon Release task (M = 23.21, SD = 4.38), t (99) = -2.99, p < .05. Table 3 shows correlations between gaming experience and performance indices on Weapon Release task. Self-rated expertise tended to be associated with higher accuracy, greater reliance on the automation, and lower neglect. No correlates of Imaging task performance were found. A multiple regression found that gender did not predict pre-task engagement with gaming experience controlled. Table 3 Correlations between gaming experience and three performance metrics for Weapon Release Task Accuracy Reliance Neglect Game hours .235* .249* -.218* Game expertise .293** .270** -.181 FPS hours .163 .129 -.112 FPS expertise .316** .257* -.252* Action hours .177 .191 -.228* Action expertise .369** .331** -.285** * p < .05, **p < .01. Game, all video games; FPS, first person shooter game; Action, other action video games

Performance Three performance metrics for each surveillance task were analyzed. Accuracy was defined as the percentage of correct responses. Reliance was defined as the percentage of trials on which the participant followed the recommendation from the automation. Neglect was defined as the frequency of items that appeared in the task window, but were not opened by the participant. Data were analyzed using 2 × 2 × 2 (LOA × workload × task type) mixed-model ANOVAs, where task type refers to Imaging vs. Weapon Release tasks. For accuracy, Weapon Release task was lower than Imaging task (75.7 vs. 82.3), F(1,91) = 23.91, p < .01, ηp2 = .21, and high workload was lower than low workload (77.1 vs. 80.9), F(1,91) = 5.87, p < .05, ηp2 = .06. Reliance was higher for management-by-exception than for management-by-consent (75.6 vs. 72.8), F(1,91) = 5.12, p < .05, ηp2 = .05. Neglect was higher for Weapon Release task than for Imaging task (8.89 vs. 3.38), F(1,91) = 94.08, p < .01, ηp2 = .51, for high compared to low workload (8.38 vs. 3.86), F(1,91) = 19.18, p < .01, ηp2 = .17, and for management-by-exception than for

DISCUSSION The study explored the effect of workload and LOA on participants' performance in a UAV simulation, and investigated video game experience and gender as predictors of stress and performance. We successfully configured ALOA to produce different levels of workload, stress, and reliance on automation. Higher workload tended to produce greater distress, subjective workload, and performance impairments on both surveillance tasks, similar to a previous study using the RESCHU simulator (Panganiban & Matthews, 2014). Participants generally showed an appropriate level of trust (reliance) in the automation of the surveillance (i.e., high but not total). However, there was some variance in reliance according to task parameters. More demanding conditions (Weapon Release task, high workload) tend to reduce reliance, although it is in demanding conditions that trust in the automation may be most important. The lower level of automation (management-by-consent) was also associated with less reliance.

Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting - 2015

Video gaming experience was predictive of both subjective response and performance on the Weapon Release task, the more demanding of the two surveillance tasks. Selfrated gaming expertise was positively correlated with engagement and negatively correlated with both distress and worry in advance of exposure to the task, suggesting that gamers anticipated a more enjoyable, less stressful experience. However, only certain gaming factors, including FPS expertise and hours, predicted post-task engagement. The higher distress anticipated by those lacking gaming expertise did not materialize post-task. Gaming experience was also associated with performance on the more demanding Weapon Release task. Those with higher levels of expertise, especially on action and FPS games, were more accurate, relied more on automation, and showed less task neglect. These findings are consistent with previous observations of enhanced performance of gamers on UAV tasks (Cummings et al., 2010; McKinley et al., 2011). Contrary to the negative stereotype described by Chappelle et al. (2014), gamers were no more stress-prone than those lacking gaming experience, and actually sustained task engagement more effectively over time. Further work is necessary to differentiate alternative explanations for the benefits of gaming. Practice may enhance certain cognitive functions which transfer to simulated UAV operation. Another explanation is that gamers may be a self-selected group who possess high aptitude for a range of complex computer-based tasks irrespective of practice. In any event, gaming expertise may be a useful marker for aptitude for UAV operation. Future studies should control for cognitive and non-cognitive attributes that may be related to performance on UAV tasks to clarify the role played by gaming experience. Some gender differences were found. Women were initially less engaged then men, but this effect was attenuated by experience with the task. Women also performed more poorly on the Weapon Release task, although no gender difference in reliance on automation was found. It is important to note, however, that all gender differences disappeared when gaming experience was controlled. This suggested that gender differences in performance may be side effects of the greater interest in gaming exhibited by men. The findings suggest that it may be of value to the Air Force to recruit from non-traditional populations. Video game players, especially those with expertise on FPS or other action games, could be a source of prospective recruits with an aptitude for acquisition of needed skills. We did not observe the marked elevation of fatigue associated with longerduration operations (Ouma et al., 2011), but the higher engagement of those with expertise in FPS games suggests they might have greater tolerance of fatigue. Furthermore, the present findings do not suggest any intrinsic disadvantage to recruitment of female operators: gender differences may be negligible after gaming experience is controlled. Future research might address limitations of this research, including testing generalization to military personnel, probing causal mechanisms for the advantages of video gaming, and investigating further the role of individual differences in trust in determining performance. Further investigation is also needed to determine individual differences in fatigue across

750

longer task durations. We also aim to analyze the present data further to test for moderator effects of workload and LOA. Acknowledgements This research was sponsored by AFOSR A9550-13-10016 and 13RH05COR. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of AFOSR or the US Government. The authors appreciate the helpful comments from the anonymous reviewers and Dr. Thomas Carretta. REFERENCES Calhoun, G.L., Ruff, H.A., Draper, M.H., & Wright, E.J. (2011). Automation level transference effects in simulated multiple unmanned aerial vehicle control. Journal of Cognitive Engineering and Decision Making, 5, 5582. Carretta, T.R. (1997). Group differences on US Air Force pilot selection tests. International Journal of Selection and Assessment, 5, 115-127. Carretta, T.R., Rose, M.R., & Bruskiewicz, K.T. (in press). Selection methods for remotely piloted aircraft systemoperators. In N.J. Cooke, L. Rowe, & W.Bennett (Eds.), Remotely piloted aircraft: A human systems integration approach. Hoboken, NJ: Wiley. Chappelle, W., Salinas, A., & McDonald, K. (2011). Psychological Health Screening of Remotely Piloted Aircraft (RPA) Operators and Supporting Units. School of Aerospace Medicine, Wright-Patterson AFB, OH. Chappelle, W., Swearingen, J., Goodman, T., Cowper, S., Prince, L., & Thompson, W. (2014). Occupational Health Screenings of US Air Force Remotely Piloted Aircraft (Drone) Operators. School of Aerospace Medicine, Wright-Patterson AFB, OH. Cummings, M.L., Clare, A., & Hart, C. (2010). The role of human-automation consensus in multiple unmanned vehicle scheduling. Human Factors, 51, 17-27. Hart, S.G., & Staveland, L.E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P.A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 239-250). Amsterdam: North-Holland. Lee, J.D., & See, K.A. (2004). Trust in automation: Designing for automation. Human Factors, 46, 50-80. Matthews, G., Szalma, J., Panganiban, A.R., Neubauer, C., & Warm, J.S. (2013). Profiling task stress with the Dundee Stress State Questionnaire. In L. Cavalcanti & S. Azevedo (Eds.), Psychology of stress: New research (pp. 49-91). Hauppage, NY: Nova Science. McKinley, R.A., McIntire, L.K., & Funke, M.A. (2011). Operator selection for unmanned aerial systems: Comparing video game players and pilots. Aviation, Space and Environmental Medicine, 82, 635-642. Ouma, J., Chappelle, W., & Salinas, A. (2011). Faces of occupational burnout among U.S. Air Force active duty and National Guard/Reserve MQ- 1 Predator and MQ-9 Reaper operators. AFRL-SA-WP-TR-2011-0003, Air Force Research Laboratory, Wright-Patterson AFB, OH. Panganiban, A.R., & Matthews, G. (2014). Executive functioning protects against stress in UAV simulation. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 58, 994-998. Paullin, C., Ingerick, M., Trippe, D. M., & Wasko, L. (2011). Identifying best bet entry-level selection measures for US Air Force remotely piloted aircraft (RPA) pilot and sensor operator (SO) occupations (No. HRROFR-11-64). Human Resources Research Organization, Alexandria VA. Spence, I., & Feng, J. (2010). Video games and spatial cognition. Review of General Psychology, 14, 92-104. Terlecki, M., Brown, J., Harner-Steciw, L., Irvin-Hannum, J., MarchettoRyan, N., Ruhl, L., & Wiggins, J. (2011). Sex differences and similarities in video game experience, preferences, and self-efficacy: Implications for the gaming industry. Current Psychology, 30, 22-33. Williams, H.P., Carretta, T.R., Kirkendall, C.D., Barron, L.G., Stewart, J.E., & Rose, M.R. (2014). Selection of UAS personnel (SUPer) phase I report: Identification of critical skills, abilities, and other characteristics and recommendations for test battery development. NAMRU-D Report 15-16. Wright-Patterson AFB, OH: Naval Aeromedical Research Unit.

Suggest Documents