A Study of Depth Perception in Hand-Held Augmented Reality using Autostereoscopic Displays

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014. A Study of Depth Perception in Hand-Held Augmented Reality usin...

Author: Sharleen Lauren Pierce

21 downloads 0 Views 2MB Size

Report

Download PDF

Recommend Documents

Experiences with Handheld Augmented Reality

Visualisation of Sensor Data using Handheld Augmented Reality

We increasingly rely on using computers in. Anywhere Interfaces Using Handheld Augmented Reality

Minimizing Latency for Augmented Reality Displays: Frames Considered Harmful

A Case Study of Augmented Reality for Mobile Platforms

State of the Art in Stereoscopic and Autostereoscopic Displays

Collaborative Augmented Reality in Education: A Review

Augmented Reality using Personal Projection and Retroreflection

Using Augmented Reality to Treat Phobias

Veridical perception of 3D objects in a dynamic stereoscopic augmented reality system

Collaborative Gaming in Augmented Reality

APPLICATION OF AUGMENTED REALITY GIS IN ARCHITECTURE

Recent Advances in Augmented Reality

Object Recognition In Augmented Reality

The Design of Immersive English Learning Environment Using Augmented Reality

Augmented and Mixed Reality

True Augmented Reality

Calibration-Free Augmented Reality

Markerless 3D Augmented Reality

INTEGRATION OF DESIGN AND ASSEMBLY USING AUGMENTED REALITY

Augmented Reality Consumer Applications

Augmented Reality for the Study of Human Heart Anatomy

Augmented Reality. XenZu Technologies

Augmented Reality Greenhouse

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

A Study of Depth Perception in Hand-Held Augmented Reality using Autostereoscopic Displays Matthias Berning∗

Daniel Kleinert

Till Riedel

Michael Beigl

Karlsruhe Institute of Technology (KIT), TECO, Karlsruhe, Germany

A BSTRACT

AR. Different virtual objects were embedded in a real scene and displayed to the user. The participants in our study had to align them accurately with a real object. The goals were a) to verify if the autostereoscopic display can help an user to understand the spatial relation between real and virtual objects and b) to compare the effect to other depth cues.

Displaying three-dimensional content on a flat display is bound to reduce the impression of depth, particularly for mobile video see-trough augmented reality. Several applications in this domain can benefit from accurate depth perception, especially if there are contradictory depth cues, like occlusion in a x-ray visualization. The use of stereoscopy for this effect is already prevalent in headmounted displays, but there is little research on the applicability for hand-held augmented reality. We have implemented such a prototype using an off-the-shelf smartphone equipped with a stereo camera and an autostereoscopic display. We designed and conducted an extensive user study to explore the effects of stereoscopic hand-held augmented reality on depth perception. The results show that in this scenario depth judgment is mostly influenced by monoscopic depth cues, but our system can improve positioning accuracy in challenging scenes.

2

Keywords: Autostereoscopy, mobile devices, depth perception, augmented reality, user study Index Terms: H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Ergonomics 1

I NTRODUCTION

Depth distortion is still one of the most common perceptual problems found in Augmented Reality (AR) today [8]. It describes the issue that users cannot correctly identify the spatial relation between objects based on their viewpoint, which is especially the case for the combination of real and virtual objects. Depending on the application this information can be crucial for a good performance, e.g. in maintenance task, manuals or when visualizing occluded objects [16]. There is already a large body of research in the area of depth perception for AR, which is focused on head-mounted displays [18, 10, 17, 5]. This is viable for professional applications, but due to commercial reasons, the current platform of choice for consumer oriented AR, are mobile hand-held devices. Since they are limited to video see-through AR and offer only a small field-of-view, these devices are even more prone to depth distortion. The availability of smartphones featuring an autostereoscopic display could enable the use of binocular depth cues in this area. The displays promise a high degree of immersion and increased spatial awareness through a single screen without the need for additional glasses worn be the user (i.e. polarized or shutter). In combination with the two cameras on the back of the device, it is possible to realize true stereoscopic AR. Our contribution in this work is an experimental evaluation on the effect of stereopsis for depth perception in mobile hand-held ∗ e-mail:

D EPTH

PERCEPTION AND

S TEREOSCOPIC

DISPLAYS

According to Cutting [2], the relative importance of different depth cues is determined by the distance of the objects to the user. There are three different areas: In personal space, from 0-2m directly in front of the observer, binocular disparity provides the most accurate depth judgments. It is the most important depth cue provided by stereoscopic vision and particularly useful to resolve ambiguities created by other perceptual cues. Kyt¨o et al. [9] list some of its special benefits for AR, e.g. the layering of augmentations to increase the information density. Current autostereoscopic displays are either of the lenticular sheet or the parallax barrier type [13]. Both have in common, that a normal display is used and the images for the left and right eye are arranged in interleaved columns. By applying a sheet to structure the emitted light, each image is only visible from a specific angle which should be the corresponding eye of the user. Due to this arrangement, stereopsis is only achieved in a specific distance and orientation to the display, which is one of the major drawbacks. Another problem is the conflict between accommodation and convergence, which are directly linked in normal vision. When consuming binocular content on a flat screen, the relative orientation of the eyes will change (convergence) but the focus has to stay fixed on the screen (accommodation). Nevertheless, recent studies have shown that these devices can improve the user experience when consuming content on a hand-held device [14]. To this date, off-the-shelf smartphones are only available with parallax barrier type displays. They structure light by superimposing fine vertical lines onto the display, blocking every second column for each eye. Through the use of liquid-crystal arrays for this barrier, the stereoscopic function can be controlled dynamically. 3

R ELATED W ORK

Understanding the perception of depth is a widely researched topic in VR and AR, although most of it is focused on HMDs. The work of Swan et al. [18] is exemplary for the conducted studies and is listed here explicitly because of the survey on the field included in the publication. They conducted two experiments to investigate the effects of stereoscopic vision on depth judgment in head-worn AR. Both showed a main effect of stereoscopy on depth judgment, resulting in greater accuracy. Dey et al. [3] looked at the effect of different hand-held screen sizes and resolutions on depth-perception. In several AR test scenarios, they found a significant effect of resolution on distance estimation. Autostereoscopic displays were not part of their evaluation. The authors of [1] conducted a Wizard-of-Oz study, using a cutout phone to simulate the effects of binocular vision on positioning accuracy of a hand behind the device. They show that performance was worse using a monocular setting. A functional prototype

[email protected]

1

Camera pair

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

L

2

Tracker (ARToolKitPlus)

ing and rendering, mapped to the Android OS. The basic frame processing is depicted in Figure 1. We extended the framework in several aspects to increase performance, while maintaining or even reducing resource consumption. This includes upgrading the tracking library to ARToolKitPlus3 [19] and replacing the renderer with Rajawali 4 . To maximize throughput for the stereoscopic mode, we limit tracking to the left video stream and calculate the transformation for the virtual objects in the right part based on the camera offset. In addition to the lower resource consumption, this also prevents tracking errors from interfering with the stereoscopic effect. We had to find the optimal configuration for the LG Real3D API, which provides fine grained control over the camera and display hardware. Camera frames are set to be placed in top-bottom arrangement in memory, meaning the left frame is located before the right, allowing for sequential access to each one. By using the YUV image format, the luminance component can be separated easily and is forwarded to the tracker. The used tracking resolution is 640 × 480. Camera calibration from the OpenCV package produced the intrinsic camera parameters, as well as the extrinsic transformation matrix ME from the left to the right camera image. After our first trials provided an unpleasant viewing experience due to unsatisfactory alignment, we identified the automatic disparity remapping of the phone to be responsible. Disparity remapping is used in stereoscopic video recordings to warp the source images to shift the perceived depth of the scene to a range, which is comfortable for the user, based on the employed display [12]. Our hardware and also most other consumer electronics, enable such an algorithm by default. The algorithm interferes with our camera calibration and the effects on depth perception are neither quantifiable, nor correctable, so we disabled it completely by setting the parameter auto-convergence. This is also necessary to ensure that the displayable depth range is not skewed and lies completely behind the display from the user perspective. The rendering component takes two inputs, the full video frame and the marker homography HM for each detected marker in the left image. The output is a side-by-side image with the two perspectives adjacent to each other. First the video frame is converted on the GPU and applied to a static plane in full width. Next, the virtual objects are transformed according to HM and the scene is rendered to the left half. After applying the extrinsic matrix ME , the scene is rendered again on the right half. When the parallax-barrier is enabled, the side-by-side image is automatically converted by the display driver. With the described pipeline, stereoscopic AR can be switched on and off instantaneously, to enable direct comparison. In stereo mode, we achieved interactive frame rates for tracking and rendering with 25-30fps and 20-25fps respectively. During the development we encountered some issues resulting from the employed display technology. The accommodationconvergence-conflict, typical for autostereoscopic displays, decreased after a short period of use. What remained was the ghosting, which occurs when bright pixels for one eye illuminate dark regions in the other image. Despite these shortcomings, initial users reported an intense feeling of immersion and depth.

HHM HMM

Renderer (Rajawali)

ME

3

R

4

HM

5

HMME

1

Figure 1: Basic frame processing pipeline of our stereoscopic AR framework: (1) read camera frame from stereo pair, (2) detect markers in left image, (3) convert frame and draw it to the canvas, (4) transform augmentations for left side and render, (5) apply extrinsic camera transformation and render right side.

using video see-trough could not be tested. The Nintendo 3DS1 is a hand-held gaming console that supports stereoscopic AR. There are some games that use this feature, but the development on this platform is restricted by the manufacturer. Structured evaluations on the effects of depth-perception were not possible with the available software. The work of Kerber et al. [7] is the most relevant for our own experiments. Similar to our work, they conducted an experiment to determine the depth discrimination ability of participants using a smartphone with an autostereoscopic display. For this purpose, two virtual textured cubes with varying size were displayed above a real table and the subjects had to decide which one was closer. In their analysis, they found no significant effect of the stereoscopic condition on depth perception. Unfortunately, the paper does not clarify how they handled the automatic disparity remapping performed by the phone. As detailed in the next chapter, this can have a severe effect on the perceived AR experience. Another aspect that is not addressed is the alignment of real and virtual objects, since the tested scenario was deliberately chosen to be very close to VR. In our opinion, based on previous research, it is especially this interaction that could benefit the most from stereoscopic vision, which is why this is the focus of our work. Mikkola et al. [14] investigated the effect of small autostereoscopic displays on depth perception in static scenes. They found a significant improvement of accuracy and speed of depth judgements when introducing binocular disparity. The addition of secondary depth cues as shadowing, texture gradient or depth blur had no measurable effect on this result. 4

P ROTOTYPE I MPLEMENTATION

We implemented our prototype on the off-the-shelf smartphone LG Optimus 3D Max (P720) featuring a stereo camera and an autostereoscopic display of the parallax barrier type. The system is based on Android and runs the latest official firmware for this device with version 2.3 of the mobile operating system. Processing resources are limited by the dual-core ARM processor, rated at 1.2GHz and 1GB of RAM. The 4.3in display has a native resolution of 800 × 480 in landscape orientation, which is halved horizontally when enabling the parallax barrier. The camera pair has a stereo basis of 24mm and a resolution of 5MP each, although we are only using a much lower resolution for tracking. Both hardware units are controlled via the proprietary Real3D API provided by LG, which extends the Android framework to support 3D user interfaces. The software is based on AndAR 2 , which provides a solid architecture including the necessary modules for camera control, track-

5 E XPERIMENTAL E VALUATION In order to evaluate its effect on the perception of spatial relations and to compare binocular disparity to other depth cues in handheld AR we conducted a user study to quantify a) accuracy and b) completion time of depth positioning tasks between real and virtual objects in a scene. 5.1 Study Design The most common experiments to study depth perception are verbal report, perceptual matching, and open-loop action-based tasks

1 http://www.nintendo.com/3ds

3 https://launchpad.net/artoolkitplus

2 https://code.google.com/p/andar/

4 https://github.com/MasDennis/Rajawali

2

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

(a) same

(b) shadow

(c) texture

(d) banknote

(e) sphere

Figure 2: Experimental setup as displayed on the device. The virtual objects presented in the different test cases are: (a) Same sized object, always placed directly on the table; (b) Cube with the same depth as reference object and drop shadow on the table; (c) Cube with same depth and textured faces; (d) 10Euro banknote as an object of known size; (e) Sphere with same diameter as reference object.

virtual object

Z

origin Y

We wanted to compare the influence of disparity to other depth cues typically found in AR scenarios. Therefore we devised five different scenes with distinct virtual objects, featuring a variety of depth cues (see Figure 2). The virtual cuboid in the first test case had the same dimensions as the real one and was always placed directly on the surface. This provided the visual height and the retinal size as additional cues. The two cubes were enhanced with drop shadow and texture respectively. The banknote is an object of known size. The sphere added no depth information. The tests same, shadow and texture also provided linear perspective as an auxiliary depth cue. All objects, except the banknote, had the same dimension along the y-axis as the reference object. We saw no need to restrict the motion of the participants for our experiment, allowing them to use motion parallax as another depth cue. They were instructed to stay behind the red line, marked with tape on the desk surface to hinder them from switching to a side or top-down view. Only few participants made extensive use of motion parallax during the experiment. Because of the unrestricted motion the depth of the real object was not controlled and only one position was tested. The distance between the observer and the object varied around 1-2m. The main subject of our evaluation was the effect of stereopsis on hand-held augmented reality so each participant performed each test case with and without the stereo condition. Each combination of test case and display condition was repeated four times in a within-subject study with repetition, resulting in 5 × 2 × 4 = 40 individual tests per subject. To counteract ordering effects, the test order was completely randomized for each participant. In addition, the starting position of the virtual object was selected randomly from ten different heights (evenly spaced between 50 and 140mm) and ten different depths (evenly spaced between 30 and 300mm), to reduce learning. This also ensured that the virtual object was placed in front as well as behind the reference object situated at 200mm. Although relative height in the visual field is also a depth cue, its influence in personal space is regarded as minor. Each of the 40 tests was started and ended with the press of a physical button located on the side of the device. Since the device was operated in landscape orientation, the button could be operated very similar to a shutter release known from a camera. We recorded the final depth offset between virtual and real object and the task completion time as dependent variables. The latter is corrected to account for the fact that the visibility of the augmentation was lost for short periods when the tracking failed. An overview of the experiment variables can be found in Table 1.

real object

175mm

260mm

marker board

100mm 170mm

60mm

180mm X

475mm

table surface

Figure 3: Schematic overview of the experimental setting on an office desk from the participants point of view. The virtual object was moved along the y-axis to be aligned with the real object.

[18]. They are based on egocentric depth judgments performed by the test subject, which are either reported verbally, matched to a reference distance or recreated after the observation. In this study we choose perceptual matching, which is an action-based closed-loop task. The reasons for this are twofold: First, these tasks resemble several interactions found in hand-held AR. Second, due to the mismatch between the stereo-base of our device compared to the human inter-ocular distance, we expected that first-time users are unable to correctly translate the perceived depth to a real distance. We designed the task to represent our target area, limited to the personal space - 0-2m in front of the user - where stereoscopy should have the largest effect. A schematic overview of the study setup can be found in Figure 3. We displayed different objects, represented by the green cube on the left, floating above the flat surface on an office desk. The task was to match the depth (y-position) of the virtual object with a real reference object, in our case, a pink cuboid standing on the right of the desk. The position of the virtual object could be varied parallel to the y-axis, along the dashed line in the schematic. This was done using a vertical swipe gesture on the touchscreen of the phone, which resulted in a relative movement of the object. To improve tracking stability, we used a marker board with six markers, which was overlayed with a large white virtual rectangle to eliminate additional perceptual clues from the marker and board dimensions. As shown in Figure 2, the white rectangle extended beyond the margins of the marker board, limited by the real object on the right and the table edge in the other directions. The space was limited in the back by a wall. All tests were conducted in our laboratory with static illumination from artificial light sources. Figure 2 gives an impression of the scene displayed on the mobile device.

5.2 Participants We recruited 24 participants but had to exclude two of them because of stereo blindness and another two because of technical problems during the experiment. The remaining 20 subjects (four female, age between 19 and 33, mean of 26) had normal or corrected-tonormal vision. Most of them were members of our faculty (students and employees). Although all of them had a technical background,

3

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014.

absolute depth error (mm)

Table 1: Dependent and independent variables of the experiment

Independent variables subject ID 20 (random variable) test case 5 same, shadow, texture, banknote, sphere display condition 2 mono, stereo test repetition 4 1..4 initial depth 10 30, 60, ... 300mm (random variable) initial height 10 50, 60, ... 140mm (random variable) Dependent variables depth offset depthVirtual - depthReal, mm task completion time duration - timeOfInvisibility, ms

relative depth offset (mm)

mono stereo

40

20

0 same

shadow

texture test case

banknote

sphere

Figure 5: Mean of the absolute depth error and 95% confidence interval. Pairwise comparison between the two display conditions shows significant improvements for the banknote test case.

200

100

a Mean (M) of 3.7mm (Standard Deviation (SD): 21.8mm) in the monoscopic and 3.1mm (SD: 21.5mm) in the stereoscopic condition. And the same is true for test condition shadow at 4.6mm (SD: 25.5mm) without and 4.6mm (SD: 21.3mm) with stereoscopy. The positioning error of all samples in these two test cases, lies in the range of ±5% of the object distance, except for some outliers. The mean of the test case texture is also very close to the center in both conditions (mono: -5.7mm; stereo: -6.5mm), but the standard deviation is much larger in the first case (mono: 63.8mm; stereo: 52.5mm). The depth judgments for the banknote are prone to underestimation in both display conditions (mono: -21.8mm; stereo: -14.3mm). The test case sphere shows a strong overestimation of the distance to the reference object in the stereoscopic (M: 30.0mm, SD: 60.5mm), but not in the monoscopic case (M: 2.2mm, SD: 77.6mm). For all of the test cases the standard deviation in stereoscopic mode is lower than the monoscopic one. This could already be an indication that it is more precise, although not necessarily more accurate. We calculated Spearman’s correlation coefficient to estimate if the initial starting position of the virtual object had an effect on the final depth of the object. The assumption would be that there is a monotonic relationship between those variables, if any. The low correlation between the depth offset and initial height (ρ = −0.049) or initial depth (ρ = 0.249) respectively, indicates that there is little influence. To compare positioning accuracy, we computed the absolute value of the depth positioning error. The mean values for each test condition are shown in Figure 5 together with the 95% confidence interval based on the standard error. To investigate the effect of the different conditions on the depth error, we performed a 2 × 5 repeated measures ANOVA based on the absolute error. The results indicate that there is a significant main effect of stereoscopy on the depth error (F1,19 = 5.194,p < .05). On average the error is 5% lower in the stereo case compared to the monoscopic display condition. Further analysis was conducted on the individual test cases using a paired t-test between the mono and stereo case. This could only show a significant improvement for the test condition banknote (t(79) = 3.56, p < 0.001). In this case mean error is reduced by 32%. The effects in the other tests were not significant: same (t(79) = 1.64, ns), texture (t(79) = 0.54, ns), shadow (t(79) = 1.78, ns) and sphere (t(79) = 1.36, ns). The mean values for the completion time plotted in Figure 6 show that test conditions banknote and sphere were finished more quickly in the stereo condition. Overall the effect of stereoscopic vision is either negligible or tends to increase completion time. Con-

0

−100

display 60

display mono stereo

−200

same

shadow

texture test case

banknote

sphere

Figure 4: Boxplot of the relative depth positioning offset in the different test cases and display conditions. Negative values indicate that the virtual object was placed closer to the viewer compared to the reference. The accuracy for the first two test cases is very high.

only one participant reported to have had prior experience with autostereoscopic displays. To monitor possible side-effects, every person started by filling out a Simulator Sickness Questionnaire (SSQ) [6]. The device was handed to the participant in a demonstration mode and they were informed about the display, the optimal viewing angle and distance. After that, they were given some time to familiarize themselves with the hardware, followed by a description and a dry run of the task with a simple cube. All of the 40 variations were tested in one sessions, each test separated by a black screen with white text indicating the progress. The duration of the whole session lasted between 15 and 20 minutes per participant and was finished with the second part of the SSQ. Due to this design, the SSQ cannot distinguish between the two display conditions, but it can serve as an indicator of the general visual strain during the session. 5.3 Results Figure 4 provides and overview on the acquired data points for the relative depth offset between both objects with a boxplot for each test case and for both display variants. The center line at 0mm would be a perfect alignment, while negative values translate to the virtual object being closer to the viewer, positive values were further away. The first observation is that test conditions same and shadow were positioned very accurately by all participants. For same there is almost no difference between the two display conditions with

4

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014. Only the test case banknote did show a significant improvement of positioning accuracy in the stereoscopic condition. It should be noted that this object did not provide any cues based on linear perspective or same object size. Out of the nine participant who ranked the difficulty of the tasks, four named the sphere, three the banknote and another two voted for the textured cube, as the most difficult object. This can also be seen in the notably higher error and completion time for these conditions. Test cases sphere and texture did not show any significant effect of stereoscopy on positioning error. This is similar to the findings of Kerber et al. [7], who investigated minimal depth discrimination distances between virtual cubes using the same hardware. They argue that object size already provides a strong depth cue which is conflicting with binocular disparity. Regarding the relative positioning offset, we could see varying degrees of over- and underestimation of the virtual objects depth. This could be caused by the different colors in the individual test cases and the illumination levels between real and virtual object. In some conditions, the color red is perceived to be nearer than adjacent regions of other color and the same holds true for brighter objects. As studied by Guibal and Dresp [4], the effect is dependent on background, contrast and interaction with other depth cues (e.g. color stereopsis). Since the real object was red and the virtual object was brighter, there could have been contradicting cues. In general these interactions should be part of further studies, since the background in AR applications is usually dynamic. In conclusion, our study shows that the use of current autostereoscopic displays in hand-held AR can only improve depth perception significantly if the scene offers no other dominant depth cues. These results go in line with the the weak fusion model of depth perception [11], in which all depth cues contribute to depth perception with different weights. Surprisingly, binocular disparity seems to have much smaller impact than expected in personal space directly in front of the user. One of the major drawbacks in terms of user experience was the display technology. The SSQ showed that several of the participants were affected by the test procedure, although the effect was minor and could also result from the high level of concentration. We think that diminished resolution and brightness, as well as ghosting could be reduced in future hardware generations, but the accommodationconvergence mismatch is not as easy to overcome. One effect we witnessed during our study, was that the participant, who declared to use a hand-held autostereoscopic game console regularly, performed better than average and also did not show any signs of visual fatigue. We speculate that this could be a sign of adaption to the technology, but this needs to be evaluated in a long term study, which was not done to date.

task completion time (ms)

15000 display 12500

mono stereo

10000

7500

5000 same

shadow

texture test case

banknote

sphere

Figure 6: Mean value of the task completion time and 95% confidence interval in the different test cases. The y-axis is truncated to improve readability. Overall, the stereo condition seems to be more time consuming.

sequently, the ANOVA analysis could not show a significant effect of the display condition on the completion time. Using a paired t-test for the individual test cases, we find a considerable increase (19%) in completion time using stereoscopy for the shadow condition (t(79) = −2.5511, p < .05). During the experiment we could observe that the task was usually approached in several phases: In a first phase, the situation was assessed and the initial position estimated. This was followed by a rough positioning and several smaller refinements. The results of the SSQ over all test conditions showed an increase in eye strain for 45%, blurred vision for 35%, dizziness with closed eyes for 25% and difficulties to focus for 20% of the subjects. There were also individual reports of degradation for the categories dizziness with open eyes, general discomfort, fatigue and difficulties to concentrate. All stated changes were only one step from none to slight or from slight to moderate. After the individual sessions of the experiment, several participant spontaneously expressed their opinion regarding the display. Four of them preferred the stereoscopic mode, while seven other subjects would choose the monoscopic option. Some others declared to perceive no difference between the two. There were several reports of blurred or low resolution images, ghosting, flickering and low brightness, all of which can be traced back to the parallaxbarrier display. 6

D ISCUSSION

7

In this study, we compared accuracy and speed of depth perception for monoscopic and stereoscopic hand-held AR. Five different virtual objects had to be aligned with a real object in individual test cases. Each of them provided a unique set of monoscopic depth cues. Almost all participants were able to complete test cases same and shadow with minimal positioning error, regardless of the display condition. Apparently, the monoscopic cues were so strong, that the binocular disparity provided no significant improvement. In retrospect, both test cases were very easy to adjust because of the distinct edge created by the virtual object standing (or being projected) on the table. With this information it was no problem to align them with the same edge on the real object without the need for other depth cues. Completion time for test case shadow suffered significantly from the stereoscopic condition and took longer to align. One possible reason could be that the subjects needed more time to adjust to the autostereoscopic display. This could also be an indication for confusion resulting from contradicting depth cues.

C ONCLUSION

AND

F UTURE W ORK

In this paper we presented an approach to improve depth perception in hand-held augmented reality using autostereoscopic displays. We demonstrated the feasibility to implement such a system on an off-the-shelf smartphone. The relevance of binocular depth cues were studied in an experiment conducted with our working prototype. The results indicate that stereopsis can only lower positioning error between real an virtual objects significantly if the scene composition offers only weak monoscopic depth cues. It should be noted that in most of our test cases other indicators for alignment had a much larger influence on the performance. Therefore it seems reasonable to suggest to designers of AR application, to include different forms of depth indicators. Example for these are auxiliary augmentations [9], depth blur [15] or artificial holes to overcome occlusion [16]. While coinciding cues should reinforce each other, contradictions are not as easily resolved, eventually leading to an increase in task completion time. Future studies need to examine other screen types and sizes for hand-held stereoscopic AR, since the current prototype is based on

5

To appear in the International Symposium on Mixed and Augmented Reality (ISMAR) 2014. a single device. Especially larger displays could increase the effect of stereoscopy. In addition the impact should be compared with other depth cues e.g. color and illumination contrast and also the background could have an influence on perception.

infrastructure visualization. Personal and Ubiquitous Computing, 13(4):281–291, June 2008. [17] G. Singh, J. E. Swan II, J. A. Jones, and S. R. Ellis. Depth judgment measures and occluding surfaces in near-field augmented reality. In Proceedings of the 7th Symposium on Applied Perception in Graphics and Visualization - APGV ’10, page 149, New York, USA, July 2010. ACM Press. [18] J. E. Swan II, A. Jones, E. Kolstad, M. A. Livingston, and H. S. Smallman. Egocentric depth judgments in optical, see-through augmented reality. IEEE transactions on visualization and computer graphics, 13(3):429–42, Jan. 2007. [19] D. Wagner and D. Schmalstieg. ARToolKitPlus for Pose Tracking on Mobile Devices. In Computer Vision Winter Workshop 2007, St. Lambrecht, Austria, 2007.

ACKNOWLEDGEMENTS This work was supported in part by the German Federal Ministry of Education and Research (BMBF) as part of the FRAGMENTS project (grant number 01IS12051). R EFERENCES ˇ c Pucihar, P. Coulton, and J. Alexander. Creating a stereo[1] K. Copiˇ scopic magic-lens to improve depth perception in handheld augmented reality. In Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services MobileHCI ’13, page 448, New York, USA, Aug. 2013. ACM Press. [2] J. E. Cutting. Reconceiving perceptual space. In Looking into pictures: An interdisciplinary approach to pictorial space, pages pp. 215–238. MIT Press, 2003. [3] A. Dey, G. Jarvis, C. Sandor, and G. Reitmayr. Tablet versus phone: Depth perception in handheld augmented reality. In 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 187–196. IEEE, Nov. 2012. [4] C. R. Guibal and B. Dresp. Interaction of color and geometric cues in depth perception: When does “red” mean “near”? Psychological Research, 69(1-2):30–40, 2004. [5] J. A. Jones, J. E. Swan II, G. Singh, E. Kolstad, and S. R. Ellis. The effects of virtual reality, augmented reality, and motion parallax on egocentric depth perception. In Proceedings of the 5th symposium on Applied perception in graphics and visualization - APGV ’08, page 9, New York, USA, Aug. 2008. ACM Press. [6] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal. Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness. The International Journal of Aviation Psychology, 3(3):203–220, July 1993. [7] F. Kerber, P. Lessel, M. Mauderer, F. Daiber, A. Oulasvirta, and A. Kr¨uger. Is autostereoscopy useful for handheld AR? In Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia - MUM ’13, pages 1–4, New York, USA, Dec. 2013. ACM Press. [8] E. Kruijff, J. E. Swan II, and S. Feiner. Perceptual Issues in Augmented Reality Revisited. In 2010 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 3–12, 2010. [9] M. Kyt¨o, A. M¨akinen, J. H¨akkinen, and P. Oittinen. Improving relative depth judgments in augmented reality with auxiliary augmentations. ACM Transactions on Applied Perception, 10(1):1–21, Feb. 2013. [10] M. Kyt¨o, A. M¨akinen, T. Tossavainen, and P. Oittinen. Stereoscopic depth perception in video see-through augmented reality within action space. Journal of Electronic Imaging, 23(1):011006, Mar. 2014. [11] M. S. Landy, L. T. Maloney, E. B. Johnston, and M. Young. Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Research, 35(3):389–412, Feb. 1995. [12] S. Mangiat and J. Gibson. Disparity remapping for handheld 3D video communications. In 2012 IEEE International Conference on Emerging Signal Processing Applications, pages 147–150. IEEE, Jan. 2012. [13] M. Mehrabi, E. M. Peek, B. C. Wuensche, and C. Lutteroth. Making 3D work: a classification of visual depth cues, 3D display technologies and their applications. In Proceedings of the Fourteenth Australasian User Interface Conference (AUIC2013), pages 91–100. Australian Computer Society, Inc., Jan. 2013. [14] M. Mikkola, A. Boev, and A. Gotchev. Relative importance of depth cues on portable autostereoscopic display. In Proceedings of the 3rd workshop on Mobile video delivery - MoViD ’10, page 63, New York, USA, Oct. 2010. ACM Press. [15] T. Ogawa. Blur with depth: A depth cue method based on blur effect in augmented reality. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 1–6. IEEE, Oct. 2013. [16] G. Schall, E. Mendez, E. Kruijff, E. Veas, S. Junghanns, B. Reitinger, and D. Schmalstieg. Handheld Augmented Reality for underground

6