2D VS. 3D VISUAL QUALITY EVALUATION: THE DEPTH FACTOR

Proceedings of Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics January 30-February 1, 2013, Scottsdal...
Author: Priscilla Craig
3 downloads 1 Views 956KB Size
Proceedings of Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics

January 30-February 1, 2013, Scottsdale, Arizona

2D VS. 3D VISUAL QUALITY EVALUATION: THE DEPTH FACTOR Sin Lin Wu 1,2, Jorge Caviedes 1, Ingrid Heynderickx 2,3 1

Intel Corporation, Chandler AZ, USA Delft University of Technology, Delft, the Netherlands 3 Philips Research Laboratories, Eindhoven, the Netherlands 2

ABSTRACT Visual quality evaluation (VQE) for 3D-TV has so far evolved as an extension to the 2D-TV methodology; the same is true for 3D content processing. In this paper, we present our research on those two topics, and provide new insights on the unique 3D-TV VQE factor and how to incorporate it in applications. These insights are based on experiments including 2D and 3D content in different viewing modes and to which enhancement is applied, using video processing algorithms known from 2D-TV. The viewing modes used are selected from 2D vs. 3D comparisons in their standard viewing conditions; (1) 2D content is shown on a 3D-TV in 2D mode (i.e., with the 3D settings turned off and no glasses), (2) 2D content is shown on a 3D-TV in 3D mode (i.e., with the 3D settings turned on and with glasses), and (3) 3D content is shown on a 3D-TV in 3D mode (i.e., with the 3D settings turned on and with glasses). All our experimental results indicate a statistically significant content effect. It has become apparent that analyzing the content, especially on depth, can be used to partition content and to indicate what kind of processing is more likely to affect quality. We will argue that with 3D content depth is an additional factor to be considered when deciding whether post-processing should be applied to make the content look more appealing. 1.

INTRODUCTION

Nowadays, 3D-TVs are widely available on the consumer market. Most of these 3D-TVs are stereoscopic, which means that they need glasses to view the 3D content; only very few 3D-TVs are auto-stereoscopic, and do not need glasses. To make the introduction of 3D-TVs on the market a success their image quality needs to be maintained minimally at the level of current 2D TVs. To be able to do so a methodology to reliably test the quality of the 3DTVs is necessary. The main difference between 2D-TV video quality and 3D-TV video quality is of course the occurrence of stereoscopic depth. However, the stereoscopic depth rendered on a 3D-TV is only simulated and when done wrongly it can cause visual discomfort. Hence, the introduction of 3D-TVs to the market was preceded with extensive research on the quality of stereoscopic images and videos, though with evaluation methodologies and content processing mainly evolving from the 2D-TV counterpart. For example, recommendations from the ITU state that ITU-R BT.500 is also applicable for stereoscopic TVs [1]. Some extensions from the evaluation of quality of 2D-TVs to the evaluation of quality of 3D-TVs have been proposed in the literature. This literature suggests not only including picture quality in the evaluation, but also depth perception [2-4] and visual comfort [5]. When stereoscopic depth is rendered correctly and at high quality, the visual experience of stereoscopic movies is higher than that of 2D movies. To assign the stereoscopic depth correctly, however, the monocular and binocular cues have to be consistent. Although most research focuses on the binocular cue, i.e. stereopsis, monocular cues do contribute in the depth perception. The latter can be noticed with people who have the inability to see stereopsis; these people get a sensation of depth through the monocular cues. Characterizing depth in a scene by means of monocular cues resulting in a depth index has been attempted [6]. This attempt was limited to analyzing depth in the content based on linear perspective. Although, linear perspective is a strong monocular cue, there are several other cues that are important (e.g., motion [7], occlusion, shadows, etc), and hence, may be relevant to characterize the perceived depth in a scene. Enhancing content with video processing towards a better perceived quality is well-known and broadly applied to 2D content. For 3D content enhancement may have a similar impact, but so far mostly 2D enhancement algorithms are applied. We believe that just applying 2D enhancement algorithms on 3D content may be counterproductive. We

111

VPQM2013

expect that by analyzing the content and in particular the depth in the content, more appropriate enhancements can be applied to the content. In this paper we will describe a methodology for evaluating the quality of 3D content. In addition, we will describe the setup of our experiment and show that the results indicate content dependency. With these results we will motivate the importance of further studying content dependency in order to better optimize video processing for 3D content. 2.

EVALUATION METHODOLOGY

The ITU-R BT. 2021 recommendation for the evaluation of 3D-TVs prescribes that the subject should be asked to score three factors separately, i.e. picture quality, perceived depth, and visual comfort. When doing so the length of an experiment triples. This is a minor issue for scientific research, and hence the recommendation is very relevant in that respect. From a practical application point of view, however, tripling the length of an experiment for measuring improvements in e.g., video enhancement algorithms impedes the speed of innovation, and so, is not always practically implementable. Therefore, we developed an alternative solution, existing of a scoring scale with guidelines along the scale [8]. This scoring scale is shown in Table 1. The guidelines are meant to emphasize the importance of depth to the subjects. In the experiment described below, the subjects were asked to score the overall quality of the TVs on a scoring sheet, including the guidelines and two vertical scoring scales placed side-by-side for a comparison on two TVs. Table 1 The guidelines used in this research Score 5 4 3 2 1

3.

Description Excellent overall quality; all desirable features are at their best, including naturalness, compelling depth, color/sharpness/contrast; no artifacts, no discomfort Good overall quality and desirable features, including depth which is present to the right extent Acceptable overall quality; depth is present but not compelling, artifacts if any have small or no impact Fair overall quality; noticeable depth issues including unnatural/incorrect/artifacts, artifacts are noticeable and annoying Poor or bad overall quality; very poor for most desirable features, serious depth issues or other artifacts very annoying, cause serious fatigue

EXPERIMENT

The experiment consisted of a within-subjects design, including viewing mode, whether or not video processing was applied and video content as independent variables. The 3D video content consisted of the full HD (i.e., 1920 x 1080 pixels), side-by-side (for the left and right eye) format. Figure 1 shows screenshots of the four different sources used. They were selected for having different characteristics. Figure 1(a) shows an outdoor scene with good contrast and color, while Figure 1(b) depicts an indoor scene with low contrast. Both scenes have relatively little depth compared to the scenes given in Figure 1(c) and (d). The scenes were shown to the participants on two calibrated 46 inch Sony® Bravia TVs of the same model placed side-by-side at a distance of 2 meters from the subjects. One TV displayed the original source video (unprocessed, and hereafter referred to as reference), while the other TV displayed the enhanced version of the source video (so, processed, and hereafter referred to as stimulus). The enhancement of the four stimuli was done in real-time by the TVs adjusting the color, color temperature, gamma and sharpness settings. Due to the proprietary nature of the algorithms of Sony®, no further details on the enhancement can be given. The reference and stimulus were shown either both in 2D or both in 3D. In the case of 2D two different modes were used. The first viewing mode was the normal 2D viewing, i.e., with the TV in 2D mode and without glasses (hereafter referred to as 2D-2D). The input

112

(a)

(b)

(c) (d) Figure 1 The sources used in the experiment: (a) Balloon, (b) Mall, (c) PedXing and (d) Suspension.

consisted of the left frames of the 3D content, up-scaled offline. The second mode consisted of viewing 2D content on a 3D-TV, i.e., with the TV in 3D mode and using glasses (hereafter referred to as 2D-3D); the input consisted of a duplicated left frame in a side-by-side format. The 3D content was always viewed in 3D mode and using glasses (hereafter referred to as 3D-3D). Six subjects participated in the experiment. Since these subjects had to score critical differences consistently, they were selected from a bigger pool of subjects, based on their ability to recognize similarly processed from differently processed scenes. All six subjects had normal or corrected to normal vision, good stereo vision and no color blindness. The subjects were asked to score both TVs on overall quality on the scoring sheet described above. The experiment consisted of two sessions. In the first session, the participants scored reference vs. stimulus for the four scenes in 2D-2D mode and 3D-3D mode. In the second session, the participants scored reference vs. stimulus for the four scenes, but now comparing 2D-3D mode with 3D-3D mode. Hence, the subjects had to score in total 8 stimuli pairs per session. Both sessions took place on separate days. 4.

RESULTS

Figure 2 shows the mean scores and their 95% confidence interval of the original and enhanced version of each of the four scenes separately for the comparison 2D-2D mode with 3D-3D mode. Video processing very clearly improves the quality of the videos, for all scenes both in the 2D-2D mode and 3D-3D mode. On average, the quality of the 3D scenes is scored lower than that of the 2D scenes. This is especially true for the scenes Mall and Balloon, being the two scenes with limited depth cues. This observation provides a first indication of the impact of scene content on perceived quality for 3D video.

113

5

Score

4 Balloon

3

Mall 2

PedXing Suspension

1

Figure 2 The mean scores of the four source videos with their respective 95% confidence interval for the comparison between the 2D-2D mode and 3D-3D mode.

Figure 3 compares the results of both parts of the experiment. It gives the mean scores and 95% confidence intervals for the original and enhanced version of the four scenes separately, comparing once the results of the 2D-2D mode with 3D-3D mode, and once the results of the 2D-3D mode with the 3D-3D mode. Note that the data points for the 3D-3D mode are actually repetitions measured by the same participants on the same scenes, but on two different days. The small deviation in mean score for these repetitions clearly shows that this particular selection of participants was well able to score the quality of scenes consistently, and that the effect of video processing, viewing mode and video content is larger than what could be expected from measurement error. Figure 3 also demonstrates a clear reduction in quality score when the 2D content is shown in 3D mode (as compared to the 2D-2D viewing mode). As a consequence, the mean scores of the 3D content are now higher than of their 2D counterparts. The latter is especially true for the two video scenes with larger depth cues, i.e. PedXing (Figure 3(c)) and Suspension (Figure 3(d)), confirming the importance of understanding differences between scenes.

Balloon 5

Session I

Mall 5

Session II

4

4

3

3

2

2

1

Session I

Session II

1 2D Original

2D Enhanced

3D Original

3D Enhanced

2D Original

(a)

2D Enhanced (b)

114

3D Original

3D Enhanced

PedXing 5

Session I

Suspension 5

Session II

4

4

3

3

2

2

1

Session I

Session II

1 2D Original

2D Enhanced

3D Original

3D Enhanced

2D Original

2D Enhanced

(c)

3D Original

3D Enhanced

(d)

Figure 3 Mean scores and 95% confidence intervals for the four source videos: (a) Balloon, (b) Mall, (c) PedXing, and (d) Suspension. Session I compared the 2D-2D and 3D-3D modes and Session II compared the 2D-3D and 3D-3D modes

5.

DISCUSSION

Two different 2D modes were used in the experiment. The main reason for including both modes was to evaluate the extent to which wearing active glasses would affect the quality scores. The comparison of the 2D-2D mode with 2D3D mode clearly demonstrates that displaying 2D content on a 3D-TV in 3D mode considerably affects the quality scores. The latter is most probably attributed to a loss in brightness as a consequence of wearing glasses and a loss in spatial resolution as a consequence of using a frame compatible 3D video format (in our case the side-by-side frame format). As a consequence, we consider the comparison of 3D content with 2D content on a 3D-TV in 3D mode fairer as the brightness loss due to the glasses and the resolution loss due to the video format are in that case the same. Nonetheless, the lower quality of the 3D content with respect to the 2D content in the 2D mode (so on a 2DTV) remains an issue to be solved. Auto-stereoscopic displays may be the solution for the brightness loss and with the introduction of 4K auto-stereoscopic TVs on the market also the resolution loss is most probably no longer an issue. The results of both parts of the experiment also clearly show content dependency. The behavior of the four different scenes suggests that the amount of monocular depth matters. We believe that by analyzing the depth cues in the scene more extensively and accurately, we can optimize the necessary enhancement to be applied on the source videos. In particular, when no or little depth cues are found it may actually save bandwidth to display the content in 2D mode as introducing stereoscopic depth has no additional value. However, more research is needed to further substantiate these conclusions.

REFERENCES [1] ITU-R BT.1438, “Subjective Assessment Telecommunication Union, March 2000

of

Stereoscopic

Television

Pictures,”

International

[2] R.G. Kaptein, A. Kuijsters, M.T.M. Lambooij, W.A. IJsselsteijn, I. Heynderickx, “Performance Evaluation of 3D-TV systems,” In proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol. 6808, 680819, 2008

115

[3] P.J. Seuntiëns, I. E. Heynderickx, W.A. IJsselsteijn, P.M.J van den Avoort, J. Berentsen, I.J. Dalm, M.T. Lambooij, W. Oosting, “Viewing Experience and Naturalness of 3D Images,” In proceedings of SPIE Three Dimensional TV, Video, and Display IV, SPIE vol. 6016, 601605, 2005 [4] M. Lambooij, W. IJsselsteijn, D.G. Bouwhuis, I. Heynderickx, “Evaluation of Stereoscopic Images: Beyond 2D Quality,” IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 432-444, June 2011 [5] ITU-R BT.2021, “Subjective Methods for the Assessment of Stereoscopic 3DTV Systems,” International Telecommunication Union, Augustus 2012 [6] L. Goldmann, T. Ebrahimi, P. Lebreton, A. Raake, “Towards a Descriptive Depth Index for 3D Content: Measuring Perspective Depth Cues,” In proceedings of VPQM 2012, Scottsdale, AZ, USA, 2012 [7] M. Lambooij, W.A. IJsselsteijn, I. Heynderickx, “Visual discomfort of 3D TV: assessment methods and modeling,” Displays, vol. 32, no. 4, pp 209-218, October 2011 [8] S.L. Wu, J. Caviedes, L. Karam, I. Heynderickx, “Development of a Practical Methodology to Evaluate the 3D Visual Experience,” ACM Transactions on Applied Perception, submitted for publication

116

Suggest Documents