Depth Adjustment for Depth-Image-Based Rendering in 3D TV System

Journal of Information & Computational Science 8: 16 (2011) 4233–4240 Available at http://www.joics.com Depth Adjustment for Depth-Image-Based Render...
Author: Isaac Wilson
3 downloads 3 Views 3MB Size
Journal of Information & Computational Science 8: 16 (2011) 4233–4240 Available at http://www.joics.com

Depth Adjustment for Depth-Image-Based Rendering in 3D TV System ⋆ Ran LIU a,b,c , Hui XIE a,∗, Guoqin TAI a , Yingchun TAN a , Ruili GUO a , Wenyi LUO a , Xiaoyan XU b , Junling LIU a a College

of Communication Engineering, Chongqing University, Chongqing 400044, China

b College

of Computer Science, Chongqing University, Chongqing 400044, China

c Panovasic

Technology Co., Ltd, Changhong Group, Chengdu 610031, China

Abstract This paper focuses on how to adjust the perceived depth for depth-image-based rendering (DIBR) in a 3D TV system. The 3D image warping equations for DIBR are introduced first. In our system the shift-sensor camera setup is used, and the 3D image warping equations are simplified to a formula which implies horizontal sensor parallax. Then, the geometry of 3D TV display system is analyzed, and the perceived depth is expressed as a function of horizontal screen parallax. Finally, the relationship between horizontal sensor parallax and horizontal screen parallax is given. Based on this relationship, a depth adjustment method which controls the amount of horizontal sensor parallax of stereoscopic images by changing the values of variables in camera space during the processing of DIBR is proposed. Subjective evaluations show that the proposed depth adjustment method can generate comfortable stereoscopic images with different perceived depths that people desire. Keywords: 3D TV; Depth Image Based Rendering; Depth Adjustment; 3D Image Warping; Visual Fatigue

1

Introduction

With the success of 3D movie "Avatar", 3D TV has become more and more popular, and people desire a number of high-quality 3D contents to be produced. Currently, the most popular 3D contents ("Avatar" included) are acquired and transmitted with the conventional 3D video format, which is composed of two streams of video [1, 2]. However, even if the 3D contents represented by conventional 3D video format are filmed with high quality (e.g. high definition, precise control of the shooting conditions), they are not suitable for all people. This is because the naturalness and comfort of 3D content can be distorted when the viewing circumstances or the target display are changed, and furthermore, there is a difference in depth appreciation over different groups ⋆ ∗

Project supported by the Fundamental Research Funds for the Central Universities (No. CDJZR10180013). Corresponding author. Email address: [email protected] (Hui XIE).

1548–7741/ Copyright © 2011 Binary Information Press December 2011

4234

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

of people [3]. It is therefore important to provide the viewer with the ability to adjust the reproduction of depth to suit his/her own personal preferences [3]. Fortunately, the depth-image-based 3D video format, which consists of regular 2D color video and an accompanying depth-image sequence with the same spatial-temporal resolution, provides the ability to adjust the perceived depth for different people [1, 3, 4, 5]. The key to this advantage lies in the fact that this kind of 3D video uses so-called depth-image-based rendering (DIBR) techniques to generate stereoscopic images instead of providing two channels of left- and right-eye video directly [1, 3, 4]. A well-known problem in the 3D video is visual fatigue. Several studies have been carried out to avoid this trouble [3, 6, 7, 8]. Most of them reduce the visual fatigue by changing shooting conditions and viewing circumstances, which are not suitable for the newly depth-image-based 3D video format. Different from these traditional methods, we tried to introduce depth adjustment, which uses DIBR techniques for the newly developed format, to reduce visual fatigue. Two main processing steps are closely concerned with this topic: (1) stereo pair creation, which is completed in the camera space; (2) stereo image display, which is conducted in the viewer space. The remaining portions of this paper are organized as follow. In Section 2, we illustrate how stereo pairs are created using the DIBR techniques, and provide the simplified 3D image warping equations that imply the horizontal sensor parallax. Section 3 is devoted to investigating the geometry of 3D TV display system on which the generated images will be displayed, the nonlinear relationship between horizontal screen parallax and perceived depth is thoroughly discussed. By connecting horizontal sensor parallax with horizontal screen parallax, Section 4 provides a depth adjustment method, which can fulfill the function by only changing the values of variables in camera space during the processing of DIBR. Section 5 provides a discussion of experimental results. Conclusions can be found in Section 6.

2

Stereo Pair Creation Using DIBR

As mentioned above, if a 3D TV system adapts the depth-image-based 3D video format, then DIBR is required to be performed at the receiver side so as to create stereo pairs. This section mainly discusses 3D image warping, which is the core step in DIBR [4, 9]. The procedure of 3D image warping can be illustrated by Fig. 1 (for simplicity, we only consider the commonly used shift-sensor camera setup). Assume Il is the reference image whose width size is Wi (in pixels), point ul = [ul vl 1]T ∈ Il , and Ir is the destination image with the same size that needs to be synthesized, point ur = [ur vr 1]T ∈ Ir . Without loss of generality, we assume that the world coordinate system xw yw zw equals the camera coordinate system of the first camera (left camera). In order to generate both negative and positive parallaxes, sensor shift is introduced to the camera setup: images Il and Ir are horizontally shifted h in opposite direction relative to their original positions respectively (h is measured in pixels, and h > 0). Thus the equations of 3D image warping can be described as   D = u − u = 2h − f × sx × B , c r l zw (1)  vl = vr , where f represents the focal length, and sx is the pixels per unit length (for example millimeter) along x-axis; zw equals the depth value of point ul ; Dc is usually called horizontal sensor parallax ;

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

4235

U = [xw yw zw 1]T Wi

Wi Il

Ir

ol'

ol

or' or

h y o

y' x

o'

ul = [ul vl

o'

x'

h

1]T y'

ur = [ur vr 1]T

y x'

x

o

f

zw

zr

yw xw

xr

B

Cl

yr Cr

Fig. 1: Illustration of shift-sensor camera setup for 3D image warping B is the length of baseline. Note that B usually takes a value of 65 mm for capturing stereo videos, which equals the average human interpupillary distance. If Il is taken as the left view, the right view Ir can be easily generated with Eq. (1). If Dc = 0, then zw = f × sx × B/(2h), which just represents the ZPS (zero-parallax setting) plane. As f , sx and B are usually constants for a shift-sensor camera setup, the choice of ZPS plane is mainly determined by the setting of h.

Geometry of 3D TV Display System screen

screen center

W = [xi yi zi]T xl

Hs

3

xr screen parallax

V Ws z x o

el = [-tx/2 0 0]T

y

er = [tx/2 0 0]T

tx

Fig. 2: Geometry of 3D TV display system The geometry of 3D TV display system can be illustrated by Fig. 2, in which the world

4236

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

coordinate system xyz is used for all objects in viewer space, and the unit length is millimeter (mm). Without loss of generality, we assume that the z-axis passes through the screen center, and the viewer’s left eye is located at el = [−tx /2 0 0]T , and right eye, at er = [tx /2 0 0]T (tx > 0). Under this viewing condition, the origin o is located at the position of cyclopean eye. With reference to Fig. 2, the following variables are used in the derivation of geometric of 3D TV display system: Ws : screen width; Hs : screen height; V : viewing distance, i.e. the distance from the viewer’s eyes to the screen of 3D TV; tx : the human interpupillary distance. Let W = [xi , yi , zi ]T denote a "virtual" point in viewer space, i.e. the point can only be stereoscopically viewed by the viewer when displayed on the screen. Note that due to the viewing condition shown in Fig. 2, the coordinate zi always satisfies zi > 0, for the viewer cannot see the objects behind him. If observing W through el and er , we will get stereo pairs xl = [xl yl zl ]T and xr = [xr yr zr ]T , which lie on the screen. From the geometry shown in Fig.2 we have d=

Ds × V , t x − Ds

(2)

where d = zi − V , Ds = xr − xl . d directly reflects the depth perception that the parallax brings to viewer’s eyes. We call d the perceived depth, and Ds is usually called horizontal screen parallax in contrast to horizontal sensor parallax. From Eq. (2) we know that, as variables V and tx are constants for the display configuration, d only varies with variable Ds . The curve of Eq. (2), which is drawn with parameters shown in Table [1], is shown in Fig.3. The values of Ws and Hs are chosen exactly equal the size of a 59 inches screen, and the viewing distance V is set to 3Hs , which is recommended by [7]. 12000 10000 8000

d

6000 4000 2000 0 -2000 -4000 -1400

-1200

-1000

-800

-600 Ds

-400

-200

0

200

Fig. 3: Plot of the horizontal screen parallax Ds versus the perceived depth d Fig.3 illustrates a nonlinear relationship between Ds and d. It can be seen that the curve becomes flatness when Ds tends to −Ws , and gives a steep rise when Ds tends to tx , which may lead to wrongly perceived depth. Only when Ds locates near 0, the curve changes approximate

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

4237

Table 1: Parameter setting for creating the curve shown in Fig.3 Parameter

Value

Screen width Ws

1306 mm

Screen height Hs

735 mm

Viewing distance V

3Hs ≈ 2200 mm

Human interpupillary distance tx

65 mm

linearly. Hence the objects of interest in viewer space should be located near the surface of the screen.

4

Depth Adjustment

As described in previous sections, the depth d that the viewer perceived is determined by the horizontal screen parallax Ds ; and the value of Ds is determined by the horizontal sensor parallax Dc . The relation between Ds and Dc can be described in Fig.4. Consequently, depth adjustment can be realized by controlling the amounts of the horizontal sensor parallax of stereoscopic images. Note that Dc and Ds belong to different spaces, and the variable Dc is measured in pixels while Ds is expressed in millimeters. So it is Wi and Ws . The edges of the right view are drawn with dashed lines, which means it is a virtual view generated by DIBR. screen

Wi

Wi

left view

left view

1080

display

ul DIBR

1920

depth image

right view

linear scaling

ur Dc

screen

generated stereo pair

Hs

Ds Ws

camera space

viewer space

Fig. 4: Relation between Dc and Ds

From Fig.4 we know that, the conversion of Ds to Dc is determined by Dc =

Ds × W i . Ws

(3)

Eq. (3) has nothing to do with the resolution of screen, hence it adapts to screens with different

4238

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

resolution. Substituting Eq. (1) and Eq. (2) into Eq. (3), then leads to the following equation: Dc Ws V tx Wi − Dc Ws (2hzw − f sx B)Ws V = (tx zw Wi ̸= (2hzw − f sx B)Ws ), tx zw Wi − (2hzw − f sx B)Ws

d =

(4)

where zw is the depth value of a point in left view. Based on Eq. (4), perceived depth d in viewer space can be calculated according to the variables in camera space. As for specific display and viewing conditions, variables in viewer space such as Ws , V and tx are unadjustable. Thus, depth adjustment can only be realized by changing the value of variables in camera space (e.g. B, f or h) during the processing of DIBR. This is the reason why depth-image-based 3D video is easier to provide safe and comfortable 3D experience than that of traditional 3D video.

5

Experiment

In this paper, "Ballet" sequence and its corresponding calibration parameters are used for depth adjustment test. Due to limited space, only frame 0 (labeled Il ) captured by camera 4 along with its depth image (labeled D) are shown here for illustration, and frame 0 is taken as the left view. Both the resolutions of Il and D are 1024 × 768 (Wi = 1024). The intrinsic parameters of camera 4 satisfy f · sx = 1908.250000. In the following experiment, f = 0.08, sx = 23853.125. The purpose of this experiment is to subjectively evaluate the perceived depth of stereo pairs generated with different parameters by 3D image warping. In order to simplify the adjustment, parameter h is set adjustable while other parameters are kept unchanged during the experiment. One can repeat the experiment in this way on other parameters. The experiment was conducted with dozens of viewers, and they sat approximately 2.2m from the screen. The values of unchanged parameters can be found in Table [1]. In our experiment, the value of h started from 0 and gradually increased to 256 (Wi /4), and the step is 25. Correspondingly, the viewer should see the objects out of screen "moves" into screen. The viewers were required to pay more attention to the women and man in "Ballet" sequence, whose depths are easier to perceive. Fig.5 shows the right views that synthesized by Eq. (1) with different values of h. Because of limited space, only the images when h = 0, 100, 200, 256 are shown in Fig.5. In order to show the influence of different h on holes, pre-processing of depth image and hole-filling are not performed here. As can be seen from Fig.5, the larger h is, the larger black margin appears at the left side of images, which indicates a small h is preferred to generate high quality images. Obviously, the "margin" effect appears due to the image shift for ZPS plane setting, and it seems unavoidable for shift-sensor camera setup. Another interesting result that can be found in Fig.5 is that holes in the "contents" of these images (e.g. holes beside the woman) are not changed much when h takes different value. This is because this kind of hole is mainly determined by parameter B and f . Comparing Fig.6 with Fig.5 (a), one can see that holes beside the woman have been reduced due to the decrease of B or f . The subjectively evaluations from viewers about the perceived depth produced by the stereo pairs reveal a wide range of responses. Some of the viewers could only perceive a small range of

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

(a) h = 0

(b) h = 100

(c) h = 200

(d) h = 256

4239

Fig. 5: Right views generated with different values of h

(a) h = 0, B = 0.33, f = 0.08

(b) h = 0, B = 0.65, f = 0.04

Fig. 6: Right views generated with different values of B or f in contrast to that of Fig.5 (a) depth, whereas others could perceive a large depth range. Some people could see more easily into the screen than out of the screen and others could more easily see out of the screen than into the screen, which is consistent with the description in [7]. Some people are more sensitive to depth change. To most viewers, values of h between 0 and 25 provide more comfortable 3D experience than other values. This indicate that in order for a stereoscopic image on a screen to be viewed by as many people as possible, the depth range should be minimized. In other word, the primary area of interest should be located near the surface of the screen by the appropriate choice of h, and people can perceive more comfort 3D scene by depth adjustment in DIBR system.

4240

R. LIU et al. / Journal of Information & Computational Science 8: 16 (2011) 4233–4240

In the experiment above, the values of h were arranged in a increasing manner. We also conducted the experiment with the values of h arranged in a random manner, and found that the viewers were easier to perceive the depth changes and easier to determine the values of h that may cause less visual fatigue.

6

Conclusions

This study provides a depth adjustment method for the depth-image-based 3D video format. We give the relations between the perceived depth and other variables in camera space in the form of mathematical formula. As a result, depth can be adjusted according to the formula by changing the values of variables in camera space during the processing of DIBR. We also performed experiments to verify the method. The experimental results show that, most viewers can perceive obvious depth changes when the parameters h (or B, f ) changes according to the proposed depth-adjustment formula. Hence by this depth adjustment method one can control the reproduction of depth to suit his/her own personal preferences. Note that in our experiment camera calibration parameters, such as intrinsic and extrinsic matrices, are used for DIBR in the experiment. However, the depth-image-based 3D video format defined in HDMI 1.4 ("L + depth" and "L + depth + graphics + graphics-depth") does not contain any calibration parameters [10]. How to apply the depth adjustment method to this format is our important research work in further.

References [1]

A. Smolic, K. Mueller, P. Merkle, A. Vetro, Development of a new MPEG standard for advanced 3D video applications, in: Proc. 6th International Symposium on Image and Signal Processing and Analysis, 2009, pp. 410 – 417.

[2]

R. Liu, Q. Zhu, X. Xu, L. Zhi, H. Xie, J. Yang, X. Zhang, Stereo effect of image converted from planar, Information Sciences, 178 (2008) 2079 – 2090.

[3]

D. Kim, K. Sohn, Visual Fatigue Prediction for Stereoscopic Image, IEEE Transactions on Circuits and Systems for Video Technology, 21 (2011) 231 – 236.

[4]

X. Yang, J. Liu, J. Sun, X. Li, W. Liu, Y. Gao, DIBR based view synthesis for free-viewpoint television. in: Proc. 5th 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video(3DTV-CON), 2011, pp. 1 – 4.

[5]

L. Shi, H. Wang, Y. Wang, Depth map noise handling in multi-view stereo matching with two-stage segmentation, Journal of Information and Computational Science, 7 (2010) 989 – 996.

[6]

F. L. Kooi, A. Toet, Visual comfort of binocular and 3D displays, Displays, 25 (2004) 99 – 108.

[7]

ITU, Subjective assessment of stereoscopic television pictures, 2000.

[8]

A. Woods, T. Docherty, R. Koch, Image distortions in stereoscopic video systems, Stereoscopic displays and applications IV, (1993) 36 – 48.

[9]

P. Lee, Effendi, Nongeometric Distortion Smoothing Approach for Depth Map Preprocessing, IEEE Transactions on Multimedia, 13 (2011) 246 – 254.

[10] HDMI Licensing LLC, High-Definition Multimedia Interface Specification, Version 1.4, 2010.