STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV

STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV Liang Zhang, Wa James Tam Communications Research Centre Canada 3701 Carling Avenue, Ott...

Author: Sibyl Glenn

2 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Depth Adjustment for Depth-Image-Based Rendering in 3D TV System

A 3D-TV System Based On Video Plus Depth Information

Depth Image Representation for Image Based Rendering

Managing Stereoscopic Content For 3D-TV Viewing PART 3

Depth Camera Based System for Auto-Stereoscopic Displays

Bundle Adjustment for Stereoscopic 3D

Enhanced Depth Discrimination Using Dynamic Stereoscopic 3D Parameters

Color stereoscopic images requiring only one color image

Domain Model of Low-Budget First Generation Stereoscopic 3D TV Production

Synthetic stereoscopic panoramic images

Interacting with 3D Content on Stereoscopic Displays

LIDAR AND PICTOMETRY IMAGES INTEGRATED USE FOR 3D MODEL GENERATION

Displaying 3D Images: Algorithms for Single Image Random Dot Stereograms

Stereoscopic 3D Line Drawing

Stereoscopic Image Quality Compendium

Nonlinear Disparity Mapping for Stereoscopic 3D

Stereoscopic 3D video for the human eyes

Ghosting in Anaglyphic Stereoscopic Images

Stereoscopic 3D Workshop. Think in 3D

A Stereoscopic Fibroscope for Camera Motion and 3D Depth Recovery during Minimally Invasive Surgery

AUTOMATED IMAGE-BASED PROCEDURES FOR ACCURATE ARTIFACTS 3D MODELING AND ORTHOIMAGE GENERATION

Stereoscopic Projection 3D PROJECTION TECHNOLOGY

Stereoscopic cameras for the real-time acquisition of panoramic 3D images and videos

IMAGE-BASED 3D MODELLING: A REVIEW

STEREOSCOPIC IMAGE GENERATION BASED ON DEPTH IMAGES FOR 3D TV Liang Zhang, Wa James Tam Communications Research Centre Canada 3701 Carling Avenue, Ottawa, Ontario, K2H 8S2, Canada email: [email protected] dimensional (2D) function that gives the depth, with respect to the camera position, of a point in the visual scene as a function of the image coordinates. Since the depth of every point in an original image is known, a virtual image of any nearby viewpoint can be rendered by projecting the pixels of the original image to their proper 3D locations and reprojecting them onto the virtual image plane. Thus, DIBR permits the creation of novel images, using information from the depth maps, as if they were captured with a camera from different viewpoints. A further advantage of the DIBR approach is that depth maps can be coded more efficiently than two streams of natural images, thereby reducing the bandwidth required for transmission. In this vein, it is not only suitable for 3D TV but also for other 3D applications such as multimedia systems [6]. One disadvantage of the DIBR approach is that with this type of data representation, one or more “virtual” images of the 3D scene have to be generated at the receiver side in real time. In addition, it is not an easy task to create new, virtual, images with high image quality. The most significant problem in DIBR is how to deal with newly exposed areas (holes) appearing in the virtual images. Holes are due to the accretion (disocclusion) of portions/regions of objects or background that would have been visible only from the new viewpoint but not from the original location that was used in capturing the original image. There is no information in the original image for these disoccluded regions and, therefore, they would appear empty, like holes, in the new virtual image. A simple way to ‘fill’ these holes is to map a pixel in the original image to several pixels in the virtual image by simple interpolation of pixel information in the foreground or

ABSTRACT A depth-image-based rendering system for generating stereoscopic images is proposed. One important aspect of the proposed system is that the depth maps are pre-processed using an asymmetric filter to smoothen the sharp changes in depth at object boundaries. In addition to ameliorating the effects of blocky artifacts and other distortions contained in the depth maps, the smoothing reduces or completely removes newly exposed (disocclusion) areas where potential artifacts can arise from image warping which is needed to generate images from new viewpoints. The asymmetric nature of the filter reduces the amount of geometric distortion that might be perceived otherwise. We present some results to show that the proposed system provides an improvement in image quality of stereoscopic virtual views while maintaining reasonably good depth quality. Keywords: Three-dimensional television, Stereoscopic image, stereoscopic image generation, depth-image-based rendering, asymmetric filter.

1. INTRODUCTION Depth-image-based rendering (DIBR) techniques have recently received much attention in the broadcast research community as a promising technology for three-dimensional television (3D TV) systems [1]-[3]. Whereas, the classical approach requires the transmission of two streams of video images [4][5], one for each eye, 3D TV systems based on DIBR will require a single stream of monoscopic images and a second stream of associated images, usually termed depth images or depth maps, that convey per-pixel depth information. A depth map is essentially a two-

1

background. More complex extrapolation technique might also be used [3]. However, these filling techniques are known to produce visible disocclusion artifacts in the virtual images, whose severity depends on the scene layout. To deal with these disocclusion artifacts in the virtual images several approaches have been suggested. One approach, termed the layereddepth-image (LDI) [7], uses a set of original images of a scene and their associated depth maps. The images and depth maps store not only what is visible in the original image, but also what is behind the visible surface. Note that while this approach is likely to produce very accurate virtual images, it is more computationally demanding and it requires more bandwidth for transmission. An alternative approach involves pre-processing of the depth maps. Recently, we adopted this latter approach and pre-processed depth maps using a symmetric 2D Gaussian filter, so that the disocclusion artifacts were incrementally removed as the smoothing of depth maps became stronger [8][9]. Experimental results using formal subjective evaluation techniques indicated that this technique (symmetric smoothing) could be used to significantly improve the image quality of novel stereoscopic views especially when there are blocky artifacts or noise in the depth maps and potential distortions in the newly generated images as a result of disocclusion [8][9]. The notion of smoothing depth maps to remove disocclusion artifacts has been advocated by other authors as well [10]. In this paper, we propose a new system for stereoscopic image generation based on depth images to deal with the disocclusion artifact in virtual images [11]. Different types of artifacts and distortions that could appear in the virtual images are then experimentally investigated for different system parameters. Based on the investigation, we present a new concept of asymmetric smoothing of depth maps for DIBR that can reduce artifacts and distortions in the virtual images and provide an improvement in the image quality. The remaining portions of this paper are organized as follow. In Section 2, we illustrate the proposed rendering system. Section 3 is devoted to experimental investigation with different system setups. In Section 4 we propose the concept of asymmetric smoothing of depth maps. Section 5

provides a discussion of experimental results using natural depth images. Conclusions can be found in Section 6.

2. DEPTH-IMAGE-BASED RENDERING SYSTEM A flowchart describing the proposed depthimage-based rendering system is illustrated in Fig. 1. This system consists of three parts: (i) preprocessing of depth maps, (ii) 3D image warping and (iii) hole-filling. Note that part (iii) is not necessary if there are no holes to fill as a result of optimal pre-processing of depth maps. In the following, these three parts will be addressed in detail.

Fig. 1. Flowchart of the proposed depth-imagebased rendering system.

A. Pre-processing of depth maps The pre-processing of depth maps includes two issues: choosing the convergence distance Zc (socalled zero-parallax setting (ZPS)) [12] and smoothing the depth maps. There are several methods that can be used to establish a ZPS [12]. In the so-called toed-in approach, the ZPS is chosen by a joint inward rotation of the left-eye and right-eye cameras. In the so-called shift-sensor approach, a plane of convergence is established by a small lateral shift h of the CCD sensors in the pair of parallel cameras. Different from these two methods, in the present rendering system the ZPS is chosen by “shifting” the depth map. Without loss of generality, we choose Z c = ( Z near − Z far ) / 2

as the ZPS plane, where Znear and Zfar are the nearest clipping plane and the farthest clipping plane of the depth map. In an 8-bit depth map, Znear= 255 and Zfar=0 (Fig. 2). After that, the depth map is further normalized with the factor of 255, so that the values of the depth map lie in the interval of [–0.5, 0.5], values that are required by the image-warping algorithm.

2

The second issue in the pre-processing step is to smooth the depth maps. To this end, different filter types can be used. For simplicity, a Gaussian filter g(x, σ) g ( x,σ ) =

⎧ x2 ⎫ 1 exp⎨− 2 ⎬ , 2π σ ⎩ σ ⎭

for

−

w w ≤x≤ , 2 2

⎧ w/ 2 ⎫ ⎨ ∑ (s ( x − µ , y − υ ) g ( µ , σ µ ) )g (υ , σ υ )⎬ = − w / 2 ⎩µ = − w / 2 ⎭ (2) w/ 2 ⎧ w/ 2 ⎫ ∑ ⎨ ∑ g (µ , σ µ ) g (υ , σ υ )⎬ υ = − w / 2 ⎩µ = − w / 2 ⎭ It is expected that different values of w and σ have different impact on the quality of the virtual images generated from the original center image. We will discuss this issue in the next section. In the following experiment, we let the filter’s window size w be equal to 3σ. w/2

∑ υ

(1)

is employed, where w is the filter’s window size and σ the standard deviation. The value of σ determines the depth smoothing strength. Let s(x,y) be a depth value in the depth map at the pixel (x,y). Then, depth value ŝ(x,y) after smoothing using a Gaussian filter is equal to

Fig. 3. Camera configuration used for generation of virtual stereoscopic images.

B. 3-D image warping For simplicity, we only consider the commonly used parallel camera configuration for generating virtual stereoscopic images from one center image associated with one depth map for 3D TV (Fig.3). In this case, the vertical coordinate of the projection of any 3D point on each image plane of three cameras is the same. Let cc be the viewpoint of the original center image, cl and cr the viewpoint of the virtual left-eye and right-eye images to be generated. f is the focal length of three cameras. tx is the baseline distance between two virtual cameras. Under this camera configuration, one point p with the depth Z in the world (of dimensions X, Y, Z) is projected onto the image plane of three cameras at pixel (xl, y), (xc, y) and (xr, y), respectively. From the geometry shown in Fig. 3, we have

Fig. 2. The test image: “Interview”. The original image is on the top and its associated unprocessed depth map is on the bottom. A lower luminance value in the depth map means that the objects are farther away from the camera.

xl = xc +

tx f 2 Z

, x r = xc −

tx f , 2 Z

(3)

where information about xc and Z is given in the center image and the associated depth map, respectively. Therefore, with formulation (3) for 3D image warping, the virtual left-eye and righteye images can be generated from the original

3

center image and its depth map by providing the value of the baseline distance tx and focal length f. Without loss of generality we choose the focal length f to be equal to one in the experiments. Based on the ZPS defined in Section II and preprocessing of depth maps, the value of the baseline distance tx also indicates the depth range appearing in the generated stereoscopic image pair. According to the image warping formulation (3), the disparity (xl - xr) involved in the rendered left-eye and righteye images is proportional to the baseline distance tx. A large disparity value indicates that the object point in the real world is far away from the ZPS, while a small value means that the object point is close to the ZPS.

information about the previously occluded areas is available neither in the monoscopic images nor in the accompanying depth maps. From the figure, we can see that the newly exposed areas are located mainly along the boundaries of objects and also the right margin of the whole image. After hole-filling, as shown in Fig. 4 (c), significant texture artifacts appear at object boundaries in the virtual image. Fig. 4 (d) and (e) show more clearly the artifacts by enlarging segments of Fig. 4 (c). We then evaluate the performance of the proposed system with smoothing of the depth maps. Similar to [8][9], we let σµ and συ of two Gaussian filters that are applied separately along the vertical and horizontal directions, respectively, have the same value. We term this process symmetric smoothing. Fig. 5 shows the results of the virtual left-eye image with σµ=συ=30. The depth map after symmetric smoothing is shown in Fig. 5 (a). The image after 3D image warping is illustrated in Fig. 5 (b). White areas represent newly exposed areas. Compared to Fig. 4 (b), we can see from Fig. 5 (b) that the newly exposed areas along object boundaries almost disappear except for the area in the right margin of the whole image. This can be explained as follows. Due to the smoothing of the depth map, there are no more sharp depth discontinuities. In other words, the disocclusion areas have become sparse because of smoothing and even disappear as the smoothing becomes stronger. Fig. 6 shows the relation between the newly exposed areas and the depth smoothing strength (as determined by σ) in the virtual left-eye image for different baseline distances. The newly exposed areas are represented as the ratio of the number of newly exposed pixels over the total number of pixels in the image. We term this ratio the disocclusion ratio. As the depth smoothing strength becomes stronger, the disocclusion ratio decreases gradually until it reaches a constant value. This constant value is simply due to the persistence of newly exposed areas at the image margins. Also, from Fig. 6 it can be seen that the minimum depth smoothing strength to reach a constant value of newly exposed areas is dependent on the baseline distances tx. For the test image “Interview”, it is approximately one quarter of the baseline distance.

C. Disocclusion and hole-filling

Due to a difference in viewpoints, some areas that are occluded in the original image might become visible in the virtual left-eye or the righteye images. These newly exposed areas, referred to as “disocclusion” in the computer graphics literature, have no texture after 3D image warping because information about the disocclusion area is available neither in the center image nor in the accompanying depth map. We fill in the newly exposed areas by averaging textures from neighborhood pixels, and this process is called hole-filling.

3. INVESTIGATION OF DIFFERENT SYSTEM SETUPS This section is devoted to investigating the performance of the proposed rendering system using natural images. As an example, only results with the test image “Interview” (Fig. 2) are shown. The image and its corresponding depth map were generously supplied by Fraunhofer HHI (HeinrichHertz-Institut), Germany. In the following investigation, the distance between the two virtual left-eye and right-eye cameras is fixed at 48 pixels for illustration. First, we investigate the performance of this system without smoothing the depth maps. Fig. 4 shows an example of the results of the virtual lefteye image. The depth map without pre-processing is shown in Fig. 4(a). The image after 3D image warping is illustrated in Fig. 4 (b). In the figure, the white areas, i.e., the holes, are the newly exposed areas. Recall that these holes are produced because

4

(a)

(d)

(b)

(e) Fig. 4. Virtual left-eye image generated without smoothing of the depth map. (a) Depth map without smoothing; (b) Image after 3D image warping. White areas represent newly exposed areas; (c) Image after hole-filling; (d) and (e) Artifacts clearly seen in enlarged segments of the image from (c). Comparison of Fig. 4 to Fig. 5 shows that simple hole-filling produces significant texture artifacts whereas symmetric smoothing virtually eliminates these artifacts. However, symmetric smoothing still produces some distortion. Specifically, vertically straight object boundaries now can become curved, depending on the depth in neighbouring regions. This can be more clearly seen in Fig. 5 (d) and (e), which show enlarged segments of the image in Fig. 5 (c). We call this type of distortion, geometric distortion. The origin of this type of distortion can be explained as follows. Let us examine the table leg in Fig. 5 (d). In the unprocessed depth map, this table leg has the same depth value along its vertical

(c)

5

length and, at the bottom of the leg, the legs of the man and the woman with relatively large depth are in its neighborhood (as can be seen in Fig. 2). After processing, due to smoothing of the depth map in the horizontal direction and at a level that is as strong as that in the vertical direction, the bottom of the table leg has a slightly larger value than that of its top (Fig. 5 (a)). This creates a curved table leg after 3D image warping.

(c)

(a)

(d)

(b)

(e) Fig. 5. Intermediate steps in the generation of a virtual left-eye image using symmetric smoothing of the depth map with σµ=συ=30. (a) Depth map after symmetric smoothing; (b) Image after 3D image warping. White areas (right margin of image) are newly exposed areas; (c) Image after

6

% Newly exposed area

hole-filling; (d) and (e) Enlarged segments of the image shown in (b). Notice the curved table leg in (d) and the curved vertical lines in (e), even though the overall output is significantly better than that without smoothing [cf. Fig. 4(e)].

4. ASYMMETRIC SMOOTHING OF DEPTH MAP The analysis of the underlying reason for the geometric distortions in the previous section suggests that the strength of the smoothing of depth maps in the horizontal direction should be less than that of the smoothing in the vertical direction, so that vertical objects, e.g. the table leg, have similar depth values throughout after depth smoothing. We call this asymmetric smoothing. The concept of asymmetric smoothing is consistent with known characteristics of the binocular system of the human eyes. The human visual system obtains depth cues from disparity mainly from horizontal differences rather than vertical differences between the images that are projected to the left and the right eyes. This allows us to filter the depth map stronger in the vertical than in the horizontal direction. In other words, we can use an asymmetric filter to smoothen the sharp depth changes in a manner that will overcome the disocclusion problem and in the meantime will still provide good, reasonable disparity cues. Fig. 7 shows the results of rendering using asymmetric smoothing of the depth map with σµ=10 and συ=90. The depth map after asymmetric smoothing is shown in Fig. 7 (a). The virtual lefteye and right-eye images generated from the original center image (Fig. 2) and the processed depth map are shown in Fig. 7 (b) and (c). Two enlarged segments from the left-eye image are shown in Fig. 7 (d) and (e) for clarity. It can be seen from Fig. 7 (d) and (e) that geometric distortions are strongly reduced compared to Fig. 5 (d) and (e). Also, no texture artifacts appear. In general asymmetric smoothing results in virtual images that have sharper texture and higher image quality. When viewed in a stereoscopic display, they also create reasonably good and stable depth.

8 7 6 5 4 3 0

5

10

15

20

Smoothing strength

% Newly exposed area

(a) 6 5 4 3 2 0

5

10

15

20

Smoothing strength % Newly exposed area

(b) 4 3.5 3 2.5 2 1.5 1 0

5

10

15

20

Smoothing strength

(c) Fig. 6. Relation between depth smoothing strength and newly exposed areas as a percentage of the total area for the test image “Interview”. Three graphs are shown for the three baseline distances that were used: (a) 48 pixels, (b) 36 pixels, and (c) 20 pixels. Note that in each figure the percentage of newly exposed area decreases with smoothing strength.

7

(a)

(d)

(b)

(e) Fig. 7. Virtual images generated using asymmetric smoothing of the depth map with σµ=10 and συ=90. (a) Depth map after asymmetric smoothing; (b) Virtual left-eye image; (c) Virtual right-eye image; (d) and (e) Enlarged segments from the image shown in (b). Note vertical lines are now straight compared to Fig. 5(e).

5. EXPERIMENTAL RESULTS AND DISCUSSIONS To further evaluate the performance of the proposed rendering system with asymmetric smoothing, experiments with additional natural depth image sequences were carried out.

(c)

A. Test depth image sequences

Samples of three additional stereo video sequences and their corresponding depth maps used

8

in the experiments are shown in Fig. 8. From top to bottom are the test depth images: “Puppy”, “Soccer” and “Tulips”. The depth maps for the first two images were obtained from the same institution that generously provided the source images [Electronics and Telecommunications Research Institute (ETRI), Korea]. These depth maps had 8×8-block depth resolution and were not as stable in that there were blocky artifacts that appeared and disappeared over time. The depth map of the third sequence, “Tulips”, was estimated using our own in-house developed software for disparity estimation [13]. It had pixel depth resolution with pel accuracy and was relatively stable, although it contained some inaccuracies appearing in the left side of the walking woman. The image size of all of the depth maps was 720×480 pixels.

and right-eye images for our images, which had a spatial resolution of 720×480. For depth smoothing strength, we chose σµ to be equal to 9 using the empirical relation found in section III that the depth smoothing strength is approximately equal to one quarter of the baseline distance. In the case of symmetric smoothing, the smoothing strength in the vertical direction was the same as that in the horizontal direction. In the case of asymmetric smoothing, the smoothing strength was chosen to be five times the value in the horizontal direction. In both cases, the filter’s window size was set to 3 times the depth smoothing strength σ. C. Experimental results

Figs. 9, 10 and 11 show the rendered left-eye images with a baseline distance of 36 pixels based on the depth images: “Puppy”, “Soccer” and “Tulips”. In each figure, three images separately demonstrate the results obtained with no depth smoothing, symmetric depth smoothing and asymmetric depth smoothing. Figs. 12, 13 and 14 show the enlarged segments of the original image and the rendered images to allow comparison of the image quality in further detail. In each figure, aside from the original, three segments that are cut from the original image are shown: the rendered image without depth smoothing, the rendered image with symmetric depth smoothing and the rendered image with asymmetric depth smoothing. Comparison of segment (b) to segments (c) and (d) in Figs. 12~14 shows that texture artifacts in the rendered image are completely eliminated by depth smoothing. Please compare the flower on the top left in Fig. 12, the “ETRI” logo in Fig. 13 and the boundary of the woman in Fig. 14. Comparing segment (c) to segment (d) in Figs. 12~14, it can be seen that geometric distortions are strongly reduced by asymmetric depth smoothing. Curved boundaries are now straight, e.g., the letter behind the flower on the top-left in Fig. 12 and the “ETRI” logo in Fig. 13. These visual comparisons indicate that with asymmetric depth smoothing the quality of the rendered image can be further improved compared to that obtained with symmetric depth smoothing.

Fig. 8. Sample of three image sequences and their corresponding depth maps. From top to bottom are “Puppy”, “Soccer” and “Tulips”. B. Parameter selection

The depth range for the rendered virtual stereoscopic images was selected so that the image was comfortable to view. Several studies suggest [14] that the maximum depth range that is still comfortable for viewing is 1o disparity or approximately 5% of the width of a standard 4×3 image at a viewing distance of 4H (four times image height). Therefore, we chose the baseline distance of 36 pixels to render the virtual left-eye

9

Fig. 9. Rendered left-eye images (with a baseline distance of 36 pixels) based on the depth image “Puppy”. From top to bottom are the rendered results with no depth smoothing, symmetric depth smoothing and asymmetric depth smoothing.

Fig. 10. Rendered left-eye images (with a baseline distance of 36 pixels) based on the depth image “Soccer”. From top to bottom are the rendered results with no depth smoothing, symmetric depth smoothing and asymmetric depth smoothing, respectively.

10

(a)

(b)

(c) (d) Fig. 12. Enlarged segments of the image “Puppy”. (a) Original image; (b) rendered image without depth smoothing; (c) rendered image with symmetric depth smoothing, and (d) rendered image with asymmetric depth smoothing. Please compare the area on the top left of the segment.

(a)

(b)

Fig. 11. Rendered left-eye images (with a baseline distance of 36 pixels) based on the depth image “Tulips”. From top to bottom are the rendered results with no depth smoothing, symmetric depth smoothing and asymmetric depth smoothing, respectively.

(c) (d) Fig. 13. Enlarged segments of the image “Soccer”. (a) Original image; (b) rendered image without depth smoothing; (c) rendered image with symmetric depth smoothing, and (d) rendered image with asymmetric depth smoothing. Please

11

compare the “ETRI” logo with respect to geometric distortion.

images were unaffected by mild smoothing. With strong smoothing, depth quality was reduced but ratings were still significantly higher than the ratings for non-stereoscopic reference images.

Table 1. Parameters (in pixels) used for smoothing depth maps in subjective image quality assessment. H = horizontal direction, V = vertical direction. (a)

(b)

Level of Smoothing

Symmetric (σ, w)

Asymmetric (σ, w)

None

H = 0, 0 V = 0, 0 H = 4, 13 V = 4, 13 H = 20, 61 V = 20, 61

H = 0, 0 V = 0, 0 H = 4, 13 V = 12, 41 H = 20, 61 V = 60, 193

Mild Strong

Table 2. Mean ratings of image quality and standard errors (in parentheses) for the different levels of smoothing, for both symmetric and asymmetric conditions. See main text for details.

(c) (d) Fig. 14. Enlarged segments of the image “Tulips”. (a) Original image; (b) rendered image without depth smoothing; (c) rendered image with symmetric depth smoothing; and (d) rendered image with asymmetric depth smoothing.

Symmetric

D. Subjective evaluation

Asymmetric

The advantage of asymmetric smoothing over symmetric smoothing with respect to image quality was confirmed by a formal subjective assessment study [15]. Ten viewers rated the image quality of stereoscopic sequences in which the view to the one eye consisted of rendered images based on either symmetric or asymmetric smoothing of the depth maps; the other view consisted of the original images. Viewers rated the stereoscopic sequences using the double-stimulus continuous-quality scale method that is a standard procedure as described in ITU-R Recommendation 500 [16]. Ratings were based on a scale of 0 to 100, ranging from "Bad" to “Excellent”. The strength of smoothing was varied at three levels of “None”, “Mild”, and “Strong”, with smoothing parameters as shown in Table 1. In general, asymmetric smoothing involved level of vertical smoothing that was three times that in the horizontal direction. As shown in Table 2, ratings based on asymmetric smoothing were higher than those based on symmetric smoothing. Not shown in the table is that the depth quality of the stereoscopic

None 44.8 (8.1) 48.6 (6.6)

Mild 52.9 (4.4) 58.0 (3.6)

Strong 62.1 (2.5) 68.4 (1.9)

E. Discussions Depth-image-based rendering has the inherent problem of having to deal with disocclusion areas. Filling in these “holes” so as to create new images with high image quality is not easy. In this paper we propose pre-processing of the depth maps to smooth the sharp changes in depth at object boundaries. In addition to ameliorating the effects of blocky artifacts and other distortions contained in the depth maps that might be caused by noise, depth estimation, or coding of the depth maps, the smoothing reduces or completely removes disocclusion areas where potential texture artifacts can arise from image warping. In a previous experimental study, we found that subjective ratings of image quality in the stereoscopic virtual views can be improved with symmetric smoothing [8][9]. Results presented in this paper demonstrate

12

that asymmetric smoothing provides a significant improvement in image quality over symmetric smoothing by reducing the amount of geometric distortion that might be present otherwise. In addition to improving overall image quality of the virtual views, smoothing depth maps can potentially lead to other benefits: 1. Smoothing reduces the contrast of depth maps and, thus, narrows the range of disparities contained in the rendered images. This will lead to increased visual comfort for viewing virtual stereoscopic images that are rendered from depth maps with large disparities. 2. Smoothing reduces the sharp transitions at the edges and borders of objects that are in front of a background and, therefore, smoothes out the depth at and near the outlines of objects. Informal observations indicate that this removal of “crispness” at the borders of objects reduces chances of perceiving the “cardboard effect” in which objects appear in depth but appear to be flat like a sheet of cardboard. Nevertheless, smoothing of depth maps, while attenuating some artifacts, will lead to a diminution of the depth resolution contained in the rendered stereoscopic views. Future studies will be required to examine this trade-off more closely, although based on the previous and current studies the benefits appear to outweigh this disadvantage.

smoothing reduces the percentage of disoccluded areas that are required to be filled in the rendering process. This is important because previous studies have worked on the premise that smoothing of areas within objects is good but that smoothing across the borders of objects (intra- vs. interregions) is something that is to be avoided because it reduces the depth between objects and their background [17][18]. For this reason, there are suggestions to reduce the smoothing at and around edges of objects where there tend to be sharp transitions in depth [19]. In the case of depthimage-based rendering, we suggest that, in addition to smoothing within objects, smoothing at borders can be beneficial because of the resulting reduction in the size of the areas that have to be filled. In turn, it leads to a reduction in the number and visibility of potential artifacts from the rendering and hole-filling process. Finally, the present investigation has a significant implication for 3D-TV and other stereoscopic display systems that are based on depth-image-based rendering. It is often thought that the spatial resolution of depth maps should be as high as possible, so as to obtain rendered views of the highest quality. The present results suggest that this need not be the case. We have shown that smoothing depth maps before the rendering of new views (i.e., the process effectively reduces the spatial resolution of the depth maps) actually helps improve the image quality of the rendered images.

6. CONCLUSIONS In this paper, we propose an algorithm for depthimage-based generation of virtual stereoscopic images. In order to minimize texture artifacts appearing in the newly exposed (disocclusion) areas of the virtual image, smoothing of the depth maps is proposed. Experimental results indicate that symmetric smoothing can create geometric distortions, leading to vertical straight boundaries becoming curved. To reduce this distortion, asymmetric smoothing of depth map is proposed. We have shown that asymmetric smoothing is an improved technique that can significantly reduce geometric distortions at the same time as removing texture artifacts. Reasonably good depth quality can be maintained with this algorithm. The present results are significant in demonstrating that smoothing of depth maps is beneficial not only in alleviating problems from noise and blocky artifacts in depth maps but that

7. REFERENCES [1] A. Redert, M. Op de Beeck, C. Fehn, W. IJsselsteijn, M. Pollefeys, L. Van Gool, E. Ofek, I. Sexton, P. Surman, “ATTEST – advanced three-dimensional television system techniques”, Proceedings of 3DPVT’ 02, pp. 313-319, Padova, Italy, Jun. 2002. [2] J. Flack, P. Harman, S. Fox, “Low bandwidth stereoscopic image encoding and transmission”, Proceedings of SPIE Conference on Stereoscopic Displays and Virtual Reality Systems X, Vol. 5006, pp. 206-214, CA, U.S.A., Jan. 2003. [3] C. Fehn, "Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV”, Proceedings of SPIE Conference on Stereoscopic Displays and

13

Virtual Reality Systems XI, Vol. 5291, pp. 93104, CA, U.S.A., Jan. 2004. [4] M. Ziegler, L. Falkenhagen, R. Horst, D. Kalivas, “Evolution of stereoscopic and threedimensional video”, Signal Processing: Image Communication, Vol. 14, pp. 173-194, 1998. [5] Y. Luo, Z. Zhang, P. An, “Stereo video coding based on frame estimation and interpolation”, IEEE Transactions on Broadcasting, Vol. 49, No. 1, pp. 14-21, 2003. [6] H. Mitsumine, H. Noguchi, K. Enami, Y. Ninomiya, Y. Yamanoue, S. Yano, A. Hanazato, M. Okui, “Virtual Museum-3-D fine art appreciation system”, IEEE Transactions on Broadcasting, Vol. 42, No.3, pp. 200-207, Sept. 1996. [7] J. Shade, S. Gortler, L. He, R. Szeliski, “Layered depth image”, Proceedings of SIGGRAPH’98, pp. 231-242, Jul. 1998. [8] G. Alain, W. J. Tam, L. Zhang, “Improving stereoscopic image quality of pictures generated from depth maps”, Internal CRC report, Communications Research Centre Canada, Ottawa, Apr. 2003. [9] W. J. Tam, G. Alain, L. Zhang, T. Martin, R. Renaud, "Smoothing depth maps for improved stereoscopic image quality”, Proceedings of SPIE Conference on Three-dimensional TV, Video, and Display III, Vol. 5599, pp.162-172, Philadelphia, U.S.A., Oct. 2004. [10] C. Fehn, “A 3D-TV approach using depthimage-based rendering (DIBR)”, Proceedings of VIIP 03, Benalmadena, Spain, Sept. 2003. [11] L. Zhang, J. Tam, D. Wang, “Stereoscopic image generation based on depth images”, Proceedings of IEEE Conference on Image Processing, pp. 2993-2996, Singapore, Oct. 2004. [12] A. Woods, T. Docherty, R. Koch, “Image distortions in stereoscopic video systems”, Proceedings of SPIE Conference on Stereoscopic Displays and Applications, pp. 3648, San Jose, CA, USA, Feb. 1993. [13] L. Zhang, D. Wang, A. Vincent, “Reliability measurement of disparity estimates for intermediate view reconstruction”, Proceedings of IEEE Conference on Image Processing, Vol. 3, pp. 837-840, Rochester NY, USA, September 2002. [14] I. P. Howard, B. J. Rogers, “Binocular vision

and stereopsis”, New York: Oxford University Press, 1995. [15] W. J. Tam, L. Zhang, “Non-uniform smoothing of depth maps before image-based rendering”. Proceedings of SPIE Conference on Three-dimensional TV, Video, and Display III, Vol. 5599, pp.173-183, Philadelphia, U.S.A., Oct. 2004. [16] ITU-R Recommendation BT.500-7, “Methodology for the subjective assessment of the quality of television pictures”, 1974-1997. [17] P. Belhumeur, D. Mumford, “A Bayesian Treatment of the stereo correspondence problem using half-occluded regions”, Proceedings of CVPR’92, pp. 506-512, 1992. [18] T. Kanade, M. Okutomi, “A stereo matching algorithm with an adaptive window: theory and experiment”, IEEE Transactions on Pattern Recognition and Machine Intelligence, Vol. 16, No. 9, pp. 920-932, 1994. [19] J. Yin, J. R. Cooperstock, “Improving depth maps by nonlinear diffusion”, Proceedings of the 12th International Conference on Computer Graphics, Visualization and Computer Vision, Plzen, Czech Republic, Feb. 2004.

14