A Real Time 2D to 3D Image Conversion Techniques Miroslav Nikolov Galabov

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 201...
Author: Elmer Black
8 downloads 0 Views 588KB Size
ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015

A Real Time 2D to 3D Image Conversion Techniques Miroslav Nikolov Galabov Abstract— This article describes methods developed for 2D-3D conversion of images based on motion parallax, depth cues in still pictures and gray shade and luminance setting for multiview autostereoscopic displays. Detailed exposed a new 2D-to-3D image conversion technique with the modified time difference (MTD) and the computed image depth (CID) realizes to convert any type of visual resources into the 3D images. Proposed a method for conversion from 2D to 3D based on gray scale and luminance setting, which does not require a complex motion analysis. INDEX TERMS— 2D TO 3D IMAGE CONVERSION, MOTION PARALLAX, IMAGE DEPTH, STEREOSCOPIC IMAGE.

I. INTRODUCTION Depending on the number of input images, we can categorize the existing 2D to 3D conversion algorithms into two groups: algorithms based on two or more images and algorithms based on a single still image. In the first case, the two or more input images could be taken either by multiple fixed cameras located at different viewing angles or by a single camera with moving objects in the scenes. We call the depth cues used by the first group the multi-ocular depth cues. The second group of depth cues operates on a single still image, and they are referred to as the monocular depth cues. According to the depth cues on which the algorithms reply, the algorithms are classified into the following 12 categories: binocular disparity [1,2,32,42], motion [1,7,8,30,36], defocus [3,9,10], focus [4], silhouette [5], atmosphere scattering [11], shading [12], linear perspective [6], patterned texture [13], symmetric patterns [14], occlusion (curvature, simple transform)[15] and statistical patterns [6]. The conversion of 2D content into 3D content involves creating missing information [33,35,37,40,43]. The process involves an automatic aspect, where parallax is created from other depth cues present in the scene, and an aspect carried out by human operators, adding a creative dimension to the procedure. Methods developed for 2D–3D conversion may also be used for parallax correction in existing, but unsatisfactory, stereoscopic content. While the domain has been explored in detail, the generation of a depth map from a single image is a problem, which presents an infinite number of solutions, and the proposed methods cannot, therefore, claim to offer universally acceptable solutions [24,25,26,27,34,38,39,41]. II. CONVERSION OF 2D TO 3D IMAGES BASED ON MOTION PARALLAX The relative motion between the viewing camera and the observed scene provides an important cue to depth perception: near objects move faster across the retina than far objects do. The extraction of 3D structures and the camera motion from image sequences is termed as structure from motion. The motion may be seen as a form of “disparity over time”, represented by the concept of motion field. The motion field is the 2D velocity vectors of the image points, induced by the relative motion between the viewing camera and the observed scene. The basic assumptions for structure-from-motion are that the objects do not deform and their movements are linear. Suppose that there is only one rigid relative motion, denoted by V , between the camera and scenes. Let P   X , Y , Z  be T

a 3D point in the conventional camera reference frame. The relative motion V between P and the camera can be described as: (1) V  T    P , Where T and ω are the translational velocity vector and the angular velocity of the camera respectively. The connection between the depth of 3D points and its 2D motion field is incorporated in the basic equations of the motion field, which combines equation (1) and the knowledge of perspective projection:

297

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 T x  Tx f  xy  x2 x  z   y f  z y  x  x , (2) Z f f

y 

T z x  Ty f Z

 x f  z y 

 y xy f



x y 2 f

,

(3)

where  x and  y are the components of motion field in x and y direction respectively; Z is the depth of the corresponding 3D point; and the subscripts x , y and z indicate the component of the x-axis, y-axis and z-axis directions. In order to solve this basic equation for depth values, various constraints and simplifications have been developed to lower the degree of freedom of the equation, which leads to the different algorithms for depth estimation, each suitable for solving problem in a specific domain. Some of them compute the motion field explicitly before recovering the depth information; others estimate the 3D structure directly with motion field integrated in the estimation process. It is worth to note that the sufficiently small average spatial disparity of corresponding points in consecutive frames is beneficial to the stability and robustness for the 3D reconstruction from the time integration of long sequences of frames. On the other hand, when the average disparity between frames is large, the depth reconstruction can be done in a way as that of binocular disparity. The motion field becomes equal to the stereo disparity map only if the spatial and temporal variances between frames are sufficiently small. We will present some approaches to extract disparity information from a 2D image and use it for the construction of a 3D image. The description of these approaches is intended to familiarize us with physiological depth cues, such as, for example, cues based on the Pulfrich effect[16]. This effect is associated with motion parallax.

Fig.1. Determination of the left and right eye images from a 2D object moving to the right.

Five temporal sequences shows a bird flying to the right in front of mountains as the original images and, above, the same images delayed by two time slots (Figure 1). The original image in time slot 4 is chosen as the left eye image and the delayed image in time slot 2 as the right eye image as depicted below. The eyes are rotated until their axes intersect at the present location of the bird. So the locations of the bird provide a sensation of depth. However, this is an illusionary depth because the speed of the bird has no relation at all to its depth. This is further elucidated by the next observation. If the bird flies slower it would be located further to left in Figure 1 as indicated by the dashed line from the left eye, while the starting position for the right eye remains the same. In this case the intersection of the axes of the eyes is of course further to the left but also higher up closer to the mountains. This indicates a larger depth even though the bird has the same depth as before. This again is an illusionary depth, which we have to cope with in the next section. In the present case it requires a correction that we do not have to deal with now. This method of depth generation [18] is based on a so-called modified time difference (MTD).

298

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 If the object, such as the car in Figure 2, moves in the opposite direction to the left, the axis of the left eye is directed toward the earlier position of the car, while the axis of the right eye follows the car to its later position. This is the reverse of the movement to the right. Also, here a correction according to the speed of the car has to be done.

Fig.2. Determination of the left and right eye images from a 2D object moving to the left.

The above described activities of the eyes serve only to explain the construction of the left and right eye images for the successful generation of 3D images. It is not assumed that the eyes react that way in reality.

Fig.3. Block diagram for the 2D/3D conversion according to the MTD process.

Signal processing for the MTD process is shown in Figure 3. The ADC provides the digital form of the analog signal, which is again converted back to analog form by the DAC at the output. The movement detector provides the direction and the speed of the movement, whereas the delay time controller provides the speed-dependent correction of the depth. The delay direction controller guides the starting position to the right eye for a movement to the right and to the left eye for a movement to the left.

III. CONVERSION FROM 2D TO 3D BASED ON DEPTH CUES IN STILL PICTURES The MTD method works only for moving objects. For still images it has to include a disparity extraction based on contrast, sharpness, and chrominance. Contrast and sharpness are associated with luminance. Sharpness correlates with high spatial frequencies, while contrast is related to medium spatial frequencies. Chrominance is associated with the hue and the tint of the color. The approach based on these features is called the computed image depth (CID) method [19,20,31]. CID is proposed for converting from still 2D images into 3D images. When we watch a 2D picture, we generally recognize the far-and-near positional relationship between the objects in the picture by some information in it. This information is supposed to be useful for the 2D-to-3D image conversion. So we use the sharpness and the contrast of the input images for computing the far-and-near positional relationship of the objects in the CID.

299

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 The CID consists of the following two processes. One is the image depth computation process that computes the image depth parameters with the contrast, the sharpness and the chrominance of the input images. The other is the 3D image generation process that generates the 3D images according to the image depth parameters. Figure 4 shows the basic principle of the CID. At first, each sharpness, contrast and chrominance values of the separated areas in the input images is detected respectively. The sharpness means the high frequency element of the luminance signal of the input images. The contrast means the middle frequency element of the luminance signal. The chrominance means the hue and the tint of the colour signal of the input images. Furthermore, the adjacent areas that have close color are grouped according to the chrominance values. The image depth computation works to be based on these grouped areas. Right Image 2D Images 3D Image Generation Generate 3D Images according to the Image Depth Parameters

Image Depth Computation Compute Image Depth with the Contrast, the Sharpness and the Composition of the 2D Image

Image Depth Parameters

Left Image

Fig.4. Determination process for classification of depth as near–middle–far based on contrast, sharpness, and composition.

The image depth computation process uses the contrast values and the sharpness values. Near objects exhibit a higher contrast and a higher sharpness than objects positioned farther away. So contrast and sharpness are inversely proportional to depth. Adjacent areas exhibit close chrominance values, thus indicating that they have the same depth. Chrominance is a measure for the composition of the 2D image. The features contrast, sharpness, and chrominance allow the depth classification far–mid–near as depicted in Figure 4. Therefore, these contrast and sharpness values are inversely proportional to the distance from the camera to the objects. If only these values are used for the image depth computation, it often occurs that the center of the images become nearer than both sides, top and bottom of the images. This cause is that the focused object is generally positioned at the center of the images, and the ground or the floor is positioned the bottom of the images that has flat surface generally and few contrast and sharpness values are taken from the bottom areas. So, it is adopted to compensate these values by the image‟s composition. The composition has the tendency that the center or the bottom side of the images is nearer than the upper side in the general images. So, each image depth parameter is decided by the average of each area‟s sharpness and contrast value that is weighted by the image‟s composition. This compensation would be better way to get good 3D effect, but it should be changed according to the applications. Secondly, the 3D image generation process generates the left and the right eye images according to the image depth parameter of each grouped area. If the parameter of an area indicates near, the left images are made by shifting the input images to the right, and the right images are made by shifting to the left. If the parameter of an area indicates far, both images are made by shifting to each opposite direction. The horizontal shift value of each separated area is proportional to the 3D effect. Furthermore, when the image depth parameters are changed quickly or frequently, the converted images become hard to be watching. Therefore, each shift value is adjusted to decrease the quick

300

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 changes of the image depth parameters between the adjacent areas. As a result of these processes, the 3D images that are easy to watch can be generated. The CID is especially suitable for converting from still images, because it does not need any motions of the objects in the images. Of course, the CID can be also adapted for the images with moving objects. IV. CONVERSION FROM 2D TO 3D BASED ON GRAY SCALE AND LUMINANCE SETTING In [21] three attractive and successful features for the determination of depth in 2D images are investigated: namely, gray-scale analysis, relative spatial setting, and multiview 3D rendering. A color image is simply converted into one intensity value I with a gray scale I= (IR +I G+I B)/3,

(4)

where the right side contains the intensities of the colors. In Figure 5 and in the block diagram in Figure 6 this is called gray-scale conversion. The gray scale I is expanded into I′ with a range from 255 to 0 for an 8-bit word by the equation I‟= (I- Imin) 255/ (Imax – Imin ). (5)

Original

Grayscale conversion

Dynamic contrast enhancement

Grayscale narrow down

Fig.5. The gray-scale conversions of a figure.

Fig.6. Block diagram for gray-scale conversions.

This is called the dynamic contrast enhancement, which is followed by a narrowing down of the gray scale to the range 0–63. Figure 5 shows the appearance of the image after these individual steps. In the next step the luminance of the entire image is reset by assigning a smaller luminance to the upper portion which is gradually getting brighter toward the lower portion.

Fig.7. The pixel arrangement for four different views

301

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 After application of the setting, the image with the increasing gray scale toward the bottom in conveys a very impressive sensation of depth (even though the reproduction quality of the figure may be low). This reminds us of another depth-enhancing cue in brighter images, which is rendering objects slightly more bluish the farther away they are.

2D image

View 1

Depth map

View 2

View 3

View 4

Fig.8. The 2D image and its depth map in the upper line. The four views are in the lower line.

Counteracting this depth enhancement is a spot at any depth reflecting light. This effect induces the sensation of a shorter depth. A 1D median smooth filter [22, 28, 29] is used to suppress this effect. After this filtering the eye looks free of reflection. The last step is multiview rendering for a presentation through a slanted array of lenticular lenses. The pixel arrangement for four views is also applied in the present case and is shown in Figure 7. The four views are paired into two views according to different depths assigned to each pair as provided by the depth map. For the image on the left in Figure 7 the depth map is shown in Figure 8 on the right with brighter areas for a smaller depth. The four viewing directions are shown in the second line. IV. CONCLUSION A single solution to convert the entire class of 2D images to 3D models does not exist. Combing depth cues enhances the accuracy of the results. Most 2D to 3D conversion algorithms for generating stereoscopic images and ad-hoc standards are based on the generation of a depth map. However, a depth map has a disadvantage that it needs to be fairly dense and accurate. Otherwise local deformations in the derived stereo pairs are easy to happen. And it is also helpful to explore the alternatives than to confine ourselves only in the conventional methods based on depth maps. The new 2D-to-3D image conversion technique with the MTD and the CID realizes to convert any type of visual resources into the 3D images. Proposed method for conversion from 2D to 3D based on gray scale and luminance setting does not require a complex motion analysis. Certain commercial solutions offer fully automated 2D–3D conversion, but the results are generally unsatisfactory, with the exception of very specific cases where the geometry of the scene is subject to strong constraints, movements are linear and predictable and segmentation is simple. Not all content is equally suited to 2D–3D conversion. ACKNOWLEDGMENT The presented article is part of research work carriedout in the “Analysis, research and creation of multimedia tools and scenarios for e-learning” project - Contract No: RD - 09-590-12/10.04.2013, which is financially supported by the St. Cyril and St. Methodius University of Veliko Turnovo, Bulgaria.

302

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 REFERENCES [1]

Trucco, E, Verri, A.Introductory Techniques for 3-D Computer Vision, Chapter 7, Prentice Hall, 1998.

[2] Scharstein, D., Szeliski, R. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, International Journal of Computer Vision47 (1/2/3), 7-42, 2002. [3]

Ziou,D.,Wang,S.,Vaillancourt,J. Depth from Defocus using the Hermite Transform, Image Processing, ICIP 98, Proc. International Conference on Volume 2, 4-7, Page(s): 958 – 962, 1998.

[4] Nayar, S.K., Nakagawa, Y. Shape from Focus, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 16, Issue 8, Page(s): 824 – 831, 1994. [5] Matsuyama,T.Exploitation of 3D video technologies , Informatics Research for Development of Knowledge Society Infrastructure, ICKS 2004, International Conference, Page(s) 7-14, 2004. [6] Battiato,S.,Curti,S.,La Cascia,M.,Tortora,M.,Scordato,E. Depth map generation by image classification, SPIE Proc. Vol 5302, EI2004 conference „Three dimensional image capture and applications VI”, 2004. [7]

Han, M., Kanade, T.Multiple Motion Scene Reconstruction with Uncalibrated Cameras, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 25, Issue 7, Page(s): 884 – 894, 2003.

[8] Franke, U., Rabe, C.Kalman filter based depth from motion with fast convergence, Intelligent Vehicles Symposium, Proceedings. IEEE, Page(s): 181 – 186 Information and Communication Theory Group Faculty of Electrical Engineering, Mathematics and Computer Science 35, 2005. [9] Pentland,A.P.Depth of Scene from Depth of Field, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, No.4, Page(s) 523-531, 1987. [10] Subbarao, M., Surya, G.Depth from Defocus: A Spatial Domain Approach, the International Journal of Computer Vision, 13(3), Page(s) 271-294, 1994. [11] Cozman,F.,Krotkov,E.Depth from scattering, IEEE Computer society conference on Computer Vision and Pattern Recognition, Proceedings, Pages: 801–806, 1997. [12] Kang,G.,Gan,C.,Ren,W.Shape from Shading Based on Finite-Element, Proceedings, International Conference on Machine Learning and Cybernetics, Volume 8, Page(s): 5165 – 5169, 2005. [13] Loh,A.M.,Hartley,R.Shape from Non-Homogeneous, Non-Stationary, Anisotropic, Perspective texture”, Proceedings, the British Machine Vision Conference, 2005. [14] Shimshoni, I., Moses, Y., Lindenbaumlpr, M.Shape reconstruction of 3D bilaterally symmetric surfaces, Proceedings, International Conference on Image Analysis and Processing, Page(s): 76 – 81, 1999. [15] Redert, A.Creating a Depth Map, Royal Philips Electronics, the Netherlands, 2005. [16] Lueder, Ernst. 3D Displays. Published by John Wiley & Sons, 2012. [17] Adelson, S.J.et al. Comparison of 3D displays and depth enhancement techniques. SID 91, p. 25. 1991. [18] Murata, M.et al. Conversion of two-dimensional images to three dimensions. SID 95, p. 859, 1995. [19] Murata, M.et al. A real time 2D to 3D image conversion technique using computed image depth. SID 98, p. 919-922, 1998. [20] Iinuma et al. Natural stereo depth creation methodology for a real-time 2D to 3D image conversion. SID 2000, p. 1212, 2000. [21] Kao,M.A., T.C. Shen. A novel real time 2D to 3D conversion technique using depth based rendering. IDW'09, p. 203, 2009. [22] Oflazer,K. Design and implementation of a single-chip 1D median filter. IEEE Trans. Acoust., Speed, Signal Process. ASSP31 (5), 1983. [23] Zhang,L. et al. Stereoscopic image generation based on depth images. IEEE International Conference on Image Processing, p. 2993, 2004. [24] Tam, W. J., C. Vazquez, and F. Speranza. Three-dimensional TV: A Novel Method for Generating Surrogate Depth Maps using Colour Information, Proc. SPIE Electronic Imaging -Stereoscopic Displays and Applications XX, 2009.

303

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 4, Issue 1, January 2015 [25] Cheng, C.C., C.T. Li, and L.G. Chen. A 2D-to-3D Conversion System using Edge Information, Proc. IEEE Conf. On Consumer Electronics (ICCE), 2009. [26] Cheng, F.H. and Y.H. Liang. Depth Map Generation based on Scene Categories, SPIE Jnl. Of Electronic Imaging, vol. 18, no. 4, October–December 2009. [27] Jung, J.I. and Y.S. Ho. Depth Map Estimation from Single-View Image using Object Classification based on Bayesian Learning, Proc. IEEE Conf. 3DTV (3DTVCON), 2010. [28] Agnot, L., W.J. Huang, and K.C. Liu. A 2D to 3D video and image conversion technique based on a bilateral filter. In Proc. SPIE Three-Dimensional Image Processing and Applications, volume 7526, Feb. 2010. [29] Durand, F. and J. Dorsey. Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graph., 21:257-266, July 2002. [30] Konrad, J., G. Brown, M. Wang, P. Ishwar, C. Wu, and D. Mukherjee. Automatic 2D-to-3D image conversion using 3D examples from the Internet, In Proc. SPIE Stereoscopic Displays and Applications, volume 8288, Jan. 2012. [31] Saxen, A.,M.Sun, and A. Ng. Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Machine Intell., 31(5):824-840, May 2009. [32] Da Silva V. Depth image based stereoscopic view rendering for MATLAB, available at http://www.mathworks.com/matlabcentral/fileexchange/27538-depth-image-based-stereoscopic-view-rendering, 2010. [33] Dejohn, M., Seigle D. A summary of approaches to producing 3D content using multiple methods in a single project, Report, In-Three, 2008. [34] Graziosi, D., Tian D., Vetro A. Depth map up-sampling based on edge layers, Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), Hollywood, CA, pp. 1–4, 3–6 December 2012. [35] Ideses, I., Yaroslavsky L., Fishbain B.Real-time 2D to 3D video conversion, Journal of Real-Time Image Processing, vol. 2, pp. 3–9, 2007. [36] Matsumoto, Y., Terasaki H., Sugimoto K., et al., Conversion system of monocular image sequence to stereo using motion parallax, Proceedings of SPIE 3012, Stereoscopic Displays and Virtual Reality Systems IV, pp. 108–115, 15 May, 1997. [37] Jebara, T., A. Azarbayejani, and A. Pentland. 3D structure from 2D motion, IEEE Signal Processing Magazine, vol. 16, no. 3, pp. 66–83, May 1999. [38] Weerasinghe, C., P. Ogunbona, and W. Li. 2D to pseudo-3D conversion of head and shoulder images using feature based parametric disparity maps, in Proc. International Conference on Image Processing, pp. 963–966, 2001. [39] Choi, C., B. Kwon, and M. Choi. A real-time field-sequential stereoscopic image converter, IEEE Trans. Consumer Electronics, vol. 50, no. 3, pp. 903–910, August 2004. [40] Curti, S., D. Sirtori, and F. Vella. 3D effect generation from monocular view, in Proc. First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT 2002), 2002. [41] Kozankiewicz, P. Fast algorithm for creating image-based stereo images, in Proc. 10th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen-Bory, Czech Republic, 2002. [42] Feng, Y., J. Jayaseelan, and J. Jiang. Cue Based Disparity Estimation for Possible 2D-to-3D Video Conversion, in Proc. VIE'06, 2006. [43] Guan-Ming Su, Yu-Chi Lai, Andres Kwasinski and Haohong Wang. 3D Visual Communications, First Edition. Published 2013 by John Wiley & Sons, Ltd. AUTHOR BIOGRAPHY Miroslav GALABOV was born in Veliko Turnovo. He received his M.S.E degree in Radio Television Engineering from the Higher Naval School N. Vapcarov, Varna, Bulgaria, in 1989. After that he worked as a design engineer for the Institute of Radio Electronics, Veliko Turnovo. From 1992 to 2001 he was an assistant professor at the Higher Military University, Veliko Turnovo. He received his Ph.D. degree in Automation Systems for Processing of Information and Control from the Higher Military University, in 1999. Since 2002 he has been an assistant professorand from 2005 he has been an associate professor in the Computer Systems and Technologies Department, St. Cyril and St. Methodius University of Veliko Turnovo. He is the author of ten textbooks, and over 40 papers. His current interests are in signal processing, 3D technologies and multimedia.

304

Suggest Documents