Classification of stereoscopic artefacts Atanas Boev Danilo Hollosi Atanas Gotchev

Classification of stereoscopic artefacts Atanas Boev n Danilo Hollosi n Atanas Gotchev MOBILE3DTV Project No. 216503 Classification of stereoscop...
Author: Michael Martin
4 downloads 1 Views 3MB Size
Classification of stereoscopic artefacts Atanas Boev n

Danilo Hollosi n

Atanas Gotchev

MOBILE3DTV Project No. 216503

Classification of stereoscopic artefacts Atanas Boev, Danilo Hollosi, Atanas Gotchev

Abstract: This report aims to overview, describe and categorise stereoscopic artefacts which occur during delivery of mobile stereoscopic video. First we identify the stages of 3D video delivery dataflow – content creation, conversion to the desired format, coding/decoding, transmission, and visualisation. In order to predict how an artefact would be interpreted by the human visual system, we match the dataflow stages against the different visual subsystems of human – structural, colour, motion and binocular vision. We obtain a table of stereoscopic artefacts. For each artefact we discuss the causes for occurrence, how it affects the 3D vision, and studies referring to it. Then, we identify which artefacts are possible in a mobile 3DTV system, and whether an artefact is possible when 3D content is represented as multi-channel or dense-depth video. Finally, we present a block diagram of a system for simulation of the identified artefacts.

Keywords: 3DTV, mobile video, stereo-video, artefacts, artefact simulation, quality estimation

MOBILE3DTV

D5.1

Executive Summary Estimation of the quality is the key factor in design and optimization of systems for stereoscopic 3D video content delivery. The first step towards objective quality estimation metric is to identify the artefacts which could arise when dealing with stereoscopic content. This report aims to overview, describe and categorise stereoscopic artefacts which occur during delivery of mobile stereoscopic video. The following stages of mobile 3DTV dataflow might be sources of artefacts: 1) Content creation or capture, 2) Conversion of the captured content to multi-channel or dense depth video, 3) 3D video coding/decoding, 4) transmission losses and 5) visualisation on a 3D display. The 3D vision of humans works by assessing various depth cues – accommodation, binocular depth cues, pictorial cues and motion parallax. As a consequence any artefact which modifies these cues will impair the quality of a 3D scene. Following that, our classification starts by listing potential sources of artefacts, and contemplates how they would be interpreted by the different visual subsystems – structural, colour, motion and binocular vision. In 3D video capture we identify noise, aliasing, blur, barrel distortion, pincushion, keystone distortion, vignetting, and aberration artefacts caused by the camera, as well inter-channel distortion such as vertical disparity, depth plane curvature, cardboard effect and puppet theatre effect. In addition, there is a group of temporal artefacts such as motion blur and temporal mismatch between channels. The representation format and the required data conversion between multi-channel video and video with dense-depth representation are sources of artefacts which appear only in dense depth video. These are temporal inconsistency of the depth map, artefacts caused by disocclusion, and perspective-stereopsis rivalry. Coding of 3D video is a source of blocking, mosaic patterns, staircase effect, ringing, colour bleeding and mosquito noise, as well as a depth “ringing” artefacts specific for dense depth video. Also, various asymmetric stereo-video coding schemes are sources of cross-distortion artefacts, where one channel is spatially or temporally downsampled. Transmission causes propagating and non-propagating packet-loss artefacts, noise and jitter; however the last two are not characteristic for the DVB-H channel. Finally, limitations in 3D displays create artefacts such as aliasing, view interspersing (also known as ghosting artefacts), and accommodation-convergence rivalry. Additionally, the autostereoscopic displays, which are suitable for mobile 3DTV suffer from image flipping (also known as pseudoscopy), picket fence effect and shear distortion. The last group of artefacts depends greatly on the observation angle. For each artefact we discuss the causes for occurrence and provide relevant publications. Where possible, a figure exemplifying the artefact is provided as well. Finally, we discuss the possibility of each artefact to occur in a mobile 3DTV system (using DVB-H channel and a portable autostereoscopic display), and whether it is possible when 3D content is represented as multi-channel or dense-depth video. We organize the artefacts in groups, which are likely to be addressed in the same block of an artefact simulation system. We present a block diagram of a channel for simulation of stereoscopic artefacts.

2

MOBILE3DTV

D5.1

Table of Contents Executive Summary .................................................................................................................... 2 Table of Contents ....................................................................................................................... 3 1

Introduction ....................................................................................................................... 6

2

Stereoscopic artefacts and perception of depth ............................................................... 6

3

Classification of artefacts ...................................................................................................9

4

Stereoscopic artefacts caused by capturing ....................................................................13 4.1

3D video capture .......................................................................................................13

4.2

Artefacts in the image structure ............................................................................... 13

4.2.1

Blur by defocusing.............................................................................................. 14

4.2.2

Barrel distortions ............................................................................................... 14

4.2.3

Pincushion distortion ......................................................................................... 15

4.2.4

Moiré effect .......................................................................................................16

4.2.5

Interlacing ..........................................................................................................16

4.2.6

Noise .................................................................................................................. 17

4.3

4.3.1

Chromatic aberration ......................................................................................... 18

4.3.2

Vignetting- decreasing intensity ........................................................................20

4.4

Motion-related artefacts ........................................................................................... 21

4.4.1

Motion Blur ........................................................................................................21

4.4.2

Channel mismatch.............................................................................................. 21

4.5

5

Colour artefacts .........................................................................................................18

Binocular artefacts ....................................................................................................21

4.5.1

Keystone distortion and depth plane curvature ................................................ 21

4.5.2

Cardboard effect ................................................................................................ 22

Stereoscopic artefacts as related to the representation of 3D content .......................... 24 5.1

3D video representation ........................................................................................... 24

5.2

Artefacts in the image structure ............................................................................... 26

5.2.1 5.3

Temporal and spatial aliasing in structure and colour ......................................26

Binocular artefacts ....................................................................................................27

5.3.1

Ghosting by disocclusion.................................................................................... 27

5.3.2

Depth “bleeding” / depth “ringing” ...................................................................28

Perspective-stereopsis rivalry (“WOW” artefacts) .......................................................... 29 6

Binocular artefacts caused by coding of 3D content ....................................................... 31 6.1

3D video coding .........................................................................................................31 3

MOBILE3DTV

6.2

Blocking artefacts ............................................................................................... 31

6.2.2

Mosaic patterns .................................................................................................32

6.2.3

Staircase effect and ringing................................................................................ 33

Colour artefacts .........................................................................................................34

6.3.1

Colour bleeding ..................................................................................................34

6.3.2

Cross- colour artefacts ....................................................................................... 35

6.4

Motion-related artefacts ........................................................................................... 35

6.4.1

Mosquito noise ..................................................................................................35

6.4.2

Judder ................................................................................................................. 36

6.5

Binocular artefacts ....................................................................................................36

6.5.1

Cross distortion ..................................................................................................36

6.5.2

Cardboard effect ................................................................................................ 36

Stereoscopic artefacts caused by transmission of 3D content ........................................36 7.1

8

Artefacts in the image structure ............................................................................... 31

6.2.1

6.3

7

D5.1

3D video transmission ............................................................................................... 36

Stereoscopic artefacts caused by visualisation of 3D content ........................................37 8.1

3D video visualisation ................................................................................................ 38

8.2

Artefacts in the image structure ............................................................................... 38

8.2.1

Flickering ............................................................................................................38

8.2.2

Resolution limitations ........................................................................................ 39

8.2.3

Spatial aliasing caused by subsampling on non-rectangular grid ...................... 39

8.3

Colour artefacts .........................................................................................................39

8.3.1

Contrast range....................................................................................................39

8.3.2

Baking and long- term use ................................................................................. 39

8.3.3

Rainbow artefacts .............................................................................................. 40

8.3.4

Viewing angle dependant colour representation .............................................. 40

8.4

Motion-related artefacts ........................................................................................... 41

8.4.1 8.5

Blurring and judder ............................................................................................ 41

Binocular artefacts ....................................................................................................41

8.5.1

Shear distortions ................................................................................................ 41

8.5.2

Crosstalk as inter-perspective aliasing and ghosting .........................................41

8.5.3

Puppet theatre effect......................................................................................... 42

8.5.4

Picket fence effect and image flipping ............................................................... 42

8.5.5

Lattice artefacts (viewing angle dependant binocular aliasing) ........................ 43

8.5.6

Accommodation-convergence rivalry ................................................................ 45 4

MOBILE3DTV

9

D5.1

Stereoscopic artefacts for different content delivery scenarios .....................................45 9.1

Simulation channel for mobile 3DTV artefacts ......................................................... 45

9.2

List of mobile 3DTV artefacts .................................................................................... 46

9.3

Conclusion ................................................................................................................. 50

5

MOBILE3DTV

D5.1

1 Introduction Recently, most of the building blocks of an end-to-end mobile 3DTV system have reached a maturity status. An ISO/MPEG multiview encoding standard developed as an amendment to H.264 AVC is due by the end of 2008 [1], [2]. Various algorithms have been developed for the efficient transmission of video streams over wireless networks [1], [4]. There are 3D displays optimized for a mobile use, [5], [6], [7]. While the core technologies have been developing, there is still much to be done to optimize the system to deliver the best possible visual output ([28], [29], [30], [31], [32], [34]). Having a perceptually acceptable and highquality 3D scene on a small display is a challenging task. Estimation of the quality is the key factor in design and optimization of any visual content. All quality metrics aim at close approximation of the quality as perceived by the user. An ideal quality metric should have the following properties: a) perceptual – being related to the way human visual system (HVS) operates, b) objective – providing a numerical representation of the quality as perceived by the user, and c) reliable – being able to predict the perceptual quality for wide variety of content, as perceived by a large amount of users. Such metric is especially needed for stereoscopic 3D video, because stereoscopic artefacts would produce not only visually unpleasant results, but are also known to cause eye-strain general discomfort also known as “simulator sickness” [23]. The previous works on quality of stereo images [8], [9], [10], do not attempt to quantify the typical distortions that could occur in stereoscopic video sequence. The first step towards objective quality estimation metric is to identify the artefacts, which could arise in various usage scenarios involving stereoscopic content. This report aims to overview, describe and categorise stereoscopic artefacts which occur during delivery of mobile stereoscopic video.

2 Stereoscopic artefacts and perception of depth The dictionary describes artefact as “something characteristic of or resulting from a human institution or activity” [11]. Non-natural processes, as is the case of transmitting a 3D scene representation over a communication channel, are source of artefacts. The data flow in mobile 3DTV content delivery, from creation to observation is shown in Figure 1.

Artefacts can be created in various stages during the dataflow: Creation/capture – special care should be taken when positioning cameras or when selecting rendering parameters. Unnatural correspondences between the images in a stereo-pair (i.e. vertical disparity) are source of many types of artefacts [9]. As perfectly parallel camera setup is practically impossible, rectification is an unavoidable preprocessing stage. Representation format – different representations of stereo-video exist, multichannel video and dense depth representations being among the most widely used [12]. If the representation format is different from the one the scene was originally captured, converting between the formats is a source of artefacts. Furthermore, some classes of artefacts are common in one format and not possible in another – for example in dense depth video disocclusion artefacts are common, while vertical parallax does not occur. Coding – there are various coding schemes, which utilize temporal, spatial or inter-channel similarities of a 3D video [13]. In order to minimize transmission cost, “redundant” information is omitted. Algorithms originally meant for single-channel video, are often

6

MOBILE3DTV

D5.1

improperly applied for stereo-video, and important binocular depth-cues may be lost in the process. Transmission – in the case of digital wireless transmission a common problem is burst packet losses [16]. Resilience and error concealment algorithms attempt to mitigate the impact on the video, but if not designed for stereo-video, such algorithms might introduce additional artefacts on their own. Visualisation – there are various approaches for 3D scene visualization, which offer different degree of scene approximation [14], [16], [19]. Each family of 3D displays has its own characteristic artefacts, and the artefacts are often scene dependant [43].

Capture

Decoding

Coding

Transmision

Resilience

Visual optimization

Display

Observation

Figure 1, data flow of mobile 3DTV content

The human visual system is a set of separate subsystems, which operate together in a single process. It is known that spatial, colour and motion information is transmitted to the brain using largely independent neural paths [20]. Vision in 3D, in turn, also consists of different “layers” which provide separate information about depth of the observer scene [20], [21]. This is true both for perception and cognition – on perceptual level there are separate visual mechanisms and neural paths, and on cognitive level there are separate families of depth cues, with varying importance from observer to observer ([40], [41], [73]). The depth cues used for assessing the depth by different layers in human vision are shown in Figure 2 and are as follows: Accommodation – This is the ability of the eye to change the optical power of its lens in order to focus on objects at various distances. Accommodation is the primary depth cue for very short distances, where an object is hardly visible with two eyes. With the distance, the importance of this depth cue quickly decreases. However, the information from other depthassessing systems is unconsciously used to correct the refraction power, to ensure clear image of the object being tracked. As a result, a discrepancy between accommodation and binocular depth cues leads to so called “accommodation-convergence rivalry” (section 8.5.6), which is a major limiting factor for stereoscopic displays. Binocular depth cues – These are a consequence of both eyes observing the scene at slightly different angles. The mechanism of binocular depth estimation has two parts – vergence and stereopsis. Vergence is the process, in which both eyes take a position which minimizes the difference of the visual information projected in both retinae. The angle 7

MOBILE3DTV

D5.1

between the eyes is used as a depth cue. With the eyes converged on a point, stereopsis is the process which uses the residual disparity of the surrounding area for depth estimation relative to the point of convergence. Binocular depth cues are the ones most often associated with “3D cinema”. However, binocular vision is quite vulnerable to artefacts – lots of factors can lead to an “unnatural” stereo-pair being presented to the eyes. As HVS is not prepared to handle such information, binocular artefacts can lead to nausea and “simulator sickness” [23]. It is worth saying, that around 5% of all people are “stereoscopically latent” and have difficulties assessing binocular depth cues [20], [21]. Such people have a perfect depth perception, only they rely mostly on depth cues coming from other visual “layers”. Pictorial cues – for longer distances, binocular depth cues become less important, and HVS relies on pictorial cues for depth assessment. These are depth cues that can be perceived even with a single eye – shadows, perspective lines, texture scaling. But even for medium distances, stereoscopically good image can be “ruined” if missing subtle pictorial details, and the scene exhibits “puppet theatre” or “cardboard effect” artefacts (see section 4.5). Motion parallax – this is the process in which the changing parallax of a moving object is used for estimating its depth and 3D shape. The same mechanism is used by insects, and is commonly known as “insect navigation” [22]. Artefacts in the temporal domain (e.g. motion blur, display persistence) will affect the motion parallax depth cues.

~10-1m

~101m

~102m

~103m

+ inf

Accommodation

Binocular Disparity Pictorial Cues Motion Parallax Figure 2, depth perception as a set of separate visual “layers”

Experiments with so-called “random dot stereograms” show that binocular and monocular depth cues are independently perceived [24]. Furthermore, the first binocular cells (cells that react to a stimulus presented to either of the eyes) appear at a late stage of the visual pathways – the V1 area of brain cortex. At this stage, only the information extracted separately for each eye, is available to the brain for deduction of image disparity [20]. This observation has led to our assumption that “2D” (monoscopic) and “3D” (stereoscopic) artefacts would be independently perceived [33]. The planar “2D” artefacts, such as noise, ringing, etc, are thoroughly studied in the literature [25], [26]. In this report, we concentrate on artefacts which affect stereoscopic perception. However, due to the “layered” structure of HVS, binocular artefacts might be inherited from other visual “layers” – for example, blockiness is a “purely” monoscopic artefact, which still can destroy or modify an important binocular depth cue. As a result, stereoscopic artefacts might be created during various stages in the mobile 3DTV content delivery, and might affect different “layers” of human 3D vision, as shown in 8

MOBILE3DTV

D5.1

Figure 3.

Figure 3, Artefacts, caused by various stages of content delivery and affecting various “layers” of human depth perception.

3 Classification of artefacts In 3D video, many causes might lead to unnatural scene representations. Some of them are specific for content delivery over DVB-H, while some might never occur in this usage scenario. For building taxonomy of stereoscopic artefacts, we use a top-down approach: first we identify content delivery stages, which might create artefacts, and then we speculate if and how these artefacts will affect various stages of human perception of depth. Our classification is presented in Table 1. The columns represent the causes for artefacts, coming from different content delivery stages – capture, representation, coding, transmission and visualization. The rows are groups of artefacts as they are interpreted by the “layers” of human vision – structure, colour, motion and binocular. These layers roughly represent the visual pathways as they appeared during the successive stages of evolution. By structure we denote the spatial (and colour-less) vision. It is assumed that during the evolution human vision adapted for assessing the “structure” (contours and texture) of images [25], and some artefacts manifest themselves as affecting image structure. Colour and motion rows represent the colour and motion vision, accordingly. As we noted before, all artefacts in 9

MOBILE3DTV

D5.1

the table will affect the binocular depth perception (and only such artefacts are included). However, the row designated with binocular contains artefacts which have meaning only when perceived as a stereopair. In other words, these are artefacts that cannot be perceived with a single eye (e.g. vertical disparity). Such process is not always straightforward – sometime one stage in the dataflow might cause several types of artefacts, while artefacts created in different stages are perceived in a similar way (e.g. ghosting). As a result, the diagram from Figure 3 cannot be easily translated to a flat table. Some artefacts are listed many times, while some groups of artefacts span across multiple cells. Furthermore, some combinations of rows and columns (cause and manifestation of artefacts) omitted as containing artefacts not related to the usage scenario of mobile 3DTV. A detailed discussion of the cells in the table is given in the following sections of our report.

10

MOBILE3DTV

D5.1

Table 1 – Classification of stereoscopic artefacts

COLOUR

STRUCTURE

Capture - optical distortions + bluring by defoucusing + barrel distortions + pincushion distortions

Representation/Conversion

MOTION

Transmission and Error Resilience - Channel distortions + data loss + data distortion + jitter

- Display limitations + flickering + resolution limitations + aspect ratio distortions + display geometry distortions + spatial aliasing by subsampling on nonrectangular grid)

- Representation distortions + temporal and spatial aliasing

- Cross distortions + cross- colour artefacts + colour bleeding

- Channel distortions + data loss leads to wrong colours + data distortion leads to wrong colours + jitter

- Display limitations + contrast range + colour representation + baking and longterm use + viewing angle dependant colour representation + rainbow artefact

- Interpicture distortions + motion compensation artefacts + mosquito noise + Judder

- Channel distortions + data loss motion + data distortion + jitter

- Display limitations + smearing + bluring and judder

- Channel distortions + data loss + data distortion (binocular)

- Display limitations + shear distortion + crosstalk as inter- perspective aliasing and ghosting + viewing depentand binocular aliasing + accomodation convergence rivalry + lattice artefacts

- sensor limitations + interlacing + temporal and spectral aliasing + downsampling + noise introduction

- optical distortions + Chromatic aberration + Vignetting- decressing intensity

- optical distortions + channel mismatch

- optical distortions + depth plane curvature + keystone-distortion + cardboard effect

Visualization

- Transformationbased distortions (DCT) + blocking artefacts + mosaic patterns + staircase effect + ringing

- sensor limitations + motion blur

BINOCULAR

Coding

- Representation distortions + temporal and spatial aliasing + line replication

+ ghosting (caused by disocclusion) + "WOW"- artefacts

+ cross distortions + cardboard effect + depth "bleeding"/depth "ringing"

- Image and Depth Mismatch + puppet theater effect + picket fence effect + Image flipping (Pseudoscopic image)

11

MOBILE3DTV

D5.1

12

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

4 Stereoscopic artefacts caused by capturing 4.1 3D video capture The first step of 3D-video content delivery is 3D video capture (Figure 4). There are three common approaches for capturing of 3D video. First, such video can be captured by two or more synchronized cameras in a multi-camera setting. Second, such content can be created from 2D video applying video processing methods. Third, video output can be augmented by depth information captured by another sensor. All these approaches have their own advantages and disadvantages, and are sources of specific artefacts.

Capture

Representation

Transmission and Error Resilience

Coding

Decoding

Visualization

Figure 4, Image and video processing chain, stage “capture”, based on [72]

Depending on camera optics, the captured image might suffer from various geometrical distortions (barrel, pincushion, etc.). In a multi-camera setting, these can result in unnatural disparity relations across the simultaneously captured frames. As a result, an unnatural stereo-pair is presented to the observer, and creates artefacts such as depth plane curvature or keystone distortion [38]. Poor synchronization between the cameras will result in a time- and object boundary mismatch, mostly visible in moving objects of the scene. Generating 3D video from 2D content can be affected by disocclusion artefacts, as there is a need to generate observations of a scene from angles, not present in the 2D video. Also, the binocular cues created by the conversion process might not be consistent with the other (pictorial and motion) depth cues, in which case the scene would be perceived as being pseudoscopic. When using multimodal depth capture, it is possible that the scene and its depth map have mismatching geometry or resolution. In that case, there would be mismatch between binocular and pictorial depth cues mostly pronounced along image contours. The human visual system interprets such mismatches as ghosting ([50], [51], [53], [54]), pseudoscopy or glass-plate artefacts. More details about stereoscopic artefacts created in the capture process are given in the following chapters.

4.2 Artefacts in the image structure To assure maximum image quality over the whole processing chain, proper capturing is a fundamental necessity. The digital representation of the image, formed by light absorbed on a sensor, should be precise enough and avoid artefacts, which would also affect the following processing steps. The artefacts arising from the use of such camera and having an effect on the structure of an image can be 13

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

divided into two groups – artefacts coming from imperfection of the optical parts in a camera, and artefacts coming from the discretisation of information by using CCD sensors. In this chapter we will describe a number of artefacts that are introduced into the image structure in terms of capturing.

4.2.1 Blur by defocusing To make sure that an object is sharply mapped on the sensor, the focus of the camera must lie within the depth of field. The size of this area depends on several parameters like aperture, focal length, object distance and the desired picture size [76]. Therefore, objects outside this area are defocused and appear blurred. On one hand, this can be used as a production design element, but on the other hand, a wrong focus on every object leads to blurring of the whole scene. The effect is illustrated in Figure 5. Edges and shapes of the objects get lost and in the worst case it may be not possible to distinguish between foreand background. Further on, depth information may not be derived correctly from the monoscopic or stereoscopic image. There are several approaches available to restore the focus of a blurred picture via inverse filtering, but most of them cannot be used practically in realtime applications. So the observation of the shooting conditions and the proper adjustment of the camera parameters is the best way to avoid this kind of distortion [59].

Figure 5: The original baboon picture (left) and its blurred version (right) caused by defocusing

4.2.2 Barrel distortions Lenses systems are used in cameras to focus on objects over a wide range while at the same time ensure correct mapping on the CCD sensor. Especially in the low- cost segments those lenses systems introduce geometrical distortions, classified as optical aberrations, to the image. One member of this group is the so called Barrel distortion shown in Figure 6. The image suffers from decreased magnification with increasing distance from the optical axis. Although there are special lenses systems, like fisheye lenses, on the market available which create this kind of effect intentionally, it leads to a wrong presentation of the objects in an image. By using high- quality lens systems or by applying an inverse filtering with the inverse distortion function, this effect can be suppressed or removed.

14

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 6. A rectangular grid, affected by barrel distortion

4.2.3 Pincushion distortion Another type of optical aberration is called pincushion distortion. The magnification therefore increases with increasing distance from the optical axis. Especially in high- order zooms the image is more and more shaped like a pillow. This is shown in Figure 7. Also this is a common problem in low-cost lenses systems. In combination with barrel distortion, the resulting optical artefact is called Moustache distortion. In order to prevent such geometrical distortion it is recommended to use higher quality lenssystems. This artefact can also be compensated by software based post- processing, e.g. by applying the inverse distortion function.

Figure 7. A rectangular grid, affected by pincushion distortion

15

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

4.2.4 Moiré effect The real world is continuous but the processing units are digital. The continuous scenes are captured by discrete sensors. The number and density of sensor elements determine the spatial resolution. Improper sampling of the signal can create spatial aliasing (most often perceived as a Moiré-pattern). The Moiré- effect occurs when periodic textures overlap with other periodic textures with a slightly different spatial frequency or with a small tilt- angle. A new, low-frequency texture appears – better named as an intensity modulation. The reason for that lies in the violation of the Nyquist- Shannontheorem, where the line-scanning frequency of the CCD or the distance between the sensor elements are lower than the highest spatial frequency in the image. In such a case, aliasing occurs as shown in Figure 8.

Figure 8

Moiré effect caused by spatial interference, (left) slightly different gridsize, (right) same gridsize but tilted

4.2.5 Interlacing Especially with the capturing of moving scenes or panning cameras, distortions may occur when the CCD sensor works in the interlaced mode. The picture is captured in two steps. First the odd rows are taken by the CCD, and then the even rows follow. If an object moves fast enough, the rows are not correctly mapped. They differ in an offset and may lead to edge- distortions as it is described in [53] and shown again in Figure 9. This distortion can be compensated through local filtering, like a Gaussian filter. Smoothing filtering may lead to blurring and to a loss of details and information, especially in high frequency areas.

16

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

a)

b)

Figure 9. Interlace distortion caused by camera panning: a) interlaced image and b) fragment

4.2.6 Noise The generalized term “noise” summarises various causes of artefacts. A classification can be made when looking at their origins. In this text we distinguish between temporal, spatial and input signal noise sources, based on [91], [92], [93] and [94]. Temporal noise sources [93] contain Reset Noise, Johnson Noise, 1/f Noise, Jitter, Dark Current Noise and Quantization Noise. Reset noise has its origin in the use of non-ideal electronic components. In particular, it is impossible to predict the charge of capacitors after charging with noisy components like resistors or MOS. The introduction of noise in real world electronic components is a function of the temperature. Johnson Noise can be well explained with the movement of charge carriers in an electrical conductor, almost independent to the applied voltage. 1/f Noise describes the modulation of low-frequency noise from MOS transistors with electrical signals. Timing jitter comes from the interference of phase variations and clock variation with the use of various clocks. It appears as stretching and bulging of an electrical signal. The CCD sensor itself introduces noise even when no light is falling on it. This kind of noise is called Thermal or Dark Current Noise, caused by the dark current which flows through the sensor elements. Although the dark current leads to heating the sensor up, noise increases as well. In Figure 10, the Dark Current Noise is illustrated with increasing time. The mapping of a signal to a discrete quantity of numbers is called quantization. Trough rounding to these numbers, an error is introduced that is well known under the term Quantization Noise. Large Quantization Noise leads to contouring in the image. The number of electrons produced by the incoming photons on a sensor, and the generation of electrons caused of the dark- current underlie a certain statistical variation. This is called Shot Noise and can be attributed to input signal noise. ([92], [91], [93]) Variations in the dark current generation, non- uniformities on pixel responses, threshold variations, gain and off-set-differences over the whole device lead to noise appearance with a unique spatial

17

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

distribution on the image. It is described as spatial noise or fixed pattern noise [92]. Hot spots designate pixels with very high dark current in comparison to the sensor’s average value.

Figure 10

Introduction of dark current noise with increasing time, camera aperture is closed [77]

4.3 Colour artefacts Distortions, such as resolution limitations, quantization and blurring, may also have an influence on the colour. In addition, several artefacts directly affect the colour and manifest as nonlinearities within optical devices. In this section, we focus on these colour artefacts. We first want to start with a description of chromatic aberration and continue with vignetting- decreasing intensity.

4.3.1 Chromatic aberration Chromatic aberration is a distortion created by the lenses. It appears as coloured fringes along object edges and increases with increasing distance from the optical axis. This is because the index of refraction of lenses is a function of the wavelength. The shorter the wavelength, the shorter is the focal length. That means that the focal point of e.g. blue light is situated ahead to the focal point of e.g. red light as it is shown in (Figure 11).

18

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

f1

f2

f3

Figure 11: Chromatic aberration, caused by wavelength dependency of lens refraction index

A proper handling of this effect is strongly recommended because the human eye is very sensitive to a combination of colours with a large difference in wavelength. An example is shown in Figure 12. A disturbing blue edge in combination with a certain amount of blurring can be seen on the right side of the picture.

Figure 12:

Example of chromatic aberration (bottom) in an image (top) [78]

19

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

4.3.2 Vignetting- decreasing intensity The term vignetting describes the non- uniform distribution of light on a CCD sensor, caused by small apertures. The picture appears as it has been shot through a hole. The intensity therefore decreases with increasing distance from the optical axis. From certain distance from the optical axis on, the lenses is no longer able to collect all light from the object- plane. The size of the lenses determines the strength of the artefact.

Figure: 13 Intensity decrease caused by vignetting

20

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

4.4 Motion-related artefacts In the previous chapter we described artefacts in still images or frames. In a video, multiple successive frames or images are captured and a different category of artefacts appear. A video is typically captured at a certain (and limited) framerate. Interframe distortions may appear for this type of content.

4.4.1 Motion Blur This kind of artefact is caused by the finite number of frames per second that can be taken by a CCD sensor. Fast moving object cannot be mapped sharply on the sensor. A simplified example is shown in Figure 14, where a white circle moves downwards and appears blurred by motion.

Figure 14: Motion blur distortion

4.4.2 Channel mismatch By using cameras with three CCD sensors to capture the colour channels red, green and blue in high quality, another artefact can occur. If the channels are not synchronized in a proper way, a temporal mismatch is the consequence and therefore the image appears decomposed.

4.5 Binocular artefacts 4.5.1 Keystone distortion and depth plane curvature With the use of (weakly) convergent camera configuration, horizontal and vertical parallaxes are introduced (Figure 15). This is because the sensors are directed towards slightly different image planes in the vertical and horizontal direction. For the vertical plane this effect is called keystone distortion and increases with increasing the distance between the object and the camera, decreasing convergence distance and decreasing focal distance [39]. For the horizontal plane this leads to curvature of the depth plane as a consequence of gradual variation in magnification and wrong representation of the reality [37]. A detailed description of this artefact and its origins can be found in [36]. The grid from one side is 21

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

smaller than that from another as it can be seen in Figure 15. In other words, objects in the centre of a screen will appear closer to the viewer then objects in the corners [58].

screen

PH

left eye

right eye

Figure 15: Keystone distortion (left) and depth plane curvature (rigth) caused by (weakly) convergent camera configuration

4.5.2 Cardboard effect The cardboard effect affects the perception of depth and is format conversion and visualisation problem, but its origin mainly lies in early stages of the processing chain like in capturing (lens focal length, object-camera distance, convergence distance) and coding (coarse quantization of disparity or depth values). If depth or disparity is not properly available, the viewer perceives wrong distances between an object surface and a stereoscopic display and the size and the form of the observed object do not match anymore [39]. The scene is cut into discrete layers of depth planes so that the object appears unnatural flat because there is only limited information about the depth of an object available (see Figure 16). There are different approaches to describe and to quantitatively measure this effect [56], however they are rather rudimentary.

22

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 16: Illustration of the cardboard effect (right) in comparison to its original depth map (left)

23

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

5 Stereoscopic artefacts as related to the representation of 3D content 5.1 3D video representation

Capture

Representation

Transmission and Error Resilience

Coding

Decoding

Visualization

Figure 17: Image and video processing chain, stage “representation”, based on [72]

Although there are many different formats for encoding 3D video, three main groups have evolved and are currently addressed by international standardization groups: multiview video, where two or more video streams show the same scene from different viewpoints; Video-plus-depth, where each pixel is augmented with information of its distance from the camera; and dynamic 3D meshes, where 3D video is represented by dynamic 3D surface geometry [13]. Video-plus-depth format is suitable for multiview displays, as it can be used regardless of the number of views a particular screen provides [60], [61]. Furthermore, video-plus-depth can be efficiently compressed [60]. Recently, MPEG specified a container format for video-plus-depth data, known as MPEG-C Part 3 [62], [63]. On the downside, video-plusdepth rendering requires interpolation of occluded areas, which may be source of artefacts. This is addressed by using layered depth images (LDI) [12], or multi-video-plus-depth encoding [64]. One straightforward way to represent video-plus-depth is to encode the depth map as a gray-scale picture, and place the 2D image and its depth map side-by-side. This representation allows the whole scene to be encoded using conventional algorithms for 2D video. The intensity of each pixel from the depth map represents the depth of the corresponding pixel from the 2D image. Such format is sometimes referred to as 2D+Z, and a typical 2D frame looks like the one shown in Figure 18. Due to its simplicity and versatility, we expect that the 2D+Z video format will be widely used with the first generation of multiview displays. However, visualization of 2D+Z video on a multiview display requires additional computations. Based on the depth map provided with the scene, multiple observations should be rendered, and the pixels from these observations should be interleaved as required by the display. An example of such interleaving (also known as “interdigitation” or “interzigging” [65], [69]) is shown in Figure 19. Additional details about which artefacts affect various types of 3D video representation are given in the following sections.

24

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 18. Dense depth map scene representation

R 1 2 2 3 4 4 5 6 6 7 8 8 1 2 2 3 4 4 5 6 6 7 8 8

1 2 3 4 5 6 7 8 G B R G B R G B R G B R G B R G B R G B R G B 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7

Figure 19, interdigitation of pixels required for multiview display [65]

25

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

5.2 Artefacts in the image structure 5.2.1 Temporal and spatial aliasing in structure and colour Images or video scenes are captured in the three basic colours red, blue and green and they are usually stored at the same resolution. Based on the fact that the human visual system is less sensitive to colour, it is possible to reduce the amount of data needed to present an image at high quality. Typically, a colour space with a luminance and two chrominance components is used where the luminance component is kept at a higher resolution than the chrominance components [27]. This is usually done by transforming RGB into different colour spaces, typically YCRCB or into any of its variations e.g. YUV, by taking empirical evaluated weighting factors for each of the basic colours into account. After that, sampling schemes according to the target application and target bit rate are introduced to reduce the amount of data. The number of samples required for each (luminance and chrominance) channel, strongly depends on factors like display resolution, desired format and available bit rate. The reduction is done by coarse sampling of the chrominance channels as the human eye is less sensitive to colour than to luminance. This is shown in Figure 20, where the 4:4:4 sampling scheme is illustrated in comparison to 4:2:0. That means that only one sample from both chrominance channels belongs to 4 samples from the luminance channel in a sampling pattern. Assuming each sample is represented with 8 bits, only 1.5x8bit/pixel = 12bits/pixel are required instead of 24bits/pixel. We refer to [76] for a more detailed description on this representation. A loss of information through sampling is always the consequence and may lead to visible spatial artefacts [81]. 4:4:4

4:2:0

Y CR CB

Figure 20:

(4:4:4) sampling- scheme (left) vs. (4:2:0) sampling- scheme (right)

26

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Also temporal artefacts can be introduced by sampling [82]. The video framerate is usually adjusted to reach desired bitrates or to match with certain video formats or standards. However, such an adjustment (i.e. reduction) is also a source of artefacts known as temporal aliasing. Another source of artefacts is caused by format conversion, as a change in the representation. For example, line replication is a very simple method to fill the missing lines when switching from interlaced to progressive mode. However, this leads to visible artefacts and loss of information especially in high frequency areas, which falls under spatial aliasing, too. This is illustrated in Figure 21.

Figure 21: Original image (left) and converted image (right) with line replication artefacts

5.3 Binocular artefacts 5.3.1 Ghosting by disocclusion For 3D image and video scenes, the depth dimension also needs to be represented in a proper way. There are different approaches such as the combination of the image and its depth, also named source and depth, and a layered approach where the basis images include the depth information for a number of depth planes. This can be seen in Figure 22.

27

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 22: “Declipse” 3D image format, proposed by Philips, using layered depth representation [83]

When using dense depth representation, disocclusion artefacts, i.e. ghosting in this case, occur when looking from different angles since there is only one plane for the depth available.

Figure 23: Ghosting by disocclusion [84]

5.3.2 Depth “bleeding” / depth “ringing” Depth “bleeding” or depth “ringing” is an artefact which comes from harsh quantization of the depth map when using dense depth representation. The misalignment between the contours of the image and its depth map create borderline artefacts which have some resemblance to ringing artefacts in 2D images. The effect is visible when looking with two eyes, and is even more pronounced when the observer moves in front of the display, changing its observation angle.

28

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 24: A stereo-pair, exhibiting depth “ringing” artefacts.

Perspective-stereopsis rivalry (“WOW” artefacts) This distortion occurs when there is an inconsistency between binocular depth cues and perspective. In displays based on WOWvx technology, Phillips provides a software that allows the user to directly control the depth in a picture and its position on the optical axis as it can be seen in Figure 25. A built-in firmware is responsible for real-time conversion between dense-depth and multi-view 3D video. If the conversion parameters are incorrectly set, or depth values are exaggerated, curvatures in objects come up, especially when their dimensions range over a wide part of the depth map. An example is shown in Figure 26.

29

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

screen

WOW screen offset 0 127

WOW depth sprawl

255

optical axis Figure 25:

Phillips WOWvx parameters

Figure 26: Object curvature caused by depth cue rivalry. Left: depth as suggested by perspective cues, right: depth as suggested by binocular cues.

30

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

6 Binocular artefacts caused by coding of 3D content 6.1 3D video coding

Capture

Representation

Transmission and Error Resilience

Coding

Decoding

Visualization

Figure 27: Image and video processing chain, stage “coding”, based on [72]

As bandwidth and memory is usually limited in systems and applications, it is necessary to find a way to reduce the amount of data that is necessary to describe an image. We distinguish between lossless, near lossless and lossy compression schemes. The latter principle always goes along with a loss of information and the introduction of distortions and artefacts, especially when the required compression ratio is high. Artefact types mainly depend on the coding technique used to compress the data. [57], [88]. There are many approaches for 3D video content coding, but two of them are the most popular ones. One is multi-view video coding, where the starting video is represented by separate streams, and temporal and inter-channel correlations are used to compress the data. This approach is being standardized as an amendment to H.264/AVC [1], [2]. Another approach is to create a dense depth representation of the video, and code the 2D video and depth channel separately. When encoding this representation, H.264/AVC can also be used, as the depth. This approach provides backward compatibility and ensures temporal consistence. In addition, MPEG defined a container format in “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” [85], [86], [87]. In this chapter, we first describe artefacts that have an influence on the structure and colour of the image, with the focus being on blocking, blurring and ringing artefacts, as well as colour bleeding and cross colour artefacts. We will give a short description on the causes for their appearance. In the second part, a description of binocular artefacts will follow.

6.2 Artefacts in the image structure 6.2.1 Blocking artefacts Blocking artefacts arise from the use of block-based coding schemes like the Discrete Cosine Transform (DCT). They can be defined as discontinuities at the borders of each block in a reconstructed frame coming from individual treatment of each block in the coding stage according to its content. They become visible when the error introduced through coarse quantization is greater than the masking 31

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

ability of the human visual system [72]. The human eye is more sensitive to low spatial frequencies, so blocking artefacts can be easily identified in smoothly textured areas. The origin of blocking artefacts lies in the separation of the data into blocks and processing by block transforms where the transform coefficients are intentionally quantized. Quantisation levels are set by the targeted compression ratio. . Luminance and chrominance channels are commonly treated individually. Higher quantisation suppress more DCT coefficients and this lost information causes the blocking artefacts. Figure 28 illustrates the effect. Due to the coarse quantization, a loss of spatial details is a consequence, visible as blurring after the reconstruction of an image. Blocking artefacts in colour channels can also cause colour bleeding.

Figure 28: Example for “pure” blocking artefacts (generated)

6.2.2 Mosaic patterns Due to the reduction of the high frequency components in horizontal and vertical direction another distortion can occur, well known as mosaic patterns. This leads to “pixelation” of the image, especially when the main orientation of the content in neighbouring blocks is different [79]. This is another situation where the basic DCT functions become visible as shown in Figure 29.

32

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 29: Example for mosaic patterns, most evident on the character forehead. Taken from [26]

6.2.3 Staircase effect and ringing Discontinuities at the block borders lead to a staircase effect, especially at diagonal edges, tilted by a certain angle. An explanation for that can be found by looking at the representation of high frequency components during coding, as they are very important for edge information. Due to a coarse quantization, diagonal edges appear as stairs as shown in Figure 30. In addition, the alignment of patterns or objects within a scene cannot always be well approximated by the separable DCT. The ringing artefact is explained in a similar manner and as a consequence of the Gibb’s phenomenon. The two artefacts often occur together. High-contrast areas are the source of ripples and shimmering near the borders due to coarse quantization of high frequency components during the quantization as shown in Figure 31.

33

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 30: Illustration of the staircase- effect

Figure 31: Example of ringing artifacts

6.3 Colour artefacts 6.3.1 Colour bleeding Colour bleeding can be described as a smearing of chrominance information along high-contrast chrominance areas and can be seen as the equivalent distortion to blurring in the luminance channel. Colour bleeding results from coarse or even zero quantization of higher order AC transform coefficients for the colour channels. Due to the colour sub-sampling schemes, this kind of distortion an annoying influence on the colour information in the whole macro-block [79], named chrominance ringing. An example is shown in Figure 32.

34

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 32: Example of colour bleeding – an image, harshly compressed using JPEG (left) and a detail, exemplifying colour bleeding artefacts (right)

6.3.2 Cross- colour artefacts

According to the short description on representation schemes of luminance and chrominance, it is also possible to interleave the information in one channel. However, this will lead to cross-colour and crossluminance artefacts, especially for highly detailed video scenes [72].

6.4 Motion-related artefacts Video coding techniques utilize interframe and intraframe prediction coding of macro blocks [79]. Motion prediction errors lead to abrupt jumping of macro blocks within the scene for interframe prediction coding. Additional high frequency distortions are introduced at the block boundaries, similar to ringing for intraframe block coding.

6.4.1 Mosquito noise The mosquito noise effect can be described as an additive overlap of ringing and motion compensation artefacts over time. It is caused by varying coding of the same macro blocks in different frames and It appears near high contrasted edges as luminance and chrominance fluctuations in smoothly textured areas. It usually overlaps with ringing distortions through coarse quantization of high spatial frequencies and motion compensation artefacts. The mosquito noise also affects stationary areas within a moving scene, characterized by high spatial frequencies. Flickering in the luminance and chrominance channels may be observed.

35

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

6.4.2 Judder This kind of artefacts is mostly visible in teleconference systems and phone applications, as the necessary bandwidth for the transmission is a function of the change in video content. Because the bandwidth is mostly limited for this sort of applications and therefore underlies a certain sampling rate, especially in fast moving scenes the image sequence is cut into discrete “snapshots” to fit the temporal bandwidth of the source. This can be seen as an image flipping in the direction of movement.

6.5 Binocular artefacts 6.5.1 Cross distortion It has been demonstrated, that in stereo images, blur in one channel of a stereo-pair is masked by sharper image in the other channel of the pair [24]. This property of HVS is exploited in asymmetrical stereo-video coding for reducing the bandwidth of the coded image. Cross distortions occur when the bit budget of the right and left picture presented to an observer is not equal. It leads to a decreased overall quality of the image or sequence, whereas the human visual system fortunately tries to compensate this mismatch. Reference sources, such as ref. [75], claim that there is a nonlinear relation between the difference in quality presented to each eye and the resulting perceptual overall quality described, whereas the overall quality is mostly ranked towards to the better image [74], [75]. However, if the difference in quality between the left and right image becomes too big, a wrong or even distorted depth is perceived [44].

6.5.2 Cardboard effect The cardboard effect, as it is described in section 4.5.2 can be caused not only in the stage of capturing, but in the coding stage too. Aiming at effective depth coding by high quantization levels reduces the depth of field into finite and limited number of planes. As a consequence the resulting stereoscopicimage appears as unnaturally flat. This effect can be reduced with proper interpolating algorithms or with spending more bits for the representation of the depth map.

7 Stereoscopic artefacts caused by transmission of 3D content 7.1 3D video transmission

36

MOBILE3DTV

Capture

D5.1 Classification of stereoscopic artefacts

Representation

Transmission and Error Resilience

Coding

Decoding

Visualization

Figure 33: Image and video processing chain, stage “transmission”, based on [72]

The transmission of an image or a sequence of video frames can introduce heavy distortions, up to the point of a total dropout. Artefacts due to transmission errors are sparse, highly variant in terms of occurrence, duration and intensity, and at very low bit rates may be masked by compression impairments. In general, channel transmission always suffers from noise introduction, depending on the technology used and the surrounding environment.

One commonly used method in video compression is motion compensation, where some frames are described in terms of transformation of a reference frame. There are three major types of frame descriptions according to the reference frame used – intraframe coded frames (I frames), predictive frames using one reference image (P-frames) and bi-directional frames, which use two or more references (B-frames) Usually, I frames are transmitted e.g. every half a second, but also irregular transmission is possible in newer video coding standards like H.264 [34]. Missing or distorted data in a transmitted video stream results in errors and artefacts, depending on which frame type is affected. Data loss in the decoded I frame will affect every following frame type, as they are decoded on the basis of the reference. Such artefacts heavy influence the image structure and colour. Data loss in P and B frames will also affect the image structure, but will also introduce strong motion artefacts as the frames contain both motion estimation data and image information. Image flipping and object jumping may be experienced. As P and B frames are decoded on the basis of previous frames, errors usually occur within a short period of time. The appearance of artefacts caused by distorted depth information strongly depends on the transmission and concealment approach that is chosen.

8 Stereoscopic artefacts caused by visualisation of 3D content

37

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

8.1 3D video visualisation

Capture

Representation

Transmission and Error Resilience

Coding

Decoding

Visualization

Figure 34: Image and video processing chain, stage “vizualisiation”, based on [72]

Recently, advances in display technology allowed the mass-production of screens, which provide stereoscopc effect to the users without the need of glasses. Such displays are known as “autostereoscopic”, and provide “left” and “right” images, separately targeted to each eye of the observer. An overview of various types of multiview displays can be found in [14], [16]. It is expected that multiview displays utilizing lenticular sheets or parallax barrier will provide the first generation of 3D displays for widespread use [15], [16], [18]. Not all of currently available 3D display technologies are suitable for mobile use. The limitations of a mobile device, such as screen size, CPU power and production costs impose constraints on the choice of a suitable 3D display technology. Another important factor is backward compatibility – a mobile 3D display should have the ability to be switched back to “2D mode” when 3D content is not available. At the moment, there are only a few vendors with announced prototypes of 3D displays, targeted for mobile devices [66], [67], [68]. All of them are two -view, TFT-based autostereoscopic displays [55]. Various technical limitations of the displays are sources of artefacts [45], [46], [47]. Imperfect optical separation between the “left” and “right” images create inter-channel crosstalk, which is perceived as ghosting artefacts. Often, the colour components of one addressable pixel are not seen from the same angle, but instead redirected to different eyes of the observer. As a result, each eye of the observer will perceive an image formed by a colour grid, specific for the display topology [69]. If the stereoscopic pair is incorrectly resampled for the new grid, this causes a specific combination of aliasing and colourbleeding to be perceived [69], [70]. If the proper observation angle for an autostereoscopic display is too narrow, there is high probability that the observer is not in the correct position for perceiving the stereoscopic image. In this case, the observer perceives mono- or pseudoscopic images.

8.2 Artefacts in the image structure 8.2.1 Flickering The minimum rate to fuse individual images to a moving scene is around 25 frames per second. To compensate flickering without increasing the number of pictures, the frames are presented in interlaced mode. Odd and even lines of a frame are presented individually, so that a frame rate of around 50Hz 38

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

results for PAL and SECAM and 60Hz for NTSC. Still some flickering is noticeable and another artefact, called twitter, is introduced that may disturb the viewer. This is caused by interference when fine stripes are shown in the pictures in interlaced mode, similar to the Moiré pattern. Avoiding this kind of distortion can already be done in capturing e.g. with a recommendation not to wear clothes with this kind of high-frequency textures. For progressive representation of interlaced content, de-interlacing techniques are used but lead to resolution reduction and a loss of spatial detail.

8.2.2 Resolution limitations Display resolution for a given size is limited due to technological constraints. This is especially relevant to mobile devices. The content for these devices need to be downsampled to fit the screen resolution, prior delivery or at the device. Interpolation and decimation algorithms have been designed to change the spatial resolution, however, a loss of information and spatial details is unavoidable in the case of decimation. and a trade-off is always necessary. Up- and downscaling of still images and videos is sometimes necessary to fit the screen size. In addition to to blurring or reduced spatial resolution,due to imperfect resampling algorithms, a mismatch between the width and height of an object can be caused as well.

8.2.3 Spatial aliasing caused by subsampling on non-rectangular grid State of the art autostereoscopic 3D displays use slanted optical filter for spatial multiplexing of multiple views. The filter might be slanted lenticular sheet [70] or slanted parallax barrier [71]. Both approaches remove the picket fence effect, create smooth transition between the views and at the same time balance the horizontal vs. vertical resolution of a view. The filter defines particular light penetration direction of each sub-pixel. The native resolution of the screen is reduced by a factor of M to provide frames/images from M different angles of view. From a particular observation angle, most of the sub-pixels are masked. Due to the slant of the filter, visible sub-pixels appear on a non-rectangular grid as shown in Figure 19. Subsampling 3D video on that grid can cause spatial aliasing affecting the image structure and colour and manifested by a loss of details and an increased graininess [69].

8.3 Colour artefacts 8.3.1 Contrast range The contrast range is defined as the luminance ratio between the brightest and darkest colour that the system is capable of producing. In LCD panels, the contrast range is reduced, since pixels on “off” state are not able to completely block the backlight. As a result, the system is incapable of producing very dark colours.

8.3.2 Baking and long- term use Baking of patterns in the display, appearing as ghosts, is a well known problem especially with CRTs. But it can also occur with LCDs, despite the origin is different. When high valued currents are pending in the 39

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

LCs over a long time (dark), e.g. log-on-windows, their ability to switch back to a zero current (bright) is reduced so that ghosting patterns appear on the displays during normal use as well. Also the whole panel becomes darker after a while for the same reason and weakening of the backlight in addition. LCD and especially OLED-based displays are also prone to pixel defects (also known as “dead” pixels) appearing after a prolonged use.

8.3.3 Rainbow artefacts Rainbow artefacts originally arise from crosstalk between the chrominance and luminance channels in composite video signals. They manifest themselves as interlaced colour stripes in regions with high intensity [89]. This effect can be mitigated by detecting image regions susceptible to the artefact, and apply a suitable low-pass filter on the chrominance channel in these regions [89], [90]. An example of rainbow artefacts in composite video is given in Figure 35a.

a)

b)

Figure 35: Rainbow artefacts (adapted from [89]), a) in composite video and b) on autostereoscopic display

In autostereoscopic displays the views are multiplexed on a sub-pixel level – the colour components of one addressable pixel are redirected towards different observation angles. As the separation of the observation zones of each sub-pixel is not perfect, crosstalk between the colour channels is also possible. The bitmap, presented in Figure 35b consists only of “black” and “white” pixels, but creates a colourful image due to crosstalk between neighbouring views. Some displays use a special optical “wavelength filter” in order to mitigate this effect [71].

8.3.4 Viewing angle dependant colour representation The LCD technology cannot ensure that the picture is seen from any viewing angle. Usually, good sensation is provided only within a limited range. A number of filters and glass plates are mounted in front of the backlight. This produces a kind of spot- effect in the horizontal direction as illustrated in Figure 36. The image on a LCD also looks brighter when viewing in extreme angles from the top and darker from the bottom. As the perceived colour is a function of the brightness of each of the three colour components, colour variations appear across the screen surface.

40

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

viewing angle= 2 x alpha

horizontal filter glass plates liquid crystals glass plates vertical filter

display width

colour filter

Backlight

Figure 36: reduced viewing angle caused by layout of a LCD

In plasma devices this distortion is negligible small, because all the gas combs are light emitters themselves and therefore provide a much bigger possible viewing angle.

8.4 Motion-related artefacts 8.4.1 Blurring and judder Content with certain framerate sometimes needs to be adjusted to the frame rate of the display. It is probable that additional blurring and judder is introduced to the pictures. For example if we want to watch movies in the PAL- format on a NTSC- standard display, the framerate needs to be adjusted from 50 to 60 Hz. Missing pictures are generated by interpolation. For the other way round, some pictures needs to be omitted to fit the standards. This may cause judder.

8.5 Binocular artefacts 8.5.1 Shear distortions This distortion occurs with displays with a fixed viewing position. The stereoscopic image appears to follow the observer when the observer changes viewing position. The same images are presented to the eyes of the observer regardless of the viewing position, which introduces wrong head parallax and may result in wrong perception about object distances [8], [36]. By using head or eye tracking for assessing the viewing position of the user this effect can be minimized or even totally suppressed.

8.5.2 Crosstalk as inter-perspective aliasing and ghosting Crosstalk is perceived as shadows or copies of the original images at different places (ghosts) and double contours. This inter-perspective aliasing, described as crosstalk and ghosting comes from imperfect image separation in the display device and depends on the technology used. Especially in CRTs, phosphor persistence in the alternating left- and right image leads to ghosting through leaking. An 41

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

investigation has objectively evaluated the relation between phosphor afterglow and shutter leakage [53]. Crosstalk is function of contrast, disparity values and screen parallax. The higher they are the more crosstalk is introduced. [49]. Another source of this artefact might be an incorrect head position in combination with linear polarization techniques. It tends to be the artefact with the most influence on the perceived image quality and threedimensional sensation. An approach dealing with this artefact is to use circular polarization techniques [52].

8.5.3 Puppet theatre effect Puppet theatre effect is caused by inconsistence between binocular and perspective depth cues. In a 2D scene, the human brain to the perspective cues and if two objects have the same absolute size on the display, the one which appears closer is perceived smaller. When binocular and perspective depth cues suggest different depths, usually the binocular ones take precedence, and the depth scaled object is perceived to have wrong (usually miniaturized) apparent size. Most common manifestation of the effect is when images of people (objects with familiar size) appear as small puppets [56]. The perceptibility of puppet theatre effect is correlated with prior knowledge about the object size – the more familiar an object is, the more susceptible it is to puppet theatre effect. In less common cases object can appear with exaggerated size as well. In dense-depth 3D video the same causes can also introduce card-board effects [44]. A method for objective evaluation of puppet theatre effect is introduced in [42].

8.5.4 Picket fence effect and image flipping Some autostereoscopic displays use parallax barrier for spatial multiplexing of the images intended for each eye of the observer. For some observation angles the gaps between the pixels are predominantly visible. When the observer moves laterally in front of the screen, he perceives the effect as vertical banding (brighter and darker vertical stripes) over the image. An example for picket fence effect is shown in Figure 37. Image flipping is perceived when the object is represented by a limited number of observations (views), and there is a sharp transition between the visibility zones of each view. The object is seen as suddenly changing its rotation, compared to the continuous head parallax seen in real-world scenes. Both picket fence effect and image flipping are caused by optical filter used for spatial multiplication of the views, and both effects can be reduced by introducing a slant of the optical filter in respect to the pixels on the screen [70]. Tracking of the user position in respect to the screen also can help reducing these artefacts [8], [36].

42

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Figure 37: Picket fence artefacts

8.5.5 Lattice artefacts (viewing angle dependant binocular aliasing) As we discussed in Sub-section 8.2.3, visible sub-pixels of a multiview display with a slanted optical filter appear on a non-orthogonal grid. This grid is not the same for different observation positions, i.e. the position of the visible sub-pixels changes with the observation angle. This is seen as variable brightness across neighbouring sub-pixels, which changes with the observation angle, and objects look like as being observed through a thin lattice. Typically, the eyes of the observer are in the observation zones of two different views, and see two different sets of sub-pixels. These two sets should create two different images, which will make the stereoscopic effect possible. However, the sub-pixels of these images are placed on two different nonorthogonal lattices, and the result is perceived as a binocular artefact. The differences in the lattices are perceived as disparities, and a ghost image of a thin lattice, floating above the display, is created. The effect is mostly visible when the screen displays 2D images or objects from 3D scenes, which appear close to the screen surface. For objects with pronounced apparent depth, the inter-view crosstalk prevails and the “lattice” is less visible. Figure 38 we give two stereoscopic pairs, which illustrate the artefact as is seen by each eye of a user of a multiview display.

43

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

a)

b)

c)

d)

Figure 38: Stereoscopic pairs exhibiting lattice artefacts, a) and b) on slanted lenticular sheet, c) and d) on slanted parallax barrier. Notably the position of the grid changes with the observation angle.

44

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

8.5.6 Accommodation-convergence rivalry Recall that in human vision, two oculo-motor systems are responsible for providing clear binocular image. One is accommodation, i.e. the change of the ocular focus, which is driven by blur on the retinal image. The other is vergence, which is the change on the ocular alignment in response to retinal disparity. However, these two are also linked by reflex, and change in one system invokes change in the other. When the eyes converge on a certain position, their lenses automatically focus on the position where this object is expected to reside [20]. Most of the 3D displays project the images meant for different eyes on one plane, and separate them by using temporal or spatial multiplexing [14]. The disparities presented to the eyes of the observer suggest various apparent depths of the presented objects, and in order to fuse those disparities, the eyes align accordingly. However, the focal length required to keep those images in focus is always the same. That is the distance between the observer and the display. Due to accommodation-convergence reflex, the eyes try to focus on the apparent depth of the objects, instead of the real one. The result is that objects appearing close to the observer appear out of focus. The effect is proportional to the difference between the real and apparent depth of the images, and is more pronounced for apparent depths suggesting objects close to the eyes of the observer. The diameter of the pupil also changes the focus of the retinal image and affects the accommodation reflex. As a result, in brighter scenes the accommodation-convergence rivalry is less pronounced. The effect of the brightness of the scene on the rivalry is thoroughly studied in [80].

9 Stereoscopic artefacts for different content delivery scenarios 9.1 Simulation channel for mobile 3DTV artefacts Not all of the artefacts described in the previous sections are likely to affect a mobile 3DTV system. Some of them would never occur on a mobile device due to the technology used for it (e.g. LCD display, DVB-H transmission). Some fall beyond the scope of our project, for example contrast range and colour representation problems of the display, as these are addressed by the display manufacturer. In this chapter we identify the most common artefacts to be expected during the data-flow of mobile 3DTV content. The perceptibility of each artefact can be estimated through subjective tests. The material for such tests should exhibit various artefacts with different amounts of impairment. When doing comparative tests, it is important that the impaired videos are versions of the same video stream. The generation of test material can be done by using an artefact simulation channel. Such channel should be able to introduce an arbitrary combination of artefacts to a video, with controlled amount of impairment for each artefact. The artefacts are identified stage by stage. Where possible, the artefacts in a stage are organized in groups. Each group consists of artefacts that have similar origins, and are likely to be addressed in one 45

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

signal processing block of the artefact simulation channel. The detailed description of each group is presented in the section 9.2. Example artefact simulation channel is presented Figure 39. The first block simulates artefacts caused by sensor limitations. Then, the degraded scene observation is sent to a block which simulates geometric distortions as the ones caused by the camera optics. The next two blocks add global spatial and temporal differences between the video channels, simulating artefacts caused by multi-camera topology and temporal misalignment. The next two blocks simulate spatial and temporal artefacts caused by coding. Then, transmission losses are simulated in the encoded scream. For the case of dense depth video representation, format conversion artefacts are added. Finally, visualisation artefacts are added, independent of the position of the observer, or alternatively, for s given observation position. Similar processing order might be used for artefact mitigation channel as well.

Sensor

Optical calibration

Capture (each camera)

Inter-camera calibration

Temporal calibration

Image filter

Capture (inter-channel)

Temporal filter

Coding

Channel simulation

Transmission

Format conversion

Format Conversion and visualisation

Visualization (static) Visualization (dynamic) Position of observer

Figure 39: Artefact simulation/mitigation channel

9.2 List of mobile 3DTV artefacts Video in a mobile 3DTV channel is likely to be encoded in one of the two most widespread representations – multi-view video, or single-channel video augmented by dense depth representation. Groups of artefacts likely to appear in to appear in any of the two representations are listed in Table 2.

46

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Table 2 – Stereoscopic artefacts in a mobile 3DTV system

Capture

Group

Artefact type

Dense depth Multiview

Data-flow stage

Sensor

Noise, aliasing

X

X

Optical

Blur, barrel distortion, pincushion, keystone distortion, vignetting, aberration

X

X

Vertical disparity, depth plane curvature

X

Cardboard effect, puppet theatre effect

X

X

Motion blur

X

X

Temporal mismatch

X

Intercamera

Temporal

Blocking, mosaic patterns, staircase effect, ringing, colour Transformati bleeding on based Depth bleeding/ringing

X

X X

Coding Temporal

Motion compensation artefacts, mosquito noise

Temporal and spatial

Cross-distortion

Transmission

Visualisation

Static observer

Moving

X

X

Data loss, propagating and non-propagating errors

Representation and format conversion

X

X

X

Temporal inconsistency of depth estimation

X

Disocclusuion

X

Perspective-stereopsis rivalry (“WOW”-artefacts)

X

Aliasing caused by non-rectangular grid

X

X

View interspersing (Ghosting)

X

X

Binocular aliasing

X

X

Accommodation-convergence rivalry

X

X

Picket fence effect

X

X

47

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

observer

Image flipping (pseudoscopy)

X

X

Shear distortion

X

X

Angle dependant colour representation

X

X

The capturing process for mobile 3DTV video is similar to the one for a 3DTV system targeting large displays. One thing which separates video broadcast system from video conferencing one is that capture for the former is done off-line and non-real-time, and significant processing power might be spent for producing the best output possible. The stages in the digital capture process which produce artefacts are the following: Hardware sensor limitations – various sources of noise, and aliasing during sensorreadout data processing required for noise removal and the colour interpolation. Optical distortions – these are caused by the imperfect optics of each capturing camera, as a result of missing of poor camera calibration. In this group are blur, barrel distortion, pincushion, keystone distortion, vignetting, and artefacts caused by aberration. Inter-camera distortions – 3D video is most often captured by more than one camera. Imprecise camera positioning or bad inter-camera calibration is a source of artefacts. Artefacts as cardboard effect and puppet theatre effect are most like caused by bad inter-camera calibration. Vertical disparity and depth plane curvature artefacts also fall in this group, but are possible only in multi-channel video representation. However, even if the resulting 3D video used dense depth map, bad inter-camera calibration will result in poor quality of the depth map. Temporal distortions – these are artefacts that occur over a group frames, and the signal processing block needs to collect a number of frames in order to simulate/mitigate the artefacts. Motion blur is the most common artefact here. Temporal mismatch is possible only in multi-channel video setup.

While the visibility of coding artefacts is quite well studied for 2D case, the impact on the 3D vision is yet to be determined. Transform-caused artefacts come from the transforms and quantisation used for compressing the video stream. Blocking, mosaic patterns, staircase effect, ringing, and colour bleeding artefacts are in this group. All of them are well visible, and as overlay structural changes on the image, they might destroy depth cues and even create misleading ones. Depth bleeding and depth ringing are artefacts specific for the coding the depth map of a scene, and as such, they exist only in dense depth-based 3D video representations. Notably, such artefacts can be mitigated by using structural information of the 2D scene. Temporal coding artefacts appear as a result of transform/quantisation over time. Temporal inconsistency such as mosquito noise is the most common artefact in this group. Artefacts caused by imprecise motion prediction are also possible. This group of artefacts can appear both in multi-view and in dense depth 3D video. 48

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

Cross-distortion is an artefact caused by asymmetrical stereo-video coding. The asymmetry might be both in spatial (one channel with lower resolution) or in temporal (one channel having lower frame-rate) domains. The effect of spatial or temporal subsampling of one channel is not yet thoroughly studied. Asymmetrical coding is typically applied for multi-view video only. The presence of artefacts generated in the transmission stage depends very much on the coding algorithms used and how the decoder copes with the channel errors. In DVB-H transmission most common are burst errors [100], which results in packet losses distributed in tight groups. In MPEG-4 based encoders packet losses might result in propagating or non-propagating errors, depending on where the error occurs in respect to the I frames, and the ratio between I and P frames. Error patterns of the DVB-H channel can be obtained with field measurements, and then used for simulation of channel losses [100], [101], [102].

Format conversion artefacts occur during the conversion for a dense-depth representation used for broadcast to a multiview one as needed by the display. Most common here are disocclusion artefacts, which are more pronounced when rendering observations at angles much different from the central observation point, and less pronounced when layered depth images are used. Perspective-stereopsis rivalry occurs if the conversion over-exaggerates the depth levels in the depth map. Temporal inconsistency of the depth estimation creates artefacts similar to mosquito and depth ringing.

Artefacts in visualisation of mobile 3DTV are caused by limitations of the display technology used. The mobile 3DTV system in our project will utilize an autostereoscopic display. As such displays use spatial multiplexing of the channels, the visibility of all artefacts depends on the position of the observer. In any case, knowing the observation angle and the distance between the observer and the display helps in both simulation and mitigation of such artefacts. Still, two groups can be identified, based on how evident is the artefact when the observer is not moving: Static observer – these artefacts are seen regardless of the observation angle. The following artefacts fall in this group: aliasing, caused by sub-sampling on a nonrectangular grid, view interspersing, binocular interspersing and accommodationconvergence rivalry. All of these are differently pronounced for different observation angles and distances, but are always perceived as the same type of artefact. Moving observer – these artefacts are most obvious for moving observer, for example the Moiré-like pattern seen on an autostereoscopic display exhibiting picket fence effect, or the unnatural image parallax causing shear distortion. Others appear only for some observation angles, as image flipping, and angle dependant colour representation. The artefacts in this group are very difficult to simulate, but much easier to mitigate for a given position of the observer.

49

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

9.3 Conclusion Objective estimation of the subjective quality of 3D video is of crucial importance for successful deployment of such system. In this report we identified and described the artefacts which could affect a 3D scene. We discussed how different stages of mobile 3DTV content delivery could affect the subsystems of human 3D vision. Aiming to recognize all possible artefacts, we matched the stages of 3D video data-flow against the subsystems in human 3D vision and created extensive list of artefacts which might affect any 3D video distribution system. Some of the artefacts in our list are rare or not likely to occur in mobile 3DTV content distribution. In the last chapter we speculate which artefacts could affect a mobile 3DTV system, featuring H.264 AVC type of encoding, DVB-H transmission channel and portable autostereoscopic display. The occurrence of some artefacts depends on the selected 3D video representation – multi-channel video or dense depth representation. We examined which of the identified artefacts might appear in each video representation. We organized the artefacts in groups, and based on these groups we proposed a processing channel for artefact simulation. Such simulation will be used to subjectively estimate the user acceptation of artefacts.

50

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[1] ISO/IEC JTC1/SC29/WG11, ―Study Text of ISO/IEC 14496-10:2008/FPDAM 1 Multiview Video Coding‖, Doc. N9760, Archamps, France, May 2008. [2] ISO/IEC JTC1/SC29/WG11, ―Joint Multiview Video Model (JMVM) 8‖, Doc. N9762, Archamps, France, May 2008. [3] Yao Wang, S. Wenger, Jiantao Wen, A. Katsaggelos, ―Error resilient video coding techniques,‖ IEEE Signal Processing Magazine, pp. 61-82, vol. 17, no. 4, July 2000. [4] Z. Tan and A. Zakhor, ―Error control for video multicast using hierarchical FEC,'' in Proc. of the Int. Conf. on Image Processing, Kobe, Japan, October 1999, vol. 1, pp. 401-405. [5] G. J. Woodgate, J. Harrold, ―Autostereoscopic display technology for mobile 3DTV applications‖, in Proc. SPIE Vol.6490A-19, 2007 [6] Sharp Laboratories of Europe, http://www.sle.sharp.co.uk/research/optical_imaging/3d_research.php

website,

[7] S.Uehara, T.Hiroya, H. Kusanagi; K. Shigemura, H.Asada, ―1-inch diagonal transflective 2D and 3D LCD with HDDP arrangement‖, in Proc. SPIE-IS&T Electronic Imaging 2008, Stereoscopic Displays and Applications XIX, Vol. 6803, San Jose, USA, January 2008 [8] L. Meesters, W. IJsselsteijn, P. Seuntiëns, ―A survey of perceptual evaluations and requirements of three-dimensional TV,‖ IEEE Trans. Circuits and Systems for Video Technology, vol. 14, No. 3, 2004, pp. 381 – 391. [9] W. IJsselsteijn, P. Seuntiens and L. Meesters, ―Human factors of 3D displays‖, in (Schreer, Kauff, Sikora, edts.) 3D Video Communication, Wiley, 2005. [10] A.Boev, A. Gotchev, K. Egiazarian, A. Aksay and G. Akar, ―Towards compound stereovideo quality metric: a specific encoder-based framework‖. Proc. of the IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI 2006). [11] Encyclopedia Britannica online, Merriam-Webster’s online dictionary, available at http://www.britannica.com/dictionary?book=Dictionary&va=artefact [12] A. Alatan, Y. Yemez, U. Gudukbay, X. Zabulis, K. Muller, C. Erdem, C. Weigel, A., ―Scene Representation Technologies for 3DTV—A Survey,‖ Circuits and Systems for Video Technology, IEEE Transactions on , vol.17, no.11, pp.1587-1605, Nov. 2007 [13] A. Smolic, K. Mueller, N. Stefanoski, J. Ostermann, A. Gotchev, G.B. Akar, G. Triantafyllidis, A.Koz, ―Coding Algorithms for 3DTV—A Survey,‖ Circuits and Systems for Video Technology, IEEE Transactions on , vol.17, no.11, pp.1606-1621, Nov. 2007 [14] L. Onural, T. Sikora, J. Ostermann, A. Smolic, M. R. Civanlar and J. Watson, ―An Assessment of 3DTV Technologies,‖ NAB Broadcast Engineering Conference Proceedings 2006, pp. 456-467, Las Vegas, USA, April 2006. [15] P. Benzie, J. Watson, P. Surman, I. Rakkolainen, K. Hopf, H. Urey, V. Sainov, C. von Kopylow, "A Survey of 3DTV Displays: Techniques and Technologies," Circuits and Systems for Video Technology, IEEE Transactions on , vol.17, no.11, pp.1647-1658, Nov. 2007 [16] Lin, C., Ke, C., Shieh, C., and Chilamkurti, N. K. 2006. The Packet Loss Effect on MPEG Video Transmission in Wireless Networks. In Proceedings of the 20th international Conference

51

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

on Advanced information Networking and Applications - Volume 1 (Aina'06) - Volume 01 (April 18 - 20, 2006). AINA. IEEE Computer Society, Washington, DC, 565-572. [17] A. Goldsmith, Wireless Communications. Cambridge University Press, 2005. [18] P. Surman, K. Hopf, I. Sexton, W.K. Lee, R. Bates, ―Solving the 3D problem - The history and development of viable domestic 3-dimensional video displays‖, In (Haldun M. Ozaktas, Levent Onural, Eds.), Three-Dimensional Television: Capture, Transmission, and Display (ch. 13), Springer Verlag, 2007 [19] Pastoor, ―3D displays‖, in (Schreer, Kauff, Sikora, edts.) 3D Video Communication, Wiley, 2005. [20] Wandell, B.A., Foundations of vision, Sinauer Associates, Inc, Sunderland, Massachusetts, USA, 1995. [21] D. Chandler, ―Visual Perception (Introductory Notes for Media Theory Students)‖, MSC portal site, University of Wales, Aberystwyth, available at http://www.aber.ac.uk/media/sections/image.html [22] M. Wexler and J. Boxtel, ―Depth perception by the active observer―, Trends in Cognitive Sciences, 9, 431-438, Sept, 2005 [23] M. McCauley and T. Sharkey, ―Cybersickness: Perception of Self-Motion in Virtual Environments‖ in Presence: Teleoperators and Virtual Environments, 1(3), 311-318., 1992. [24] Julesz, B. Foundations of Cyclopean Perception, The University of Chicago Press, Chicago, 1971. [25] M. Yuen, ―Coding Artefacts and Visual Distortions‖, in (H. Wu. K. Rao eds), Digital Video Image Quality and Perceptual Coding, ISBN 9780824727772 , CRC Press, 2005 [26] M. Yuen and H. R. Wu, ―A survey of MC/DPCM/DCT video coding istortions,‖ Signal Processing, vol. 70, no. 3, pp. 247–278, Nov. 1998. [27] Z. Wang, , A. Bovik, H. Sheikh and E. Simoncelli, ―Image quality assessment: From error visibility to structural similarity‖, IEEE Trans. Image Processing, vol. 13, No. 4, 2004, pp. 600612 [28] ITU-T P.910, Subjective video quality assessment methods for multimedia applications, Recommendation ITU-T P.910, ITU Telecom. Standardization Sector of ITU, September 1999 [29] S. Jumisko-Pyykkö and J. Häkkinen, ―Evaluation of subjective video quality of mobile devices‖, in MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia. New York, NY, USA: ACM Press, pp. 535–538, 2005. [30] S. Winkler and C. Faller, ―Audiovisual quality evaluation of low- bitrate video‖, in Proceedings of SPIE Human Vision and Electronic Imaging, vol. 5666, San Jose, CA, January 16–20, pp. 139–148, 2005 [31] J. Häkkinen, M. Liinasuo, J. Takatalo, and G. Nyman, ―Visual comfort with mobile stereoscopic gaming‖, Proceedings of SPIE, vol. 6055, p. 60550A, 2006.

52

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[32] Joly, N. Montard, and M. Buttin, ―Audio-visual quality and interactions between television audio and video‖, Signal Processing and its Applications, Sixth International, Symposium on. 2001, vol. 2, 2001. [33] M. Halle, ―Autostereoscopic displays and computer graphics‖, International Conference on Computer Graphics and Interactive Techniques, 2005. [34] T. Ebrahimi and F. Pereira, The MPEG-4 Book. Prentice Hall PTR, 2002. [35] H. Knoche, J. D. McCarthy, and M. A. Sasse, ―Can small be beautiful?: assessing image resolution requirements for mobile tv‖, in MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia. New York, NY, USA ACM Press, pp. 829–838, 2005 [36] AJ. Woods, T. Docherty and R. Koch, ―Image distortions in stereoscopic video systems‖, in Procedings of the SPIE Vol. 1915,Stereoscopic Displays and Applications, San Jose, California, 23 September 1993 [37] JY. Son, Y. Gruts and J. Chun et al., ―Distortion analysis in stereoscopic images‖, in Optical Engineering Vol. 41 Society of Photo- Optical Instrumentation Engineering, Seoul, Korea, 1 March 2002 [38] LB. Stelmach, WJ. Tam and F. Speranza, et al., ―Improving the visual comfort of stereoscopic images‖, in Procedings of the SPIE- IS&T Electronic Imaging Vol. 5006, Stereoscopic Displays and Virtual Reality Systems X, Ottawa, Canada, 30 May 2003 [39] MJ. Meesters, WA. IJsselsteijn and PJ.Seuntiens, ―A survey of perceptual evaluations and requirements of three-dimensional TV‖, in Circuits and Systems for Video Technology, IEEE Transactions Vol. 14,Issue 3,pp. 381 – 391, March 2004 [40] IP. Howard and BJ Rogers, ―Binocular Vision and Stereopsis‖, in Oxford University Press, New York, Oxford, 1995 [41] S. Pastoor, ―Human factors of 3D imaging: Results of recent research at Heinrich- HertzInstitut Berlin‖, 2nd International Display Workshop, Hamamatsu, pp. 69-72, 1995 [42] H. Yamanoue, ―The relation between size distortion and shooting conditions for stereoscopic images‖, in Journal of the SMPTE, pp.225-232, 1997, [43] K. Hopf, ―An autostereoskopic display providing comfortable viewing conditions and high degree of telepresence.‖, in IEEE Transaction on Circuits and Systems for Video Technology 10, pp. 359-365, 2000 [44] A. Schertz, ―Source coding of stereoscopic television pictures.‖, IEEE International Conference on Image Processing and its Applications, pp. 462-464, 1992 [45] AM. Ariyaeeinia, ―Distortions in stereoscopic displays‖, in SPIE Vol. 1669, Stereoscopic Displays and Applications III, 30 June 1992 [46] T. Marc, M. Lambooij, W. IJsselsteijn and I. Heynderickx, ―Visual discomfort in stereoscopic displays: a review‖, in Procedings of the SPIE- IS&T Electronic Imaging Vol. 6490, Stereoscopic Displays and Virtual Reality Systems XIV, 5 March 2007 [47] S. Pastoor, ―Human factors of 3D displays in advanced image communication.‖, in Displays 14, pp. 150-157, 1993 53

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[48] P. Surman, I. Sexton, R. Bates, WK. Lee, M. Craven and KC. Yow, ―Beyond 3D television: The multi-modal, multi-viewer TV-system of the future.‖, in SID/SPIED FLOWERS 2003, pp. 208-210, 2003 [49] S. Pastoor, ―Human factors of 3D imaging: Results of recent research at Heinrich- HertzInstitut Berlin‖, 2nd International Display Workshop, Hamamatsu, pp. 69-72, 1995 [50] AJ. Chang, HJ. Kim and JW. Choi et al, ―Ghosting reduction method for color anaglyphs‖, in Procedings of the SPIE- IS&T Electronic Imaging Vol. 6903, Stereoscopic Displays and Virtual Reality Systems XIX, 29 Feb 2008 [51] A. Woods and S. Tan, ―Characterising Sources of ghosting in time-sequential stereoscopic video displays.‖, in Proceedings of the SPIE 4660, pp. 66-77, 2002 [52] J. Konrad, B. Lacotte, E. Dubois, ―Cancellation of image crosstalk in time-sequential displays of stereoscopic video.‖, in IEEE Transaction on Image Processing 9,pp. 897-908, 2000 [53] AJ. Woods, S. Stanley and L.Tan, ―Characterizing sources of ghosting in time-sequential stereoscopic video displays‖, in Procedings of the SPIE- IS&T Electronic Imaging Vol.4660, Stereoscopic Displays and Virtual Reality Systems IX, 23 May 2002 [54] AJ. Woods and T. Rourke, ―Ghosting in anaglyphic stereoscopic images‖, in Procedings of the SPIE- IS&T Electronic Imaging Vol.5291, Stereoscopic Displays and Virtual Reality Systems XI, 21 May 2004 [55] I. Sexton, P. Surman, ‖Stereoscopic and autostereoscopic display systems.‖, in IEEE Signal Processing Magazine, pp. 85-99, 1999 [56] H. Yamanoue, M. Okui and F. Okano, ―Geometrical analysis of puppet-theater and cardboard effects in stereoscopic HDTV images‖, in Circuits and Systems for Video Technology, IEEE Transactions on Volume 16, Issue 6, pp. 744 – 752, June 2006 [57] K. Masaoka, A. Hanazato and M. Emoto et al, ―Spatial distortion prediction system for stereoscopic images‖, in Journal of Electronic Imaging Vol 15(1), 1 January 2006 [58] VV. Petrov and KA. Grebenyuk, ―Optical correction of depth plane curvature image distortion‖, in Proc. of SPIE Vol. 6637, XV International Symposium on Advanced Display Technologies, 22 May 2007 [59] U. Schmidt, ―Professionelle Videostudiotechnik‖, Axel Springer Verlag, Berlin, 4th edition, Mai 2005 [60] C. Fehn, P. Kauff, M. Op de Beeck, F. Ernst, W. IJsselsteijn, M. Pollefeys, L. Van Gool, E. Ofek, and I. Sexton, ―An evolutionary and optimized approach on 3D-TV,‖ in Proc. Int. Broadcast Conf., Amsterdam, The Netherlands, Sep. 2002, pp. 357–365. [61] C. Fehn, ―3D-TV using depth-image-based rendering (DIBR),‖ in Proc. Picture Coding Symp., San Francisco, CA, USA, Dec. 2004. [62] Text of ISO/IEC FDIS 23002-3 Representation of Auxiliary Video and Supplemental Information, ISO/IEC JTC1/SC29/WG11, Jan. 2007, Doc. N8768, Marrakesh, Morocco. [63] Text of ISO/IEC 13818-1:2003/FDAM2 Carriage of Auxiliary Data, ISO/IEC JTC1/SC29/WG11, Jan. 2007, Doc. N8799, Marrakech, Morocco.

54

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[64] C. Fehn, N. Atzpadin, M. Muller, O. Schreer, A. Smolic, R. Tanger, P. Kauff, P., ―An Advanced 3DTV Concept Providing Interoperability and Scalability for a Wide Range of MultiBaseline Geometries,‖ Image Processing, 2006 IEEE International Conference on , vol., no., pp.2961-2964, 8-11 Oct. 2006 [65] X3D-23‖ Users’ Manual. NewSight GmbH. Firmensitz Carl-Pulfrich-Str. 1 07745 Jena, 2006 [66] Sharp Laboratories of Europe, http://www.sle.sharp.co.uk/research/optical_imaging/3d_research.php

website,

[67] G. J. Woodgate, J. Harrold, ―Autostereoscopic display technology for mobile 3DTV applications‖, in Proc. SPIE Vol.6490A-19 (Stereoscopic Displays and Applications XVIII), 2007 [68] S.Uehara, T.Hiroya, H. Kusanagi; K. Shigemura, H.Asada, ―1-inch diagonal transflective 2D and 3D LCD with HDDP arrangement‖, in Proc. SPIE-IS&T Electronic Imaging 2008, Stereoscopic Displays and Applications XIX, Vol. 6803, San Jose, USA, January 2008 [69] J. Konrad and P. Agniel, ―Artefact reduction in lenticular multiscopic 3-D displays by means of anti-alias filtering,‖ in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 5006, pp. 336-347, Jan. 2003 [70] C. Van Berkel and J. Clarke, ―Characterisation and optimisation of 3D-LCD module design‖, in Proc. SPIE Vol. 2653, Stereoscopic Displays and Virtual Reality Systems IV, (Fisher, Merritt, Bolas, edts.), p. 179-186, May 1997 [71] A. Schmidt and A. Grasnick, "Multi-viewpoint autostereoscopic displays from 4D-vision", in Proc. SPIE Photonics West 2002: Electronic Imaging, vol. 4660, pp. 212-221, 2002 [72] A. Punchihewa and D.G. Bailey, ―Artefacts in Image and Video Systems: Classification and Mitigation‖, Institute of Information Sciences & Technology, Massey University [73] D.B. Diner, ―A new definition of Orthostereopsis for 3-D Television‖, IEEE International Conference on Systems, Man and Cybernetics, pp.1053-1058, October 1991 [74] A. Aksay, C. Bilen, G. Bozdagi Akar, ―Subjective evaluation of effects of spectral and spatial redundancy reduction on stereo images‖, in Proc. 13th European Signal Processing Conference, EUSIPCO-2005, Turkey, Sept. 2005 [75] WJ. Tam, ―Image and depth quality of asymmetrically coded stereoscopic video for 3DTV‖, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 23rd Meeting: San Jose, California, USA, 21–27 April, 2007 [76] U. Schmidt, ―Professionelle Videostudiotechnik‖, Axel Springer Verlag, Berlin, 4th edition, Mai 2005 [77] ―Temperatur rauschen iso400 R7308112‖, Wikimedia commons, published under under the terms of the GNU Free Documentation License, available online at, http://de.wikipedia.org/wiki/Bild:Temperatur_rauschen_iso400_R7308112_wp.png [78] ―Chromatic_aberration (comparison)‖, Wikimedia commons, published under under the terms of the GNU Free Documentation License, available online at, http://de.wikipedia.org/wiki/Bild:Chromatic_aberration_%28comparison%29.jpg 55

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[79] M. Yuen and HR. Wu, ―A survey of hybrid MC/DPCM/DCT video coding distortions‖, Signal Processing 70, pp. 247 – 278, 1998 [80] R. Suryakumar, ―Study of the dynamic interactions between vergence and accommodation‖, PhD Thesis, Faculty of Science, U. Waterloo, Ontario, Canada, 2005. [81] FX. Coudoux and M. Gazalet,‖ Reduction of Color Bleeding for 4:1:1 Compressed Video‖ in IEEE Transactions on Broadcasting, Vol. 51, No. 4, December 2005 [82] G. de Haan, ―An overview of flaws in emerging television displays and remedial video processing‖, in IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001 [83] ―Technology, content creation notes‖, White paper, Philips Electronics Nederland, 2006. Available onlie at http://www.business-sites.philips.com/3dsolutions/Downloads/Index.html [84] C. Theobald, S. Wuermlin, E. de Aguiar and C. Niederberger, ―New Trends in 3D Video‖ In: EUROGRAPHICS 2007, Tutorial, 5, 2007. [85] ISO/IEC JTC1/SC29/WG11, ―Text of ISO/IEC FDIS 23002-3 Representation of Auxiliary Video and Supplemental Information‖, Doc. N8768, Marrakech, Morocco, January 2007. [86] ISO/IEC JTC1/SC29/WG11, ―Text of ISO/IEC 13818-1:2003/FDAM2 Carriage of Auxiliary Data‖, Doc. N8799, Marrakech, Morocco, January 2007. [87] ITU T and ISO/IEC JTC 1, ―Advanced video coding for generic audiovisual services‖, ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC, 2003, most recent Version: 2005. [88] P. Bourdon, B. Augereau, C. Olivier and C. Chatellier, ―MPEG-4 Comression artefacts removal on color video sequences using 3D nonlinear diffusion‖, in IEEE International Conference on Acoustics, Speech, and Signal Processing 2004, Vol. 3, pp 729-732, 2004 [89] L. Chang, YP. Tan and HC. Chua, "Detection and Removal of Rainboweffect Artifacts," Image Processing, 2007. ICIP 2007. IEEE International Conference on , vol.1, no., pp.I -297-I 300, Sept. 16 2007-Oct. 19 2007 [90] JW. Lee, HS. Le, RH. Park and S. Kim, "Reduction of Dot Crawl and Rainbow Artifacts in the NTSC Video," Consumer Electronics, IEEE Transactions on , vol.53, no.2, pp.740-748, May 2007 [91] Blanksby, A.J., M.J. Loinaz, D.A. Inglis, and B.D. Ackland, .Noise performance of a color CMOS photogate image sensor., IEEE Int. Electron Devices Meeting 97 Tech. Dig., pp. 205208, 1997. [92] Costantini, R., and S. Süsstrunk, .Virtual Sensor Design., Proc. IS&T/SPIE Electronic Imaging 2004: Sensors and Camera Systems for ScientiÞc, Industrial, and Digital Photography Applications V, vol. 5301, pp. 408-419, 2004. [93] Tian, H., B. Fowler, and A. El Gamal, .Analysis of Temporal Noise in CMOS Photodiode Active Pixel Sensor., IEEE Journal of Solid-State Circuits, vol. 36, no. 1, pp. 92.101, Jan. 2001. [94] Wach, H., and E.R. Dowski, Jr., .Noise modeling for design and simulation of computational imaging systems., Proc. SPIE, Visual Information Processing XIII, vol. 5438, pp. 159-170, July 2004.

56

MOBILE3DTV

D5.1 Classification of stereoscopic artefacts

[95] Zwicker, M.; Vetro, A.; Yea, S.; Matusik, W.; Pfister, H.; Durand, F., "Resampling, Antialiasing, and Compression in Multiview 3-D Displays", IEEE Signal Processing Magazine, ISSN: 1053-5888, Vol. 24, Issue 6, pp. 88-96, November, 2007 [96] Martinian, E.; Behrens, A.; Xin, J.; Vetro, A.; Sun, H., "Extensions of H.264/AVC for Multiview Video Compression", IEEE International Conference on Image Processing (ICIP), ISSN: 1522-4880, pp. 2981-2984, October 2006 [97] C. Fehn, ―3D-TV using depth-image-based rendering (DIBR),‖ in Proc. Picture Coding Symp., San Francisco, CA, USA, Dec. 2004. [98] Text of ISO/IEC FDIS 23002-3 Representation of Auxiliary Video and Supplemental Information, ISO/IEC JTC1/SC29/WG11, Jan. 2007, Doc. N8768, Marrakesh, Morocco. [99] Häkkinen, J., Pölönen, M., Takatalo, J., and Nyman, G. 2006. Simulator sickness in virtual display gaming: a comparison of stereoscopic and non-stereoscopic situations. In Procs. MobileHCI '06, vol. 159. ACM, New York, NY [100] J. Poikonen, J. Paavola, ―Error Models for the Transport Stream Packet Channel in the DVB-H Link Layer‖, Proc. ICC 2006, Istanbul, Turkey, 2006. [101] CELTIC-WINGTV TV/default.asp )

project

(http://www.celtic-initiative.org/Projects/WING-

[102] COST207, Digital land mobile radio communications (final report), Commission of the European Communities, Directorate General Telecommunications, Information Industries and Innovation, 1989, pp. 135 147

57

Mobile 3DTV Content Delivery Optimization over DVB-H System

MOBILE3DTV - Mobile 3DTV Content Delivery Optimization over DVB-H System - is a three-year project which started in January 2008. The project is partly funded by the European Union 7th RTD Framework Programme in the context of the Information & Communication Technology (ICT) Cooperation Theme. The main objective of MOBILE3DTV is to demonstrate the viability of the new technology of mobile 3DTV. The project develops a technology demonstration system for the creation and coding of 3D video content, its delivery over DVB-H and display on a mobile device, equipped with an auto-stereoscopic display. The MOBILE3DTV consortium is formed by three universities, a public research institute and two SMEs from Finland, Germany, Turkey, and Bulgaria. Partners span diverse yet complementary expertise in the areas of 3D content creation and coding, error resilient transmission, user studies, visual quality enhancement and project management. For further information about the project, please visit www.mobile3dtv.eu.

Tuotekehitys Oy Tamlink Project coordinator FINLAND

Tampereen Teknillinen Yliopisto Visual quality enhancement, Scientific coordinator FINLAND

Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V Stereo video content creation and coding GERMANY

Technische Universität Ilmenau Design and execution of subjective tests GERMANY

Middle East Technical University Error resilient transmission TURKEY

MM Solutions Ltd.

Design of prototype terminal device BULGARIA MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of the Seventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects only the authors’ views and the Community or other project partners are not liable for any use that may be made of the information contained therein.