From 3D video communications to immersive shared experiences

Instituto de Telecomunicações Instituto Politécnico de Leiria From 3D video communications to immersive shared experiences Pedro A. Amado Assuncao S...
Author: Gwen Singleton
0 downloads 2 Views 15MB Size
Instituto de Telecomunicações Instituto Politécnico de Leiria

From 3D video communications to immersive shared experiences

Pedro A. Amado Assuncao SceneNet Workshop, 19 October, 2015 © 2005, it - instituto de telecomunicações. Todos os direitos reservados.

Outline §  3D/Multiview visual communications §  Multiview video §  The communication chain §  Coding and Transmission §  Human factors §  Quality of Experience §  Conclusion

2

3D visual communications

§  To deliver visual information perceived as realistic scenes of dynamic three-dimensional spaces

§  To interact with different elements in the scene §  To get the feeling of being there… Quality of Experience

§  The level of success of 3D technologies depends on their capability to meet these requirements Human Factors

3

Multiview video # Views

Evolution towards higher resolution and free viewpoint

N

FTV

HD-FTV

UHD-FTV

2

3DTV

3D-HDTV

UHD-FTV

1

TV

HDTV

UHD

720 x 576

1920 x 1080

3840 x 2160 7680 x 4320

4

Resolution

Multiview video # depths Scene Geometry



3D model + Texture 1 view + N depths 1 view + 1 depth

1 0

TV

1 5

Multiview

N

Super Multiview

Holoscopic Lightfield

∞ views

3D Visual Scene - stereo Not really new..

Stereoscopic Images appeared in the 19th century The Stereoscope: 1838 (Charles Wheatstone)

6

3D Visual Scene - Multiview

Same time instant Two views èstereoscopic pair 7

Multiview video + depth time

view

8

3D Visual Scene – Multiview capture

Copyright© 2010 ITE, IPSJ, IEIJ, IEEJ, IEICE, NII. JSPS. All Rghts Reserved.

9

Multiview + depth Virtual views

Free view point 2D or 3D

10

MPEG reference framework (2015) Super Multiview and Free Navigation 80+ views

Sparse views

Challenge: Coding efficiency

Challenge: Rendering for Free Navigation

11

Light Field: Plenoptic function The set of light rays passing through any point in space Plenoptic function (7D) Plenus : complete or full Optic

.  . 

12

P(θ,φ,λ,t,VX,VY,VZ)

.  Position .  Direction .  Wavelenght .  Time

Plenoptic camera Tradeoff between spatial resolution and directional resolution

13

Plenoptic Image A large set of micro-images Increased redundancy Low disparity, very high resolution sensor.. ~4x 2D resolution Plenoptic Camera Rawphotography Image Computational

Example

Compute “refocused“ image

14

Plenoptic (Light Field) cameras New standardization activity JPEG PLENO (2015)

15

3D Displays - holographic playing basic rules

2D vs 3D ght emission 2D: Light beams are emitted in all stems having a screen, conforming with current point on the ons (alsodirections true for thefrom outereach surface of volumetric screen surface

Optical modules project light beams with different angles of incidence onto a screen

Light-field displays

Direction selective screen

s-based, 3D: as separation is achieved in awhere different way Light emitting surface, light

o create beams a light emitting surface, we are are emitted from where each point in able beams from each point a controlled wayin a controlled way

by Holografika odules

light beams to hit the points of a with multiple beams under various of incidence

hic screen

on selective property

reconstruction instead of views angle geometry determined

ojected module image not a 2D view 16

Viewing angles of 3D displays

HoloVizio displays

angle HoloVizio monitor

Vizio 80WLT

rlier: 96ND, 128WD, 128WLD

pixel,  30”  (16:10) degrees FOV freedom in 3D experinece valid zones peated views

uivalent image resolution x 768 (WXGA) olors le DVI inputs

17

Continuous motion parallax

Remarks

§  Video formats are rapidly evolving to UHD with many views + geometric information (e.g. depth)

§  Free View navigation and continuous motion parallax requires a huge number of views – impossible without virtual views

§  Mobile, non-structured 3D content creation is not a specific concern in standardization bodies (e.g. MPEG), but not constrained.

§  Holografic displays seem to be the most promising technology.

18

Coding and Transmission

19

3D/Multiview video communication chain Users

Users

20

Coding (compression) of visual information • 

• 

2 basic principles • 

Remove redundancy (statistical)

• 

Remove irrelevancy (perceptual)

2 types of result • 

Less data needed to transmit the scene

• 

Coding distortion è Perceived or not ?

21

Evolution From MPEG-2 (1994) to HEVC (2013)

22

MPEG-2 Encoder (1994) Perceptual factors (HVS)

23

HEVC Encoder (20 years later) Perceptual factors (HVS)

24

H.264/AVC ! HEVC (e.g. Intra prediction)

H.264/AVC 8 Angular Modes + 1

HEVC 33 Angular

10 years evolution !

25

Modes + 2

Video compression technology evolution (10 years) the same underlying principles

H.264/AVC

26

HEVC

MV-HEVC: Coding of multiview video (2014) •  The spatio-temporal prediction structure was expanded to include additional inter-view modes •  High level of coding dependency ! Constrains Interactivity, free navigation.

27

3D-HEVC: Coding of Multiview video + depth (2015) • 

Joint texture-depth coding - depth parameters inferred from texture

• 

View synthesis optimisation : depth coding rate vs synthesis quality

28

Communication Network • 

No longer a simple link between two compatible devices

• 

Cloud network functions • 

Interconnection

• 

Processing

• 

Adaptation

• 

Delivery

• 

Store

• 

… 29

e.g., splitting streams – MDC, multipath Robust transmission Dynamic transcoding

30

e.g., hybrid networks - expanding 2D Enabling 3D / multiview 2D TV Broadcast

Free navigation

2D TV users

Video Stream

3D TV users

Depth Stream IP Network

Depth Stream

Quality monitoring ?

Content Server (Video + Depth)

Virtual views 3D Video Quality Monitor

31

NR Models

Remarks

§  Standard video codecs are based on the same coding paradigm for the last decades: predictive coding

§  Human perceptual factors embedded in video codecs did not evolve substantially over the years – other factors, such as psychological, emotional and cognitive have been mostly far away from these technologies

§  Quality of experience (QoE) in delivery services has received a lot of attention at the receiver side, but new QoE-aware technology is required to cope with new forms of content creation/sharing and social interaction.

32

Beyond pixels and bits

33

Moving forward

Moving from a concept of scene capture, coding and delivery to a different one, containing more or less the idea of evolution from a Technology-based approach to a User-driven approach.

34

Known evidence about the user: The brain fusion function in stereoscopic video Two images - 

Merged in the brain to form a single view

Besides the signals, the perceived quality (including depth) also depends on elements that exist outside the signals (e.g. viewing conditions) and brain processing functions..

35

From depth perception to immersion The feeling of being there… The challenge: Technology for full immersion or the Art for misleading 5 senses ? & a lot more .. 36

The user experience – multisensory brain Vision captures sound

•  The ventriloquist effect •  The McGurk effect https://www.youtube.com/watch?v=G-lN8vWm3m0

Sound changes vision

•  Motion/bounce illusion. http://www.michaelbach.de/ot/mot_bounce/index.html Current audio and video coding standards do not exploit multisensory nature of our brain ! 37

Visual attention Visual experience depends critically on attention • 

Visual attention models allow to selectively distinguish the relevance of visual content, from the user point of view.

• 

Derived from eye-fixation maps and/or cognitive functions related to psychological or neurophysiological aspects

38

Saliency maps from different visual attention models

39

Integration of Visual Attention models in 3D Multimedia

§  Perceptual coding (using RoI) §  Summarization (relevance vs dissimilarity) §  Quality evaluation (towards QoE) §  Content retargeting (UHD è mobile) §  Unequal Error Protection (wireless networks) §  Rendering (error concealment, visual discomfort)

40

oments for anoments, nts (two 0; SD = marked to corparticids were

§  Emotional response obtained from watching a visual scene Measured as Electrodermal Activity (EDA) 11.0

Useful for

10.5

Content Creation

10.0

μS

ionship was balwatched VD, but not the in mehe conhave to

Emotional states

9.5

QoE monitoring

9.0 8.5

Customer satisfaction

8.0

… Seconds from the event

Figure 1. Example of phasic EDA response to an emotional event

HOW VISUAL DISCOMFORT AFFECTS 3DTV VIEWERS’ EMOTIONAL AROUSAL Miguel Barreda-Ángeles, Romuald Pépion, Emilie Bosc, Patrick Le Callet, and Alexandre Pereda-Baños 3DTV-Con 2014 41

4. RESULTS

Familiar

LIVEVQDB 6

Perceived Quality: Type of content, Emotional, Cognitive and Conative factors Anger Disgust Fear Happiness Sadness Surprise

Video set Proposed Interesting LIVEVQDB Proposed The Scientific Wo Familiar LIVEVQDB Proposed LIVEVQDB Proposed LIVEVQDB Proposed Appealing Anger LIVEVQDB Proposed Watch again Disgust LIVEVQDB

Share video Fear

The proposed set also achiev Happiness the six basic emotions observed, finding is probably a result of o

Evaluating the Role of Content in Subjective Video Quality Assessment M. Mirkovic, P. Vrgovic, D. Culibrk, D. Stefanovic, A. Anderla Hindawi, Scientific World Journal, 2014 42

Remarks Reaching high levels of QoE is always a goal: To measure the QoE, besides evaluating technological and low-level perceptual parameters, one must consider higher-level cognitive and emotional processes, and, eventually, even social aspects of the experience. (e.g. referring to the context of creation, sharing and consumption)

43

Conclusion To reach the next breakthrough deeper knowledge about perception, emotions, psychology, etc, must be integrated in multimedia technology This is no longer for engineers only J  L Thanks !

44

Suggest Documents