Instituto de Telecomunicações Instituto Politécnico de Leiria
From 3D video communications to immersive shared experiences
Pedro A. Amado Assuncao SceneNet Workshop, 19 October, 2015 © 2005, it - instituto de telecomunicações. Todos os direitos reservados.
Outline § 3D/Multiview visual communications § Multiview video § The communication chain § Coding and Transmission § Human factors § Quality of Experience § Conclusion
2
3D visual communications
§ To deliver visual information perceived as realistic scenes of dynamic three-dimensional spaces
§ To interact with different elements in the scene § To get the feeling of being there… Quality of Experience
§ The level of success of 3D technologies depends on their capability to meet these requirements Human Factors
3
Multiview video # Views
Evolution towards higher resolution and free viewpoint
N
FTV
HD-FTV
UHD-FTV
2
3DTV
3D-HDTV
UHD-FTV
1
TV
HDTV
UHD
720 x 576
1920 x 1080
3840 x 2160 7680 x 4320
4
Resolution
Multiview video # depths Scene Geometry
∞
3D model + Texture 1 view + N depths 1 view + 1 depth
1 0
TV
1 5
Multiview
N
Super Multiview
Holoscopic Lightfield
∞ views
3D Visual Scene - stereo Not really new..
Stereoscopic Images appeared in the 19th century The Stereoscope: 1838 (Charles Wheatstone)
6
3D Visual Scene - Multiview
Same time instant Two views èstereoscopic pair 7
Multiview video + depth time
view
8
3D Visual Scene – Multiview capture
Copyright© 2010 ITE, IPSJ, IEIJ, IEEJ, IEICE, NII. JSPS. All Rghts Reserved.
9
Multiview + depth Virtual views
Free view point 2D or 3D
10
MPEG reference framework (2015) Super Multiview and Free Navigation 80+ views
Sparse views
Challenge: Coding efficiency
Challenge: Rendering for Free Navigation
11
Light Field: Plenoptic function The set of light rays passing through any point in space Plenoptic function (7D) Plenus : complete or full Optic
. .
12
P(θ,φ,λ,t,VX,VY,VZ)
. Position . Direction . Wavelenght . Time
Plenoptic camera Tradeoff between spatial resolution and directional resolution
13
Plenoptic Image A large set of micro-images Increased redundancy Low disparity, very high resolution sensor.. ~4x 2D resolution Plenoptic Camera Rawphotography Image Computational
Example
Compute “refocused“ image
14
Plenoptic (Light Field) cameras New standardization activity JPEG PLENO (2015)
15
3D Displays - holographic playing basic rules
2D vs 3D ght emission 2D: Light beams are emitted in all stems having a screen, conforming with current point on the ons (alsodirections true for thefrom outereach surface of volumetric screen surface
Optical modules project light beams with different angles of incidence onto a screen
Light-field displays
Direction selective screen
s-based, 3D: as separation is achieved in awhere different way Light emitting surface, light
o create beams a light emitting surface, we are are emitted from where each point in able beams from each point a controlled wayin a controlled way
by Holografika odules
light beams to hit the points of a with multiple beams under various of incidence
hic screen
on selective property
reconstruction instead of views angle geometry determined
ojected module image not a 2D view 16
Viewing angles of 3D displays
HoloVizio displays
angle HoloVizio monitor
Vizio 80WLT
rlier: 96ND, 128WD, 128WLD
pixel, 30” (16:10) degrees FOV freedom in 3D experinece valid zones peated views
uivalent image resolution x 768 (WXGA) olors le DVI inputs
17
Continuous motion parallax
Remarks
§ Video formats are rapidly evolving to UHD with many views + geometric information (e.g. depth)
§ Free View navigation and continuous motion parallax requires a huge number of views – impossible without virtual views
§ Mobile, non-structured 3D content creation is not a specific concern in standardization bodies (e.g. MPEG), but not constrained.
§ Holografic displays seem to be the most promising technology.
18
Coding and Transmission
19
3D/Multiview video communication chain Users
Users
20
Coding (compression) of visual information •
•
2 basic principles •
Remove redundancy (statistical)
•
Remove irrelevancy (perceptual)
2 types of result •
Less data needed to transmit the scene
•
Coding distortion è Perceived or not ?
21
Evolution From MPEG-2 (1994) to HEVC (2013)
22
MPEG-2 Encoder (1994) Perceptual factors (HVS)
23
HEVC Encoder (20 years later) Perceptual factors (HVS)
24
H.264/AVC ! HEVC (e.g. Intra prediction)
H.264/AVC 8 Angular Modes + 1
HEVC 33 Angular
10 years evolution !
25
Modes + 2
Video compression technology evolution (10 years) the same underlying principles
H.264/AVC
26
HEVC
MV-HEVC: Coding of multiview video (2014) • The spatio-temporal prediction structure was expanded to include additional inter-view modes • High level of coding dependency ! Constrains Interactivity, free navigation.
27
3D-HEVC: Coding of Multiview video + depth (2015) •
Joint texture-depth coding - depth parameters inferred from texture
•
View synthesis optimisation : depth coding rate vs synthesis quality
28
Communication Network •
No longer a simple link between two compatible devices
•
Cloud network functions •
Interconnection
•
Processing
•
Adaptation
•
Delivery
•
Store
•
… 29
e.g., splitting streams – MDC, multipath Robust transmission Dynamic transcoding
30
e.g., hybrid networks - expanding 2D Enabling 3D / multiview 2D TV Broadcast
Free navigation
2D TV users
Video Stream
3D TV users
Depth Stream IP Network
Depth Stream
Quality monitoring ?
Content Server (Video + Depth)
Virtual views 3D Video Quality Monitor
31
NR Models
Remarks
§ Standard video codecs are based on the same coding paradigm for the last decades: predictive coding
§ Human perceptual factors embedded in video codecs did not evolve substantially over the years – other factors, such as psychological, emotional and cognitive have been mostly far away from these technologies
§ Quality of experience (QoE) in delivery services has received a lot of attention at the receiver side, but new QoE-aware technology is required to cope with new forms of content creation/sharing and social interaction.
32
Beyond pixels and bits
33
Moving forward
Moving from a concept of scene capture, coding and delivery to a different one, containing more or less the idea of evolution from a Technology-based approach to a User-driven approach.
34
Known evidence about the user: The brain fusion function in stereoscopic video Two images -
Merged in the brain to form a single view
Besides the signals, the perceived quality (including depth) also depends on elements that exist outside the signals (e.g. viewing conditions) and brain processing functions..
35
From depth perception to immersion The feeling of being there… The challenge: Technology for full immersion or the Art for misleading 5 senses ? & a lot more .. 36
The user experience – multisensory brain Vision captures sound
• The ventriloquist effect • The McGurk effect https://www.youtube.com/watch?v=G-lN8vWm3m0
Sound changes vision
• Motion/bounce illusion. http://www.michaelbach.de/ot/mot_bounce/index.html Current audio and video coding standards do not exploit multisensory nature of our brain ! 37
Visual attention Visual experience depends critically on attention •
Visual attention models allow to selectively distinguish the relevance of visual content, from the user point of view.
•
Derived from eye-fixation maps and/or cognitive functions related to psychological or neurophysiological aspects
38
Saliency maps from different visual attention models
39
Integration of Visual Attention models in 3D Multimedia
§ Perceptual coding (using RoI) § Summarization (relevance vs dissimilarity) § Quality evaluation (towards QoE) § Content retargeting (UHD è mobile) § Unequal Error Protection (wireless networks) § Rendering (error concealment, visual discomfort)
40
oments for anoments, nts (two 0; SD = marked to corparticids were
§ Emotional response obtained from watching a visual scene Measured as Electrodermal Activity (EDA) 11.0
Useful for
10.5
Content Creation
10.0
μS
ionship was balwatched VD, but not the in mehe conhave to
Emotional states
9.5
QoE monitoring
9.0 8.5
Customer satisfaction
8.0
… Seconds from the event
Figure 1. Example of phasic EDA response to an emotional event
HOW VISUAL DISCOMFORT AFFECTS 3DTV VIEWERS’ EMOTIONAL AROUSAL Miguel Barreda-Ángeles, Romuald Pépion, Emilie Bosc, Patrick Le Callet, and Alexandre Pereda-Baños 3DTV-Con 2014 41
4. RESULTS
Familiar
LIVEVQDB 6
Perceived Quality: Type of content, Emotional, Cognitive and Conative factors Anger Disgust Fear Happiness Sadness Surprise
Video set Proposed Interesting LIVEVQDB Proposed The Scientific Wo Familiar LIVEVQDB Proposed LIVEVQDB Proposed LIVEVQDB Proposed Appealing Anger LIVEVQDB Proposed Watch again Disgust LIVEVQDB
Share video Fear
The proposed set also achiev Happiness the six basic emotions observed, finding is probably a result of o
Evaluating the Role of Content in Subjective Video Quality Assessment M. Mirkovic, P. Vrgovic, D. Culibrk, D. Stefanovic, A. Anderla Hindawi, Scientific World Journal, 2014 42
Remarks Reaching high levels of QoE is always a goal: To measure the QoE, besides evaluating technological and low-level perceptual parameters, one must consider higher-level cognitive and emotional processes, and, eventually, even social aspects of the experience. (e.g. referring to the context of creation, sharing and consumption)
43
Conclusion To reach the next breakthrough deeper knowledge about perception, emotions, psychology, etc, must be integrated in multimedia technology This is no longer for engineers only J L Thanks !
44