Interactive Free Viewpoint 3D TV Rendering Platform

Interactive Free Viewpoint 3D TV Rendering Platform Hendrik Boer November 2010 MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010 1 Interactive Free Vi...

Author: Cecilia Miller

3 downloads 2 Views 580KB Size

Report

Download PDF

Recommend Documents

Free-viewpoint rendering algorithm for 3D TV

Real-time, Free-viewpoint Video Rendering from Volumetric Geometry

Hardware-accelerated Autostereogram Rendering for Interactive 3D Visualization

Development of MPEG Standards for 3D and Free Viewpoint Video

Interactive Rendering of Dynamic Geometry

A Platform for Gaming in Digital Interactive TV *

Image Based Rendering. Fast Realistic Rendering without 3D models

Out of Interactive TV

Mobile Interactive TV

Interactive TV for Hospitality

irun: Interactive Rendering of Large Unstructured Grids

Interactive Inverse 3D Modeling

Interactive 3D Model Completion

Interactive Volume Rendering Aurora on the GPU

Interactive Rendering using the Render Cache

The Piranesi system for interactive rendering

3D Movie Making for Kids: & Cheap 3D Rendering Programs

LIVE 3D TV STREAMING

Global 3D TV Forecasts

IPTV System. Interactive IPTV Platform

Building an Interactive Communications Platform

BENUTZERHANDBUCH LED-TV (CINEMA-3D-Smart-TV)

3D Audiovisual Rendering and Real-Time Interactive Control of Expressivity in a Talking Head

Rendering of 3D Dynamic Virtual Environments

Interactive Free Viewpoint 3D TV Rendering Platform Hendrik Boer November 2010

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

1

Interactive Free Viewpoint 3D TV Rendering Platform Hendrik Boer Eindhoven University of Technology Electronic Systems Group [email protected]

Abstract—This paper concerns the development of the rendering platform for the iGLANCE project, to allow interactive Free Viewpoint for 3D TV (FTV). It describes the development of an architecture for the FTV rendering platform and the mapping of an interactive FTV algorithm. Such an algorithm allows the viewer to select and interactively change the viewing position. The FTV rendering platform contains an FTV processor which is based on an image signal processor template from Silicon Hive. It combines regular RISC operations with SIMD operations. Several optimizations have been researched and applied; e.g., vectorization and loop optimizations, which reduced the operation and cycle count of the algorithm by respectively a factor of 6 and 13. The developed architecture for the platform has been realized using FPGA technology. It supports full HD video processing, however at an effective frame rate of 0.5 frames per second. Index Terms—3D TV, free viewpoint, FTV, mapping, rendering platform, Silicon Hive, vectorization

I. I NTRODUCTION TVs allow depth perception of displayed scenes. This stereoscopic effect is obtained by presenting a slightly different image of a 3D scene to the left and the right eye of a viewer. Currently available commercially 3D TV systems support a single static viewpoint. Interactive Free Viewpoint for 3D TV (FTV) allows the viewer to select and interactively change the viewing position. The iGLANCE project aims at making this interactive free viewpoint selection possible in 3D TV broadcasted media. The iGLANCE project started in October 2008 and aims at researching methods of receiving and rendering free-view 3D TV sequences. The project focuses on the reception and decoding of multi-view video streams on a decoding platform and free viewpoint video interpolation on an FTV rendering platform. One of the key components of this system is view interpolation and its related challenges. IGLANCE also aims to actively contribute to the standardization process for the future 3D TV, and to facilitate, by a pertinent demonstration, the mass adoption and the commercial deployment of this 3D TV system. The iGLANCE project is part of the European MEDEA+ program. It has been set up and labelled within the framework of EUREKA (E! 2365) to ensure Europe’s continued technological and industrial competitiveness in this sector. Silicon Hive B.V. is one of the European partners within the iGLANCE project [1]. Silicon Hive B.V. licenses flexible system solutions with focus on designing architectures and building blocks of parallel

3D

multi-processors. The solutions and parallel processor technology are embedded and applied in high performance camera systems, video systems, and wireless communication systems. Within the iGLANCE project, Silicon Hive is investigating whether its processor technology can be extended for 3D video algorithms. This includes the development of an architecture for the FTV rendering platform and the mapping of the FTV algorithm from iGLANCE partner Eindhoven University of Technology, Video Coding & Architectures (VCA) research group. Related work with respect to the development of the FTV rendering platform and the mapping of the FTV algorithm are described in Section II. The iGLANCE demonstrator, which the FTV rendering platform is part of, is discussed in Section III. In Section IV the FTV algorithm is described. Next, Section V describes the development of the architecture for the FTV rendering platform. Section VI describes the mapping of the FTV algorithm and its challenges. The paper ends with conclusions and future work in Section VII. II. R ELATED W ORK Free viewpoint algorithms include (multi)-view image interpolation, image and depth map warping and their interpolation [2]. Various techniques exist for Image Based Rendering (IBR) of which light field [3] and lumigraph rendering [4] are two well-known methods. The latest FTV algorithms use depth maps to improve rendering quality. This technique for rendering is called Depth Image Based Rendering (DIBR) [5]. Proposed real-time rendering architectures suggest the use of commercially available Graphics Processing Units (GPUs) [6][7]. In this work we focus on the development of an application specific architecture. In consumer electronics context, which has strict requirements for energy efficiency and cost, the use of GPUs is not suitable [8][9]. [10] Proposes a hardware-oriented DIBR algorithm and its associated hardware architecture with a processing capability of 25 SDTV (720 x 576 pixels) frames per second (fps) in left and right channel simultaneously. The iGLANCE FTV rendering platform capability should be 30 full High Definition (HD) (1920 x 1080 pixels) fps, which is today’s resolution standard in television technology. The required processing capability of the iGLANCE FTV rendering platform is thus triple the processing capability as achieved in [10]. In [11] an architecture is proposed which can achieve real-time multi-

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

view rendering of full HD at 60 Hz using an FPGA. A realtime interactive ray-space rendering approach is proposed and implemented on an FPGA platform in [12]. The mentioned architectures implement different methods and structure of algorithms as for the iGLANCE project. Furthermore, the developed architecture for the iGLANCE FTV rendering platform contains an application specific processor. This allows more flexibility through programmability as opposed to the approaches in [11] and [12] for which the applications are implemented in hardware. III. I GLANCE DEMONSTRATOR The objective of the iGLANCE project is to design a demonstrator system that can make use of the infrastructure available today and to design and implement a hardware solution for real-time decoding of video streams and processing of the free viewpoint algorithm. Figure 1 defines the iGLANCE transmission chain, showing its main components.

2

Fig. 2. A 2D image (left) and its corresponding depth image (right), using the Ballet Sequence from Microsoft Research [13]

B. Scenarios The iGLANCE project targets the consumer and the healthcare application fields. Consumer scenario FTV Bypass demonstrates a full HD at 30 Hz per eye decoding chain (see Figure 3a), or a full HD side-by-side 60 Hz decoding chain as illustrated in Figure 3b. The FTV interpolation on the FTV rendering platform is bypassed in this scenario. This is what the market is requesting for; a short term deployment of 3D@home.

3D video generation H.264 decoding

v6 d6 v8 v8 d8 v8 v4 d4 v8

D

v0 d0 v8 v2 d2 v8 v4 d4 v8

cam

R

R

cam

cam

L

D

cam

cam

R

L

cam

Free Viewpoint Interpolation

iGLANCE Healthcare scenario

H.264 decoding

cam

L

Free Viewpoint Interpolation

iGLANCE Consumer scenario – FTV

H.264 decoding

cam

Free Viewpoint Interpolation

iGLANCE Consumer scenario – FTV Bypass

R

Free Viewpoint Interpolation

cam

User input

L

H.264 decoding

cam

coded transmission

iGLANCE Consumer scenario – FTV Bypass

H.264 encoding

H.264 decoding

Free Viewpoint Interpolation

iGLANCE system

R

v0 v1 v2 v3 v4 v5 v6 v7 v8

cam

R

L

FTV

cam

L

R FTV

cam

R

L

cam

The iGLANCE transmission chain components

cam

L

Fig. 1.

cam

3D Panel

A. Overview Each video stream frame contains 2D texture and its respective depth map, as shown in Figure 2. These video streams are encoded by H.264. Decoding of the video streams is done on the iGLANCE decoding platform. This platform contains a specially developed multi-stream decoding System on Chip (SoC), targeting new 3D TV applications. The decoded video streams are packed and transmitted to the iGLANCE FTV rendering platform through an HDMI channel, as illustrated in Figure 3. The free viewpoint interpolation is performed on the FTV rendering platform. The FTV rendering platform contains an FPGA which allows fast development-cycles for the demonstrator system. The (interpolated) views are shown on a 3D panel, which is connected through HDMI.

3D Panel

a

3D Panel

b

3D Panel

c

3D Panel

d

Fig. 3. The iGLANCE scenarios full HD frame packaging and interpolation on the FTV rendering platform

Consumer scenario FTV demonstrates the feasibility of a free view point selection based on two views. The decoding chain is full HD side-by-side of texture and depth, and framesequential of left and right camera views, as illustrated in Figure 3c. The FTV algorithm interpolates the views for left and right eye according to the interactively chosen viewpoint. The interpolated views for left and right eye are packed side-

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

3

Tleft Warp depthmap

Dleft Camera left

Combine depthmap

Camera right

Dright

Inverse warp texture left

Median filtering

Dilate disocclusions

Disocclusion inpainting

Blending texture maps

Warp depthmap

Inverse warp texture right

Disocclusion inpainting

Dilate disocclusions

Tright Fig. 4.

The FTV algorithm pipeline. Dleft indicates the depth map of the left camera view and Tleft the texture map of the left camera view.

by-side in a full HD frame and sent to the 3D panel. The 3D panel takes care of scaling the side-by-side views for left and right eye to full HD, displaying the views at 30 Hz per eye. The healthcare scenario has been defined for healthcare applications, but classified as ”nice to have”, due to the lack of an HD auto-stereoscopic panel on the market. The healthcare scenario shows how to deal with the picture format to package several SD (640 x 360 pixels) views in a single HD frame and the interpolation of several views. In this paper we will focus on consumer scenario FTV (Figure 3c).

A. Overview Figure 4 illustrates the algorithm pipeline, which consists of generating a depth map and a texture map at the interpolated position. DIBR algorithms are based on warping a camera view to another view [14]. The depth map at the interpolated position is therefore created by warping the two camera view depth maps to the interpolated view depth map. Figure 5 illustrates the warping of the depth map from the left camera view. The warped depth maps are combined to get the depth

C. Target platform The FTV rendering platform is developed on the Gladiator system. The Gladiator system can be broadly divided into two sub-systems; the Host board and the Gladiator board (FPGAs). The Host is a Cirrus logic board, which contains an ARM processor. The Host board is used for configuring and controlling the Gladiator board. The Gladiator board involves two Xilinx Virtex-5 FPGAs. These two FPGAs are used for processing logic tiles developed by Silicon Hive. The communication between the two FPGAs is facilitated through 350 single-ended connections. FPGA1 receives HDMI frames through an HDMI receiver IC on the Gladiator board, while FPGA2 interfaces with the HDMI transmitter IC to send the (interpolated) views to the 3D panel. Each FPGA has two external memory banks containing two 32-bit DDR2 2GB SDRAMs. The Gladiator system contains furthermore an LCD interface, a 1024 x 768 pixels TFT LCD, a camera module interface, an Ethernet interface and two USB ports. The Gladiator system is successfully used in previous projects within Silicon Hive, for example for the demonstration of various camera-solutions. The time-to-market is shortened by using this target platform and already available IP blocks instead of the development of a dedicated iGLANCE FTV rendering platform.

Fig. 5.

Warping of the left camera view depth map

map at the interpolated position, by selecting the smallest depth (closest to camera) for every point. Due to rounding errors and introduced disocclusions, the generated depth map contains cracks, as visible in Figure 6a.

IV. A LGORITHM The algorithm is a combination of well-known techniques in literature [5]. It consists of two steps, the depth map generation at the interpolated position and the generation of the texture map at the interpolated position. This section explains the algorithm in more detail and describes the partitioning of the algorithm into control and data-plane parts.

(a) Combined warped depth map (b) Combined warped depth map before median filtering after median filtering Fig. 6. Median filtering fills in empty pixels and smoothens the depth image while preserving edges

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

The interpolated depth map can also have errors when the background is visible where foreground is expected. These errors are corrected by a median filter (3x3). Median filtering also smoothens the depth map while preserving the edges of objects. Figure 6 illustrates the combined depth map after warping and the effectiveness of median filtering. After median filtering, there still might be some holes in the depth map. These are the overlapping disocclusions, from the warped camera view depth maps. At the interpolated position, there is no information from the left and right camera views [15][16]. The disocclusions are filled by determining for every pixel of the disoccluded area the closest non-disoccluded neighbor, in 8 directions. The disocclusion is filled by copying the maximum depth value (farthest from camera) of these 8 neighbors to the disoccluded pixel. This method is based on the assumption that the disocclusions have a depth equal to the surrounding background. Figure 7 shows the result of filling the disocclusions within the depth map.

4

and foreground are smeared at the edges. Textures of the foreground may be projected to places in the background (see Figure 9). This is visible as a ghost contour in the background. The ghosting errors are solved by enlarging the disocclusions from the left and right projected texture images before the blending process. The accompanying disadvantage is that more disocclusions might occur. The resulting texture

(a) With ghost contours Fig. 9.

(a) Depth map containing disocclusions Fig. 7.

(b) Disocclusions filled in

Disocclusions are filled in with background depth

From the interpolated depth map an inverse projection for every pixel is done to the original left and right camera view. Figure 8 illustrates the inverse warped texture map from the original left view. Due to ill-defined edges and contours

(b) Ghosting erased

Removing ghost contours by dilation

images are blended together to make a full interpolated texture map. The blending is done by taking a weighted average between the projected left and right texture images. The weight depends on the distance from the interpolated position to the left and right camera positions. After the blending process there still might be some holes left. These are areas that cannot be viewed from the two reference camera views. They are uncovered areas of background, therefore the depth information is used to fill the disocclusions more accurately. For every disoccluded pixel the algorithm searches in eight directions for the nearest non-disoccluded pixel. The disoccluded pixel is filled by the weighted average of those pixels that have the largest depth. Figure 10 shows the result of filling the disocclussions within the texture map.

(a) Texture map containing dis- (b) Final texture map without occlusion disocclusions Fig. 8. Inverse warped texture map from the original left camera view using the interpolated depth map. Green pixels are due to rounding errors and disocclusions.

Fig. 10.

Final texture map at the interpolated postion

in the depth and texture maps, ghosting errors may occur. These mainly occur when the foreground is a lot closer to the camera than the background. The unsharpness of the edges attributes to this artifact. The textures of background

For each iteration of the algorithm one view is interpolated. The current solution is to execute the algorithm twice to produce the textures for a viewpoint, containing left and right eye (interpolated) textures.

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

5

B. Partitioning The FTV algorithm is partitioned into a control-plane and a data-plane part. The parameters for the FTV algorithm, which are stream and/or viewpoint dependent, are calculated within the control-plane partition. These parameters are used for warping the depth maps, inverse warping of the texture maps and for blending the texture maps. The parameters have to be re-calculated upon a stream and/or viewpoint change. The control-plane partition is mapped on the ARM host processor. The algorithm functions as illustrated in Figure 4 are part of the data-plane partition. These functions use the parameters that are calculated in the control-plane partition. The data-plane partition is mapped on a Silicon Hive processor. This processor is implemented in application FPGA1 of the Gladiator board. To meet the real-time constraints, in case of iGLANCE consumer scenario FTV, the interpolation of a viewpoint 1 seconds to demonstrate full HD should be done within 30 at 30 Hz per eye. The FTV algorithm throughput requirement is therefore 62,2 Mpixels/sec.

DDR2 SDRAM 2 x 2GB

DDR2 SDRAM 2 x 2GB

Inter FPGA Interface

FPGA1

HDMI IN

HDMI Receiver

FPGA2

Memory Expansion connector

HDMI Transmitter

HDMI OUT

Fig. 11. Simplified Gladiator hardware schematic. Hardware components that are not used for the iGLANCE project are omitted from the schematic.

B. Architecture overview Figure 12 shows the developed architecture for the FTV rendering platform. The starting point was an existing architecture developed upon the same target platform, used to demonstrate several camera-solutions. The main components of the developed architecture are the host, FTV processor, external memory, buses, HDMI-in and -out interfaces.

V. FTV RENDERING PLATFORM ARCHITECTURE This section describes the development of the architecture for the FTV rendering platform, according to the iGLANCE scenarios and the derived requirement from previous section. Figure 11 shows a simplified hardware schematic of the Gladiator board. The so-called memory expansion connector connects to the Cirrus Logic Host board. In the current architecture, FPGA2 is not used except for passing signals from FPGA1 through the FPGA interconnect to the HDMI transmitter IC. The target, in this paper, is to develop the FTV rendering platform using one FPGA. Due to the lack of an automated mapping, at Silicon Hive, that distributes a single system design over multiple FPGAs, a single FPGA is used. To use both FPGAs, two system designs have to be developed which are connected to each other at board level, which is outside the scope of this paper. The complete architecture is described using the Hive System Description (HSD) language.

A. HSD HSD is a language in which system components, connections, properties of a system and their hierarchy can be declared. In this context, a system is considered as a number of Silicon Hive processors, a host processor, buses, external memories and custom devices. All system components are connected via a bus or are directly connected. The system description is used to generate the system specific part of the Hive Run-Time (HRT) API. This includes bus address mappings and connectivity. In a system, the host processor controls all other processors, using HRT. The system description is also used to generate a system simulator and RTL for FPGA and final silicon. The tool hivesc (Hive System Compiler) is used to compile a system description and to generate HRT support and a system simulator for a system description.

1) Host: The host is a processor that controls the system. The host processor can upload to and execute programs on the FTV processor, load and store data into memories, and send and receive data to and from FIFOs. The host implements the control-plane of the FTV algorithm and the FTV-Application Programming Interface (FTV-API). The API allows iGLANCE project partners to map application level software onto the FTV rendering platform. The next section discusses the FTV-API in more detail. The host is connected to the configuration bus of the system that allows the host to control and support the different components of the system. The AMBA High-performance Bus (AHB) of the ARM processor is connected through a SRAM Memory Controller (SMC) interface to the Gladiator board. On the FPGA, this is converted into the internal Core Input/Output (CIO) bus protocol. Clock Domain Crossings (CDC) are used to cross the different clock regions. The configuration bus and the memory bus are connected through a CIO converter, such that the host can access external memory. A CIO converter interconnects CIO connections with different data widths. The FIFO within the FTV processor is connected to the configuration bus through a FIFO adapter. A FIFO adapter transforms the bus protocol into a FIFO protocol. The FIFO is used by the host to send and receive tokens. 2) FTV processor: The FTV processor should process 62,2 Mpixels/sec to meet the real-time constraints. The startingpoint is a scalar processor. However, the FTV algorithm is based on floating-point arithmetic, which gets emulated by the compiler. The floating-point arithmetic is emulated using a series of simpler fixed-point operations. This can be executed on an integer Arithmetic Logic Unit (ALU), however this significantly increases the number of operations. In addition to the actual operations, the fixed point arithmetic

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

6

ARM TM

FPGA

AHB2SMC

FTV Processor

FIFO

CIO/CDC

FIFO adapter

FIFO

DMA

CIO converter 32->256

CIO configuration bus 32bit

CIO CDC

CIO CDC

HDMI IN

Fig. 12.

CDC

CIO converter 32->256

HDMI-in Interface

CIO CDC

CIO memory bus 256bit

SMC2CIO

CIO CDC

CIO converter 256->64

CIO 2 Xilinx

Xilinx DDR2 controller

DDR2 A

CIO CDC

CIO converter 256->64

CIO 2 Xilinx

Xilinx DDR2 controller

DDR2 B

CIO CDC

HDMI-out Interface

HDMI OUT

FTV rendering platform architecture

implementation requires, for example, shifting after a fixedpoint multiplication to make sure the result of multiplication does not exceed the available number of bits. The introduction of floating-point function units avoids the emulation by the compiler. The out-of-the-box FTV algorithm requires 1404 operations per pixel based on a VLIW processor supporting floating-point arithmetic. When processing 62.2Mpixels/sec, this is equivalent to 87.3 GOperations/second (GOPS). When using a five issue slot processor at a frequency of 50 MHz, the maximum throughput is about 0.25 GOPS. To improve the throughput, a Video Signal Processor (VSP) from the HiveFlex VSP2500 series is proposed. This is an embedded C-programmable multi-processor IP optimized for video signal processing. Analysis of the FTV algorithm learned that little advantage would be taken from the available block processing feature. Main part of the FTV algorithm is line/column or pixel based, which makes an Image Signal Processor (ISP) from the HiveFlex ISP2000 series a good candidate. An ISP2400 processor is proposed, which is a scalable processor based on the VLIW architecture template, combining regular RISC operations with Single Instruction Multiple Data (SIMD) operations. The ISP2400 processor supports operations that operate on vectors; apart from the standard vectors, it also supports operations that work on wide vectors and on slices of vectors, which are particularly useful operations for pixel processing. The start configuration of the FTV processor, which is based on a ISP2400 template, contains seven issue slots VLIW supporting 16-way SIMD operations. The FTV processor can be adapted in a later stage to FTV algorithm specific functions and/or operations.

3) External memory: The received views from the iGLANCE H.264 decoder and the interpolated views are stored in external DDR2 memory. The external memory is connected to a 256 bits wide memory bus, which allows the FTV processor to load and store a vector from/to external memory using a single bus transfer. The FTV processor can access external memory through a Direct Memory Access (DMA). The external DDR2 memory is controlled through a Xilinx DDR2 controller which is indirectly connected to the memory bus. 4) HDMI interfaces: The HDMI interfaces are configured by the host through the configuration bus. The HDMI interfaces configure the external HDMI receiver/transmitter ICs through an I2 C bus. The HDMI-in interface captures the received pixels from the HDMI receiver IC and stores these into external memory. The HDMI-out interface loads the pixels from external memory and transmits them through the HDMI transmitter IC to the 3D panel. To support full HD at 60 fps, the HDMI interfaces run at 150 MHz. The rest of the system operates on a clock frequency of 50 MHz. The development of the HDMI-in interface is described in the Appendix. C. Pixel Format The available throughput of the memory bus is 1.6 Gbytes/sec (256 bits @ 50 MHz). If the internal pixel format is RGB, the bus would be utilized for 70%, without taking loading/storing possible intermediate results of interpolation into account. This is due to receiving and storing full HD

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

video at 60 fps, the FTV processor is reading the received frames from memory and writing back the interpolated views to memory and the HDMI-out interface is reading the interpolated views from memory to output these to the 3D panel. To avoid the memory bus becoming the bottleneck, the YUV420 pixel format is proposed inside the FTV rendering platform. However, this requires the HDMI-in interface to convert the received RGB pixel format to the YUV420 pixel format and vice versa for the HDMI-out interface. Furthermore, the FTV algorithm should be able to process the YUV420 pixel format. With the available IP blocks and FTV algorithm, only the HDMI-in interface needed adaptation to allow the YUV420 pixel format within the architecture, which results in halving the bus utilization. The views are stored according to the planar image format in external memory, for which each color component of a pixel is stored at a separate place (block) in memory. It allows the FTV processor to load and store vectors containing elements of one color component. These vectors can be directly used by the FTV algorithm. A camera view depth map consists of depth information at the corresponding pixel in the texture, which is represented by an 8 bits value. However, to transmit the depth information through the HDMI channel, from the H.264 decoder to the FTV rendering platform, the depth information is not packed. This means that for the available 24 bits of one HDMI transfer, only 8 bits are used to represent the depth information. The HDMI-in interface takes advantage of this knowledge and only stores the 8 bits representing the depth information to external memory while omitting the 16 remaining bits. The number of bus transfers for the depth information within the architecture is therefore decreased by a factor of three. D. FTV-API The FTV-API is the interface offered by the FTV rendering platform allowing the application level software to control it. The FTV-API consists of ANSI C functions. The offered functions to control the FTV rendering platform are start/stop/pause and resume. Furthermore, the FTV-API offers functions to initialize the platform through an init function and to stop the platform through a quit function. The FTV algorithm is controlled through providing parameters which are iGLANCE scenario-, dataset- and viewpoint-dependent. According to the selected iGLANCE scenario, the FTV algorithm has to interpolate two or five views. The FTV-API offers for each iGLANCE scenario a corresponding initialization function. The frame format between the H.264 decoder and the FTV rendering platform should be specified through the init dataset function. This includes specification of the frame, texture- and depth-dimensions, watermarks of odd and even frames and the number of views within one frame. Then the FTV-API defines buffers within external memory according to these dimensions and provides the addresses to the FTV processor and to the HDMI interfaces. Viewpoint dependent parameters of the FTV algorithm should be specified through the set-viewpoint function. This includes parameters for warping the depth maps, inverse warping of the texture maps and for blending the texture maps.

7

The FTV-API is implemented on the ARM host processor. The FTV-API implements the above described functions and communicates parameters through HRT functions to FTV processor’s memory, as illustrated in Figure 13.

FTV init fuctions

Host Parameters packaging

FTV-API HRT

Kernel FTV application

Fig. 13.

FTV application hierarchy

The host application, which is part of the application level software, embeds the FTV init functions and the parameters packaging layers. The FTV init functions are part of the control-plane partition and calculates FTV data-plane required parameters. These parameters are communicated to the FTV processor through the FTV-API. The parameter packaging layer transforms the FTV init functions calculated parameters into predefined structures of parameters that are accepted by the FTV-API. Uploading of the FTV parameters to FTV processor’s memory is done through the configuration bus. The FTV processor has to be idle when uploading these parameters, otherwise incorrect interpolation could be the result. To check whether the FTV processor is idle, a synchronizing FIFO is introduced between host and FTV processor. The communication through this FIFO is based on a producerconsumer method. The FTV-API writes to a predefined memory location within the FTV processor to inform that the host has new parameters available. The FTV-API regularly polls the FIFO for available tokens. The FTV processor checks after interpolation of a complete view whether this memory location is set. If set, the FTV processor produces a token and stalls until a token is produced by the FTV-API. The FTVAPI consumes a token and uploads the parameters to FTV processor’s memory. After uploading the parameters, the FTVAPI produces a token whereafter the FTV processor continues. The communication between the FTV processor and the HDMI-in interface is based on the same producer-consumer method. The HDMI-in interface produces a token when it has stored an input frame. The FTV processor consumes two tokens (left and right camera view) before starting interpolation, and produces two tokens after interpolation. The producerconsumer method avoids the FTV processor and the HDMIin interface are working on the same buffer. The number of buffers can be specified through the FTV-API function init dataset, which allocates these buffers. The communication between the FTV processor and HDMI-out interface is implemented through the configuration bus. The FTV processor updates the buffer address containing the interpolated views to the HDMI-out interface. The HDMI-out interface sends these interpolated views to the 3D panel. The next section explains

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

the main loop of the FTV processor, taking the communication between the host and the HDMI interfaces into account. VI. M APPING This section describes the mapping of the control- and dataplane partition of the FTV algorithm and the main loop of the FTV processor. The mapping of the data-plane partition includes a description of the applied optimizations, including vectorization and its challenges. A. Control-plane mapping The control-plane partition of the FTV algorithm is mainly implemented through the FTV-API. The FTV-API is implemented as a Finite-State-Machine (FSM), which is a behavioral model composed of a finite number of states, transitions and actions. The FSM is introduced to prevent undefined and/or incorrect FTV processing. Figure 14 shows the structure of the FTV-API. This structure helps the application level

Init

scenario

InitBypass

InitLeftRight

Init5to9Views

InitDataSet

SetViewpoint

Start

Pause

InitDataSet

SetViewpoint Resume

Stop

Quit

Fig. 14.

Structure of the FTV-API

8

software programmer to use the FTV-API. A function call to a state that is not reachable and/or enabled from the current state returns an error code to inform the applicationlevel software. The SetViewPoint function call is only allowed if the iGLANCE consumer scenario FTV or the iGLANCE healthcare scenario is selected, in which views are interpolated. The FTV-API pause and resume functions are empty on the FTV rendering platform. This is based on the assumption that upon a pause of the iGLANCE demonstrator system, the iGLANCE decoder platform sends no new decoded views to the FTV rendering platform. The FTV processor is thus automatically paused due to the HDMI-in interface which does not produce tokens when no input frames are received. The HDMI-out interface sends the last interpolated views repeatedly to the 3D panel. The light blue states in Figure 14 affect the state of the FTV processor, which is described in the next section. B. FTV main loop The FTV main loop is implemented on the FTV processor and controls the FTV algorithm and its buffers. The main loop is explained according to the iGLANCE consumer scenario FTV pseudo code for, which left and right views are interpolated. For the other iGLANCE scenarios, similar main loops are developed. Wait for Start Produce token to HDMI-in Produce token to HDMI-in while {host ctrl! = ST OP } do Consume token from HDMI-in Consume token from HDMI-in Interpolate(viewpoint - left eye) Interpolate(viewpoint - right eye) Update buffer address HDMI-out Produce token to HDMI-in Produce token to HDMI-in if {host ctrl == set} then Produce token to Host Consume token from Host end if end while

The variable host ctrl is declared in FTV processor’s local memory. The FTV processor produces two tokens to the HDMI-in interface such that two frames are received and stored in external memory. The FTV processor consumes two tokens such that left and right camera views are available. The left and right eye views are interpolated according to the selected viewpoint. After interpolation, the HDMI-out interface is updated to output the interpolated views to the 3D panel. Again, two tokens are produced to the HDMI-in interface. Before starting the interpolation of the next frame, the FTV-API can upload new parameters. The host ctrl variable is checked, and if set by the FTV-API, the FTV processor produces a token and stalls until a token can be consumed. The FTV-API is able to upload FTV parameters while the FTV processor is stalling. To meet real-time requirements, double or triple-buffering of the input can be applied, such that HDMI-in interface and FTV processor are not busy-waiting. The number of buffers for the in- and output is a parameter for the FTV-API function init dataset.

MASTER THESIS EMBEDDED SYSTEMS, NOVEMBER 2010

9

TABLE I FTV ALGORITHM PERFORMANCE PER FUNCTION ( OPERATION AND CYCLE COUNT ) Out-of-the-box Operation count Total Per pixel

Vectorization Operation count Total Per pixel

Vectorization Cycle count Total Per pixel

Warp depth map - left

64,471,235

82

33,438,478

43

14,906,258

Warp depth map - right

62,928,931

80

33,438,478

43

14,906,258

19

Combine depth map

13,216,965

17

245,760