Light Field Compressed Sensing Over A Disparity-Aware Dictionary

1 Light Field Compressed Sensing Over A Disparity-Aware Dictionary Jie Chen, Lap-Pui Chau Abstract—Light field (LF) acquisition faces the challenge ...

Author: Kerry Clark

9 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Compressed Sensing MRI

Compressed Chatting Over Internet

A Compressed Self-Index using a Ziv-Lempel Dictionary

Light-Field Microscopy with a Consumer Light-Field Camera

Compressed Sensing for Magnetic Resonance Image Reconstruction

A Switchable Light Field Camera Architecture with Angle Sensitive Pixels and Dictionary-based Sparse Coding

Energy Efficient Image Transmission over Bandwidth Scarce WSN using Compressed Sensing

Accelerating Regular Expression Matching Over Compressed HTTP

A Light Transport Framework for Lenslet Light Field Cameras

Lytro Light Field Camera

Light Field Layer Matting

Light Field Superresolution

T Light Field Stereoscope

Light Field Video Stabilization

A Real-Time Distributed Light Field Camera

Compressed String Dictionary Look-up with Edit Distance One

Scam Light Field Rendering

Ultra Wideband Channel Estimation based on Kalman Filter Compressed Sensing

Accelerated Diffusion Spectrum Imaging with Compressed Sensing Using Adaptive Dictionaries

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks

Matrix-free Interior Point Method for Compressed Sensing Problems

Research Article Distributed Compressed Video Sensing in Camera Sensor Networks

SHAPING THE FUTURE OF LIGHT SENSING SOLUTIONS

Dual Brite Motion Sensing Coach Light

1

Light Field Compressed Sensing Over A Disparity-Aware Dictionary Jie Chen, Lap-Pui Chau

Abstract—Light field (LF) acquisition faces the challenge of extremely bulky data. Available hardware solutions usually compromise the sensor resource between spatial and angular resolutions. In this work a compressed sensing framework is proposed for the sampling and reconstruction of a high resolution light field based on a coded aperture camera. First, a light field dictionary based on perspective-shifting is proposed for sparse representation of the highly correlated light field. Then, two separate methods, i.e., Sub-Aperture Scan (SAS) and Normalized Fluctuation (NF) are proposed to acquire/calculate the scene disparity, which will be used during the light field reconstruction with the proposed disparity-aware dictionary. At last, a hardware implementation of the proposed light field acquisition/reconstruction scheme is carried out. Both quantitative and qualitative evaluation shows the proposed methods produce state-of-the-art performance in both reconstruction quality and computation efficiency. Index Terms—perspective shifting; compressed sensing; sparse representation; light field

I. I NTRODUCTION The light field (LF) is a function that describes the intensities of light rays in all possible propagation directions. Conventional cameras use a converging lens to record a 2D projection of the high dimensional LF data. This inevitably loses a lot of information, especially each light ray’s directional information. The light field cameras are designed to resolve the integration process and capture the extra directional information, which is useful in many applications such as refocusing, depth inference [1] [2], and 3D reconstruction [3] [4] etc. Available light field acquisition hardware solutions usually share the sensor resolution between dimension of spatial and angular information, which greatly reduces the resolution for both. The theory of Compressed Sensing [5], [6] dictates that accurate reconstruction is possible for compressible signals in certain transform basis, with fewer measurements than what the Nyquist-Shannon sampling theorem requires. And this theory makes capturing a LF with resolution much higher than the camera sensor resolution possible, since the LF data is intrinsically sparse in both spatial and angular dimensions. A. Our Contribution In this work a compressed sensing framework is proposed for the sampling and reconstruction of a high resolution light field based on a coded aperture camera. Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. The authors are with the School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, 639798. (e-mail: {JChen5, ELPChau}@ntu.edu.sg).

We propose a novel light field dictionary based on perspective-shifting of a normal image dictionary. A subaperture vignetting matrix is incorporated to model the light field integration over imaging lenses. The designed shifting process makes the dictionary atoms disparity-aware, which facilitates the light field reconstruction in both reconstruction quality and computational simplicity. Two different light field sensing methods that fully take the advantage of such LF dictionary are proposed. A Sub-Aperture Scan (SAS) process is introduced to quickly capture the perspective disparity flow of the scene during camera’s focusdetection process before final image shot is taken. Another method, i.e., Normalized Fluctuation (NF) is subsequently proposed to calculate the disparity from aperture modulated images. Both SAS and NF have their respective advantages in terms of hardware implementation or reconstruction quality, which will be explained in detail in this article. The disparity information for each image patch will be used to extract the corresponding dictionary segment during light field reconstruction. This greatly boosts the computation efficiency, and experiments also show that it increases reconstruction quality as well, especially for cases with low measurement numbers. Finally, a hardware implementation of the proposed light field acquisition/reconstruction scheme is carried out. A Liquid Crystal on Silicon (LCOS) device is used as aperture modulator. An additional optical scheme design is also proposed to integrate the whole scheme into one opto-mechanical module that can best serve our design ideology. Both quantitative and qualitative evaluation shows the proposed methods produces state-of-the-art performance in both reconstruction quality and computation efficiency.

B. Limitations The proposed LF dictionary is based on perspective shifting of the center view, therefore it has limitations in describing irregular patterns with occlusions and specularities. However, since the LF is reconstructed from over-lapping patches, over-lapped areas are actually average of different patches and different disparities, some irregular patterns are able to be reconstructed. However, this remains a limitation of the proposed method. For both SAS and NF method herein proposed, has their limited performance with dynamic scenes, since the disparity information is sensitive to scene content change. This limitation can be alleviated by shortening the exposure time for each scan, however this deteriorates the output image quality.

2

II. R ELATED W ORK A. Light Field Acquisition The theoretical principle of light field imaging have been introduced over a century ago [7], [8], however it is only with the recent technological development, it has become practical and popular. One technical challenge to record the light field is the extremely large amount of information the light field contains, the other is in the hardware design that can resolve the light ray integration. A micro-lens array is useful in diverging the light rays in a micro scale so that the converging light ray’s original directional information could be recorded by the camera sensors closely behind it [9]. The famous Lytro Camera falls into this category [10]. Except from a micro-lens array, a spatial modulation mask can also be inserted closely in front of the sensor for an angular modulated sampling of the light field [11]. The images generated with this method are generally with high quality. However since the sensor resource has to shared between spatial and angular dimensions, it greatly compromises the final output image resolution. Except from the camera sensor plane, spatial modulation can also be performed near the system aperture. Such designs and their performance analysis can be found in [12]–[15]. In [16], the author uses a Liquid Crystal on Silicon as spatial modulator, which increased the efficiency of aperture coding in terms of light transmission rate and resolution. In [17] Manakov et al. introduced a kaleidocamera for light field capturing. The camera uses two parallel mirrors to form a kaleidoscope cavity, which physically copies a 3 × 3 subaperture image collage. The system is designed as an add-on to existing DSLR cameras and lenses. Since it is based on spatial copying, it also suffers from the resolution limit since camera sensor has to be shared between the 9 views. Other light field acquisition hardware include: the Pelican [18], where a monolithic multi-camera array system is used; and the HTC One M8’s Duo Camera, which uses a combination of software and an additional lens that collects rudimentary depth information for light field reconstruction. These technologies generally produces much lower imaging quality, however their low-cost makes them competitive in mobile applications. B. Light Field Reconstruction The post-processing and reconstruction of the light field data is necessary for almost every light field camera, and therefore it is a critical procedure which decides the quality of the final output of the system. Even for cameras that share sensor resource between the dimensions [10], the raw output tend to be dim and distorted which requires intensity and contrast adjustment as well as interpolation [19] to rectify the final output. For the kaleidocamera [17], which has an initial 3 × 3 copies of the sub-aperture views, needs to be interpolated to higher resolution based on optical flow between adjacent views, before they can be used for refocusing or other light field applications.

The hope of capturing the light field with both high spatial and angular resolution gives rise to a lot of new researches, notably on applying compressed sensing techniques to light field imaging. [15]. Ashok et. al. first attempted to exploit the spatio-angular correlations inherent in the light field [13]. Babacan et. al. proposed to reconstruct the light field using a coded aperture camera based on a Bayesian reconstruction algorithm. Kamal et al. [20] used compressed acquisitions from a camera array modulated by random convolution to reconstruct the LF. A set of directional wavelet transforms are used for the reconstruction to exploit the sparsity in the angular domain. Other LF compressed sensing applications are also found in hyperspectral imaging [21] and compressive focal stack imaging [22]. Attention is also paid to sparsity in the transformed domain, for example, Shi et al. proposed an algorithm that is optimized for sparsity in the continuous Fourier spectrum to reconstruct the light field [23]. Marwah et. al proposed a light field sparse representation framework in [11] where they reconstruct the light field based on sparse coding over a trained LF dictionary, using measurements from a coded-sensor plane camera. The dictionary learning method and its coding algorithm is extremely computation intense (takes more than 10 hours on a 24-core processor to learn the dictionary, and takes 18 hours for the reconstruction of light field with spatial resolution 480 × 270 and angular resolution 5 × 5 on a 8-core workstation). The compressed sensing and reconstruction scheme proposed in this work can produce better LF reconstruction quality with much lower computation cost. III. L IGHT F IELD D ICTIONARY VIA P ERSPECTIVE S HIFTING A. The Shifting of Central View Atom Synthetic dictionaries are used in various image processing applications [24]. In this section, we propose a semi-synthetic dictionary training method, in which the central view atoms are trained from natural image patches, while other LF views are perspective shifted versions of the central view. Fig. 1(a) shows the typical layout of 4D light field [25]. Each small square represents a separate stereo viewing point, from which one image is hypothetically taken for each LF view. In the figure, the red and blue arrows show the displacements of each view with respect to the central view in the horizontal and vertical directions respectively, and the displacement values are recorded in matrices MH and MV . For a 5 × 5 light field, MH and MV take the following form:   −2 −2 −2 −2 −2  −1 −1 −1 −1 −1    0 0 0 0  MH =   0 ,  1 1 1 1 1  2 2 2 2 2   −2 −1 0 1 2  −2 −1 0 1 2     MV =  (1)  −2 −1 0 1 2  .  −2 −1 0 1 2  −2 −1 0 1 2

3

ܰ௩

(a)

(c)

݇ଵ௖ ݇ଶ௖ ݇ଷ௖ ݇ସ௖

݀‫݌‬ଵ ൌ -3 ݀‫݌‬ଶ ൌ -2 ݀‫݌‬ଷ ൌ -1 ݀‫݌‬ସ ൌ 0 ݀‫݌‬ହ ൌ 1 ݀‫ ଺݌‬ൌ 2 ݀‫ ଻݌‬ൌ 3

(b)

Fig. 1: (a) a 5×5 LF view configuration: red and blue arrows represent relative perspective shifting distances in the horizontal and vertical directions for each view. (b) shows four of the LF atoms in DLF . Each row is a different atom pattern; each column has a different adjacent view disparity. (c) shows three sample atoms that moves patterns from outside of the central patch, which are suitable for describing irregular patterns under occlusions and specularities.

Due to the linear spatial shifting relations among each view, the disparity of the same scene point at different views can be expressed as (with respect to the central view): Sd (dp, v) = dp × (MH (v) + MV (v)), v = 1, 2, ..., V, (2) where dp is the unit disparity value between adjacent views. dp is a parameter related to the light field camera’s aperture dimension, and the scene point’s distance to the camera. Once the camera’s settings are fixed, dp is only inversely related to the scene depth. For different views, v is the LF view index, and V is the total number of views, which is 5 × 5 for the case in Fig. 1 and 2. If we assume that pixels in a small patch area around that scene point have the same disparity (depth), then we can predict the patch appearance in all other views by simply shifting the central view. According to Eqn. (2), the shifting distance is Sd (dp, v) for the view v with unit disparity dp. Based on this idea, a LF dictionary could be created by shifting the atoms in a normal image dictionary for different LF views (v) at different disparity values (dp) [25]. It is expected that the resultant dictionary atoms are able to sparsely represent the LF data. The dictionary perspective-shifting process will be generalized in detail as follows: • First, apply the KSVD [26] method, a global dictionary c c Dc ∈ RN ×K is trained from 10 benchmark natural

images √ which √ are decomposed into overlapping patches of N c × N c pixels. K c is the atom number. • Second, for a certain unit disparity value dp, and a certain √ √ Nv × Nv atom in Dc , an array of smaller patches √ ( √ pixels) are extracted from the atom ( N c × N c pixels, c and N v ≤ N4 ), with extraction center locations decided by Sd defined in Eqn. (2). As illustrated in Fig. 2(a) the gray squares are the 1st, 5th, and 25th views to be extracted. The extracted patches are then vectorized and concatenated as a vector to be the new atom for the v LF dictionary dLF ∈ R(N ·V )×1 . The vectorization and concatenation process is illustrated in Fig. 2(c). • Third, the previous step is repeated for a vector of different disparity values dp which span the LF dictionary’s disparity dimension: dp ∈ {dp(1), dp(2)...dp(S)}. S is the total number of different unit disparity values. Fig. 2(b) shows the extraction process for a different dp value (dp00 ). c • Last, repeat the second and third step for all K c atoms in D to create the final LF dictionary: DLF ∈ v c R(N ·V )×(S×K ) . Still in Fig. 2(a) and (b), notice as dp increases, the array of black dots expands accordingly. They stand for the extraction centers for different views. The view shifting/extraction process applies bilinear interpolation when the patch centers fall on sub-pixel positions. And dck (shown as solid rectangle) will be padded with symmetrically repeated patterns along its borders in case the shifted patch reach out of its area (the padded area is in dotted rectangles). Some sample dictionary atoms are shown in Fig. 1(b). The shifting process will move some patterns outside of the central patch into the center at certain disparities. This gives the dictionary flexibility to describe irregular patterns such as occlusions and specularities. Some of these example atoms are shown in Fig. 1(c). The semi-synthetic LF dictionary herein proposed enables the LF dictionary to be disparity-aware, which becomes a major advantage during the LF reconstruction coding. B. The Vignetting Pattern Matrix For most cameras, the light volume that reaches the image sensor tends to decrease from the image center to the corners, which is the phenomenon known as vignetting [27]. Classical radiometry shows that the irradiance from the aperture of a lens onto a point on the film is equal to the following weighted integral of the radiance coming through the lens: ZZ 1 Lf (x, y, s, t) cos4 θ ds dt, (3) I(x, y) = 2 F where F is the separation between the exit pupil of the lens and the camera sensor, Lf is a continuous light field and I is camera sensor image. Four parameters are used to indicate a light ray: (s, t) are coordinates on the lens aperture plane, and (x, y) are coordinates on the camera sensor plane. θ is the angle between ray (x, y, s, t) and the sensor plane normal. cos4 θ is the falloff factor that represents the reduced effect of rays striking the sensor from oblique directions [10].

4

ܰ௖ ݀௣ᇱ

ܰ௩

ܵ

݀‫݌‬ଵ ݀‫݌‬ଶ ήήή

݀‫݌‬ௌ

ܸீ (1)

ܰ௩

(a)

ܰ௩ ൈ ܸ

݀௣ᇱᇱ

(b)

ܸீ (2)

ܸீ (3)

ܸீ (4)

Fig. 3: Juxtaposed sub-matrices VG (v) (v = 1, 2, 3, 4), with the vignetting pattern for each sub aperture on their respective diagonals.

ܵ ൈ ‫ܭ‬௖

(c)

Fig. 2: Formation of the proposed LF dictionary by perspective √ (a) shows the extraction process of LF atoms √ shifting. v ( Nv × N √ in gray √ color) from a globally trained image dictionary ( N c × N c ) with adjacent disparity value dp0 . The area within dotted lines are symmetric replication of the pattern near the border in solid lines. (b) shows the situation when adjacent disparity is at another increased value dp00 . (c) shows the structure of the LF dictionary DLF . Each column is concatenated with vectorized view patches. Different brightness for each LF atom represents different disparities.

In this paper we aim at the reconstruction of a discrete light field L with limited number of measurements from the camera sensor. We change the continuous light field Lf (x, y, s, t) in Eqn. (3) to a discrete one LF (x, y, v), where v = 1, 2, ...V is the LF view index. The resulting light field integration equation becomes: 1 X I(x, y) = 2 L(x, y, v)MV (v) cos4 θ. (4) F v MV (v) is the modulation mask value for the view v. For conventional camera without a modulation mask, MV (v) = 1 for all views v = 1, ..., V . Eqn. (4) can be written in matrix format: I = ΦL. (5) Where L is vectorized form of LF , different LF views are v v directly concatenated. Φ ∈ RN ×(N ·V ) is the projection matrix that sums all the LF views together. Φ can be expressed as the juxtaposition of V sub-matrices Φ(v), v = 1, 2, ...V , and it has two functions: Φ(v) = VG (v) · MV (v).

(6)

1) incorporate the sub-aperture vignetting effect: Fig. 3 shows the configuration of the vignetting matrix VG . Each sub matrix VG (v), (v = 1, 2, 3, 4) stands for its corresponding sub aperture’s vignetting pattern. 2) modulate the sub-aperture signal: an aperture modulation mask is installed on the lens aperture plane in this work, a modulation coefficient MV (v) is multiplied on light rays of that sub aperture view. The vignetting pattern for each sub-aperture view is related to the camera parameters and the aperture shape. Numerous

Fig. 4: Sub-aperture images with vignetting. First row are the aperture shapes, second row are the sub-aperture images with vignetting, third row are the calibrated vignetting pattern for each sub-aperture view. The brighter pixels indicates higher transmission rate.

efforts have been made in modeling the vignetting pattern in a camera, most of which adopt a polynomial parametric model such as in [28] and [14]. In this work however, we choose to use experiment measurement to calibrate the vignetting pattern as a non-parametric model. By directing a white LED light source into the camera sensor, each sub-aperture view’s vignetting pattern can be easily recorded. After normalization, the vignetting pattern can be directly used as diagonals in VG (v). Shown in Fig. 4 are the calibration results for a simple 4 × 4 light field. C. Sensing Matrix Optimization The choosing of a good measurement matrix has been studied in many recent works. Based on a random matrix, which is generally considered to be good at providing a powerful frame in compressed sensing, the authors in [29] proposed to optimize the sensing matrix for a given dictionary to make each atom as orthogonal to each other as possible. In our proposed system, the sensing matrix has a specific structure (as shown in Fig. 3), which constrains the matrix elements to be related to each other, therefore the optimization method in [29] is not suitable in our system. According to our experiments, mask patterns generated from uniform random process with a large neighborhood correlation produces outstanding reconstruction performance. The mask’s total neighborhood correlation is defined as: X MV f [MV , (i, j)] Rn = (7) 2 2 kMV k2 kf [MV , (i, j)]k2 i,j∈Ωi,j where Ωi,j is the 3×3 neighborhood around the mask element → − (i, j). f [A, b ] is an operator that translates a matrix A to the

5

→ − direction specified by b . For a single measurement, the mask pattern is selected from NR (NR should be large enough, NR =10000 in our experiment) random generated normal distribution, and then among these NR masks, the one with the maximum Rn is chosen as one candidate mask pattern. When multiple measurements are taken, the masks used for each one should be as orthogonal to each other as possible. We choose among the pool of NP (NP =1000 in our experiment) mask candidates, and the next mask pattern will be chosen among these candidates according to the criterion: l0 = arg min l

KI X

Mk Ml

V 2 V 2 , l = 1, ..., NP ,

k l k=1 MV 2 MV 2

k IM = Φk L.

where = 1, ..., KI are the masks already chosen, and l0 is the new mask to be added from the mask pool with N N candidates. IV. L IGHT F IELD ACQUISITION AND R ECONSTRUCTION A. Perspective Flow calculation In order to take advantage of the proposed disparity-aware LF dictionary, a rudimentary disparity/depth information of the scene need to be calculated. In this work, we propose two methods to achieve this goal. 1) Disparity from Sub-Aperture Scan (SAS): By opening only a sub-section of the lens aperture to allow light pass through, a sub-aperture image IA can be captured; and by changing different sub-aperture locations, a sequence of imv ages IA (v = 1, 2, ...) that observes different stereo angles could be captured by the camera. Since only a small part of the aperture is opened for each image, a vignetting rectification procedure needs to be carried out before they can be used for further processing. With the assumption of intensity conservation among different LF sub-view images, applying the chain rule for differentiation, we have: v v v ∂ I˜A ∂x ∂ I˜A ∂y ∂ I˜A + + = 0, (9) ∂x ∂v ∂y ∂v ∂v v where I˜A (x, y) is the vignetting rectified pixel intensity at A (x, y) of the sub-aperture image with view index v. ∂X ∂x and ∂XA ∂y are the image’s horizontal and vertical gradient respec∂ I˜v

tively; and ∂vA is the gradient between the scan sequence. Using Horn’s method [30], the horizontal and vertical optical ∂y flow PX = ∂x ∂v and PY = ∂v could be estimated. PX and PY are then combined estimate the unit disparity dp: (10)

The max operator is adopted for the concern of texture-less regions. In these regions at certain directions the estimation might fail but the max operator might help to use other directions to remedy. The perspective flow P is then discretized and mapped into a new S-bin image: Θ(P ), where Θ(·) denotes the mapping operation. Based on our assumption that pixels in a small patch area share the same disparity, we set the patch disparity to be equal

(11)

First, measurements (at least 2) need to be normalized by their mask’s mean transmission rate:

(8)

MVk , k

P = sign(PX ) max(|PX |, |PY |).

to the disparity of the pixel with largest intensity gradient inside that patch. The reason being that optical flow estimation is more precise for areas with strong intensity variation. 2) Disparity from Normalized Fluctuation (NF): Instead of using binary sub-aperture masks, optimized random mask patterns can also be used to Calculate the disparity map. Suppose k image IM is the image taken with aperture modulation mask k MV , according to Eqn. (5):

k IM k , I˜M = PV ( v=1 MVk (v))/V

(12)

and then still using the assumption of intensity conservation: k k ∂ I˜M ∂x ∂ I˜M ∂y ∂ I˜k + + M = 0, ∂x ∂k ∂y ∂k ∂k

(13)

∂x Horn’s method [30] is again used to estimate the term ∂k ∂y k1 k2 and between two normalized measurements: I˜M and I˜M . ∂k ∂x ∂y However and needs to be normalized again before they ∂k ∂k can be used as disparity map. This time, however, they are normalized by the modulation mask’s horizontal and vertical summation deviation: ∂x X /( MH (v)(MVk1 (v) − MVk2 (v))), (14) PX = ∂k v PY =

∂y X /( MV (v)(MVk1 (v) − MVk2 (v))). ∂k v

(15)

We compare the two methods hereby introduced to calculate the perspective flow: Sub-Aperture Scan can obviously produce better flow quality, however it also rely heavily on the correctness of vignetting correction pattern; the normalized fluctuation approach produces slightly worse flow estimation, however the difference is still acceptable. The normalized fluctuation approach uses optimized mask patterns throughout the whole process, therefore produces higher reconstruction quality outputs. This will be validated in Section VI. B. Light Field Reconstruction To recover the light field data L from its measurements X, Eqn. (5) poses an inverse problem in which the number of measurements (N v ·K) is significantly smaller than the number of unknowns (N v · V ). Since the light field data has a strong spatio-angular correlation, it can be sparsely represented by a few signal examples [11]. We choose to implement sparse coding technique for the inverse problem. With the proposed perspective-shifted LF dictionary DLF , the LF acquisition process in Eqn. (5) can be rewritten as: X = ΦL = ΦDLF α.

(16)

Since DLF comprises of perspective-shifted atoms at different disparities, the dictionary can be divided into multiple

6

1 2 S segments: DLF = [DLF , DLF , ..., DLF ]. The atoms that have been shifted at the same disparity are grouped as one segment. The coding coefficient α can also be divided into multiple segments, each for one dictionary segment: α = [α1 , α2 , ..., αS ]. Based on our assumption that pixels in a small image patch have the same disparity, the coding coefficient vector α should have non-zero elements over only one dictionary segment. To be specific, for coefficient segments αi , i = 1, 2, ...S whose indices i are different from the calculated perspective flow map: i 6= Θ(P ), αi = 0; and for i = Θ(P ), the following optimization problem is to be solved:

0 minimize αΘ(P ) 0

2

(17)

Θ(P ) subject to X − ΦDLF αΘ(P ) ≤ ε 2

To solve Eqn. (17), greedy Orthogonal Matching Pursuit (OMP) algorithm could be applied in a similar manner when it is used in image denoising [31]. When OMP is directly applied on DLF , which has a extremely large dimension ((N v · V ) × (S · K c )), the coding process will be extremely time consuming. However due to the perspective flow detection process, the actual LF dictionary used for coding is reduced to a much lower dimension ((N v · V ) × K c ). This results in a much more efficient LF reconstruction algorithm both in terms of computation complexity, and reconstruction quality which we show in Section VI.

at that specific aperture location can be dynamically changed. When the reflected light beam passes through the PBS again, different polarization angles translate to different transmission rate. Finally the second relay lens will converge the light and focus onto the camera sensor (or DSLR viewfinder). By changing the pattern displayed on LCoS, the system aperture can be changed. By showing only a small fraction of the whole aperture, parallax will be maximized. For SAS, the whole scanning process is fast. In our experimental set-up, 20ms is used for one image. Theoretically, two scans will be enough for the disparity calculation, which will take only 40ms. For a better disparity estimation, more pictures can be taken which will of course take longer time. According to our experiment, five quick scans will suffice to produce a good enough perspective flow estimation, which takes around 0.1s. This sub-aperture scanning process is somewhat similar to most DSLR camera’s auto focus function (which in essence is also a scanning process), and when the scanning is finished, subsequent image(s) will be taken using optimized aperture mask, and all the measurement(s) will later be used together with the disparity flow estimation for the LF reconstruction. The hardware set up of the LF camera is shown in Fig. 6, which will later be used for experimental evaluation of the proposed LF acquisition and reconstruction scheme.

V. A H ARDWARE I MPLEMENTATION A hardware implementation of LF acquisition system based on coded aperture will be introduced in this section. A complete system schematic drawing is shown in Fig. 5. The system comprises the following components: 1) 2× relay lens (aspherized achromatic Lens with 50mm EFL); 2) 2× right angle prisms; 3) Polarizing beam-splitter (designed wavelength 420680nm); 4) LCoS spatial modulator (1280×1024 pixels);

Imaging Lens

Right Angle Prisms

Camera Sensor

Fig. 6: The hardware implementation for the proposed system.

Polarizing Beam Splitter

VI. E VALUATION AND A NALYSIS Relay Lens 1

LCoS Spatial Modulator

Relay Lens 2

Fig. 5: Proposed camera system for light field acquisition. Light from the scene is first collected and converged by the front imaging lens, and then the light passes through a relay lens and becomes collimated. The polarizing beamsplitter (PBS) will bend s-polarized light towards the LCoS (Liquid Crystal on Silicon) module, and let p-polarized light pass through. The LCoS is a reflective device that can change the polarization angle of incident light beams. By setting the corresponding pixel values on LCoS, the polarization angle

In order to evaluate the efficiency of the proposed LF acquisition and reconstruction scheme, experiments are designed and carried out both on simulated LF data, and on images acquired via the designed hardware. Following the steps in Section III-A, we train a perspectiveshifted LF dictionary DLF with the following parameters: •

c 256×400 train the global , (image √ image √ dictionary D ∈ R c c patch size N × N = 16 × 16, and dictionary atom number K c = 400) based on multiple natural images. The reason of setting 16 × 16 as the patch size is for the consideration of the shifting process later: a 16 × 16 image patch leaves enough margin for a smaller patch of

7

34

31.3

33

33 31

33.0

32.8Ͳ

32

32.8

32.6Ͳ

31

32.6

32.4Ͳ

30

1

2

3

4

6

PSNR 3

32.4

32.2Ͳ

32.2

32.0Ͳ

32.0 5

32.5 30.7

7

Fig. 7: Evaluation of aperture scan number and measurement number’s effect on final reconstruction quality for SAS. The data is collected on there LF images Buddha, Dragons, and Maria with 5×5 views.

4

5

6

(a)

7

30.4 8

32 3

4

5

(b)

6

7

8

3

4

5

6

7

8

(c)

Fig. 8: Evaluation of the modulation mask quality with respect to the mask elements’ total neighborhood correlation defined in Eqn. (7) for the LF images Buddha, Dragons, and Maria respectively in (a), (b), and (c).

reconstruction quality (vertical axis) generally increases with the neighborhood correlation value (horizontal). Therefore, this experiment validates the optimization method we used for the choosing of an efficient coding mask pattern. C. Quantitative Evaluation of the Light Field Reconstruction

• •

8 × 8 to shift across. The atom number 400 is chosen to guarantee the dictionary redundancy. √ √ shift and extract patches of size N v × N v = 8 × 8 from each atom in Dc with disparity value dp; go through all adjacent view disparity value dp in the 13 × 1 vector: [−3.0, − 2.5, − 2.0, − 1.5, − 1.0, − 0.5, 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0] to create a LF dictionary DLF ∈ R(64·25)×(13·400) = R1600×5200 .

A. Evaluation of the Sub-aperture Scan Process We evaluate how different numbers of sub-aperture scan can affect the final LF reconstruction quality. The LF data Buddha [32] is used for the evaluation, and the results are shown in Fig. 7. As expected, the reconstruction quality grows with the increase of aperture scans. More scans lead to better estimation of the disparity, and therefore a more precise choice of the LF dictionary segment for each patch; Also obvious from the Fig. 7, the reconstruction quality grows with the increase of measurement numbers. It is difficult to tell which factor has more impact on the final reconstruction quality, however it is possible to conclude: A scan number of 5 ensures a high reconstruction quality (which allows a 20 millisecond exposure time for each sub-aperture image); and that the reconstruction quality grows slow after measurement number 4. B. Measurement Mask Pattern Evaluation According to Section III-C, a measurement mask is chosen among a pool of uniform randomly distributed values with a large neighborhood correlation. In this subsection we validate this assumption. Experiments are carried out on LF data Buddha, Dragons, and Maria. 70 uniform distributed random mask patterns are generated for each LF data, these masks are then used to evaluation the LF acquisition and reconstruction process. SAS method is used with 5 sub-aperture scans, and only 1 measurement is used for the reconstruction. The reconstruction performance PSNR are shown in Fig. 8. As can be seen, the

In this sub-section we carry out quantitative evaluation of the light field reconstruction algorithms herein proposed and compare them with several other state-of-the-art methods. The reconstruction methods we choose to compare are listed bellow. All the coding masks used are optimized using the method introduced in Section III-C, and the reconstruction are all based on the proposed perspective-shifted LF dictionary DLF ∈ R1600×5200 . 1) Coded Aperture (CA) [13]: during LF acquisition, an optimized aperture mask is installed at the lens aperture plan, which codes each sub-aperture view with a constant rate. During reconstruction, proposed LF dictionary will be used without disparity information. 2) Coded Sensor-Plane (CS) [11]: during acquisition, an optimized mask is installed at the plane closely in front of the sensor, which codes each directional light ray with a different constant for each pixel. During reconstruction, proposed LF dictionary will be used without disparity information. 3) Global LF Dictionary (GD): following the method in [11], a global LF dictionary will be used which is trained directly from LF patches (dictionary available from the author’s website, with patch size 9×9, and total atom number 2500). LF will be reconstructed using coded sensor-plane method without disparity information. 4) Proposed Sub-Aperture Scan method (SAS): subaperture scan is performed, during which 5 sub-aperture images are taken for the calculation of disparity information (introduced in Section IV-A1). The disparity will be used to extract a dictionary segment for each patch. 5) Proposed Normalized Fluctuation method (NF): during acquisition, an optimized mask is installed at the lens aperture plane. The NF method introduced in IV-A2 will be used for the disparity calculation, which will be used to choose the right dictionary disparity segment during LF reconstruction. Experiments have been carried out on a 4-core Intel i5 workstation with 8 GB RAM. A complete table of reconstruction

•

•

•

•

We further look into the global LF dictionary GD [11], which is trained directly from LF patches (with patch size 9×9, and total atom number 2500). Disparity values for each LF atom is calculated using method introduced in [3], and the distribution for all 2500 atoms are drawn in Fig. 9. It can be seen: 1) the distribution is concentrated near zero, there are no atoms with disparity values larger than 1.5 or -1; 2) there are three obvious peaks. Analyzing on the angular structures and composition of the LF scenes, we believe the three peaks belong to the background (negative disparities), foreground (positive disparities), and texture-less regions (near-zero disparities), respectively. The global training in [11] seeks to minimize the representation errors for the majority of LF patches, which have smaller disparities. However for the minority with larger disparity values, whose information is generally considered more critical for the LF restoration, their role is largely neglected. This explains the superiority of the proposed LF dictionary, as it places equal importance for the entire disparity range. To generalize, the proposed dictionary structure, and its reconstruction scheme have a great advantage over [11] in the following aspects: 1) Much less dictionary training complexity: in [11], the dictionary training process takes more than 10 hours on a 24-core processor; the proposed method is only the complexity of a normal 2D image dictionary; 2) Much faster reconstruction: As only a fraction of the dictionary will be used for the reconstruction, the calculation complexity will be dramatically reduced in the order of several minutes on a single core processor as compared to [11], where a spatial 480×270 and angular

400

foreground texture-less

300 200 background 100 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Disparity Fig. 9: Disparity value distribution of the GD in [11]. Dictionary downloaded from the author’s website. 35

10000

34 33

8000

32 31

6000

30 29

4000

28 27

2000

26 25

Processing Time (in second)

•

Both of the SAS and NF method herein proposed produce much higher reconstruction quality especially when measurement number is less than 4; The SAS method produces outstanding reconstruction quality using only one measurement (the aperture scan process can be considered as a preliminary setting-up procedure similar to that of the auto-focus detection on a conventional DSLR that leads to a final shot). The SAS produces much better reconstruction than its competing methods (4dB+) when only one measurement is taken. The NF method, although only working with at least two images, produces the highest reconstruction quality at all possible measurement ranges with good margin. Both the SAS and NF are much quicker than that of CA and CS (more than 1/5 of CA and 1/20 of CS), and demands much less working memory for computation. Comparing GD and SAS, in which both coded sensorplane methods are used, the proposed disparity-aware LF dictionary performs much better than the GD.

Reconstruction PSNR (in dB)

performance in terms of PSNR and computation time are listed in TABLE I. Simulated LF data Maria, Buddha [32], and Dragons [11] are used for the tests. The average performance data of the three are drawn in Fig. 10. As can be seen from the figure, the proposed LF acquisition and reconstruction method SAS and NF shows obvious advantages over state-of-the-art methods. Several conclusions could be drawn:

Number of Atoms

8

0 1

2

3

4

5

6

CATime CSTime GDTime SASTime NFTime CSPSNR CAPSNR GDPSNR SASPSNR NFPSNR

7

Number of Measurements

Fig. 10: LF reconstruction performance comparison for competing methods. Both reconstruction PSNR and computation time are evaluated. The curves and bars are average of three synthetic LF data Buddha, Maria and Dragons.

5 × 5 LF reconstruction takes 18 hours on a 8-core workstation with parallel implementation; 3) Disparity will be strictly preserved during the reconstruction, which lead to higher reconstruction fidelity.

D. Quantitative Evaluation of Disparity Restoration The most critical information for light field data is the disparity information across all perspective views. Therefore, the restoration of this information is an important evaluation criteria for LF reconstruction. We use the method introduced in [3] to calculate the LF disparity. Instead of using all disparity values, we only select scene points whose edge confidence value Ce is above a threshold (in our experiment, we set as 0.05). Any disparity estimation below that threshold can be considered as inaccurate [3], therefore they are excluded from this evaluation. The results are shown in Table II. For each entry, first row shows the disparity root mean squre error (RMSE) in unit of pixels using the method in [11] based on GD; and second row shows the RMSE of our proposed method. As can be seen, the disparity RMSE from our method is in average 0.2 pixels smaller than [11]. Fig. 11(b) shows the reconstructed disparity for the light field Maria and Dragons. Column (a) gives the

9

TABLE I: Quantitative evaluation of LF reconstruction for synthesized LF data Maria, Buddha [32], and Dragons [11] with different number of measurements. The five entries for each unit are (from top to bottom): (1) coded aperture method; (2) coded sensor-plane method based on our proposed LF dictionary; (3) method in [11] basing on a global LF dictionary; (4) sub-aperture scan method using 5 scan images; (5) normalized fluctuation method. Best performance, i.e. highest PSNR and shortest time, are put in bold for each unit. 1

2

3

4

5

6

7

Measurement No.

PSNR

Time

PSNR

Time

PSNR

Time

PSNR

Time

PSNR

Time

PSNR

Time

PSNR

Time

Maria (741 × 741)

28.05 28.02 28.73 32.45 /

1498 5106 2702 268 /

30.78 29.95 29.25 32.52 34.51

1943 5450 2812 287 272

31.38 30.85 29.29 32.42 34.70

2219 5865 3208 297 268

32.19 31.15 29.38 32.58 34.89

1498 6024 3173 299 278

32.30 31.42 29.44 32.59 34.86

1722 6144 3050 393 275

32.33 31.54 29.41 32.56 34.92

2001 6220 3141 560 278

32.45 31.64 29.42 32.55 34.93

2236 6361 3084 804 276

Buddha (768 × 768)

29.58 30.04 31.41 33.64 /

1226 3682 1953 217 /

32.66 31.97 32.19 33.65 33.35

1219 4925 2046 240 273

33.21 32.75 32.26 33.56 33.55

1293 4927 2084 251 272

33.56 32.96 32.43 33.61 33.75

1428 4597 2192 263 272

33.68 33.17 32.53 33.55 33.70

1638 4435 2264 360 270

33.75 33.28 32.53 33.53 33.76

1922 5071 2314 535 264

33.79 33.36 32.54 33.54 33.82

2238 4843 2403 722 263

Dragons (593 × 840)

27.56 28.16 29.37 31.93 /

735 2610 1737 195 /

30.64 29.99 30.07 32.00 31.84

762 3209 2014 210 182

31.16 30.63 30.13 31.97 32.11

849 3136 1832 214 169

31.84 30.85 30.24 32.04 32.47

972 3448 1925 227 174

31.91 31.06 30.31 31.99 32.43

1168 3545 2045 305 177

31.95 31.18 30.34 31.98 32.55

1368 3221 1989 421 176

32.02 31.26 30.35 32.00 32.63

1734 3382 2043 575 172

Average

28.40 28.74 29.83 32.68 /

1153 3799 2131 227 /

31.36 30.64 30.50 32.72 33.23

1308 4528 2291 246 242

31.92 31.41 30.56 32.65 33.45

1454 4643 2375 254 236

32.53 31.65 30.68 32.74 33.70

1299 4690 2430 263 241

32.63 31.88 30.76 32.71 33.66

1509 4708 2453 353 241

32.67 32.00 30.76 32.69 33.74

1764 4837 2481 505 240

32.75 32.08 30.77 32.70 33.79

2069 4862 2510 700 237

(a)

(b)

(c)

(d)

Fig. 11: LF disparity reconstruction for the light field Maria and Dragons. (a) and (c) are the disparity calculated using [3] based on the original LF data; (b) and (d) show the disparity reconstructed with the NF method using only 2 measurements. Areas in black are pixels with confidence value Ce ≤ 0.05. ground truth disparity calculated based on the original LF data using method in [3]. This experiment validates the efficiency of proposed scheme in LF disparity restoration. E. Qualitative Evaluation of the Reconstructed Light Field In this sub-section we give visual demonstrations of the proposed LF acquisition/reconstruction schemes, and the output of the designed LF camera. To demonstrate the reconstruction details, we zoom in to some occlusion areas of the reconstructed LF from different perspective view points. The details are shown in Fig. 12 for the reconstructed LF Buddha and Dragon. Fig. 13 and 14 show 9 sub-view images from the reconstructed 5 × 5 LF data using our camera. The LF images are reconstructed using only 2 measurements with the NF method. Red dotted lines are added across the views to demonstrate the disparity.

TABLE II: Quantitative evaluation of LF disparity reconstruction for synthesized LF data Maria, Buddha [32], and Dragons with different number of measurements. For each entry, first row shows the disparity root mean square error (RMSE) in unit of pixels using the method in [11] based on GD; and second row shows the RMSE of our proposed method. Meas. No.

2

3

4

5

6

7

Buddha

0.649 0.551

0.641 0.557

0.631 0.518

0.619 0.521

0.622 0.517

0.622 0.515

Maria

0.576 0.358

0.570 0.397

0.563 0.334

0.557 0.331

0.560 0.329

0.559 0.327

Dragons

0.835 0.532

0.817 0.538

0.808 0.510

0.800 0.514

0.797 0.507

0.798 0.505

Average

0.678 0.472

0.668 0.492

0.660 0.445

0.651 0.446

0.652 0.442

0.652 0.440

Fig. 15 and Fig. 16 are examples of refocusing based on the reconstructed LF data. Video demonstration could be found at: http://www3.ntu.edu.sg/home/elpchau/workdemo.mp4. VII. C ONCLUSION In this work a compressed sensing framework is proposed for the sampling and reconstruction of a high resolution light field based on a coded aperture camera system. First, a LF dictionary based on perspective-shifting is proposed for sparse representation of the highly correlated light field. Then, two

10

Fig. 14: 9 of the reconstructed 5 × 5 angular views with measurements taken using the proposed LF camera. The NF method is used with only 2 measurements.

Fig. 12: LF reconstruction details for Buddha (top) and Dragons. Areas in red rectangles are zoomed-in for better observation. Blue rectangles highlight perspective differences between LF views. NF method is used with 2 measurements. Reconstruction PSNR 33.35 for Buddha, 31.84 for Dragons.

Fig. 13: 9 of the reconstructed 5 × 5 angular views with measurements taken using the proposed LF camera. The NF method is used with only 2 measurements.

separate methods, i.e., Sub-Aperture Scan (SAS) and Normalized Fluctuation (NF) are proposed to calculate the scene disparity, which will be used during the LF reconstruction with the proposed disparity-aware dictionary. At last, a hardware implementation of the proposed LF acquisition/reconstruction scheme is carried out. Both quantitative and qualitative evaluation show the proposed methods produce state-of-the-art performances in both reconstruction quality and computation efficiency. ACKNOWLEDGMENT The research was partially supported by the ST Engineering-NTU Corporate Lab through the NRF corporate

(a)

(b)

(c)

(d)

Fig. 15: Refocusing demonstration on the reconstructed light field based on synthetic data Buddha. 5 sub-aperture scans and only 1 extra measurement is used for the reconstruction. The reconstruction PSNR is 33.64 dB. Details in red and blue rectangles in (a), (b) are zoomed in (c), (d) respectively.

lab@university scheme. R EFERENCES [1] S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4d light fields,” in CVPR. IEEE, 2012. [2] T. E. Bishop and P. Favaro, “Plenoptic depth estimation from multiple aliased views,” in ICCV Workshops. IEEE, 2009, pp. 1622–1629. [3] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. H. Gross, “Scene reconstruction from high spatio-angular resolution light fields.” ACM Trans. Graph., vol. 32, no. 4, p. 73, 2013. [4] S.-C. Chan, K.-T. Ng, Z.-F. Gan, K.-L. Chan, and H.-Y. Shum, “The plenoptic video,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 12, pp. 1650–1659, 2005. [5] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006. [6] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.

11

[24]

[25] [26] (a)

(b)

[27] [28] (c)

(d)

Fig. 16: Refocusing demonstration on the reconstructed LF based on measurements from our proposed LF camera. The NF method is used, with 2 measurements. Details in red and blue rectangles in (a), (b) are zoomed in (c), (d) respectively.

[29]

[30] [31] [32]

[7] F. E. Ives, “Parallax stereogram and process of making same.” Apr. 14 1903, uS Patent 725,567. [8] G. Lippmann, “La photographie int´egrale,” Academie des Sciences, vol. 146, pp. 446–451, 1908. [9] E. H. Adelson and J. Y. Wang, “Single lens stereo with a plenoptic camera,” IEEE transactions on pattern analysis and machine intelligence, 1992. [10] R. Ng, M. Levoy, M. Br´edif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Computer Science Technical Report CSTR, 2005. [11] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar, “Compressive light field photography using overcomplete dictionaries and optimized projections,” ACM Transactions on Graphics, vol. 32, no. 4, p. 46, 2013. [12] T. Georgiev and C. Intwala, “Light field camera design for integral view photography,” Adobe System, Inc, 2006. [13] A. Ashok and M. A. Neifeld, “Compressive light field imaging,” in Proc. SPIE, 2010. [14] C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” ACM Transactions on Graphics (TOG), vol. 27, no. 3, p. 55, 2008. [15] S. D. Babacan, R. Ansorge, M. Luessi, P. R. Matar´an, R. Molina, and A. K. Katsaggelos, “Compressive light field sensing,” IEEE Transactions on Image Processing, vol. 21, no. 12, 2012. [16] H. Nagahara, C. Zhou, T. Watanabe, H. Ishiguro, and S. K. Nayar, “Programmable aperture camera using lcos,” in Computer Vision–ECCV 2010. Springer, 2010. [17] A. Manakov, J. Restrepo, O. Klehm, R. Hegedus, E. Eisemann, H.P. Seidel, and I. Ihrke, “A reconfigurable camera add-on for high dynamic range, multispectral, polarization, and light-field imaging,” ACM Transactions on Graphics, vol. 32, no. 4, pp. 47–1, 2013. [18] K. Venkataraman, D. Lelescu, J. Duparr´e, A. McMahon, G. Molina, P. Chatterjee, R. Mullis, and S. Nayar, “Picam: an ultra-thin high performance monolithic camera array,” ACM Transactions on Graphics, vol. 32, no. 6, 2013. [19] T. E. Bishop, S. Zanetti, and P. Favaro, “Light field superresolution,” in IEEE International Conference on Computational Photography. IEEE, 2009, pp. 1–9. [20] M. Hosseini Kamal, M. Golbabaee, and P. Vandergheynst, “Light field compressive sensing in camera arrays,” in IEEE International Conference on Acoustics, Speech and Signal Processing. Ieee, 2012, pp. 5413–5416. [21] X. Lin, Y. Liu, J. Wu, and Q. Dai, “Spatial-spectral encoded compressive hyperspectral imaging,” ACM Transactions on Graphics, vol. 33, no. 6, p. 233, 2014. [22] X. Lin, J. Suo, G. Wetzstein, Q. Dai, and R. Raskar, “Coded focal stack photography,” in IEEE International Conference on Computational Photography. IEEE, 2013, pp. 1–9. [23] L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F. Durand, “Light field

reconstruction using sparsity in the continuous fourier domain,” ACM Transactions on Graphics, vol. 34, no. 1, p. 12, 2014. G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: from gaussian mixture models to structured sparsity,” IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2481–2499, 2012. J. Chen, A. Matyasko, and L.-P. Chau, “A light field sparse representation structure and its fast coding technique,” in International Conference on Digital Signal Processing. IEEE, 2014, pp. 214–218. M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, 2006. S. F. Ray, Applied photographic optics: Lenses and optical systems for photography, film, video, electronic and digital imaging. Focal Press, 2002. S. Lyu, “Estimating vignetting function from a single image for image authentication,” in Proceedings of the 12th ACM workshop on Multimedia and security. ACM, 2010, pp. 3–12. J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Transactions on Image Processing, vol. 18, no. 7, pp. 1395–1408, 2009. B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, 1981. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, vol. 15, no. 12, 2006. S. Wanner, S. Meister, and B. Goldluecke, “Datasets and benchmarks for densely sampled 4d light fields,” in Vision, Modeling & Visualization. The Eurographics Association, 2013.

Jie Chen received the B.S. and M. Eng degree from School of Optical and Electronic Information, Huazhong University of Science and Technology, China. He is currently pursuing Ph.D. degree in School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore. His research interests are image processing, image sparse representation and applications, and computational photography.

Lap-Pui Chau received the B. Eng degree in Electronic Engineering from Oxford Brookes University, and the Ph.D. degree in Electronic Engineering from The Hong Kong Polytechnic University, in 1992 and 1997, respectively. Since 1997, he joined Nanyang Technological University. His research interests include fast signal processing algorithms, image and video processing, Mocap processing and human motion analysis. He was a General Chairs for IEEE International Conference on Digital Signal Processing (DSP 2015) and International Conference on Information, Communications and Signal Processing (ICICS 2015). He was a Program Chairs for International Conference on Multimedia and Expo (ICME 2016), Visual Communications and Image Processing (VCIP 2013) and International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS 2010). He was the chair of Technical Committee on Circuits & Systems for Communications (TC-CASC) of IEEE Circuits and Systems Society from 2010 to 2012. Since 2004, he have served as an associate editor for IEEE Transactions on Multimedia, IEEE Signal Processing Letters, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Broadcasting, The Visual Computer (Springer Journal) and IEEE Circuits and Systems Society Newsletter. Besides, he is IEEE Distinguished Lecturer for 2009-2015.