Advanced motion estimation and motion compensated de-interlacing

Advanced motion estimation and motion compensated de-interlacing E.B. Bellers and G. de Haan Philips Research Laboratories Television Systems Group Pr...

Author: Guest

7 downloads 0 Views 124KB Size

Report

Download PDF

Recommend Documents

Combined Wavelet-Domain and Motion-Compensated Video Denoising Based on Video Codec Motion Estimation Methods

Motion Estimation algorithm for

Image Mosaicing & Motion Estimation

Video Coding with Superimposed Motion-Compensated Signals

Alchemist Ph.C Motion Compensated Conversion Platform

Motion Vector Field Estimation and Motion Compensated Reconstruction for Flat Detector Cone Beam CT Scans of Breathing Patients

Motion Estimation on Interlaced Video

Motion Estimation for Video Coding

Motion Adaptive Compensation Approach for Deinterlacing of Video Sequences

advanced motion blur reduction technology

Motion-compensated Scalable Video Transmission over MIMO Wireless Channels under Imperfect Channel Estimation

Archangel Ph.C Real-time Motion Compensated Video Archive Processor

Weighted-Adaptive Motion-Compensated Frame Rate Up-Conversion

Comparing motion induction in lateral motion and motion in depth

Super Motion Super Motion Super Motion

Analysis of Motion Estimation Algorithm in HEVC

Robust Ego-Motion Estimation with ToF Cameras

Motion & Motion Diagram

Improving motion compensated Extrapolation for Distributed Video Coding

H.264 Motion Estimation Engine v1.0

Expanding Line Search for Panorama Motion Estimation

motion

Advanced motion estimation and motion compensated de-interlacing E.B. Bellers and G. de Haan Philips Research Laboratories Television Systems Group Prof. Holstlaan 4 5656 AA Eindhoven The Netherlands TEL:+31–40–2744285 FAX:+31–40–2742630 Key words: de–interlacing, motion compensation, motion estimation, sequential scan conversion, generalized sampling theorem Abstract: This paper describes a new high quality de-interlacing algorithm applying motion estimation and compensation techniques. First, a comparison between two recently introduced de-interlacing concepts will be presented. One method is based on a generalized sampling theorem and the other uses timerecursion. The new algorithm aims at combining the benefits of both.

1 INTRODUCTION Historically, interlacing has been introduced to offer a compromise between quality and required bandwidth. A major drawback of the interlaced scanning format on current bright high resolution displays is the line flicker and serration of moving edges. In the literature, several deinterlacing algorithms have been proposed to eliminate these artifacts, or to serve as a base for other scan rate conversions. Delogne et al. [1] recently proposed an advanced motion estimation and de-interlacing technique based upon a generalization of the sampling theorem. For an assumed velocity, the motion compensated correlation between two successive frames is calculated applying motion compensated vertical-temporal filters. As this can be repeated for any assumed velocity, it is possible to calculate the velocity for which this correlation has a maximum. This method is elegant, but requires a constant velocity over a three field period, which is a serious drawback for sequences with acceleration or covering and uncovering, which are rather common case in natural sequences. The problem has been noticed by the authors of the original algorithm already, and has resulted in an approach in which they maximize the correlation between a reconstructed frame and

a successive field as mentioned in [2] by Vanderdorpe et al.. The constant velocity constraint is reduced to a two field period. Wang et al. [3], somewhat earlier, proposed an alternative high-quality, time-recursive, deinterlacing concept, which does not impose a constant velocity constraint. The de-interlacing proposal also requires motion estimation and aims at the highest performance level. This paper compares the time-recursive technique proposed by Wang with the Vandendorpe method, applying the 3D Recursive Search (3D-RS) block matcher of [5] to both algorithms, and introduces an advanced new de-interlacing algorithm that aims at combining the advantages of the other methods. The motion estimator used for all algorithms is the same apart from the error function which depends on the algorithm under test. Section 2 starts with a description of this 3D-RS block matcher. Section 3 focuses on the time-recursive and generalized sampling theorem based de-interlacing algorithms. In section 4, the new algorithm is presented, and section 5 compares the experimental results with these algorithms. Finally, conclusions are drawn in section 6.

2 THE 3D RECURSIVE-SEARCH BLOCK MATCHER The high quality and efficient 3D RS block matcher of [5] is used in the algorithms presented. This algorithm uses a small number of candidate vectors per block of pixels with a quarter pixel accuracy. Furthermore, due to the inherent smoothness constraint, it yields very coherent vector fields that closely correspond to the true-motion of objects. This makes this method also suitable for scan rate conversion. This section briefly summarizes its characteristics. In block-matching motion estimation algorithms, a displacement vector (or motion vector) t

d ( b c , n ) is assigned to the center b c = ( x c , y c ) , with t for transpose, of a block of pixels B ( b c ) in the current field n by searching a similar block within a search area SA ( b c ) , also centered at b c , but in the previous field n – 1 . This similar block has a center which is shifted with respect to b c over the displacement vector d ( b c , n ) . To find d ( b c , n ) , a number of candidate vectors C are evaluated applying an error measure e ( C , b c , n ) to quantify block similarity. Figure 1 demon-

strates the procedure. The block of pixels (positions) is defined by:  W W H H  B ( b c ) =  ( x , y )   x c – ------ ≤ x ≤ x c + ------ ∧  y c – ---- ≤ y ≤ y c + ----   2 2 2 2  

(1)

with W and H the block width and block height respectively1, and ( x , y ) the spatial position in the image. The candidate vectors are selected from the candidate set CS ( b c , n ) , which is defined by:

W –W   0   CS ( b c , n ) =   d  b c –   , n + U 1 ( b c , n ) ,  d  b c –  , n + U 2 ( b c , n ) ,  d  b c +   , n – 1          2H H H  

(2)

1. In our experiments, W was set to 8 pixels and H to 8 frame lines.

C

yc

W SA

n-1

B(bc)

xc yc image number

H

n xc Figure 1

Illustration of block-matching of the dark areas

where the update vectors U 1 ( b c , n ) and U 2 ( b c , n ) are selected from an update set US , defined by: US ( b c , n ) = U S i ( b c , n ) ∪ U S f ( b c , n )

(3)

with the integer updates U S i ( b c , n ) defined by: 0 1 –1 0 0 2 –2 0 0 3 –3   0 0 U S i ( b c , n ) =    ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,    0 1 – 1 0 0 2 – 2 0 0 3 – 3 0 0  

(4)

and the fractional updates U S f ( b c , n ) , necessary to realize sub-pixel accuracy, are defined by:  0   0   0.25  – 0.25  0   0   0.5  – 0.5  , , , , , , , U S f ( b c , n ) =       0   0   0.5  – 0.5  0   0    0.25 – 0.25 

(5)

From these equations it can be concluded that the candidate set consists of spatial and spatio-temporal ‘prediction vectors’ from a 3D neighbourhood and an updated prediction vector. This implicitly assumes spatial and/or temporal consistency. The updating process involves updates added to either of the spatial predictions. Figure 2 shows where the spatial and spatio-temporal prediction vectors are located relative to the current block. The displacement vector d ( b c , n ) resulting from the block-matching process, is a candidate vector C which yields the minimum value of the error function e ( C , b c , n ) : d ( b c , n ) = { C ∈ C S ( e ( C , b c , n ) ≤ e ( V , b c , n ) ) ∀( V ∈ CS ( b c , n ) ) }

(6)

The error function (which will be given in the following subsections) is a cost function of the luminance values, f ( x , n ) with spatial position x = ( x , y ) t , of the pixels in the current block and those calculated with aid of the candidate vector. This error function is different for the Wang and the Vandendorpe approach, called Time-Recursive (TR) error function and Transversal Generalized-Sampling-Theorem (TGST) error function respectively.

current block block in current field --> spatial prediction

yc

block in previous field --> spatio-temporal prediction

H xc

W Figure 2 Positions, relative to the current block, from which the prediction vectors are taken in the 3D RS block-matcher

2.1 TR-error function The TR-error function is a cost function of luminance values of the pixels in the current block and those of the shifted block from the previous field, summed over the block B ( b c ) . A common choice, which we also use, is the Sum of the Absolute Differences ( SAD ). The TR-error function is defined by: e TR ( C , b c , n ) = SAD TR ( C , b c , n ) =

∑

x ∈ B (bc )

f ( x , n ) – f out ( x – C , n – 1 )

(7)

with f out ( x , n – 1 ) the de-interlaced output frame n – 1 .

2.2 TGST error function The TGST error function is calculated, using a generalization of the sampling theorem. From the sampling theorem, it is known that a bandlimited signal with maximum frequency

fs ----- can exactly be reconstructed if this signal is sampled with a frequency of at least f s . Already 2

in 1956, Yen [6] showed a generalization of this theorem. Yen proved that any signal that is bandlimited by f0.5 f s can exactly be reconstructed by N independent sets of samples, sampled with frequency ----s- . This theorem can effectively be used to perform motion estimation and de-interN lacing as also presented by Vandendorpe [2] and Delogne [1]. This method will be addressed as the TGST approach. An application of the generalized sampling theorem is to calculate an error-function for motion estimation by comparing a generated field from the previous and pre-previous image with the most recent field. Figure 3 shows that samples from the previous field and pre-previous field are shifted over the motion vector in order to create two sets of samples for field number n . An appropriate filter that matches the desired interpolator can reconstruct the expected samples. Motion estimation tries to minimize the difference between the generated field and the current field n. The calculation of the expected samples is explained in the papers of Vanderdorpe [2] and Delogne [1]. Kalker [7] shows a generalization of this concept which does not require the translation via the Fourier domain.

odd

even

odd

existing samples

y+4 vertical position y

motion compensated samples

y+3

expected samples

y+2 y+1 y y-1

n-2 Figure 3

n-1

n

field number

Motion estimation with the generalized sampling theorem

According to Kalker [7], the expected samples for the odd field f o ( x , n ) , for vertical motion are calculated by:

only2,

0

0

∑ f  x –  2 k + 1 , n – 1 ⋅ h 1 ( k ) + ∑ f  x –  2 m , n – 2 ⋅ h 2 ( m )

f o(x , n) =

k

(8)

m

with f o ( x , n ) the sample from the odd field with fieldnumber n at vertical position ( x , y ) , f ( x , n – 1 ) the sample from field n – 1 at vertical position ( x , y ) , h ( k ) the desired filter impulse response which models the shift due to motion and the interpolator as well, and ( g ) o the odd field of g . Note that x f    , n – 2 = 0 y

y mod 2 = 0

x f    , n – 1 = 0 y

y mod 2 = 1

(9)

In the z-domain, equation (8) is rewritten into: F o ( z , n ) = ( F ( z , n – 1 ) ⋅ H 1 ( z ) + F ( z , n – 2 ) ⋅ H 2 ( z ) )o

(10)

with H 1(z ) =

∑ h1(k )z

–( 2 k + 1 )

k

H 2(z ) =

∑ h2(m )z

–2 m

(11)

m

Assuming that we have the complete frame at n – 2 , F p ( z , n – 2 ) available, of which field n – 2 is extracted. Consequently, the following equation is valid: F o ( z , n – 2 ) = ( F p ( z , n – 2 ) )o

(12)

Field n – 1 can be reconstructed by shifting the samples from frame n – 2 over the motion vector, applying the desired interpolator, and extracting the desired field samples. So,

2. Horizontal motion is irrelevant for this explanation, since it can be solved with simple sample rate conversion theory. Therefore, it is set to zero for clarity.

F e ( z , n – 1 ) = ( F p ( z , n – 2 ) ⋅ H ( z ) )e

(13)

where H ( z ) describes the motion over one field period and the desired interpolation in the z-domain. Field n can now also be reconstructed by shifting the samples from F p ( z , n – 2 ) over twice the motion vector. This results in: 2

F o ( z , n ) = ( ( F p ( z , n – 2 ) ⋅ H ( z ) ) ⋅ H ( z ) )o = ( F p ( z , n – 2 ) ⋅ H ( z ) )o

(14)

Using the following set of characteristics: X o(z, n) =

∑

x (k , n) ⋅ z

∑

x (k , n) ⋅ z

–k

k is odd

X e(z, n) =

–k

k is even

(15)

( X ( z , n ) ⋅ Y ( z , n ) )e = X o ( z , n ) ⋅ Y o ( z , n ) + X e ( z , n ) ⋅ Y e ( z , n ) ( X ( z , n ) ⋅ Y ( z , n ) )o = X o ( z , n ) ⋅ Y e ( z , n ) + X e ( z , n ) ⋅ Y o ( z , n )

equation (13) results in: F e(z, n – 1) = F e(z, n – 2) ⋅ H e(z ) + F o(z, n – 2) ⋅ H o(z )

(16)

and equation (14) results in: 2

F o ( z , n ) = F o ( z , n – 2 ) ⋅ ( H ( z ) )e + F e ( z , n – 2 ) ⋅ 2 H o ( z ) ⋅ H e ( z )

(17)

Substituting equation (16) for F e ( z , n – 2 ) in (17) finally results in the desired expression: F o (z, n ) = a(z )F o (z, n – 2) + b (z )F e (z, n – 1)

(18)

with 2

2

a(z ) = H e(z ) – H o (z ) b (z ) = 2H o (z )

(19)

Similar expressions can be extracted for the calculation of an even field. If the motion vector and the desired interpolator are known, expression (19) calculates the filter coefficients. Assuming a bilinear interpolator and a shift over 1 – α , H is modelled as: H (z ) = α + (1 – α)z

–1

(20)

Consequently, H o (z ) = (1 – α)z

–1

(21)

H e(z ) = α

and 2

2 –2

a(z ) = α – (1 – α) z b (z ) = 2(1 – α)z

–1

(22)

If the velocity equals, e.g. 0.5 pixels per field, then α = 1--- , and the luminance sample f ( x , n ) 2 can be calculated according to: 0 0 1 1 f ( x , n ) = f  x +   , n – 1 + --- f ( x , n – 2 ) – --- f  x +   , n – 2 4 4 1 2

(23)

This is illustrated in figure 4. 1/4 X

y-1 vertical position y

y

1

X

calculated sample existing samples

y+1

motion compensated samples

y+2

expected samples

y+3

-1/4

y+4

n-2 Figure 4

n-1

n

field number

Calculation of the expected sample by applying a form of the generalized sampling theorem

So, for a given candidate vector, f ( x , n ) can be calculated, which can then be matched with the current field. The same set of candidates as defined in section 2 is used to find the motion vectors. The best vector selection is based on minimization of the SAD according to: e TGST ( C , b c , n ) = SAD TGST ( C , b c , n ) =

∑

x ∈ B (bc )

f ( x , n ) –  ∑ h 1 ( k ) f  x – C ( x , n ) –  k

(24) 0  0 , n – 1 + ∑ h 2 ( m ) f  x – 2 C ( x , n ) –   , n – 2      2 m 2 k + 1 m

From figure 4 and expression (24), it is concluded that the motion vector is assumed constant for all pixels within the filter aperture. So, motion is considered to be constant over a 2-field period and spatially over the filter length, which defines the regional vertical-temporal uniform motion constraint of the TGST method.

3 THE DE-INTERLACING METHODS The de-interlacing algorithm used in the Wang approach will be called ‘TR de-interlacing’ in this paper. The Vandendorpe de-interlacer is referred to as the ‘TGST de-interlacer’.

3.1 The TR de-interlacer In [3], a time-recursive de-interlacing algorithm is proposed in which the interpolated pixels are determined by motion compensating the previously found de-interlaced output signal:

 f (x , n) f outTR ( x , n ) =   f outTR ( x – d ( x , n ), n – 1 )

y mod 2 = n mod 2 else

(25)

where f ( x , n ) is the interlaced input (luminance) signal and f out ( x , n ) the de-interlaced output. The motion vector d ( x , n ) is obtained by a ‘block erosion’ process [8], which uses several block vectors d ( b c , n ) in the direct environment of x . Note that y mod 2 = n mod 2 is true for original lines only. The pixels interpolated in the current frame are, generally, due to the sub-pixel accuracy partly based on pixels interpolated in the de-interlaced process of the previous field. As an implication, errors originating in an output frame, can propagate into later output frames. This is inherent to the recursive approach, and the most important drawback of this method. To prevent serious propagation errors, several solutions have been described in [3]. Particularly, the median filter is recommended to solve this problem. Consequently, equation (25) changes to: f (x , n) y mod 2 = n mod 2   f outTR ( x , n ) =  (26)    0    0   else  median  f outTR ( x – d ( x , n ), n – 1 ), f  x –   , n , f  x +   , n  1 1   

Although this is a very effective method, it introduces alias in the de-interlaced image.

3.2 The TGST de-interlacing The de-interlacer proposed in [2] by Vandendorpe is illustrated in figure 5. odd

odd

even

y+4 vertical position y

existing samples

y+3

motion compensated samples

y+2

calculated samples

y+1 y y-1

n-2 Figure 5

n

n-1

field number

De-interlacing using a generalization of the sampling theorem

The missing samples in field n of figure 5 can now be calculated according to: f e(x , n) =

0

0

∑ f  x –  2 k + 1 , n h 1 ( k ) + ∑ f  x –  2 m , n – 1 h 2 ( m ) k

(27)

m

Both the original samples from the current field and the original samples from the previous field are used to calculate the missing samples in the current field. In the z-domain, the missing samples can be calculated according to:

F e ( z , n ) = ( F p ( z , n – 1 ) ⋅ H ( z ) )e = F o ( z , n – 1 ) ⋅ H o ( z ) + F e ( z , n – 1 ) ⋅ H e ( z )

(28)

F o ( z , n ) = ( F p ( z , n – 1 ) ⋅ H ( z ) )o = F o ( z , n – 1 ) ⋅ H e ( z ) + F e ( z , n – 1 ) ⋅ H o ( z )

(29)

and

Solving F e ( z , n ) from equations (28) and (29) results in: F e(z, n ) = a(z ) ⋅ F e(z, n – 1) + b (z ) ⋅ F o (z, n )

(30)

with 2

H o(z ) a ( z ) = H e ( z ) – --------------H e(z )

(31)

H o(z ) b ( z ) = --------------H e(z )

If we again assume a bilinear interpolator and a shift over 1 – α : H (z ) = α + (1 – α)z

–1

(32)

the filter coefficients are defined by: 2

( 1 – α ) –2 a ( z ) = α – -------------------- z α 1 – α –1 b ( z ) = ------------ z α

(33)

As an example, consider the situation of a motion of 0.5 pixels per field and a bilinear interpolator, then α = 1--- , and the missing samples can be calculated according to: 2

0 0 1 1 f ( x , n ) = f  x +   , n + --- f ( x – d ( x , n ), n – 1 ) – --- f  x – d ( x , n ) +   , n – 1 2 2 2 1

(34)

This is illustrated in figure 6. y-2 vertical position y

X 1/2

y-1 y

existing samples x

y+1

calculated sample

-1/2

motion compensated samples 1

to be calculated samples

y+2 y+3

n-2 Figure 6

n-1

n

field number

Calculation of the missing samples using generalized sampling

The missing samples are determined by the original samples of the current and the previous field only. The motion estimator also uses original samples only. Therefore, errors, due to incorrect motion vectors, will not propagate, which is a major advantage compared to the TR-algorithm.

The de-interlaced output is defined by: f outTGST ( x , n ) = f (x , n) y mod 2 = n mod 2   =      0    0  else  ∑ f  x – d ( x , n ) –  2 k + 1 , n h 1 ( k ) + ∑ f  x – d ( x , n ) –  2 m , n – 1 h 2 ( m ) m k

(35)

4 RECURSIVE GST MOTION ESTIMATION AND DE-INTERLACING ALGORITHM Delogne et al. [1] proposed a motion estimation algorithm based on a generalization of the sampling theorem, which requires uniform motion over a 3 field period. Vandendorpe et al. [2] proposed a similar algorithm which requires motion to be uniform over a 2 field period. The 3D RS block matcher [5] is not restricted in this sense, which is an advantage as motion is generally not uniform. It is therefore expected that the TR approach is advantageous in case of non-uniform motion. However, in the case of uniform motion, the proposed algorithms of Delogne et al. and Vandendorpe et al. are expected to perform well. The de-interlacer as described in subsection 3.2 uses information from the past (field n – 1 ) and from the current field n for calculating the interpolated lines. Generally, the samples from the current field are highly correlated with the samples from the missing lines. Consequently, it is advantageous to use information from the current field. The TR de-interlacer does not use information from the current field for calculating the interpolated lines, but uses information from previous fields only3. The combination of a ‘recursive’ motion estimator which uses the current field and the previous de-interlaced field (which is generated using data from the previous field and pre-previous field only), and the de-interlacer of section 3.2 is expected to outperform the previous approaches and will be addressed as the recursive GST method (RGST). Note that the estimator uses information from 3 successive fields only (in contrast with the TR method which uses information from the complete history), due to the choice of the de-interlacer. Consequently, severe error propagating as with the TR method can not occur. The de-interlaced image is median filtered prior to the TR motion estimation. As a consequence, serious errors of the de-interlacer will not deteriorate the motion estimator. Without protection of the estimator, the result is a ‘self-fulfilling prophecy’. Due to wrong or inaccurate motion vectors, the de-interlacer generates a wrong de-interlaced image, which is used by the motion estimator. The motion estimator estimate motion between this ‘wrong’ image and the input, which results in a wrong motion vector. As a consequence, the de-interlaced image is incorrect. The protection of the estimator by a median prevents these errors from propagating.

3. However, the median filter used to prevent errors for propagating can also introduce information from the current field, but is only meant as an ‘escape’.

4.1 RGST with selective median The median protection of the estimator does not prevent de-interlace errors, however, prevents severe errors from propagating. For near-critical velocities, shifted samples are mapped closely to original samples. The difference between these sample values greatly influences the interpolation. As a consequence, this difference becomes ‘boosted’, which also boosts the noise level. Therefore, inaccuracies can occur, which yield into undesired artefacts for which a remedy is required. The accuracy of the motion estimator is 0.25 pixel. Together with the proposed bilinear interpolator, filter coefficients can be calculated according to Kalker et al. [7]. The filter coefficients that result from this are either {-2.25, 3, 0.25}, {-0.5, 1, 0.5}, {-0.083, 0.333, 0.75} or {0, 0, 1}, with the middle coefficient referring to the current field and the other coefficients to the previous field. The filter with coefficients {-2.25, 3, 0.25} has a high gain in the high frequencies. As a consequence, the resulting samples might be ‘over-corrected’ due to a wrong motion vector or just peaked due to inaccuracies. Consequently, the de-interlaced sequence might show regions that are extremely boosted compared to the rest of the image. The estimator is also negatively influenced by this effect. As a remedy against this phenomenon, a median filter is activated when this filter is selected, according to: f outRGSTseM ( x , n ) =     0    0    median  f outTGST ( x , n ), f  x –   , n , f  x +   , n  1 1     f outTGST ( x , n ) 

boosting filter selected

(36)

else

This approach will be denoted as the RGST with selective median. In order to allow correct motion estimation of vertical high spatial frequencies, the selective median is also used for the estimator4. The architecture of this new algorithm is shown in figure 7.

5 EVALUATION Several tools can be used to evaluate the de-interlaced results ranging from objective measurements to subjective evaluation. We preferred to use the objective measurement based on the mean-square-error, since it is used in the papers of the described de-interlacing methods. However, it is not always a reliable indicator. New tools which better reflect the relation between the measurement and the perception are still desired.

4. A proposal with a selective median for the estimator only was not found to be an interesting option.

GST deinterlacer

field memory median field memory

interlaced input

= line memory

Figure 7

de-interlaced output mux

motion estimator

Proposed de-interlacing architecture

5.1 Tools Several sequences with different characteristics were processed in order to evaluate the discussed methods. As an objective measure, the interlaced Mean-Square-Error, MS E i , is calculated according to: 1 2 MS E i ( n ) = -------------- ∑ ( f orig ( x , n ) – f ( x – d ( x , n ), n – 1 ) ) N MW x ∈ MW

(37)

with MW indicating the Measurement Window, N MW the number of samples within the measurement window, and f orig ( x , n ) the original samples in field n . All field lines within MW contribute to the interlaced MSE. For sequences that originate from movie-material, the de-interlaced images can be compared with the original, resulting in a real MSE, and defined by: 1 2 MSE ( n ) = -------------- ∑ ( f orig ( x , n ) – f out ( x , n ) ) N MW x ∈ MW

(38)

with f out ( x , n ) the output of the chosen de-interlacer; f outTR ( x , n ) , f outTGST ( x , n ) , f outRGST ( x , n ) or f outRGSTseM ( x , n ) . The real MSE can not be calculated for video-camera material, since no progressive original is available. The MSE scans all frame lines within MW instead of the field lines only for the interlaced MSE. The MSE has the advantage that it can be applied to judge also the performance in case of critical velocities, whereas this is not reflected in the MS E i . as an additional criterion, the Motion Trajectory Inconsistency, MTI [4], will be calculated: 1 2 MTI ( n ) = -------------- ∑ ( f ( x – d ( x , n ), n – 1 ) – f out ( x , n ) ) N MW x ∈ MW

As with the MSE, all frame lines within MW contribute to the MTI.

(39)

A low MTI score indicates a high correlation between the previous de-interlaced image and the currently calculated de-interlaced image, or more specific, it is a measure for the temporal consistency along the motion trajectory. As indicated in [4], a problem with this measure is that a good score is a necessary but insufficient constraint. Switching the output to zero, forces the MTI to low values, while the picture is seriously degraded. However, a lower MTI coupled to an almost stable (interlaced) MSE is a strong indication for quality improvement. These measurements together form a useful tool to evaluate the alternative algorithms.

5.2 Results The sequences used in the evaluation are Renata (ample vertical detail, horizontal velocities), Mobile (both horizontal and vertical velocities, including critical velocities), Shopping (ample vertical and horizontal detail, with critical velocities), RenataSpeed (same as Renata, but accelerated 3 times), Tokyo (slow vertical and horizontal motion) and Bicycle (rotation). For calculation of the real MSE, the sequences Tokyo and Bicycle are used. A snapshot of these sequences are shown in figure 8. The results are categorized into two groups; one with a near uniform motion (Mobile, Shopping and Tokyo) and one with non-uniform motion (Renata, RenataSpeed and Bicycle). The transversal de-interlacer of [2] was expected to perform well for images with uniform motion, since the constraint of uniform motion over a 2 field period is in this case valid. The recursive de-interlacer was expected to perform better in situations where this constraint is invalid. The results are shown in the plots for MSE and MTI in the figures 9, 10 and 11. Table 1 shows the results in both MSEi and MTI improvement with respect to the TR algorithm.

TGST RGST RGST sel med

b a Figure 10 Percentage MSE improvement compared to the TR method; a) interlaced MSE, b) real MSE. (R=Renata, M=Mobile, S=Shopping, RS=RenataSpeed, B=Bicycle, T=Tokyo). Some observations from these results: •

In all situations, the RGST with selective median outperforms the one without selective median both for the MSE5 as well as the MTI. The improvement is partly due to the elimination of the median in the estimator, which allows vertical high frequency to be tracked, and partly caused by the protection at the output due to the selective median.

a) Renata

b) Mobile

c) Shopping

d) RenataSpeed

e) Tokyo f) Bicycle Figure 8 Images from the used sequences. Images b,c,e have nearly uniform motion, whereas a,d,f are typical non-uniform •

Similarly, the RGST outperforms in all situations the TGST, which is mainly due to removing the uniform motion constraint.

5. The trend in the real MSE’s resembles that of the interlaced MSE’s, which indicates that the interlaced MSE is also a valuable measure indicating performance improvements.

Non-uniform motion

MSEi

MSEi

Uniform motion

frame number

c

frame number

b

frame number

d

frame number

MTI

MTI

a

Figure 9 MSEi (a,b) and MTI (c,d) results for TR (- -), TGST (-.-), RGST (...) and RGST with selective median (-), measured for the frames 5 to 29

TGST RGST RGST sel med

Figure 11 Percentage MTI improvement compared to the TR method. (R=Renata, M=Mobile, S=Shopping, RS=RenataSpeed, B=Bicycle, T=Tokyo).

Table 1 MSEi / MTI with respect to the TR method Sequence

TGST

RGST

RGST sel median

Renata

+ / ++

++ / ++

++ / ++

Mobile

+/-

++ / -

++ / +

Shopping

+ / ---

+++ / -

+++ / +

RenataSpeed

--- / -

-/-

++ / +

Bicycle

--- / ---

--- / ---

-/-

Tokyo

++ / ++

++ / ++

+++ / ++

•

In the plots of figure 10 and 11 and table 1 it can be observed that the sequence Bicycle performs worst in all situations compared to the TR method. Bicycle is a sequence with complex motion, but without high vertical frequencies. It is therefore not surprising that even the RGST with selective median has no advantage without the median (in most cases), but has the disadvantage of not having a median for all velocities. Consequently, there is not enough protection available. So, for complex motion sequences without vertical high frequencies, the TR method is advantageous.

•

The MTI performance indicator as plotted in figures 9c and 9d shows a significant improvement in favor of the RGST with selective median for uniform motion sequences and on average a similar score in case of non-uniform motion sequences. Since the MSEi for the RGST with selective median outperforms the TR method, it can be concluded that this algorithm results in a significant improvement for de-interlacing.

•

It is also interesting to note that for the TR approach the MTI value is about 75% of the MS E i value, which indicates that the median filter, which is used to prevent errors for propagating, seems to be a must. (If the median filter would not be used, the MTI value would be half the MS E i value).

•

As shown in table 1, some sequences show an improvement in MSEi and a worse MTI compared to the TR method. Since no thorough subjective evaluation has been conducted yet, it is too early for conclusions in these situations.

•

Generally, the TR method outperforms the TGST method in case of non-uniform motion.

•

The TR algorithm performs, generally, also very well in terms of MTI, which was also expected, since the recursive algorithm stimulates temporal consistency along the motion trajectory. This algorithm inherently also performs noise reduction, which also contributes positively to the MTI.

•

The ‘boosting filter’ as mentioned in section 4 is selected for near critical velocities, which are detected in the Mobile and Shopping sequence. As a result, the MTI increases. The selective median is a solution to solve this problem as also indicated in figure 11.

6 CONCLUSIONS Two interesting algorithms for motion estimation and de-interlacing of Wang et al. [3] and Vandendorpe et al. [2] have been compared. The TGST was found to be superior (in terms of MSEi) for sequences with uniform motion, whereas the recursive algorithm of Wang was found to be superior for sequences with non-uniform motion. The MTI score is for the Wang approach in

both cases the best. The impact on subjective perception is yet unclear. The mutual weight of the MSEi and MTI to the subjective perception remains to be investigated. The best aspects of both algorithms were joined in order to create a new motion estimation and de-interlacing algorithm, which does not demand for uniform motion nor suffer from error propagation. The RGST with selective median is found to be superior over the earlier methods, since the interlaced MSE was decreased in both cases with a similar MTI for sequences with non-uniform motion (compared to the TR method) and a lower MTI for sequences with uniform motion. The robust approach of TR algorithm of Wang combined with the GST de-interlacing, which uses original samples only, seems an very interesting basis for further improvements.

References [1] P. Delogne, L. Cuvelier, B. Maison, B. Van Caillie and L. Vandendorpe, ‘Improved Interpolation, Motion Estimation and Compensation for Interlaced Pictures’, IEEE Tr. on Image Processing, Vol. 3, No. 5, September 1994, pp. 482-491. [2] L. Vanderdorpe, L. Cuvelier, B. Maison, P. Quelez and P. Delogne, ‘Motion-compensated conversion from interlaced to progressive formats’, Signal Processing: Image Communication 6, Elsevier 1994, pp. 193-211. [3] F.M. Wang, D. Anastassiou and A.N. Netravali, ‘Time-Recursive Deinterlacing for IDTV and Pyramid Coding’, Signal processing: Image Communications 2, Elsevier 1990, pp. 365374. [4] G. de Haan and P.W.A.C. Biezen, ‘Time-recursive de-interlacing for high-quality television receivers’, Proc. of the Int. Workshop on HDTV and the Evolution of Television, November 1995, Taipai, Taiwan, pp. 8B25-8B33. [5] G. de Haan and P.W.A.C. Biezen, ‘Sub-pixel motion estimation with 3-D recursive search block matching’, Signal Processing: Image Communication 6, Elsevier 1994, pp. 229-239. [6] J.L. Yen, ‘On Nonuniform Sampling of Bandwidth-Limited Signals’, IRE Tr. on Circuit Theory, vol. CT-3, December 1956, pp. 251-257. [7] A.A.C. Kalker, ‘Motion Estimation and Compensation for Interlaced Video’, to be published in IEEE Tr. on Image Processing. [8] G. de Haan, P.W.A.C. Biezen, H. Huijgen and O.A. Ojo, ‘True-Motion Estimation with 3-D Recursive Search Block Matching’, IEEE Tr. on circuits and systems for video technology, Vol. 3, No. 5, October 1993, pp. 368-379.