Video Processing and Compression

Video Processing and Compression Video (A Sequence of Temporal Images) I(x,y,t) t 1 Video • Temporal Sequence of Images – I(x,y,t) – t = time ...
Author: Jasmin Pearson
5 downloads 0 Views 221KB Size
Video Processing and Compression

Video

(A Sequence of Temporal Images)

I(x,y,t)

t

1

Video • Temporal Sequence of Images – I(x,y,t) – t = time

• Terminology – Frame

• an image in the sequence I(x,y,t=c) • frames per second (fps)

– numbers of frames displayed per second

– Field

• Frames are often composed of two fields – Even and odd

• This procedure is called interlacing

Interlacing even field

odd field

0

time

0.5

say it takes time, t=1, to capture a frame odd field captured in t/2 time even field captured in t/2 time

1

t

frame composite

odd and even field This is the “interlace” effect

2

Interlacing • Legacy from analogue video

– Fields used due to bandwidth limitations of analogue transmission – Trade off in spatial resolution for temporal time

• 30 frames per second

– means 60 fields per second – some monitors transmit video frame in interlaced format

Terminology – Progressive Scan Video Cameras

• Can capture a whole frame in time t • Although format can still be interlaced

– You will not get the “interlace” effect – that is, odd and even fields captured at the same time

– Progressive Scan

• Common on new models of Camcorders

– Topics in this lecture (other than de-interlacing) will assume no interlacing in the frame

3

Terminology • Critical Fusion Frequency – The temporal frame rate at which your eye does not see “flicker” in an video sequence – Generally considered to be around 24 fps

• Temporal Aliasing – Effect from low frame rate • Jerky Image • “Flicker”

Some Simple Video Processing • De-interlacing • Optical Flow/Motion Estimation • Object Tracking

4

De-interlacing • Frame is interlaced • De-interlacing

(suffers from interlace effect)

– Create two new frames (same size as original) – One using only even scan lines • Interpolate “odd” scan lines

– One using only odd scan lines

• Interpolate “even” scan lines

• Produces 60 frames per second

– De-interlaced frames have lower vertical resolution

• Common Pre-processing Technique

De-interlacing Example Even “Fields” Interpolate missing scanlines

Frame has interlaced effect Odd “Fields” Interpolate missing scanlines

5

Optical Flow • Estimating the motion field from image sequences • Idea: Change in illumination over time is related to motion in the scene such that: I(x,y,t-1) = I(x+u, y+v, t) • Assumes

– a constantly illuminated scene – fairly simple scene motion and object properties

3D Motion -> 2D Motion Field Object/ M(t) Scene

M(t+dt) dM

u = dx/dt v = dy/dt

Set of (u,v) vectors is called “optical flow”

V = dM/dt

(dx,dy)

Image

6

Motion Field

I(x,y,t)

I(x,y,t-1) (u,v)

I(x,y,t-1) = I(x+u, y+v, t)

[u,v] represent the projection of the 3D motion.

Optical Flow Example

I(x,y,t-1)

I(x,y,t)

“Needle representation” of the motion vectors Often shown at a “sparse” resolution.

7

Matching Approach to Optical Flow (Motion Estimation) • For each (x,y) in I(t-1) – Construct a template about (x,y)

• Search a neighborhood size W, about (x,y) in I(t) • Find best match – Using Correlation, SAD, SAM, etc . .

• Best match is at location x’, y’

– record motion vector

• V(x,y) = ( x’ – x, y’ – y’ )

• loop

Match-based Motion Estimation I(x,y,t-1)

Create a template around x,y

I(x,y,t)

Search a window W, about x,y Perform template matching Find best match (x’, y’) Motion Vector

u = x’ - x v = y’ - y

8

Aperture problem Actual Motion

Observed Image

I(t)

I(t)

I(t-1)

I(t-1)

• The motion perceived is only the normal component • Parallel direction cannot be estimated

Optical Flow Uses • Motion vectors gives insight to 3D scene – Used to help segment moving objects – Determine direction of objects – Other high-level analysis (take computer vision)

• Describes camera motion – Often called camera “ego” motion

De-Zoom

Zoom

Translate (or rotation)

9

Optical Flow • Many of the assumptions of optical flow are invalid – Constant Intensity

• changing illumination is perceived as motion • Objects do not have constant reflectance

– Slow moving objects

• we often have small objects with lots of occlusion/deocclusion

– Sufficient aperture

• As demonstrated we may encounter the aperture problem

• Even with invalid assumption, still gives reasonable results • Note: magic numbers for template size and search window

Object Tracking

1. Locate object 2. Track this object over time Generally assume a fixed camera

10

Two phases • “Lock on” – Find the object to track – Use recognition techniques – User selects initial starting point

• Tracking based on initial template – Construct a template around the object – Find the template in the next image – Perform template matching within a window

Tracking It

It+1

T

It+2

T

T

•Feature-based tracking –Construct a template around the object –Fixes the size of the template! –Find the template in the next image –Perform template matching within a window about current center –Update the template using the best match in advancing frames –this allows the template to adapt over time

11

Tracking Issues • Occlusion

– Object moves/becomes occluded

• Need to threshold matches to determine when object has been “lost” • Requires a new “lock on” phase

• Multiple Objects

– what happens when two similar objects “cross-over”

• Object deforms over time

– Object exhibits substantial change • Lighting effect, specularity, pose

• Tracking is a hard problem!

Tracking • Presented simple tracking skeleton

– Often want tracking to be performed in realtime – This is a fast and simple brute-force approach

• Extensions

– Speed up using prediction of motion based on previous estimations • See Kalman Filters

– Use better models than just template based • Training set, template set • Deformable templates • Affine templates

12

Other Video Processing • Digital Re-mastering

– Film converted to digital format – Allows us the opportunity to process the pixels

• Some examples

– Remove scratches in film

• Median filters • Median filters over time (use t axis)

– Digital Effects

• Post-video processing • Digital artists

Video Compression (MPEG)

13

Redundancies • Three basic types of redundancies – Coding Redundancy – Interpixel Redundancy • Spatial-Temporal Redundancy – Temporal Frame Coherence

– Psycho-visual Redundancy

• Image/Video compression – Reduce one or more of these redundancies

Image Compression (Remember JPEG)

f(x,y) - 128 (normalize between –128 to 127)

FDCT

(Forward DCT) on each block

Quantize DCT coefficients via Quantization “Table” C’(u,v) = round(C(u,v)/T(u,v))

T(u,v) f(x,y) Divided into 8x8 blocks

JPEG bitstream

Differential coding DC component

Huffman Encode

0

RLE AC Vector “Zig-zag” Order Coefficients

14

Video Compression • Motion-Jpeg – I(x,y,t) – Each frame (t) is encoded as a JPEG – Reduces spatial redundancies in the frame

• MPEG – Exploit temporal coherence – Image does not change much from frame to frame – If it does change, it is due to some small motion in the image

MPEG • Three Frame Types – I frame • Intra-encoded Frame • Fully encoded JPEG frame

– P frame • Predictive Frame

– B frame • Bi-directional Predictive Frame

15

MPEG • Each frame is divided into 8x8 blocks – 8x8 blocks will be DCT encoded

• We also create 16x16 logical “macro-blocks” – These will be used for motion estimation

8x8 Blocks

Macro-blocks

Predictive Frame • • •

Determine a motion vector for each macro-block in the P-frame The motion vector points to the most “similar” macro-block in a previous I-frame or P-frame Encode the motion vector

• •

Calculate residual error (diff of the macroblocks) Encode residual as 4 8x8 DCT blocks (like JPEG)

– Often referred to as Motion Compensation (MC)

– The residual DCT values are quantized (compression) Encode motion vector (u,v) for each macroblock Encode residual as 4 8x8 DCT blocks I(t-1)

I(t)

NOTE: if background, you get a (0,0) motion vector and no residual [good compression!]

16

MPEG Sequence with P-frames

I

P

P

P

P

B-Frames (Bi-direction predictive frames) • Bi-direction predictive frames • Can use previous or future P or I frames for motion compensation – YES, uses future frame – Not for real-time encoding

• Requires the frames to be transmitted out of order – and/or buffering

I

P

B

P

P

17

Group of Pictures (GOP) • Sequence of I-P-and-B frames form a logical GOP – GOP always starts with an I-frame

• Note that if you lose in the initial I-frame in “transmission” – The whole GOP is corrupted

• Common GOP formats – IPPPP – IBBPBBPBBPBBPBB

I

P

B

P

GOP

P I

P

B

P

P

GOP

MPEG Decoding Input Buffer

MC Decoder

ZZ and DeQuant

IDCT

Forward MC Prev. Frame Store

Adder

Display

Bidirectional MC Future Frame Store

Backward MC

18

Motion Compensation • Provides most of MPEG’s compression. • Relies on temporal coherence. • Finding a good motion vector essentially a search problem. • Evaluating the motion vector can be a bit tricky. – Often trade speed for accuracy

• MC is what makes MPEG asymmetric. – Harder to encode than to decode.

Exhaustive MC Search • Brute force calculation of MC – Using SAD, SAM

• The most obvious and easiest solution. • Encoding time related to size of search window. • Although time consuming, also embarrassingly parallel.

19

MPEG • DCT quantization of I-frames and residual controls frame fidelity • Note that error propagates as we have more and more predictive frames (B-P frames) • Resulting bit-rate is scene dependent! • More MPEG details See: – www.mpeg.org

Useful Tool • Berkeley MPEG encoder on unix machines • mpeg_encode – utility to convert pgm/ppm files to MPEG – usage: • mpeg_encode param_file

– Param file • Describes input files and MPEG format • See class-web page for example

20

Summary Video Processing • Processing

– De-interlacing

• Pre-processing step to remove interlacing effect

– Optical Flow

• Motion Vector Estimation

– Object Tracking

• Temporal Template Matching • Constrained Search Window

• Video Compression – MPEG

• Powerful compression exploiting temporal coherence

– I, P, and B frames

• P and B frames use motion estimation for macroblocks • Encode a residual error

Active Research Areas • Optical Flow

– Computational expensive process – Research into faster techniques – Dense samples

• Tracking

– Tracking is a hot topic – Recognition over temporal frames – Multiple objects

• Especially multiple interacting objects

– New approaches to finding the object – Staying locked on in the face of occlusions

21

Active Research Areas • Video Encoding • Hot topics – Video Codecs – – – – – –

• new techniques, often for non-real-time apps

Constant bit-rate for MPEG Robustness to transmission errors VOD (video on demand) Streaming technology Motion compensation strategies MPEG-4

• Wrapper for all types of data (not just video)

22