Video Processing and Compression
Video
(A Sequence of Temporal Images)
I(x,y,t)
t
1
Video • Temporal Sequence of Images – I(x,y,t) – t = time
• Terminology – Frame
• an image in the sequence I(x,y,t=c) • frames per second (fps)
– numbers of frames displayed per second
– Field
• Frames are often composed of two fields – Even and odd
• This procedure is called interlacing
Interlacing even field
odd field
0
time
0.5
say it takes time, t=1, to capture a frame odd field captured in t/2 time even field captured in t/2 time
1
t
frame composite
odd and even field This is the “interlace” effect
2
Interlacing • Legacy from analogue video
– Fields used due to bandwidth limitations of analogue transmission – Trade off in spatial resolution for temporal time
• 30 frames per second
– means 60 fields per second – some monitors transmit video frame in interlaced format
Terminology – Progressive Scan Video Cameras
• Can capture a whole frame in time t • Although format can still be interlaced
– You will not get the “interlace” effect – that is, odd and even fields captured at the same time
– Progressive Scan
• Common on new models of Camcorders
– Topics in this lecture (other than de-interlacing) will assume no interlacing in the frame
3
Terminology • Critical Fusion Frequency – The temporal frame rate at which your eye does not see “flicker” in an video sequence – Generally considered to be around 24 fps
• Temporal Aliasing – Effect from low frame rate • Jerky Image • “Flicker”
Some Simple Video Processing • De-interlacing • Optical Flow/Motion Estimation • Object Tracking
4
De-interlacing • Frame is interlaced • De-interlacing
(suffers from interlace effect)
– Create two new frames (same size as original) – One using only even scan lines • Interpolate “odd” scan lines
– One using only odd scan lines
• Interpolate “even” scan lines
• Produces 60 frames per second
– De-interlaced frames have lower vertical resolution
• Common Pre-processing Technique
De-interlacing Example Even “Fields” Interpolate missing scanlines
Frame has interlaced effect Odd “Fields” Interpolate missing scanlines
5
Optical Flow • Estimating the motion field from image sequences • Idea: Change in illumination over time is related to motion in the scene such that: I(x,y,t-1) = I(x+u, y+v, t) • Assumes
– a constantly illuminated scene – fairly simple scene motion and object properties
3D Motion -> 2D Motion Field Object/ M(t) Scene
M(t+dt) dM
u = dx/dt v = dy/dt
Set of (u,v) vectors is called “optical flow”
V = dM/dt
(dx,dy)
Image
6
Motion Field
I(x,y,t)
I(x,y,t-1) (u,v)
I(x,y,t-1) = I(x+u, y+v, t)
[u,v] represent the projection of the 3D motion.
Optical Flow Example
I(x,y,t-1)
I(x,y,t)
“Needle representation” of the motion vectors Often shown at a “sparse” resolution.
7
Matching Approach to Optical Flow (Motion Estimation) • For each (x,y) in I(t-1) – Construct a template about (x,y)
• Search a neighborhood size W, about (x,y) in I(t) • Find best match – Using Correlation, SAD, SAM, etc . .
• Best match is at location x’, y’
– record motion vector
• V(x,y) = ( x’ – x, y’ – y’ )
• loop
Match-based Motion Estimation I(x,y,t-1)
Create a template around x,y
I(x,y,t)
Search a window W, about x,y Perform template matching Find best match (x’, y’) Motion Vector
u = x’ - x v = y’ - y
8
Aperture problem Actual Motion
Observed Image
I(t)
I(t)
I(t-1)
I(t-1)
• The motion perceived is only the normal component • Parallel direction cannot be estimated
Optical Flow Uses • Motion vectors gives insight to 3D scene – Used to help segment moving objects – Determine direction of objects – Other high-level analysis (take computer vision)
• Describes camera motion – Often called camera “ego” motion
De-Zoom
Zoom
Translate (or rotation)
9
Optical Flow • Many of the assumptions of optical flow are invalid – Constant Intensity
• changing illumination is perceived as motion • Objects do not have constant reflectance
– Slow moving objects
• we often have small objects with lots of occlusion/deocclusion
– Sufficient aperture
• As demonstrated we may encounter the aperture problem
• Even with invalid assumption, still gives reasonable results • Note: magic numbers for template size and search window
Object Tracking
1. Locate object 2. Track this object over time Generally assume a fixed camera
10
Two phases • “Lock on” – Find the object to track – Use recognition techniques – User selects initial starting point
• Tracking based on initial template – Construct a template around the object – Find the template in the next image – Perform template matching within a window
Tracking It
It+1
T
It+2
T
T
•Feature-based tracking –Construct a template around the object –Fixes the size of the template! –Find the template in the next image –Perform template matching within a window about current center –Update the template using the best match in advancing frames –this allows the template to adapt over time
11
Tracking Issues • Occlusion
– Object moves/becomes occluded
• Need to threshold matches to determine when object has been “lost” • Requires a new “lock on” phase
• Multiple Objects
– what happens when two similar objects “cross-over”
• Object deforms over time
– Object exhibits substantial change • Lighting effect, specularity, pose
• Tracking is a hard problem!
Tracking • Presented simple tracking skeleton
– Often want tracking to be performed in realtime – This is a fast and simple brute-force approach
• Extensions
– Speed up using prediction of motion based on previous estimations • See Kalman Filters
– Use better models than just template based • Training set, template set • Deformable templates • Affine templates
12
Other Video Processing • Digital Re-mastering
– Film converted to digital format – Allows us the opportunity to process the pixels
• Some examples
– Remove scratches in film
• Median filters • Median filters over time (use t axis)
– Digital Effects
• Post-video processing • Digital artists
Video Compression (MPEG)
13
Redundancies • Three basic types of redundancies – Coding Redundancy – Interpixel Redundancy • Spatial-Temporal Redundancy – Temporal Frame Coherence
– Psycho-visual Redundancy
• Image/Video compression – Reduce one or more of these redundancies
Image Compression (Remember JPEG)
f(x,y) - 128 (normalize between –128 to 127)
FDCT
(Forward DCT) on each block
Quantize DCT coefficients via Quantization “Table” C’(u,v) = round(C(u,v)/T(u,v))
T(u,v) f(x,y) Divided into 8x8 blocks
JPEG bitstream
Differential coding DC component
Huffman Encode
0
RLE AC Vector “Zig-zag” Order Coefficients
14
Video Compression • Motion-Jpeg – I(x,y,t) – Each frame (t) is encoded as a JPEG – Reduces spatial redundancies in the frame
• MPEG – Exploit temporal coherence – Image does not change much from frame to frame – If it does change, it is due to some small motion in the image
MPEG • Three Frame Types – I frame • Intra-encoded Frame • Fully encoded JPEG frame
– P frame • Predictive Frame
– B frame • Bi-directional Predictive Frame
15
MPEG • Each frame is divided into 8x8 blocks – 8x8 blocks will be DCT encoded
• We also create 16x16 logical “macro-blocks” – These will be used for motion estimation
8x8 Blocks
Macro-blocks
Predictive Frame • • •
Determine a motion vector for each macro-block in the P-frame The motion vector points to the most “similar” macro-block in a previous I-frame or P-frame Encode the motion vector
• •
Calculate residual error (diff of the macroblocks) Encode residual as 4 8x8 DCT blocks (like JPEG)
– Often referred to as Motion Compensation (MC)
– The residual DCT values are quantized (compression) Encode motion vector (u,v) for each macroblock Encode residual as 4 8x8 DCT blocks I(t-1)
I(t)
NOTE: if background, you get a (0,0) motion vector and no residual [good compression!]
16
MPEG Sequence with P-frames
I
P
P
P
P
B-Frames (Bi-direction predictive frames) • Bi-direction predictive frames • Can use previous or future P or I frames for motion compensation – YES, uses future frame – Not for real-time encoding
• Requires the frames to be transmitted out of order – and/or buffering
I
P
B
P
P
17
Group of Pictures (GOP) • Sequence of I-P-and-B frames form a logical GOP – GOP always starts with an I-frame
• Note that if you lose in the initial I-frame in “transmission” – The whole GOP is corrupted
• Common GOP formats – IPPPP – IBBPBBPBBPBBPBB
I
P
B
P
GOP
P I
P
B
P
P
GOP
MPEG Decoding Input Buffer
MC Decoder
ZZ and DeQuant
IDCT
Forward MC Prev. Frame Store
Adder
Display
Bidirectional MC Future Frame Store
Backward MC
18
Motion Compensation • Provides most of MPEG’s compression. • Relies on temporal coherence. • Finding a good motion vector essentially a search problem. • Evaluating the motion vector can be a bit tricky. – Often trade speed for accuracy
• MC is what makes MPEG asymmetric. – Harder to encode than to decode.
Exhaustive MC Search • Brute force calculation of MC – Using SAD, SAM
• The most obvious and easiest solution. • Encoding time related to size of search window. • Although time consuming, also embarrassingly parallel.
19
MPEG • DCT quantization of I-frames and residual controls frame fidelity • Note that error propagates as we have more and more predictive frames (B-P frames) • Resulting bit-rate is scene dependent! • More MPEG details See: – www.mpeg.org
Useful Tool • Berkeley MPEG encoder on unix machines • mpeg_encode – utility to convert pgm/ppm files to MPEG – usage: • mpeg_encode param_file
– Param file • Describes input files and MPEG format • See class-web page for example
20
Summary Video Processing • Processing
– De-interlacing
• Pre-processing step to remove interlacing effect
– Optical Flow
• Motion Vector Estimation
– Object Tracking
• Temporal Template Matching • Constrained Search Window
• Video Compression – MPEG
• Powerful compression exploiting temporal coherence
– I, P, and B frames
• P and B frames use motion estimation for macroblocks • Encode a residual error
Active Research Areas • Optical Flow
– Computational expensive process – Research into faster techniques – Dense samples
• Tracking
– Tracking is a hot topic – Recognition over temporal frames – Multiple objects
• Especially multiple interacting objects
– New approaches to finding the object – Staying locked on in the face of occlusions
21
Active Research Areas • Video Encoding • Hot topics – Video Codecs – – – – – –
• new techniques, often for non-real-time apps
Constant bit-rate for MPEG Robustness to transmission errors VOD (video on demand) Streaming technology Motion compensation strategies MPEG-4
• Wrapper for all types of data (not just video)
22