Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00
Source Coding
D.L. Jones
1
Outline • • • • •
Speech Coding Speech Recognition Audio Coding Image Coding Video Coding
4/11/00
Source Coding
D.L. Jones
2
Goals for this Lecture • Learn basic principles underlying speech, audio, image, and video coding • Understand why coding is important • Understand some roles of media processing in embedded system design
4/11/00
Source Coding
D.L. Jones
3
Speech Coding • Speech coding is important for – reduced data rate in digital communications, such as cell-phones – reduced memory, storage requirements for digital answering machines, speech databases, etc.
4/11/00
Source Coding
D.L. Jones
4
Speech Processing Parameters • • • • •
Typical sampling rate: 8 kHz Typical quantization: 8-bit mu-law or A-law Base rate: 64k bits per second (64 kbps) Compressed rates range from 2.4 - 32 kbps Modern compression methods based on ADPCM or LPC
4/11/00
Source Coding
D.L. Jones
5
ADPCM • Adaptive Differential Pulse Code Modulation (ADPCM) is an old compression standard that remains the best method for half-rate (32 kbps) compression • ADPCM is a “waveform compression” method that tries to preserve the actual speech signal waveform as much as possible. 4/11/00
Source Coding
D.L. Jones
6
• ADPCM codes the difference signal; that is, d(n) = x(n) - x(n-1) • Since the speech waveform is primarily a low-frequency signal, the difference is usually small, so it requires fewer bits to represent • Adaptive DPCM adjusts quantization step size to track difference amplitude 4/11/00
Source Coding
D.L. Jones
7
• A short adaptive prediction filter also reduces size of difference • ADPCM at half rate (32 kbps) sounds almost indistinguishable from 64 kbps speech
4/11/00
Source Coding
D.L. Jones
8
Linear Prediction Coding • Linear Prediction Coding (LPC) is a fundamentally different, “model” based approach to speech coding • Based on “acoustic tube” model of human speech production • Models short speech segments as either white noise (unvoiced) or an impulse train (voiced) input to an all-pole (IIR) filter 4/11/00
Source Coding
D.L. Jones
9
LPC-10 • Input amplitude, voiced/unvoiced, pitch period, and 10th-order filter coefficients are computed for 20-30 ms blocks • Instead of transmitting speech, send only the filter coefficients and other parameters! • Rerun filter at the receive end to reconstruct speech 4/11/00
Source Coding
D.L. Jones
10
• Produces artificial-sounding but understandable speech reconstructions at rates as low as 2400 bits/sec
4/11/00
Source Coding
D.L. Jones
11
Enhanced LPC Methods • LPC-10 achieves excellent compression, but insufficient quality for most telephony applications • Enhanced LPC methods have been developed with higher rates and performance • LPC-based approaches dominate speech coding for rates at and below 16 kbps 4/11/00
Source Coding
D.L. Jones
12
RELP • Residual Excited Linear Prediction (RELP) retains and sends residual (prediction error) as well • Sending residual back through prediction model reconstructs original waveform (in the absence of quantization) • Rate remains fairly high, since residual requires many bits 4/11/00
Source Coding
D.L. Jones
13
CELP • Code Excited Linear Prediction (CELP) selects an excitation sequence from a “codebook” of possible choices • Transmit code indicating selection, rather than residual • Greatly reduced rate, only modest loss in performance 4/11/00
Source Coding
D.L. Jones
14
• There are many flavors of CELP; the better lower-rate methods based on this concept • Cell-phones tend to use rates from about 4.8 to 9.6 kbps • Quality noticeably inferior to telephone, but deemed acceptable • Allows 3-6 times as many users in a cell! 4/11/00
Source Coding
D.L. Jones
15
Hardware note ... • Speech coding/decoding is the primary reason for DSP uP in digital cell-phones! • DSP uP is nearly ideal for speech coding algorithms (ASIC wouldn’t be better) • Since it’s there anyway, DSP uP also used for many other functions
4/11/00
Source Coding
D.L. Jones
16
Speech Recognition • Speech recognition is expected to become a very important component of many future embedded systems • Convenient, natural user interface for – very small embedded systems (e.g., wristwatch cell-phone, Palm-Pilot X) – non-critical systems (e.g., car radio, windshield wipers) 4/11/00
Source Coding
D.L. Jones
17
Speech Recognition Methods • Modern speech recognition is based on short-time spectral analysis • Spectral estimates usually constructed from linear prediction followed by further processing • Hidden Markov Models (HMMs) perform statistical comparison with database of words and language models 4/11/00
Source Coding
D.L. Jones
18
System Requirements • Memory and computational requirements: – Small vocabulary, isolated word recognition • a few MIPS and kBs
– Large vocabulary, continuous speech • 100s of MIPS, 100s of MBs
4/11/00
Source Coding
D.L. Jones
19
Audio Coding • Quality expectations considerably higher than with speech • 16-bit, 44.1 kHz stereo is CD standard • Modern audio coding methods (e.g., mp3) based on perceptual coding tricks – Exploit limitations of human hearing to reduce rate while minimizing audible artifacts 4/11/00
Source Coding
D.L. Jones
20
• Split signal into different frequency bands according to sensitivities of human hearing • Exploit “masking” to remove data from inaudible bands due to loud neighbors • Shape quantization noise to lie in masked regions • Obtain near-CD quality at 128-256 kbps 4/11/00
Source Coding
D.L. Jones
21
Image Coding • Many emerging embedded system applications – Digital cameras – Security (e.g., fingerprint ID) – Medical record storage
• Image usually acquired with a CCD imaging sensor 4/11/00
Source Coding
D.L. Jones
22
Requirements • Typical image ~ 512x512 pixels, 3 colors each at 8 bits • Or binary black-and-white • Two types of compression – Lossless: maximum compression ratios of 2-3 – Lossy: high quality with compression ratios of 10-30 4/11/00
Source Coding
D.L. Jones
23
Image Compression Standards • Binary images: – JBIG/FAX standards – Primarily based on run-length coding (i.e., number of black or white pixels in succession)
• 8-bit images: – JPEG standard: DCT-based – Emerging standards wavelet based (EZW, SPIHT, JPEG-2000) 4/11/00
Source Coding
D.L. Jones
24
Principles of JPEG • Image segmented into 8x8 blocks of pixels • 2-D Discrete Cosine Transform (DCT) computed of each block • Most of these frequency components are typically very small and can be coarsely quantized or discarded • Quantized data is entropy-coded 4/11/00
Source Coding
D.L. Jones
25
JPEG Characteristics • At compression rates of 1 bit per pixel, quality loss is usually small • Below about 0.5 bpp, blocking artifacts begin to appear; much below this is usually unacceptable
4/11/00
Source Coding
D.L. Jones
26
Emerging Methods • New methods based on wavelets are emerging • Frequency decomposition by successive subband filtering • Small coefficients discarded • Artifacts generally less objectionable
4/11/00
Source Coding
D.L. Jones
27
• Exploitation of tree structure and dependencies yields further compression • JPEG-2000 standard will be based on these methods
4/11/00
Source Coding
D.L. Jones
28
Video Coding • Embedded system examples: – – – – –
4/11/00
HDTV Satellite TV Set-top boxes Security systems Multimedia devices
Source Coding
D.L. Jones
29
Motion-Based Coding Methods • Modern video coding methods exploit frame-to-frame similarities to further compress video • Similar to JPEG, except that motioncompensated difference frames are coded with DCT • Motion vectors encode change in location of blocks 4/11/00
Source Coding
D.L. Jones
30
Video Coding Standards • MPEG-2 and MPEG-4 are leading standards for high (television) quality video coding • H.263 is primary standard for low-rate video coding (video-phones) • Compression ratios of 30-50 with good quality are usually obtained 4/11/00
Source Coding
D.L. Jones
31
Summary • Source coding essential to reduce memory requirements, bandwidth of multimedia data • Complex DSP algorithms obtain great data reductions with little loss in quality • Coding algorithms have characteristics common to other DSP computations • Source coding likely to play increasingly important role in many embedded systems 4/11/00
Source Coding
D.L. Jones
32