Speech, Audio, Image, and Video Coding

Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00 Source Coding D.L. Jones 1 Outline • • • • • Speech Coding S...
Author: Janis Paul
4 downloads 2 Views 30KB Size
Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00

Source Coding

D.L. Jones

1

Outline • • • • •

Speech Coding Speech Recognition Audio Coding Image Coding Video Coding

4/11/00

Source Coding

D.L. Jones

2

Goals for this Lecture • Learn basic principles underlying speech, audio, image, and video coding • Understand why coding is important • Understand some roles of media processing in embedded system design

4/11/00

Source Coding

D.L. Jones

3

Speech Coding • Speech coding is important for – reduced data rate in digital communications, such as cell-phones – reduced memory, storage requirements for digital answering machines, speech databases, etc.

4/11/00

Source Coding

D.L. Jones

4

Speech Processing Parameters • • • • •

Typical sampling rate: 8 kHz Typical quantization: 8-bit mu-law or A-law Base rate: 64k bits per second (64 kbps) Compressed rates range from 2.4 - 32 kbps Modern compression methods based on ADPCM or LPC

4/11/00

Source Coding

D.L. Jones

5

ADPCM • Adaptive Differential Pulse Code Modulation (ADPCM) is an old compression standard that remains the best method for half-rate (32 kbps) compression • ADPCM is a “waveform compression” method that tries to preserve the actual speech signal waveform as much as possible. 4/11/00

Source Coding

D.L. Jones

6

• ADPCM codes the difference signal; that is, d(n) = x(n) - x(n-1) • Since the speech waveform is primarily a low-frequency signal, the difference is usually small, so it requires fewer bits to represent • Adaptive DPCM adjusts quantization step size to track difference amplitude 4/11/00

Source Coding

D.L. Jones

7

• A short adaptive prediction filter also reduces size of difference • ADPCM at half rate (32 kbps) sounds almost indistinguishable from 64 kbps speech

4/11/00

Source Coding

D.L. Jones

8

Linear Prediction Coding • Linear Prediction Coding (LPC) is a fundamentally different, “model” based approach to speech coding • Based on “acoustic tube” model of human speech production • Models short speech segments as either white noise (unvoiced) or an impulse train (voiced) input to an all-pole (IIR) filter 4/11/00

Source Coding

D.L. Jones

9

LPC-10 • Input amplitude, voiced/unvoiced, pitch period, and 10th-order filter coefficients are computed for 20-30 ms blocks • Instead of transmitting speech, send only the filter coefficients and other parameters! • Rerun filter at the receive end to reconstruct speech 4/11/00

Source Coding

D.L. Jones

10

• Produces artificial-sounding but understandable speech reconstructions at rates as low as 2400 bits/sec

4/11/00

Source Coding

D.L. Jones

11

Enhanced LPC Methods • LPC-10 achieves excellent compression, but insufficient quality for most telephony applications • Enhanced LPC methods have been developed with higher rates and performance • LPC-based approaches dominate speech coding for rates at and below 16 kbps 4/11/00

Source Coding

D.L. Jones

12

RELP • Residual Excited Linear Prediction (RELP) retains and sends residual (prediction error) as well • Sending residual back through prediction model reconstructs original waveform (in the absence of quantization) • Rate remains fairly high, since residual requires many bits 4/11/00

Source Coding

D.L. Jones

13

CELP • Code Excited Linear Prediction (CELP) selects an excitation sequence from a “codebook” of possible choices • Transmit code indicating selection, rather than residual • Greatly reduced rate, only modest loss in performance 4/11/00

Source Coding

D.L. Jones

14

• There are many flavors of CELP; the better lower-rate methods based on this concept • Cell-phones tend to use rates from about 4.8 to 9.6 kbps • Quality noticeably inferior to telephone, but deemed acceptable • Allows 3-6 times as many users in a cell! 4/11/00

Source Coding

D.L. Jones

15

Hardware note ... • Speech coding/decoding is the primary reason for DSP uP in digital cell-phones! • DSP uP is nearly ideal for speech coding algorithms (ASIC wouldn’t be better) • Since it’s there anyway, DSP uP also used for many other functions

4/11/00

Source Coding

D.L. Jones

16

Speech Recognition • Speech recognition is expected to become a very important component of many future embedded systems • Convenient, natural user interface for – very small embedded systems (e.g., wristwatch cell-phone, Palm-Pilot X) – non-critical systems (e.g., car radio, windshield wipers) 4/11/00

Source Coding

D.L. Jones

17

Speech Recognition Methods • Modern speech recognition is based on short-time spectral analysis • Spectral estimates usually constructed from linear prediction followed by further processing • Hidden Markov Models (HMMs) perform statistical comparison with database of words and language models 4/11/00

Source Coding

D.L. Jones

18

System Requirements • Memory and computational requirements: – Small vocabulary, isolated word recognition • a few MIPS and kBs

– Large vocabulary, continuous speech • 100s of MIPS, 100s of MBs

4/11/00

Source Coding

D.L. Jones

19

Audio Coding • Quality expectations considerably higher than with speech • 16-bit, 44.1 kHz stereo is CD standard • Modern audio coding methods (e.g., mp3) based on perceptual coding tricks – Exploit limitations of human hearing to reduce rate while minimizing audible artifacts 4/11/00

Source Coding

D.L. Jones

20

• Split signal into different frequency bands according to sensitivities of human hearing • Exploit “masking” to remove data from inaudible bands due to loud neighbors • Shape quantization noise to lie in masked regions • Obtain near-CD quality at 128-256 kbps 4/11/00

Source Coding

D.L. Jones

21

Image Coding • Many emerging embedded system applications – Digital cameras – Security (e.g., fingerprint ID) – Medical record storage

• Image usually acquired with a CCD imaging sensor 4/11/00

Source Coding

D.L. Jones

22

Requirements • Typical image ~ 512x512 pixels, 3 colors each at 8 bits • Or binary black-and-white • Two types of compression – Lossless: maximum compression ratios of 2-3 – Lossy: high quality with compression ratios of 10-30 4/11/00

Source Coding

D.L. Jones

23

Image Compression Standards • Binary images: – JBIG/FAX standards – Primarily based on run-length coding (i.e., number of black or white pixels in succession)

• 8-bit images: – JPEG standard: DCT-based – Emerging standards wavelet based (EZW, SPIHT, JPEG-2000) 4/11/00

Source Coding

D.L. Jones

24

Principles of JPEG • Image segmented into 8x8 blocks of pixels • 2-D Discrete Cosine Transform (DCT) computed of each block • Most of these frequency components are typically very small and can be coarsely quantized or discarded • Quantized data is entropy-coded 4/11/00

Source Coding

D.L. Jones

25

JPEG Characteristics • At compression rates of 1 bit per pixel, quality loss is usually small • Below about 0.5 bpp, blocking artifacts begin to appear; much below this is usually unacceptable

4/11/00

Source Coding

D.L. Jones

26

Emerging Methods • New methods based on wavelets are emerging • Frequency decomposition by successive subband filtering • Small coefficients discarded • Artifacts generally less objectionable

4/11/00

Source Coding

D.L. Jones

27

• Exploitation of tree structure and dependencies yields further compression • JPEG-2000 standard will be based on these methods

4/11/00

Source Coding

D.L. Jones

28

Video Coding • Embedded system examples: – – – – –

4/11/00

HDTV Satellite TV Set-top boxes Security systems Multimedia devices

Source Coding

D.L. Jones

29

Motion-Based Coding Methods • Modern video coding methods exploit frame-to-frame similarities to further compress video • Similar to JPEG, except that motioncompensated difference frames are coded with DCT • Motion vectors encode change in location of blocks 4/11/00

Source Coding

D.L. Jones

30

Video Coding Standards • MPEG-2 and MPEG-4 are leading standards for high (television) quality video coding • H.263 is primary standard for low-rate video coding (video-phones) • Compression ratios of 30-50 with good quality are usually obtained 4/11/00

Source Coding

D.L. Jones

31

Summary • Source coding essential to reduce memory requirements, bandwidth of multimedia data • Complex DSP algorithms obtain great data reductions with little loss in quality • Coding algorithms have characteristics common to other DSP computations • Source coding likely to play increasingly important role in many embedded systems 4/11/00

Source Coding

D.L. Jones

32