Speech, Audio, Image, and Video Coding

Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00 Source Coding D.L. Jones 1 Outline • • • • • Speech Coding S...

Author: Janis Paul

4 downloads 2 Views 30KB Size

Report

Download PDF

Recommend Documents

Information Theory and Coding Image, Video and Audio Compression

7Lesson 7: Video, Audio and Image Techniques Objectives

SPEAKER-DEPENDENT SPEECH CODING

Image Compression. CmpE 464 Image Processing. Image Compression: Coding redundancy. Image Compression. Image Compression: Coding redundancy

Image and video compression

Image and Video Processing

Image and Video Fundamentals

Media - Video Coding: Standards

AVC Video Coding Standard

The Scalable Video Coding

Overview: Video Coding Standards

AVC Video Coding Standard

WG11 CODING OF MOVING PICTURES AND AUDIO

Vector Quantization in Speech Coding

Perceptual Coding of Digital Audio

AUDIO AND VIDEO TAG SUPPORT

UNIVERSAL TRELLIS CODING OF SPEECH

Game Audio : Coding vs. Aesthetics

MPEG-4 natural audio coding

Lecture 1: Image Coding and Connected Components

Speech, Audio, Image, and Video Coding Douglas L. Jones ECE 497 Spring 2000 4/11/00

Source Coding

D.L. Jones

1

Outline • • • • •

Speech Coding Speech Recognition Audio Coding Image Coding Video Coding

4/11/00

Source Coding

D.L. Jones

2

Goals for this Lecture • Learn basic principles underlying speech, audio, image, and video coding • Understand why coding is important • Understand some roles of media processing in embedded system design

4/11/00

Source Coding

D.L. Jones

3

Speech Coding • Speech coding is important for – reduced data rate in digital communications, such as cell-phones – reduced memory, storage requirements for digital answering machines, speech databases, etc.

4/11/00

Source Coding

D.L. Jones

4

Speech Processing Parameters • • • • •

Typical sampling rate: 8 kHz Typical quantization: 8-bit mu-law or A-law Base rate: 64k bits per second (64 kbps) Compressed rates range from 2.4 - 32 kbps Modern compression methods based on ADPCM or LPC

4/11/00

Source Coding

D.L. Jones

5

ADPCM • Adaptive Differential Pulse Code Modulation (ADPCM) is an old compression standard that remains the best method for half-rate (32 kbps) compression • ADPCM is a “waveform compression” method that tries to preserve the actual speech signal waveform as much as possible. 4/11/00

Source Coding

D.L. Jones

6

• ADPCM codes the difference signal; that is, d(n) = x(n) - x(n-1) • Since the speech waveform is primarily a low-frequency signal, the difference is usually small, so it requires fewer bits to represent • Adaptive DPCM adjusts quantization step size to track difference amplitude 4/11/00

Source Coding

D.L. Jones

7

• A short adaptive prediction filter also reduces size of difference • ADPCM at half rate (32 kbps) sounds almost indistinguishable from 64 kbps speech

4/11/00

Source Coding

D.L. Jones

8

Linear Prediction Coding • Linear Prediction Coding (LPC) is a fundamentally different, “model” based approach to speech coding • Based on “acoustic tube” model of human speech production • Models short speech segments as either white noise (unvoiced) or an impulse train (voiced) input to an all-pole (IIR) filter 4/11/00

Source Coding

D.L. Jones

9

LPC-10 • Input amplitude, voiced/unvoiced, pitch period, and 10th-order filter coefficients are computed for 20-30 ms blocks • Instead of transmitting speech, send only the filter coefficients and other parameters! • Rerun filter at the receive end to reconstruct speech 4/11/00

Source Coding

D.L. Jones

10

• Produces artificial-sounding but understandable speech reconstructions at rates as low as 2400 bits/sec

4/11/00

Source Coding

D.L. Jones

11

Enhanced LPC Methods • LPC-10 achieves excellent compression, but insufficient quality for most telephony applications • Enhanced LPC methods have been developed with higher rates and performance • LPC-based approaches dominate speech coding for rates at and below 16 kbps 4/11/00

Source Coding

D.L. Jones

12

RELP • Residual Excited Linear Prediction (RELP) retains and sends residual (prediction error) as well • Sending residual back through prediction model reconstructs original waveform (in the absence of quantization) • Rate remains fairly high, since residual requires many bits 4/11/00

Source Coding

D.L. Jones

13

CELP • Code Excited Linear Prediction (CELP) selects an excitation sequence from a “codebook” of possible choices • Transmit code indicating selection, rather than residual • Greatly reduced rate, only modest loss in performance 4/11/00

Source Coding

D.L. Jones

14

• There are many flavors of CELP; the better lower-rate methods based on this concept • Cell-phones tend to use rates from about 4.8 to 9.6 kbps • Quality noticeably inferior to telephone, but deemed acceptable • Allows 3-6 times as many users in a cell! 4/11/00

Source Coding

D.L. Jones

15

Hardware note ... • Speech coding/decoding is the primary reason for DSP uP in digital cell-phones! • DSP uP is nearly ideal for speech coding algorithms (ASIC wouldn’t be better) • Since it’s there anyway, DSP uP also used for many other functions

4/11/00

Source Coding

D.L. Jones

16

Speech Recognition • Speech recognition is expected to become a very important component of many future embedded systems • Convenient, natural user interface for – very small embedded systems (e.g., wristwatch cell-phone, Palm-Pilot X) – non-critical systems (e.g., car radio, windshield wipers) 4/11/00

Source Coding

D.L. Jones

17

Speech Recognition Methods • Modern speech recognition is based on short-time spectral analysis • Spectral estimates usually constructed from linear prediction followed by further processing • Hidden Markov Models (HMMs) perform statistical comparison with database of words and language models 4/11/00

Source Coding

D.L. Jones

18

System Requirements • Memory and computational requirements: – Small vocabulary, isolated word recognition • a few MIPS and kBs

– Large vocabulary, continuous speech • 100s of MIPS, 100s of MBs

4/11/00

Source Coding

D.L. Jones

19

Audio Coding • Quality expectations considerably higher than with speech • 16-bit, 44.1 kHz stereo is CD standard • Modern audio coding methods (e.g., mp3) based on perceptual coding tricks – Exploit limitations of human hearing to reduce rate while minimizing audible artifacts 4/11/00

Source Coding

D.L. Jones

20

• Split signal into different frequency bands according to sensitivities of human hearing • Exploit “masking” to remove data from inaudible bands due to loud neighbors • Shape quantization noise to lie in masked regions • Obtain near-CD quality at 128-256 kbps 4/11/00

Source Coding

D.L. Jones

21

Image Coding • Many emerging embedded system applications – Digital cameras – Security (e.g., fingerprint ID) – Medical record storage

• Image usually acquired with a CCD imaging sensor 4/11/00

Source Coding

D.L. Jones

22

Requirements • Typical image ~ 512x512 pixels, 3 colors each at 8 bits • Or binary black-and-white • Two types of compression – Lossless: maximum compression ratios of 2-3 – Lossy: high quality with compression ratios of 10-30 4/11/00

Source Coding

D.L. Jones

23

Image Compression Standards • Binary images: – JBIG/FAX standards – Primarily based on run-length coding (i.e., number of black or white pixels in succession)

• 8-bit images: – JPEG standard: DCT-based – Emerging standards wavelet based (EZW, SPIHT, JPEG-2000) 4/11/00

Source Coding

D.L. Jones

24

Principles of JPEG • Image segmented into 8x8 blocks of pixels • 2-D Discrete Cosine Transform (DCT) computed of each block • Most of these frequency components are typically very small and can be coarsely quantized or discarded • Quantized data is entropy-coded 4/11/00

Source Coding

D.L. Jones

25

JPEG Characteristics • At compression rates of 1 bit per pixel, quality loss is usually small • Below about 0.5 bpp, blocking artifacts begin to appear; much below this is usually unacceptable

4/11/00

Source Coding

D.L. Jones

26

Emerging Methods • New methods based on wavelets are emerging • Frequency decomposition by successive subband filtering • Small coefficients discarded • Artifacts generally less objectionable

4/11/00

Source Coding

D.L. Jones

27

• Exploitation of tree structure and dependencies yields further compression • JPEG-2000 standard will be based on these methods

4/11/00

Source Coding

D.L. Jones

28

Video Coding • Embedded system examples: – – – – –

4/11/00

HDTV Satellite TV Set-top boxes Security systems Multimedia devices

Source Coding

D.L. Jones

29

Motion-Based Coding Methods • Modern video coding methods exploit frame-to-frame similarities to further compress video • Similar to JPEG, except that motioncompensated difference frames are coded with DCT • Motion vectors encode change in location of blocks 4/11/00

Source Coding

D.L. Jones

30

Video Coding Standards • MPEG-2 and MPEG-4 are leading standards for high (television) quality video coding • H.263 is primary standard for low-rate video coding (video-phones) • Compression ratios of 30-50 with good quality are usually obtained 4/11/00

Source Coding

D.L. Jones

31

Summary • Source coding essential to reduce memory requirements, bandwidth of multimedia data • Complex DSP algorithms obtain great data reductions with little loss in quality • Coding algorithms have characteristics common to other DSP computations • Source coding likely to play increasingly important role in many embedded systems 4/11/00

Source Coding

D.L. Jones

32