Source coding for ISDB-T

ISDB-T seminar in Brazil Seminar #6 Source coding for ISDB-T 30th March, 2005 Digital Broadcasting Expert Group (DiBEG) Yasuo TAKAHSHI(Toshiba) Yosh...
Author: John Jefferson
1 downloads 0 Views 272KB Size
ISDB-T seminar in Brazil

Seminar #6

Source coding for ISDB-T 30th March, 2005 Digital Broadcasting Expert Group (DiBEG) Yasuo TAKAHSHI(Toshiba) Yoshiharu DEWA(SONY)

Contents 1. Outline of source coding 2. Video coding 2.1 Outline of video coding 2.2 Video compression and coding 2.3 Display format 3. Transmission coding 3.1 Audio input format  3.2 Main parameters of audio coding  (reference) Outline of AAC 4. H.264 coding (note 1) (reference) Answer for CPqD question (note2) (note 1) text for this section is prepared as word format separately (note 2) this document is prepared separately

1. Outline of source coding

Layered Structure for Digital Broadcasting Video Encode

Audio Encode

Multiplex

Data Encode

Application Layer   Commonality Interoperability ↓ MPEG-2 Standard

MPEG-2 MPEG-2 FEC/Interleaving

Modulation

Theme of seminar #6

Transmission Layer Optimized for each transmission system   ↓ Satellite/Terrestrial/Cable

Satellite: Single Carrier-8 PSK/QPSK Terrestrial(TV/Audio): OFDM-QAM/(DQPSK) Cable: Single Carrier-64QAM Satellite Audio: CDM

Digital Broadcasting Standard in Japan Transmission coding Satellite TV (STD-B20)

Video/Audio Coding (STD-B32) Data Broadcasting (STD-B24) Theme of seminar #6

Multi-plex (STD-B32,-B10)

Source coding

RMP (STD-B25)

Source coding and MUX systems are common for each system

Terrestrial TV (STD-B31)

Receiver Satellite/ Terrestrial TV (STD-B21)

Terrestrial Audio (STD-B29)

Terrestrial Audio (STD-B30)

Satellite Audio (STD-B41)

Satellite Audio (STD-B42)

Cable TV (JCL SPC-001)

Cable TV (JCTEA STD-004)

Transmission systems are different

Note: Cable transmission system standards are defined at another consortium

2. Video source coding

2.1 Outline of video coding 2.2 Video compression and coding 2.3 Display format

Outline of video coding for DTTB In Japan, HDTV had been developed since 1980’s, and analog HDTV trial service, named MUSE, has already started. Because of this situation, video coding system for DTV should support many video format and has capability of video format change according to display aspect ratio. because of above reasons, specifications of video coding should have following features (1) Video coding system; adopt most popular system MPEG2 (2) Support many types of video format; 480i/480p/1080i/720p (3) Specify the relationship of video source and display aspect ratio Video coding system is specified in ARIB STD-B32 Part 1(note) (note) Video coding system for LDTV is specified in ARIB STD-B24 separately

2.1 Outline of video coding (1)Compression system; MPEG2(MP@HL) (2) Video format No. of line 1080i 720p 480p 480i

No. of pixel 1920*1080 1440*720 720*480 720*480

quality HDTV (interlace) HDTV (progressive) SDTV (progressive) SDTV (interlace)

D terminal: D1:480i, D2:480p, D3:1080i, D4:720p

Video signal parameters Number of lines Number of active lines Scanning system Frame frequency Field frequency Aspect ratio Line frequency Sampling frequency Numbers of samples per line

fH

Luminance signal Color-difference signals

Luminance signal Color-difference signals Number of Luminance signal samples per Color-difference active line signals Filter characteristics Line synchronizing signal Field synchronizing signal

525 483 Interlaced 30/1.001 Hz 60/1.001 Hz 16 : 9 or 4 : 3 15.750/ 1.001kHz 13.5 MHz

525 483 Progressive 60/1.001 Hz

6.75 MHz

13.5 MHz

858

858

429

429

825

1100

720

720

1280

1920

360

360

640

960

16 : 9 31.500/ 1.001 kHz 27 MHz

See Fig. 1 See Fig. 2 See Fig. 4 See Fig. 7 See Fig. 8

750 720 Progressive 60/1.001 Hz

1125 1080 Interlaced 30/1.001 Hz 60/1.001 Hz 16:9 16 : 9 45.000/ 33.750/ 1.001 kHz 1.001 kHz 74.25/1.001MHz 74.25/1.001MHz 37.125/ 37.125/ 1.001MHz 1.001MHz 1650 2200

See Fig. 3 See Fig. 5 See Fig. 6 See Fig. 9 See Fig. 10

(ARIB STD-B32 Part 1, chapter 2.4 )

Actual video bit rate

No. of line

profile

actual bit rate

1080i

MP@HL

BS:12-24Mbps DTTB:8-20Mbps

720p

MP@H-14

480p

MP@H-14

480i

720*480

1.5-15Mbps

240p

720*480

0.2-4Mbps

BS:4-24Mbps DTTB:4-20Mbps

2.2 Video compression and Coding Video compression, coding block diagram Video input

Prediction error signal DCT

Coded data Variable length coding

Quantization

Inverse quantization

Inverse DCT

Forward Bidirectional

Backward

Prediction memory 1

Prediction memory 2

Intra Motion compensation

Coded data

Motion vector Motion vector detection

Variable length coding

(ARIB STD-B32 Part 1, chapter 4.1 )

principle of DCT Coefficient of frequency  DCT block(aij) conversion(Amn) two dimensional frequency conversion

Amn=∑xij cos(2i+1)mπ/N cos(2j+1)nπ/N

480 8*8 pixel

TV picture 720

Discrete Cosine Transform (DCT) two-dimensional DCT coefficients F (u, v) for N × N pixels f (x, y) are defined as follows

F (u , v) =

2C (u )C (v) N −1 N −1  (2 x + 1)uπ   (2 y + 1)vπ  f ( x , y ) cos   cos   ∑ x=0 ∑ y =0 N 2 N 2 N      1 2  C (u ), C (v ) =   1 

where

for u , v = 0 for u , v ≠ 0

Inverse Discrete Cosine Transform (DCT)

2 f ( x, y ) = N

∑ ∑

( ( 2 x + 1)uπ 2 y + 1)vπ cos C (u )C (v) F (u , v) cos

N −1

N −1

u =0

v=0

2N

2N

(ARIB STD-B32 Part 1,chapter 4.1 )

motion vector compensation Macro block 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

 present picture

1

2

5

6

next picture

16

Inverse quantization and variable length coding Inverse quantization and variable length coding shall comply with ITUT Rec. H.262. Note that the order of output data of a variable length coder shall be one of the following:

ν

0 1 2 3 4 5 6 7

0 0 2 3 9 10 20 21 35

1 1 4 8 11 19 22 34 36

2 5 7 12 18 23 33 37 48

3 6 13 17 24 32 38 47 49

4 14 16 25 31 39 46 50 57

5 15 26 30 40 45 51 56 58

6 27 29 41 44 52 55 59 62

u 7 28 42 43 53 54 60 61 63

0 1 2 3 4 5 6 ν 7

0 0 1 2 3 10 11 12 13

1 4 5 8 9 17 16 15 14

2 6 7 19 18 26 27 28 29

3 20 21 24 25 30 31 32 33

4 22 23 34 35 42 43 44 45

5 36 37 40 41 46 47 48 49

6 38 39 50 51 56 57 58 59

u 7 52 53 54 55 60 61 62 63

ARIB STD-B32 Part 1, chapter 4.1

Signal configuration

GOP

B

B

I

B

B

P

B

B

I

Video sequence

Picture

16 pixels

Slice 16 pixels

Macroblock

16 pixels

8 pixels

8 pixels

Block

ARIB STD-B32 Part 1, chapter 4.2

(note) 1. Video sequence is the highest syntactic configuration for video coding and refers to a series of images that comprise a video signal. 2. GOP consists of I-pictures (pictures encoded using only current picture information), B-pictures (pictures encoded using current, past and future picture information) and P-pictures (pictures encoded using current and past picture information) and contains at least one I-picture. 3. A picture refers to a single image. 4. A slice consists of an arbitrary number of macroblocks in the same horizontal row. 5. A macroblock consists of a luminance signal of 16 × 16 pixels and two color-difference signals of spatially corresponding to 8 × 8 or 16 × 8 pixels. ARIB STD-B32 Part 1, chapter 4.2

2.3 Desirable display formats on 4:3 and 16:9 aspect ratio monitors Video source

Monitor with 4:3 aspect ratio

Monitor with 16:9 aspect ratio

(1) 16:9 aspect ratio program 1 when C and D are equal to A and B, respectively (including cases in which C and D are not transmitted) (2) 16:9 aspect ratio program 2 when B is set to 3/4 of B (excluding a fake 16:9 aspect ratio program in which side panels are added to 4:3 aspect ratio program)

The program is displayed in letterbox format on a 4:3 aspect ratio monitor.

The program is displayed over the entire screen (480 × 720) of the 4:3 aspect ratio monitor. Note that side panels are discarded.

(3) 4:3 aspect ratio program when C and D are equal to A and B, respectively (including cases in which C and D are not transmitted) (4) 4:3 aspect ratio program in letterbox format When C is set to 3/4 of A

Note: A: vertical_size_value (sequence_header) B: horizontal_size_value (sequence_header)

The 4:3 aspect ratio program is displayed as is on a 4:3 aspect ratio monitor.

The program is displayed as is on a 4:3 aspect ratio monitor.

The program is displayed as is on a 16:9 aspect ratio monitor.

The program is displayed as is on a 16:9 aspect ratio monitor. Gray area indicates two cases: one in which this area contains a real picture, and one in which the area consists of a black panel.

The program is displayed with side panels on a 16:9 aspect ratio monitor. With the 525i television system, appropriate changes are made to the monitor’s deflection system to allow the program to be displayed.

The program is displayed on a 16:9 monitor after multiplication of the program in the vertical direction by 4/3, 2, and 3 to produce 480, 720, and 1080 valid lines, respectively. With the 525i television system, appropriate changes are made to the monitor’s deflection system to allow the program to be displayed.

C: display_vertical_size (sequence_display_extension) D: display_horizontal_size

ARIB STD-B32 Part 1, chapter 5.1

3. Audio source coding

3.1 Audio input format 3.2 Main parameters of audio coding (reference) Outline of AAC

3.1 Audio Input Signal (1)The sampling frequency for audio signals shall be 32 kHz, 44.1 kHz, or 48 kHz. (2)To configure stereophonic signals (consisting of two or more audio signals to achieve a three-dimensional reproduction of sound), the sampling timing for all signals shall be the same. (3)The number of quantized input bits shall be 16 or more. (4)The maximum number of audio input channels shall be five, in addition to the channel used to enhance low frequencies.

ARIB STD-B32 part 2 Chapter 2

Audio Input Format Parameter Audio mode

Restriction

Possible Monaural, stereo, multichannel stereo (3/0, audio modes 2/1, 3/1, 2/2, 3/2, 3/2+LFE) (Note 1), 2-audio signals (dual monaural), multi-audio (3 or more audio signals) and combinations of the above

Recommended audio mode

Monaural, stereo, multichannel stereo (3/1, 3/2, 3/2+LFE) (Note 2), 2-audio signals (dual monaural)

Emphasis

None

(Note 1) Number of channels to front/rear speakers:

Example:3/1 = 3 front + 1 rear 3/2 = 3 front and 2 rear

(Note 2) LFE = Low frequency enhancement channel ARIB STD-B32 part 2 Chapter 5.1

Main parameters of audio coding Parameter

Restriction

Bit stream format

AAC Audio Data Transport Stream (ADTS)

Profile

Low Complexity (LC) profile

Max. number of coded channels

5.1 channels(Note) max. per ADTS

Max. bit rate

As per ISO/IEC 13818-7

(Note)

5 channels + LFE channel

ARIB STD-B32 part 2 Chapter 5.2

Main parameters of audio coding (1) Bit stream format; ADTS, has a header in each frame (2) Profile of AAC (advaned audio coding) LC profile is adopted. Reasons are as follows The LC profile was initially adopted for use with BS/broadband CS digital broadcasting based on the following factors: (a)As a result of the AAC audio quality assessment test conducted by ARIB in June 1998, we found that the LC and SSR profiles met the ITU-R broadcasting quality criteria or the criteria required by BS/broadband CS digital broadcasting at 144 kbps/2 channels or more. (b)It was pointed out that SSR profile-specific features were not effective for BS/broadband CS digital broadcasting. (c)It was pointed out that the LC profile could improve audio quality as a result of optimization and technical advance of encoders beyond year 2000 when BS digital broadcasting would begin. ARIB STD-B32 Part 2 Operational guideline Commentary

(continued) (d) Based on the premise that BS digital broadcasting shall begin in 2000, it was pointed out that it would be possible to develop encoders and receivers for the LC profile, but would be difficult to do so for the MAIN profile. (e) There is a significant difference in chip costs between MAIN and LC profiles. (f) There are technical problems to be solved for MAIN profile. We have decided to adopt the LC profile for terrestrial digital television broadcasting and terrestrial digital audio broadcasting as well for the above reasons and in view of consistency with BS/broadband CS digital broadcasting. No restrictions have been introduced in relation to the maximum bitrate. In terms of the standard, the maximum bitrate for AAC format is 288 kbps/channel when the sampling frequency is 48 kHz.

ARIB STD-B32 Part 2 Operational guideline Commentary

ADTS configuration and Multiplexing Input audio model

ADTS configuration and multiplexing

Monaural, stereo

Comprises one ADTS.

Multichannel stereo (3/0, 2/1, 3/1, 2/2, 3/2, 3/2+LFE)

Comprises one ADTS.

2-audio signals (dual monaural)

Comprises one ADTS.

(Note)

Multiple audio signals other than dual monaural (2/0+2/0)

Comprises the same number of ADTSs as that of audio streams (languages) and is multiplexed with the MPEG-2 system layer.

(Note) Dual monaural is defined as two monaural audio channels that can be simultaneously reproduced by a single ADTS.

ARIB STD-B32 Part 2 Chapter 5.2.3

(reference) Outline of AAC • AAC(Advanced Audio Coding) – MPEG2 non backward compatibility (focus on audio quality)

• 3 profiles – Main profile – Low Complexity(LC) profile to reduce hardware complexity – Scaleable Sampling Rate(SSR) profile selectable coding 4 bands

AAC encoder Block Diagram I n p u t tim e s ig n a l

P e r c e p tu a l M odel

P reP r o c e s s in g L egend F ilt e r B ank

D ata C o ntro l

TNS

I n t e n s it y / C o u p lin g

AAC: advanced audio coding

Q u a n t iz e d S p ectru m of P r e v io u s F ram e

P r e d ic t io n

M /S

I te r a tio n L o o p s S c a le F acto rs

R a t e / D is t o r t io n C o ntro l P ro cess

Q u a n t iz e r

N o is e le s s C o d in g

B its tr e a m F o r m a tte r

1 3 8 1 8 -7 C o d e d A u d io S tream

Preprocessing • This is optional for SSR profile • Four-Band PQF - 24, 18, 12, 6 kHz Output at 48 kHz • Gain Control - Time Resolution 0.7 ms at 48 kHz

Filter bank • A Modified Discrete Cosine Transform (MDCT/IMDCT) • Output 1024 frequency lines (Fs=48 kHz) -Frequency Resolution : 23 Hz -Time Resolution: 2.7 ms

• Adaptive Window Shape -Sine and Kaiser-Bessel Windows -Start and Stop Windows on short window’s coding

Prediction • This is optional for main profile • Increase Redundancy Reduction • Second Order Backwards-Adaptive Prediction filter • predict the up to 16 kHz MDCT coefficients using results of previous frame.

Quantization and Coding • Non-Uniform Quantization (1.5 dB steps) • Huffman Coding of Spectral Coefficients • Noise Shaping by Amplification of Spectral Regions (49 regions) • Huffman Coding of Differential Scale Factors

Noiseless Coding • Dynamic Spectral Sections • A single Huffman Codebook per Section • A total of 12 Codebooks

Perceptual Model • Close approximation of frequency domain masking effects, spectral shaping of quantization noise • Optimized for minimization of audible artifacts

TNS(Temporal Noise Shaping) • Predicts the MDCT coefficients and flatten the frequency spectral.

Joint Stereo Coding • M/S (Middle Side) -Matrixed Stereo Coding of L-R and L+R

• Intensity/Coupling - Combine high-frequency channel pair signal into single channel in order to reduce high-frequency redundancy.

Test results • Subjective test 1 – According to ITU-R recommendation BS. 1116 – Main profile(bitrate 320kbit/s per 5 ch) satisfies requirement of broadcast quality. – Main profile 320kbps coding corresponds to 640kbps MPEG2 layer 2 coding. – Main profile(bitrate 128kbit/s per 2ch) satisfies requirement of broadcast quality.

Test results(cont’d)  This section is reserved for figures derived from ARIB paper. 5

0

-1

-2

○ ▽ △ □ ×

-3

LC 192kbps/2ch LC 144kbps/2ch LC 128kbps/2ch LC 96kbps/2ch Anchor Codec

-4

score + 95% confidence interval

diff grade + 95% confidence interval

1

4

3

2 ▼ ■ ● +

LC LC LC AM

80kbps/2ch 48kbps/2ch 32kbps/2ch simulated data

1 Cast

Harp

Pipe

Glock Ornet

Fig.1 Results of AAC listening test (higher bit rate)

cla aba m-sp f-sp uru

mj+a ba mj+cla

mj

fj+cla

fj

Fig.2 Results of AAC listening test (lower bit rate)

Test results(cont’d) • Subjective test 2 -According to ARIB which conform to ITU-R recommendation BS. 1116 - Both LC and SSR profile coding satisfy a requirement of broadcast quality recommended by ITU-R at more than or equal to 144kbps per 2 channels - Both LC and SSR profile coding satisfy an “EBU indistinguishable quality” at more than or equal to 144kbps per 2 channels.

4. H.264 coding

For this section, “Operational guideline for H.264” is prepared separately, by word format.

END of seminar #6