Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use

映像情報メディア学会誌 Vol. 68, No. 10, pp. J447 ∼ J456(2014) Copyright © 2014 by The Institute of Image Information and Television Engineers Special Issue » D...
Author: Leon Gaines
1 downloads 0 Views 7MB Size
映像情報メディア学会誌 Vol. 68, No. 10, pp. J447 ∼ J456(2014)

Copyright © 2014 by The Institute of Image Information and Television Engineers

Special Issue » Display; mainly from IDW '13

Paper

Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use ʢFPD Ұମ‫ܕ‬࿮‫ܕ‬εϐʔΧʹΑΔ 22.2 ϚϧννϟϯωϧԻ‫ڹ‬ͷՈఉ޲͚όΠϊʔϥϧ࠶ੜʣ

Kentaro Matsui†,†† , Satoshi Oishi† , Takehiro Sugimoto (member)† , Satoshi Oode (member)† , Yasushige Nakayama (member)† , Hiroyuki Okubo (member)† , Hiroshi Sato††† , Koji Mizuno††† , Yuichi Morita††† , Shuichi Adachi†† Abstract

NHK has developed a 22.2 multichannel sound system for 8K Super Hi-Vision (SHV), an ultra high-definition

TV. The system consists of 24 spatially arranged audio channels including two low frequency effect channels for reproducing three-dimensional spatial sound. To respond to various viewing circumstances of SHV in homes, we have also developed several reproduction methods to virtually reproduce 22.2 multichannel sound with fewer loudspeakers. In this paper, we propose binaural reproduction of 22.2 multichannel sound with multiple loudspeakers integrated into a flat panel display, which makes it possible for us to experience 22.2 multichannel sound without installing 24 discrete loudspeakers.

Key words: Super Hi-Vision, binaural reproduction, head-related transfer function, flat panel display, loudspeaker frame

easy to install all 24 loudspeakers in each home. Therefore, offering a variety of options to virtually reproduce 22.2 multichannel sound with a smaller number of loudspeakers is an important task to accommodate various viewing circumstances. Regarding this matter, binaural reproduction is a useful method that can auditorily imitate sound image localization anywhere with only two or a few more loudspeakers. The idea of binaural reproduction over loudspeakers was first put forward by Bauer in the 1960s5) , and it was first formulated by Atal and Schroeder6)7) , who invented the concept of crosstalk cancellation for converting binaural signals for listening using loudspeakers. Since then, many researchers have improved the theory and devised sophisticated algorithms for binaural reproduction with two or more loudspeakers8)ʙ18) . We have studied binaural reproduction of 22.2 multichannel sound with frontal loudspeakers because loudspeakers placed only at the periphery of a display is one practical solution for home reproduction19)20) . This paper is organized as follows. First, an overview of 22.2 multichannel sound is given in Section 2. Binaural reproduction over loudspeakers is then outlined in Section 3. In Section 4, binaural reproduction of 22.2 multichannel sound with a loudspeaker frame integrated

1. Introduction 8K Super Hi-Vision (SHV) is in the development process for a next-generation television system. It features an ultra-high definition image with 7680 × 4320 pixels, which is 16 times the number of pixels of current high definition television, and three-dimensional (3-D) spatial sound provided by a 22.2 multichannel sound system1) . The 22.2 multichannel sound system consists of 24 spatially arranged audio channels and can reproduce an immersive and realistic 3-D sound field that conveys a sense of reality to the listeners2) . Test broadcasting of SHV is planned for 2016, and its broadcasting service via satellite will start in 2020. However, the way of viewing SHV depends a lot on increasingly diverse consumer lifestyles3)4) , and in many cases, it would not be Appeared in the 20th International Display Workshops Received February 27, 2014; Revised July 18, 2014; Accepted August 18, 2014 † NHK Science and Technology Research Laboratories (1–10–11 Kinuta, Setagaya-ku, Tokyo 157–8510, Japan)

†† Keio University (3–14–1 Hiyoshi, Kohoku-ku, Yokohama-shi, Kanagawa 223–8522, Japan)

††† Foster Electric Co., Ltd. (1–1–109 Tsutsujigaoka, Akishima-shi, Tokyo 19–8550, Japan)

J447

映像情報メディア学会誌 Vol. 68, No. 10(2014)

into a flat panel display (FPD) is proposed as one of the typical viewing devices for SHV. The paper concludes with a discussion of future work.

13

15

14

16

19 17

20

21

top layer 9 channels

18

Display

2. 22.2 multichannel sound system

1

7

3

The performance requirements for the audio format for use with an ultra high-definition TV (UHDTV), which are excerpted from Rec. ITU-R21) , are as follows: i) The sound image can be reproduced in all directions around the listener, including elevation direction. ii) The sensation of 3-D spatial impressions that augments a sense of reality, which is related to ambience and envelopment, can be significantly enhanced over 5.1 surround. iii) For applications with accompanying picture, the directional stability of the frontal image should be maintained over the entire area of highresolution large-screen digital imagery. iv) Excellent sound quality can be maintained over a wide listening area. v) Backward compatibility with the 5.1 channel sound system and conventional two-channel sound system can be ensured. vi) Live recording, mixing and transmission can be possible. The 22.2 multichannel sound system is designed to fulfill these requirements. The system consists of 22 audio channels arranged in three spatial layers: a top layer with nine channels, middle layer with ten channels, and bottom layer with three channels. Also, there are two channels for low frequency effects (LFE). The two LFE channels are recommended to be arranged symmetrically in the bottom layer. The schematic of the audio channel arrangement is illustrated in Fig. 1, and its labels are listed in Table 1. Eight channels in the top and middle layers are arranged radially and symmetrically to fulfill requirements i) and ii). Two additional front channels overlapping the display’s position in the middle layer are intended to stabilize the localization of sound images on the display. An overhead channel in the top layer is used to reproduce sound images over the listening area, whereas the three channels in the bottom layer reproduce sounds on the ground. These channels enhance 3-D spatial impressions especially in the heightwise direction. Finally, the two LFE channels improve the bass impression. The relationship between the number and placement of loudspeakers and the sensation of spatial impressions has been experimen-

2

8 12

11 5

9

23

4

middle layer 10 channels

6

10

22

24

bottom layer 3 channels LFE 2 channels

Fig. 1

Audio channel arrangement of 22.2 multichannel sound system.

Table 1

Channel number, label, and name of 22.2 multichannel sound system.

No.

Label

Channel name

No.

Label

Channel name

1 2 3 4 5 6 7 8 9 10 11 12

FL FR FC LFE1 BL BR FLc FRc BC LFE2 SiL SiR

Front Left Front Right Front Center LFE-1 Back Left Back Right Front Left Center Front Right Center Back Center LFE-2 Side Left Side Right

13 14 15 16 17 18 19 20 21 22 23 24

TpFL TpFR TpFC TpC TpBL TpBR TpSiL TpSiR TpBC BtFC BtFL BtFR

Top Front Left Top Front Right Top Front Center Top Center Top Back Left Top Back Right Top Side Left Top Side Right Top Back Center Bottom Front Center Bottom Front Left Bottom Front Right

tally examined22) . So far, it has been confirmed that a listening area that maintains high spatial impressions broadens with increasing number of loudspeakers, and its shape can be approximately estimated on the basis of the maximum angle of two adjacent loudspeakers. From such a perspective, the audio channel arrangement of 22.2 multichannel sound can be described as having a wider listening area than other existing multichannel audio formats. Requirements v) and vi) are about flexibility in the audio format and system compatibility with other audio formats. They are essential for practical reproduction systems and have been systematically studied and evaluated in our laboratories. The details and related information are listed in the literature2)23) .

3. Binaural reproduction 3. 1 HRTF A head-related transfer function (HRTF) is a transfer function that describes how a given sound wave from a specific position reaches the entrance to the listener’s external ear canal through reflection and diffraction on the head, pinna, and torso. HRTFs for the left and right ears contain all the directional properties of sound propagation. Thus, by adding these properties to a sound source, a sound image can be artificially localized at J448

Paper » Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use

HRTF

hrr xr

Fig. 2

Binaural reproduction over headphones.

hrl

hlr

xl

hll Fig. 4

g rr

xr

g rl

g lr

xl

g ll

Binaural reproduction over two stereophonic loudspeakers.

crosstalk

 H= Fig. 3

H = G−1

H(ωk ) = G−1 (ωk )

 G=

gll glr

The concept of crosstalk cancellation can be easily expanded to multiple-loudspeaker systems. Because the number of control points can increase in proportion to the number of loudspeakers, using multiple loudspeakers for binaural reproduction is an effective way to broaden the listening area. In the case of multipleloudspeaker systems, Eqs. (2) to (4) are revised as follows:

 (2) grl grr

(6)

3. 3 Expansion to multiple-loudspeaker systems

(1)

xr

(5)

where k indicates a discrete frequency index. Then, they are transformed into the time domain by using a discrete Fourier transform and implemented as finite impulse response (FIR) filters.

where X=

(4)

There are several approaches to computing compensation filters, including convolution-based processing in the time domain18) , processing in the frequency domain using a regularization parameter8)ʙ16) , and processing based on the control theory17) . In our study, the processing is done in the frequency domain taking the amount of computational time and load into account20) . In this case, the compensation filters are acquired by computing inverse matrices for each discrete frequency bin ωk :

Unlike listening with headphones where binaural signals are directly provided to the left and right ears, binaural reproduction over loudspeakers requires a means of handling acoustic crosstalk occurring in the process of sound propagation from the loudspeakers to the listener’s ears. Fig. 3 illustrates this unwanted acoustic phenomenon. Sound propagating from the left (right) loudspeaker to the contralateral right (left) ear is acoustic crosstalk to be cancelled, whereas sound propagating to the ipsilateral left (right) ear is intended and equalized. Fig. 4 is a schematic diagram illustrating binaural reproduction over two stereophonic loudspeakers. Sound propagation characteristics, which are typically represented by HRTFs, and compensation filters for crosstalk cancellation are represented by using the 2 × 2 matrices G and H, respectively. Matrix H is designed to match the audio signals at the listener’s ears (control points) with the intended binaural signals X. This process is formulated as follows:

xl



Therefore, the compensation filters for crosstalk cancellation work as inverse filters.

3. 2 Binaural reproduction over loudspeakers



hrl hrr

Acoustic crosstalk.

a specific position through headphones. This method is mathematically formulated in the time domain as a convolution of an audio signal and head-related impulse responses (HRIRs), the time domain equivalent of HRTFs, and it is called “binaural reproduction” (see Fig. 2).

X = GHX

hll hlr



⎤ x1 ⎢ . ⎥ . ⎥ X=⎢ ⎣ . ⎦ xm

 (3) J449

(7)

映像情報メディア学会誌 Vol. 68, No. 10(2014)



g11 ⎢ ⎢ g21 G=⎢ ⎢ .. ⎣ . gm1 ⎡

h11 ⎢ ⎢ h21 H=⎢ ⎢ .. ⎣ . hn1

g12 g22 .. . gm2

... ... .. . ...

⎤ g1n ⎥ g2n ⎥ .. ⎥ ⎥ . ⎦ gmn

h12 h22 .. . hn2

... ... .. .

⎤ h1m ⎥ h2m ⎥ .. ⎥ ⎥ . ⎦

...

hnm

(8)

(9)

where m and n indicate the number of control points and loudspeakers, respectively. If there are more loudspeakers than control points, i.e. m < n, Eq. (1) is underdetermined and has an infinite number of solutions. In such a case, the least-norm solution is derived taking the system stability into consideration. In contrast, if the number of loudspeakers is less than the number of control points, i.e. m > n, Eq. (1) is overdetermined and has no solution. The least-squares method is used to obtain an approximate solution. Note that the number of listening positions is half the number of control points since humans have two ears.

Fig. 5

12-loudspeaker frame integrated into 85-inch LCD.

Fig. 6

4. Binaural reproduction of 22.2 multichannel sound

Corrugated edge.

on each rear panel of the vertical sides. The subwoofer units are used to improve the frequency response of the frontal units by feeding low-frequency components of all 22.2 channel signals. On the other hand, the 145-inch plasma display panel (PDP) shown in Fig. 7 has 108 dynamic loudspeaker units comprised of units for binaural reproduction and ones for future expandability. The loudspeaker units for binaural reproduction, which have white diaphragms in Figs. 7, 8, are installed in the middle and at the top of each vertical side of the frame, and the number of them has been doubled to increase the overall output sound level and reduce the load of each unit. This arrangement is nearly the minimum configuration for binaural reproduction. The diameter of each loudspeaker unit is 7 cm, and the maximum sound pressure level is 96 dB. The frame is equipped with bass reflex ports on the rear of the enclosure to increase the efficiency at low frequencies.

4. 1 FPD-integrated loudspeaker frame As described in Section 1, placing loudspeakers alongside the display is a practical way to virtually reproduce 22.2 multichannel sound at home. For such occasions, the loudspeakers should be as compact as possible and fit into the room interior design so as not to obstruct television viewing while retaining their performance. FPD-integrated loudspeaker frames with many small loudspeakers installed along the edge of the FPD chassis can accommodate these requirements. Two types of loudspeaker frames were experimentally manufactured. Fig. 5 shows a 12-loudspeaker frame integrated into an 85-inch liquid crystal display (LCD). As shown in the picture, five dynamic loudspeaker units are installed at even intervals in each of the horizontal sides, and one unit is installed in the middle of each vertical side of the frame. This arrangement corresponds to a mapping of the front channels in the 22.2 multichannel sound system and so is thought to be a minimum and sufficient configuration at this time. The diameter of each loudspeaker unit is 7 cm. In spite of its small size, the maximum sound pressure level (SPL) is 92 dB. Furthermore, the corrugated edge, shown in Fig. 6, suppresses third harmonic distortion. Four subwoofer units are installed

4. 2 Binaural reproduction with FPDintegrated loudspeaker frame The front channels in the 22.2 multichannel sound system, except for the FLc, FRc, and FC channels overlapping the display’s position, were assigned to loudspeaker units on the loudspeaker frame. The percepJ450

Paper » Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use

Fig. 7

Fig. 8

Fig. 9

Loudspeaker frame integrated into 145-inch PDP.

HRTF measurement in acoustic anechoic chamber.

intervals. This arrangement took the typical television viewing position and natural movements of the body into consideration. The HRIRs for synthesizing the phantom sound sources, which were represented by X in Eq. (1), were also measured in an acoustic anechoic chamber with the same dummy head and loudspeakers mounted at specific channel positions 1.3 m from the listening position. Logarithmic time-stretched pulses at a bit depth of 24 bits and sampling rate of 48 kHz were used in both measurements. The pulses had a length of 217 samples. The responses recorded with the dummy head microphone were convolved with inverse filters of the time-stretched pulses and the measuring system and windowed with a rectangular window to shape 1024-point HRIRs for crosstalk cancellation and 256-point ones for synthesis. Fig. 10 shows examples of the HRIRs for crosstalk cancellation measured using the 12-loudspeaker frame, and Fig. 11 shows their frequency magnitude responses. HRTFs from the left and right loudspeaker units in the middle of the vertical side to the left and right ears at the center of the listening positions are taken as examples. Small ripples in the tails of the impulse responses are inferred to be reflections from the loudspeaker frame. They should also be compensated for and were included as parts of the HRIRs. Fig. 12 shows examples of the HRIRs for synthesis, and Fig. 13 shows their frequency magnitude responses. HRTFs from the SiL and SiR channels, i.e., from immediately lateral to the listening position, to the left and right ears are taken as examples.

Number of loudspeaker units has been doubled.

tion of the three excepted channels was produced by the pair-wise amplitude panning method. Our recent research revealed that a vertical pair of loudspeakers brings better directional stability of the frontal phantom sound images than a horizontal pair of loudspeakers on the medium-sized display. Therefore, the upper and lower loudspeaker units in the horizontal sides of the loudspeaker frame were used for the amplitude panning. The side and back channels were synthesized as phantom sound sources at specific channel positions by using all of the integrated loudspeaker units for binaural processing. Two LFE channels were reproduced separately by subwoofers. The HRIRs for crosstalk cancellation, which are represented by G in Eq. (1), were measured in an acoustic anechoic chamber in NHK Science and Technology Research Laboratories. As shown in Fig. 9, measurements were directly made by using a dummy head and the loudspeaker frames. The listening positions at which HRIRs were measured were distributed on a line at a distance of 1.5 times the display height, 1.5 m from the display in the case of the 85-inch LCD, and at 0.05 m

4. 3 Performance evaluation The performance of crosstalk cancellation was quantitatively evaluated through an objective experiment using the 12-loudspeaker frame. The experimental condiJ451

映像情報メディア学会誌 Vol. 68, No. 10(2014)

1

1

(a)

left right

0

−0.5

200

400 600 Sample

800

−1

1000

1

(b)

left right

0

−0.5

400 600 Sample

800

20

(a)

−40

−60

1000 Frequency (Hz)

50

100 150 Sample

200

250

Example of HRIRs for synthesis. (a) from SiL channel to left/right ear (b) from SiR channel to left/right ear.

(a)

0

−20

left right −60

10000

20

(b)

0

1000 Frequency (Hz)

10000

(b)

0 Magitude (dB)

Magnitude (dB)

left right

−40

left right

−20

−40

1000 Frequency (Hz)

−20

−40

left right

Fig. 11

250

0

Fig. 12

Example of HRIRs for crosstalk cancellation. (a) from left loudspeaker unit to left/right ear (b) from right loudspeaker unit to left/right ear.

−20

−60

200

(b)

−1

1000

Magnitude (dB)

Magnitude (dB)

200

0

20

100 150 Sample

−0.5

−1

20

50

0.5 Magnitude

Magnitude

0.5

Fig. 10

0

−0.5

−1

1

left right

0.5 Magnitude

Magnitude

0.5

(a)

left right −60

10000

Frequency magnitude responses of HRIRs for crosstalk cancellation. (a) from left loudspeaker unit to left/right ear (b) from right loudspeaker unit to left/right ear.

Fig. 13

J452

1000 Frequency (Hz)

10000

Frequency magnitude responses of HRIRs for synthesis. (a) from SiL channel to left/right ear (b) from SiR channel to left/right ear.

Paper » Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use

1

20

(a)

left right

0.8

Magnitude (dB)

Magnitude

0.6 0.4 0.2 −0

(a)

0

−20

−40 left right

−0.2 −60

−0.4

1

1000

2000 3000 Sample

4000

20

(b)

left right

0.8

Magnitude (dB)

0.6 Magnitude

1000 Frequency (Hz)

5000

0.4 0.2 −0

(b)

0

−20

−40 left right

−0.2 −0.4

Fig. 14

10000

−60

1000

2000 3000 Sample

4000

1000 Frequency (Hz)

5000 Fig. 15

Measured impulse responses at control points. (a) left ear (b) right ear.

125 18.1

tions were about the same as those of the HRTF measurement, i.e., the 12-loudspeaker frame was set at the same positions as shown in Fig. 9, and a dummy head was set at the center of the listening positions, 1.5 m from the loudspeaker frame. The compensation filters for crosstalk cancellation were cascaded with the 12loudspeaker frame, and unit impulse signals, instead of binaural signals, were fed to the compensation filters to measure the impulse responses. Fig. 14 shows the measured impulse responses at each of the control points, i.e., at the left and right ears of the dummy head, and Fig. 15 shows their frequency magnitude responses. On the other hand, Fig. 16 shows theoretical impulse responses generated using a computer simulation as references, and Fig. 17 shows their frequency magnitude responses. In the figures, the terms “left” and “right” indicate that a unit impulse signal was fed from the “left” or “right” input terminal instead of the “left” or “right” binaural signal; therefore, unit impulse signals with some delays should be observed at ipsilateral ears and silent signals at contralateral ears. We can see from the figures that the responses observed at ipsilateral ears approximate the envelope of the intended allpass properties to some extent, and crosstalk occurring at contralateral ears are suppressed below −15 dB

250 27.5

10000

Measured frequency magnitude responses at control points. (a) left ear (b) right ear.

500 37.3

Table 2 Channel separation. 1k 2k 4k 8k 16k broadband 37.4 40.0 40.7 34.5 27.2 31.7

(Hz) (dB)

or less at almost all frequencies. The measured performance of crosstalk cancellation degrades slightly in high and low frequencies compared with that of the simulation. This is mainly because these frequencies are out of the frequency range of the loudspeaker units. The performance of crosstalk cancellation was quantified using the channel separation measure24) defined as J=

Pips Pcon

(10)

where Pips and Pcon represent the power of the signals recorded at the ipsilateral and contralateral ears, respectively. Table 2 lists the channel separation at different octave bands centered at the following standard frequencies: 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz, and 16 kHz. Also, it indicates that the compensation filters provide 18 dB or more channel separation in all frequency bands. We can safely say from these results that the compensation filters work well for performing not only equalization but also crosstalk cancellation with some processing delays. The stability of the compensation filters for crosstalk cancellation was evaluated by referring to the condition number, which is an objective measure of sensitivity to J453

映像情報メディア学会誌 Vol. 68, No. 10(2014)

1

(a)

left right

0.8

Magnitude

0.6

four-unit

six-unit

12-unit

0.4 0.2 −0

Fig. 18

−0.2 −0.4

1000

2000 3000 Sample

4000

Loudspeaker unit arrangements.

5000 12

(b)

condition number

1

left right

0.8 0.6 Magnitude

two-unit

0.4 0.2

10 8 6 4 2 2

−0

4 6

−0.2 units

−0.4

Fig. 16

20

1000

2000 3000 Sample

4000

5000

Fig. 19

(a)

Magnitude (dB)

−40 −60

left right 1000 Frequency (Hz)

10000

(b)

Magnitude (dB)

0 −20 −40 left right 1000 Frequency (Hz) Fig. 17

Frequency (Hz)

Condition number vs. frequency and number of loudspeaker units.

tion numbers of inverse matrices for each discrete frequency bin for a varying number of loudspeaker units in the 12-loudspeaker frame used for binaural processing. The loudspeaker unit arrangements are shown in Fig. 18. The loudspeaker unit arrangement for binaural reproduction in the 108-loudspeaker frame corresponds to the four-unit arrangement. Fig. 19 shows the condition number as a function of frequency and the number of loudspeaker units. We can see from the figure that the condition number decreases roughly in inverse proportion to the number of loudspeaker units and that peaks seen in the two-unit configuration result gradually become rounded. The results imply that stability of the compensation filters, and thus that of the synthesized phantom sound sources, is improved by using multiple loudspeakers for binaural processing. The signal processing task was implemented on general purpose computers equipped with an ASIO audio interface. In theory, the processing can be done sample by sample with no latency; however, a few system buffers were used to maintain stable continuous audio streaming. The buffer size is currently set to 1024 samples at a sampling rate of 48 kHz, which is the same as the buffer size of measured HRIRs and is equivalent to 21 millisecond latency. The equivalent amount of delay was added to the audio signals fed to the front channels to compensate for the latency and synchronize

−20

−60

10000 1000

Impulse responses at control points in simulation. (a) left ear (b) right ear.

0

20

12

10000

Frequency magnitude responses at control points in simulation. (a) left ear (b) right ear.

noise or numerical errors. The experimental conditions were the same as those in the previous experiment. The degree of ill-conditioning was evaluated through condiJ454

Paper » Binaural Reproduction of 22.2 Multichannel Sound with Flat Panel Display-Integrated Loudspeaker Frame for Home Use

them with the side and back channels.

11ʣO. Kirkeby and P. A. Nelson, “Digital Filter Design for Inversion Problems in Sound Reproduction”, J. Audio Eng. Soc., 47, 7/8, pp. 583–595 (1999) 12ʣT. Takeuchi and P. A. Nelson: “Optimal source distribution for virtual acoustic imaging”, ISVR Tech. Rep., 288, University of Southampton (2000) 13ʣS. Miyabe, M. Shimada, T. Takatani, H. Saruwatari, and K. Shikano: “Multi-Channel Inverse Filtering with Selection and Enhancement of a Loudspeaker for Robust Sound Field Reproduction”, in Proc. IWAENC 2006, pp. 1-4 (2006) 14ʣN. Kamado: “Sound Field Reproduction Integrating Multi-Point Sound Field Control and Wave Field Synthesis”, Doctoral Dissertation (2012) 15ʣH. Hokari, Y. Furumi, and S. Shimada: “A Study on Loudspeaker Arrangement in Multi-Channel Transaural System for Sound Image Localization”, in Proc. AES 19th International Conference (2001) 16ʣM. R. Bai, C. W. Tung, and C. C. Lee: “Optimal design of loudspeaker arrays for robust cross-talk cancellation using the Taguchi method and the genetic algorithm”, J. Acoust Soc. Am., 117, 5, pp. 2802–2813 (2005) 17ʣT. Samejima, Y. Sasaki, I. Taniguchi, and H. Kitajima: “Robust transaural sound reproduction system based on feedback control”, Acoust Sci & Tech., 31, 4 (2010) 18ʣY. Huang, J. Benesty, and J. Chen: “On Crosstalk Cancellation and Equalization With Multiple Loudspeakers for 3-D Sound Reproduction,”, IEEE Signal Processing Letters, 14, 10, pp. 649– 652 (2007) 19ʣK. Matsui and A. Ando: “Binaural Reproduction of 22.2 Multichannel Sound over Loudspeakers”, 129th Conv. Audio Eng. Soc., Prepr. 8272 (2010) 20ʣK. Matsui and A. Ando: “Binaural Reproduction of 22.2 Multichannel Sound with Loudspeaker Array Frame”, 135th Conv. Audio Eng. Soc., Prepr. 8954 (2013) 21ʣRec. ITU-R BS.1909: “Performance requirements for an advanced multichannel stereophonic sound system for use with or without accompanying picture” (2012) 22ʣI. Sawaya, S. Oode, A. Ando, and K. Hamasaki: “Size and Shape of Listening Area Reproduced by Three-dimensional Multichannel Sound System with Various Number of Loudspeakers”, 131th Conv. Audio Eng. Soc., Prepr. 8510 (2011) 23ʣK. Hamasaki, K, Hiyama, T. Nishiguchi, and K. Ono: “Advanced Multichannel Audio Systems with Superior Impression of Presence and Reality”, 116th Conv. Audio Eng. Soc., Prepr. 6053 (2004) 24ʣJ. J. Lopez and A. Gonzalez: “Experimental Evaluation of CrossTalk Cancellation Regarding Loudspeaker’s Angle of Listening”, IEEE Signal Processing Letters, 8, 1 (2001)

5. Conclusion This paper gave an overview of 22.2 multichannel sound as the audio format for use with UHDTV. To meet the needs of home reproduction systems, binaural reproduction with FPD-integrated loudspeaker frames was proposed and two experimental systems were manufactured. Out of these two experimental systems, the 12-loudspeaker frame was first exhibited at the 2014 NAB Show. Since then, it has been demonstrated at various event sites. Through such experimental operations, it will be rolled out in the near future as a practical prototype of home reproduction systems for 22.2 multichannel sound. One of our important tasks is to evaluate the system performance objectively and subjectively in various viewing circumstances of SHV. So far, it has been confirmed through objective experiments that the robustness and stability of the synthesized phantom sound sources against disturbances, such as the movement of the listener’s head, are enhanced at the center of listening positions (sweet spot) by using multiple loudspeakers for binaural processing. However, a sufficiently broad listening area for viewing SHV has not been attained yet. It is also important to make the overall processing more sophisticated and cut down on the amount of computations with the aim of putting the system into practical use in the near future. ʤReferencesʥ

Kentaro Matsui received the B.E. and M.E. degrees in information engineering from Nagoya University, Aichi, Japan, in 1996 and 1998, respectively. He joined NHK in 1998. Since 2001, he has been with the NHK Science and Technology Research Laboratories. His research interests include head-related transfer function and acoustic signal processing. Since 2013, he has been in a doctoral course of Keio University.

1ʣY. Shishikui, Y. Fujita, and K. Kubota: “Super Hi-Vision - the star of the show!”, EBU Tech. Rev., pp. 4–16 (2009) 2ʣK. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama, and A. Ando: “A 22.2 Multichannel Sound System for UltrahighDefinition TV (UHDTV)”, SMPTE Mot. Imag. J., 117, 3, pp. 40–49 (2008) 3ʣI. Sawaya: “Multichannel Sound Reproduction System in the Home”, NHK STRL R&D, 128, pp. 11–17 (2013) in Japanese 4ʣT. Sugimoto, K. Matsui, and H. Okubo: “A Loudspeaker Array Frame Reproducing 22.2 Multichannel Sound for Super Hi-Vision Flat Panel Display”, NAB Proceedings 2012 pp. 16–21 (2012) 5ʣB. B. Bauer: “Stereophonic Earphones and Binaural Loudspeakers”, J. Audio Eng. Soc., 9, 2, pp. 148–151 (1961) 6ʣB. S. Atal and M. R. Schroeder: “Apparent Sound Source Translator”, US Patent 3,236,949 7ʣM. R. Schroeder: “Digital Simulation of Sound Transmission in Reverberant Spaces”, J. Acoust. Soc. Am., 47, 2(1), pp. 424–431 (1970) na-Bustamante, and D. Engler: “Experi8ʣP. A. Nelson, F. Ordu˜ ments on a System for the Synthesis of Virtual Acoustic Sources”, J. Audio Eng. Soc., 44, 11, pp. 990–1007 (1996) 9ʣH. Tokuno, O. Kirkeby, P. A. Nelson, and H. Hamada: “Inverse Filter of Sound Reproduction Systems Using Regularization”, IEICE Trans. Fundamentals, E80-A, 5, pp. 809–820 (1997) 10ʣO. Kirkeby, P. A. Nelson, and H. Hamada: “Local sound field reproduction using two closely spaced loudspeakers”, J. Acoust. Soc. Am., 104, 4, pp. 1973-1981 (1998)

Satoshi Oishi

received the M.E. degree from Waseda University, Tokyo, Japan, in 2005. He joined NHK in 2005. Since 2010, he has been engaged in the research and development of audio signal processing at NHK Science and Technology Research Laboratories.

Takehiro Sugimoto

received the B.E. and M.E. degrees in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1999 and 2001, respectively. He received the Ph.D. in information processing from Tokyo Institute of Technology, Tokyo, Japan, in 2013. He joined NHK in 2001 and has been working at developing the acoustic transducer and the audio coding. He is also engaged in MPEG standardization.

J455

映像情報メディア学会誌 Vol. 68, No. 10(2014)

Satoshi Oode

received the M.S. degree from Tokyo Institute of Technology, Tokyo, Japan, in 1999. He joined NHK in 1999. Since 2001, he has been engaged in the research and development of multichannel audio at NHK Science and Technology Research Laboratories.

Yasushige Nakayama received the B.E. and M.E. degrees from the University of Iwate, Iwate, Japan in 1992, and 1994, respectively. He joined NHK in 1994. He has mainly been engaged in the research and development of a threedimensional sound system for television. He is currently a senior research engineer of NHK Science and Technology Research Laboratories. Hiroyuki Okubo received the M.E. degree from Meiji University, Tokyo, Japan, and joined NHK in 1992. He is currently in the Science and Technology Research Laboratories of NHK. He has been engaged in the research and development of 22.2 multichannel audio systems for Super HiVision. Hiroshi Sato joined Foster Electric co. LTD in 1990. Since 2004, he has been engaged in research and development of electronic circuit at Strategic Research & Development Div. Product Research & Development Dept.

Koji Mizuno joined Foster Electric co. LTD in 1991. Since 2011, he has been engaged in research and development of loudspeaker unit at Strategic Research & Development Div. Product Research & Development Dept.

Yuichi Morita joined Foster Electric co. LTD in 2003. Since then, he has been engaged in research and development of loudspeaker system, and audio signal processing at Strategic Research & Development Div. Product Research & Development Dept.

Shuichi Adachi

received the M.E. and Ph.D. degrees in electrical engineering from Keio University, Yokohama, Japan, in 1983 and 1986, respectively. From 1986 to 1990, he was employed by Toshiba Research and Development Center. In 1990, he joined Department of Electrical and Electronic Engineering, Utsunomiya University as an associate professor, and in 2002 he became a professor. From 2003 to 2004, he was a visiting researcher at Engineering Department of Cambridge University. Since 2006, he has been a professor in the Department of Applied Physics and PhysicoInformatics, Keio University. His current research interests include system identification theory and its application to real systems, for example, acoustic, automobile, aerospace, biomedical systems, and so on.

J456