HOW DOES MY 3D VIDEO SOUND LIKE? IMPACT OF LOUDSPEAKER SET-UPS ON AUDIOVISUAL QUALITY ON MID-SIZED AUTOSTEREOSCOPIC DISPLAY

HOW DOES MY 3D VIDEO SOUND LIKE? – IMPACT OF LOUDSPEAKER SET-UPS ON AUDIOVISUAL QUALITY ON MID-SIZED AUTOSTEREOSCOPIC DISPLAY Dominik Strohmeier, Satu...
Author: Chad Douglas
7 downloads 0 Views 110KB Size
HOW DOES MY 3D VIDEO SOUND LIKE? – IMPACT OF LOUDSPEAKER SET-UPS ON AUDIOVISUAL QUALITY ON MID-SIZED AUTOSTEREOSCOPIC DISPLAY Dominik Strohmeier, Satu Jumisko-Pyykkö Technische Universität Ilmenau, Helmholtzplatz 2, 98693 Ilmenau, Germany Tampere University of Technology, Korkeakoulunkatu 6, 33101 Tampere, Finland [email protected], [email protected] ABSTRACT In this paper, we examine an optimum loudspeaker set-up for audiovisual environments using a 15" autostereoscopic display to present video. By varying the number of loudspeakers and their distance from the listening point, we performed subjective assessment tests on four different setups with 32 participants. We measured simulator sickness to examine possible influencing side effects of visual presentation. As a test environment, we chose the MPEG-4 based audiovisual IAVAS player and content creation was done using virtual rooms. Our results show that four loudspeakers in a distance of one meter from the listener offer the most pleasant experience of audiovisual quality and the result is not significantly impacted by visual discomfort. Index Terms— Audiovisual quality, multimedia quality, 3D, multichannel audio, autostereoscopic display 1. INTRODUCTION New 3D multimedia products are taking their first steps into consumer markets. For example, 3D cinema already exists in several countries; various autostereoscopic displays, laptops and mobile and head-mounted devices are available as well. In addition, lots of research effort is currently put on emerging the 3D multimedia to various application fields, like 3DTV and mobile 3DTV. Providing produced quality according to end user’s quality requirements is a challenging task in multimedia environment, as the quality perception is determined by complex interaction of audio and video variables. Complexity of these perceptions may increase even more when the depth dimension is introduced. Currently, the multimodal impacts on experienced quality are relatively unexplored. Subjective quality evaluation is used to measure quality for various systems and to identify the critical quality and impairment factors for product development and objective

modeling purposes [1]. Typically, quality of different media, e.g. audio and video, are studied separately and among different research societies even though the final product may contain both media. In the field of multidimensional audiovisual quality, previous work has examined impact of different loudspeaker set-ups on experienced quality wide screen or virtual reality environments [2]. No previous work has been reported in the context of mid-size 3D screens. This study examined impact of loudspeaker set-ups on audiovisual quality when video is presented on mid-size autostereoscopic screens. 2. AUDIOVISUAL QUALITY Multichannel audio in audiovisual set-ups To provide optimal audiovisual quality, setup selection problems are often faced. However, literature provides controversial advice. For example, International Telecommunication Union’s recommendations which are widely spread among engineering societies have very different viewpoints dependently on emphasis of media [1,3,4]. For audiovisual assessment purposes, we need to combine the audio and video set-up and therefore fulfill requirements for audio and video. The problem in combining set-ups is that this “may cause conflicts between the two sets of requirements” [5]. Hence, selection of audiovisual set-ups has to be done carefully to find a compromise solution that offers an optimum set-up for the certain audiovisual application, e.g. 3DTV. Comparing studies on virtual reality applications shows that the selected audio set-ups highly differ with the selected video device. Reiter [6] tried to answer the question for “audiovisual applications using large screens”. In his test, Reiter used a 4/3 format projection screen of 2.72m of width. He has found out that the number of loudspeakers is a critical factor and his results show significant differences in the audiovisual quality perception for different set-ups. Kuhlen et al. [7] examined different sound set-ups for CAVE-like displays and finally developed a spatial sound system using four loudspeakers. More

sophisticated approaches try to combine Wave Field Synthesis with video-conferencing [2] or stereoscopic multi-view displays [8]. However, whether the device is a two-dimensional video screen or three-dimensional multiview stereo display, the goal of the applications always is to create immersive environments for the user. Perception of audiovisual quality Combining two or more modalities is more than the simple sum derived from each sensory channel separately. Multimodal integration is described as a complex process in all levels of sensory processing, but the process is not yet known in depth [9,10]. In audiovisual research, integrating visual and auditory stimuli can affect each other. Both modalities can detect small discrepancies in the stimuli or, on the other hand, are able to improve multimodal quality by integrating the single stimuli. Recent research focuses on these discrepancies that can cause heavy quality costs with the users' quality experience. Perceived quality of one modality can influence the perceived quality of another modality, especially if the discrepancies clearly differ [11]. However, matching stimuli are able to enhance the perceived quality. Relating to three-dimensional applications, they can enhance presence - the user's feeling of being there [12]. Thereby, depth perception has been identified being the most influencing feature [13]. Displays creating three-dimensional impression might introduce sickness symptoms that reduce the presence and feeling of depth. To measure the level of sickness, Kennedy et al. [14] developed a simulator sickness questionnaire (SSQ). SSQ includes 16 questions gathering various aspects of nausea, oculomotor, and disorientation and questions are answered on 4-point scale. This measurement has been used for conventional CRT displays and HMD with various presentation modes including television contents and computer games [15,16]. 3. RESEARCH METHOD Participants - Total of 32 participants, aged between 20-54 (median 25) years took part in the experiment. The majority of them were males (25 male, 7 female). Most of them also participated on quality assessment experiments before (20 at least once, 11 regularly). In addition to their experience of listening tests, 4 participants had technical knowledge of audiovisual quality factors. Procedure - The experiment started with demographic data-collection and sensorial screening (normal hearing and vision acuity incl. myopia, hyperopia and color vision). Prior to the test part participants passed a short training and quality anchoring in which quality range and stimuli were introduced.

We applied Absolute Category Rating as the method for the experiment. 10 seconds long test items were presented one by one, rated independently and retrospectively on a nominal (yes/no) and an 11-point numerical quality scale. Only the extremes of the scales were named [17]. After each item the participants rated the actual item by answering to three questions about acceptance (yes /no), overall quality (0-10) and quality of localization (0-10). Total length of the assessment was approximately 22 minutes . Stimulus - We used four different set-ups [cf. Fig. 1] to compare audio set-ups with the autostereoscopic screen. We studied 4 and 5 loudspeaker set-ups in a distances of one and two meters from the listening point. The stimuli were rendered as virtual rooms using MPEG-4. We chose a virtual classroom for the test. Visually, it provides enough space for unhindered movement while interacting with the scene. The virtual classroom was equipped with other typical accessories to help with orientation and to give a better idea of dimensions. Avatars were used to describe the sound source. We used two different audio contents (Music: drum and bass, and Speech: male voice with explaining, teacher like accent). The audio files were encoded to AAC and included to the BIFS files. In the virtual room, we rendered room acoustics using the perceptual approach of MPEG-4. Presentation conditions - The tests were conducted under controlled laboratory conditions in the listening lab. We added additional, nonworking loudspeakers to the listening room to hide the visual hints form participants. All stimuli material was presented on a Sharp AL3DU autostereoscopic display. The viewing distance of the participants was adjusted to approximately 0.5m, but we allowed limited movement for the participants to find their best position for depth impression. The sound pressure level was 75dBA at the listening point. All stimuli were presented twice on random order.

Fig. 1. The four different loudspeaker set-ups under assessment. a) 4 channel loudspeaker set-up with two varying distances, b) in brackets 5 channel loudspeaker setup with two varying distances 4. RESULTS Acceptance of quality - Acceptability was measured to check that the presented quality reaches the level of useful quality. This measure can be used when examining quality of novel applications to connect quality preferences into actual use. For all set-ups the presented quality was acceptable at least in 80% of all presentations showing that the presented quality was very high in general [see Fig. 2]

Fig. 3. Impact of loudspeaker set-ups on satisfaction of overall audiovisual quality Quality of localization - The loudspeaker set-ups had impact on quality on localization when averaging across contents of audio signal [Fr=31.3, df=3 p.05, ns; difference to others p

Suggest Documents