MICROSOFT CONFIDENTIAL for discussion purposes only Microsoft Corporation. All rights reserved

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved. MICROSOFT CONFIDENTIAL – for discussion p...
Author: Lillian Lloyd
72 downloads 2 Views 2MB Size
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Effects (APO/DSP vendor)

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Provide value-added features, e.g. AEC, AGC • COM object, and run in user mode • Proxy APO for Hardware DSP, Windows provide a default proxy APO (MsApoFxProxy.dll) • Three different location for APO: o Stream Effect (SFX): an instance of the effect for every stream o Mode Effect (MFX): applied to all streams that are mapped to the same mode o Endpoint Effect(EFX): Endpoint Effect (EFX) are applied to all streams that use the same endpoint, always applied event to RAW MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Expose all audio effects including Beam Forming, Noise suppression and echo cancelation via FX_Stream_CLSID, FX_Mode_CLSID, and FX_Endpoint_CLSID APOs

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Describe Microphone’s number, position, type, angle, and so on • Audio driver reported to Windows by KSPROPERTY_AUDIO_MIC_ARR AY_GEOMETRY • Very important for Windows Speech platform enhancement pipeline • Descriptor

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Speech mode specifies:  The application expects speech recognition specific signal processing at the lowest latency  The hardware preferred sample rate for wideband speech (such as 16 kHz). • Need support for Speech mode if using OEM pipeline #define STATIC_AUDIO_SIGNALPROCESSINGMODE_SPEECH 0xfc1cfc9b, 0xb9d6, 0x4cfa, 0xb5, 0xe0, 0x4b, 0xb2, 0x16, 0x68, 0x78, 0xb2 DEFINE_GUIDSTRUCT("FC1CFC9B-B9D6-4CFA-B5E0-4BB2166878B2", AUDIO_SIGNALPROCESSINGMODE_SPEECH); #define AUDIO_SIGNALPROCESSINGMODE_SPEECH DEFINE_GUIDNAMED(AUDIO_SIGNALPROCESSINGMODE_SPEECH) MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Mic Gain is very key important to Cortana experience • Default Mic Gain is the OEM recommended Mic Gain for customer to use in Cortana • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_One Core\AudioInput\MicWiz\DefaultDefaultMicGain • The Registry key is set only to integrated mic arrays • The Registry is set only meet or exceed Standard metrics

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Terminal Type

Code

I/O Description

Input Undefined

0x0200

I

Input Terminal, undefined Type.

Microphone

0x0201

I

Desktop Microphone

0x0202

I

Personal microphone

0x0203

I

A generic microphone that does not fit under any of the other classifications. A microphone normally placed on the desktop or integrated into theUSB monitor. A head-mounted or clip-on microphone.

omni-directional microphone microphone array

0x0204

I

0x0205

I

processing microphone array

0x0206

I

A microphone designed to pick up voice from more than one speaker at relatively long ranges. An array of microphones designed for directional processing using host-based signal processing algorithms. An array of microphones with an embedded signal processor.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Start

Is Mic an array No Yes A single microphone does not require a microphone geometry

Is Mic geometry exposed

No

Yes

Is Speech Mode supported

No

Yes

Are AEC and NS exposed

No

Is Raw Mode Supported

Yes

Yes

Run OEM pipeline in speech mode

Run MS pipeline in raw mode

No

Run MS pipeline in default mode

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Driver Configuration Verification Tool • OEMVerificationWin10x86.exe

• Recorder and Sound files

• Score Utility • OEMScoreUtilityx64.exe

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved. Shared with Partners under NDA.

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Good acoustic design is a function of many parameters other than just microphone design, and is highly dependent on the device integration and usage

Beamforming Automatic Gain Control

OEM

Mic EQ, Gain

Voice Activation Speech Recognizer

Noise Suppression Multi-channel Echo Canceling

Acoustic Models

Microsoft speech pipeline

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Cortana

• Microsoft recommends two or

more Microphones • Benefits:

 Sound Source Localization  Reduction of ambient noises.  Partial de-reverberation, because most

indirect paths are attenuated.  Reducing the effects of electronic noise.

Target Characteristics(for reference) Microphone Eleme Type array nts Linear, small 2 unidirectional Linear, big 2 unidirectional Linear, 4el 4 unidirectional L-shaped 4 unidirectional Linear, 4 el 4 integrated second geometry Good 1 integrated omnidirectio nal microphone

NG, dB NGA, DI, dB dB -12.7 -6.0 7.4 -12.9

-6.7

7.1

-13.1

-7.6

10.1

-12.9

-7.0

10.2

-12.9

-7.3

9.9

0

0

4.5

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Cover a quiet office or cubicle with good sound capturing • Speaker is less than 0.6 meters from the microphone

Small two-element array

Big two-element microphone array

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Cover a quiet office or cubicle with good sound capturing • Speaker is less than 2 meters from the microphone

Linear four-element microphone array

L-shaped four-element microphone array

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

Circle microphone array geometry

MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.

• Important to ensure

temporal relationship between signals in Mics • Import to Beam forming and source localizer

Frequency(HZ)

PHASE RESPONSE MATCHING

250