MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Effects (APO/DSP vendor)
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Provide value-added features, e.g. AEC, AGC • COM object, and run in user mode • Proxy APO for Hardware DSP, Windows provide a default proxy APO (MsApoFxProxy.dll) • Three different location for APO: o Stream Effect (SFX): an instance of the effect for every stream o Mode Effect (MFX): applied to all streams that are mapped to the same mode o Endpoint Effect(EFX): Endpoint Effect (EFX) are applied to all streams that use the same endpoint, always applied event to RAW MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Expose all audio effects including Beam Forming, Noise suppression and echo cancelation via FX_Stream_CLSID, FX_Mode_CLSID, and FX_Endpoint_CLSID APOs
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Describe Microphone’s number, position, type, angle, and so on • Audio driver reported to Windows by KSPROPERTY_AUDIO_MIC_ARR AY_GEOMETRY • Very important for Windows Speech platform enhancement pipeline • Descriptor
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Speech mode specifies: The application expects speech recognition specific signal processing at the lowest latency The hardware preferred sample rate for wideband speech (such as 16 kHz). • Need support for Speech mode if using OEM pipeline #define STATIC_AUDIO_SIGNALPROCESSINGMODE_SPEECH 0xfc1cfc9b, 0xb9d6, 0x4cfa, 0xb5, 0xe0, 0x4b, 0xb2, 0x16, 0x68, 0x78, 0xb2 DEFINE_GUIDSTRUCT("FC1CFC9B-B9D6-4CFA-B5E0-4BB2166878B2", AUDIO_SIGNALPROCESSINGMODE_SPEECH); #define AUDIO_SIGNALPROCESSINGMODE_SPEECH DEFINE_GUIDNAMED(AUDIO_SIGNALPROCESSINGMODE_SPEECH) MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Mic Gain is very key important to Cortana experience • Default Mic Gain is the OEM recommended Mic Gain for customer to use in Cortana • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_One Core\AudioInput\MicWiz\DefaultDefaultMicGain • The Registry key is set only to integrated mic arrays • The Registry is set only meet or exceed Standard metrics
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Terminal Type
Code
I/O Description
Input Undefined
0x0200
I
Input Terminal, undefined Type.
Microphone
0x0201
I
Desktop Microphone
0x0202
I
Personal microphone
0x0203
I
A generic microphone that does not fit under any of the other classifications. A microphone normally placed on the desktop or integrated into theUSB monitor. A head-mounted or clip-on microphone.
omni-directional microphone microphone array
0x0204
I
0x0205
I
processing microphone array
0x0206
I
A microphone designed to pick up voice from more than one speaker at relatively long ranges. An array of microphones designed for directional processing using host-based signal processing algorithms. An array of microphones with an embedded signal processor.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Start
Is Mic an array No Yes A single microphone does not require a microphone geometry
Is Mic geometry exposed
No
Yes
Is Speech Mode supported
No
Yes
Are AEC and NS exposed
No
Is Raw Mode Supported
Yes
Yes
Run OEM pipeline in speech mode
Run MS pipeline in raw mode
No
Run MS pipeline in default mode
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Driver Configuration Verification Tool • OEMVerificationWin10x86.exe
• Recorder and Sound files
• Score Utility • OEMScoreUtilityx64.exe
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved. Shared with Partners under NDA.
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Good acoustic design is a function of many parameters other than just microphone design, and is highly dependent on the device integration and usage
Beamforming Automatic Gain Control
OEM
Mic EQ, Gain
Voice Activation Speech Recognizer
Noise Suppression Multi-channel Echo Canceling
Acoustic Models
Microsoft speech pipeline
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Cortana
• Microsoft recommends two or
more Microphones • Benefits:
Sound Source Localization Reduction of ambient noises. Partial de-reverberation, because most
indirect paths are attenuated. Reducing the effects of electronic noise.
Target Characteristics(for reference) Microphone Eleme Type array nts Linear, small 2 unidirectional Linear, big 2 unidirectional Linear, 4el 4 unidirectional L-shaped 4 unidirectional Linear, 4 el 4 integrated second geometry Good 1 integrated omnidirectio nal microphone
NG, dB NGA, DI, dB dB -12.7 -6.0 7.4 -12.9
-6.7
7.1
-13.1
-7.6
10.1
-12.9
-7.0
10.2
-12.9
-7.3
9.9
0
0
4.5
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Cover a quiet office or cubicle with good sound capturing • Speaker is less than 0.6 meters from the microphone
Small two-element array
Big two-element microphone array
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Cover a quiet office or cubicle with good sound capturing • Speaker is less than 2 meters from the microphone
Linear four-element microphone array
L-shaped four-element microphone array
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
Circle microphone array geometry
MICROSOFT CONFIDENTIAL – for discussion purposes only. © 2015 Microsoft Corporation. All rights reserved.
• Important to ensure
temporal relationship between signals in Mics • Import to Beam forming and source localizer
Frequency(HZ)
PHASE RESPONSE MATCHING
250