Quarterly Progress and Status Report. Voice fundamental frequency tracking

Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voice fundamental frequency tracking ¨ Risberg, A. and Moller, A. and Fujis...
Author: Dana Stevenson
1 downloads 1 Views 258KB Size
Dept. for Speech, Music and Hearing

Quarterly Progress and Status Report

Voice fundamental frequency tracking ¨ Risberg, A. and Moller, A. and Fujisaki, H.

journal: volume: number: year: pages:

STL-QPSR 1 1 1960 003-005

http://www.speech.kth.se/qpsr

Be

VOICE F'JNDAT.IENTK, FREQUENCY TRACKING Several systems f o r Fo-tracking have been t r i e d i n the past with

an e f f o r t t o construct a r e l a t i v e l y simple instrumentation f o r use in vocoders and f o r phonetic research. Our experience supports the general view t h a t any system w i l l work f i n e on some voices and p a r t i c u l a r l y w e l l iLn scstained portions of speech o r singing.

140 simple system, however, has Seen

considered r e l i a b l e enough f o r vocoder usage and a l l systems have had the weakness of being sensitive t o hum, noise, and s t a t i c s from the voice channel.

.The most common e r r o r remaining, i n the case of a high q u a l i t y speech

input, i s the tendency towards synchronization on harmor~cso r t h e tempor a r y indication of a subharmonic.

We have t r i e d various non-linear systems

f o r regenerating o r enhancing t h e voice fundamental and i n combination with the following p r e f i l t e r i n g : ( 1 speaker. (2)

A fixed low-pass f i l t e r optimally selected f o r the p a r t i c u l a r

A low-pass f i l t e r o r a band-pass f i l t e r continuously controlled

by the measured output of t h e Fo-meter.

( 3 ) Three band-pass f i l t e r s combined with l o g i c s f o r selecting a s t h e input t o the frequency-measuring stage t h e output of the band-pass f i l t e r of lowe st center frequency carrying s i g n a l above a c e r t a i n threshold v a l ~ ,e None of these have functioned t o our s a t i s f a c t i o n . 1 i s a s good as any of the other two,

Systen: number

System number 2 i s subject t o e r r o r s

due t o the delay i n frequency measurements and i n s t a r t i n g errors.

System

number 3 i s sensitive t o switching t r a n s i e n t s and t o unfavorable phase combinations of signals from two band-pass channels. A s u b s t a n t i a l gain i n accuracy has recently been obtained i n an

experimental s e t up which i s similar t o system number 3 above i n s o ~ erespects.

The basic idea i s t o avoid time-variable f i l t e r i n g and t o incorpo-

r a t e one complete frequency counter i n each of the band-pass channels and t o decide which channel provides the lovest frequency measure.

This measure

i s selected a s the most probable Foe Errors due t o synchronization on overtones are avoided provided one of the channels c a r r i e s a s u f f i c i e n t l y pura fundamental. tem.

Successful r e s u l t s have been reached with a two-channel sys-

A three-channel system comprising t h r e e complete %-meters and a min-

-

SPEECH SIGNAL

--

230

-

BP 550 c/s

-

THRESHOLD AMPLITUDE

-

-

FREQUENCY VOLTAGE CONVERTEB

.

GATE

THRESHOLD AMPLITUDE

CIRCUIT FOR EM PEASIS OF THE FUNDAMENTAL FRE-

BP

130

- no c / s

-

a

.

0

-

T

FREQUENCY VOLTAGE COWERTER

C U

GATE

W

DC VOLTAGE PRO-

$4

PORTIONAL M FO

W m

-

BP

70

- 130 c/s

LP

sM)e/s

-

B B

c.l

Z

r

3

FREQUENCY VOLTAGE CONVERTER

-

-

P

THRESHOLD AMPLITUDE

FREQUENCY VOLTAGE COWERTER

111. 1-3 Proposed acheme for TO--

GATE

.-

i

-

h u m s e l e c t o r i s under construction.

The b a s i c system i s i l l u s t r a t e d i n

Fig, 1-3. .

The success of t h i s system, o r of any other frequency-measuring

system, xi11 depend on t h e a c t u a l presence of a voice fundamental of an amplitude which may not be much l e s s than t h a t of a r g harmonic. Special a t t e n t i o n has therefore been devoted t o t h e i n i t i a l stage f o r enhancing t h e voice fundamental.

A few simple non-linear systems have been t e s t e d

A speech material or" 10 seconds each from

recently.

speakers was processed by t h e various methods.

4 male and +!, female

Narrovr-band spectrograms

of t h e r e s u l t s were studied and evaluated h i t h regard t o t h e r e l a t i v e int e n s i t y of t h e voice fundamental.

The percentage of p i t c h periods which

were judged t o require only a moderate f i l t e r i n g i n t h e following stages were counted.

The following r e s u l t s were obtained: Voice channel 50-3000 c/s input

Spea&,

\ Method Direct

Half'-wave r e c t i fication phase 1 phase 2

Male

49

Female

85

35 67

Nl-wave rectification

Re c t if i e d single s i d e band

50

47

83

8'7

22

82

These r e s u l t s do not p e r t a i n t o t h e o v e r a l l performance of a complete

F meter. 0-

The half-wave r e c t i f i c a t i o n i s p h s e sensitive.

The d i r e c t

unprocessed speech provides t h e b e s t m a t e r i a l f o r female voices which

-

i s due t o t h e n a t u r a l prominence of t h e i r fundamental.

Poll-wave r e c t i -

f i c a t i o n tends t o produce a frequency multiplication.

I n these instances

t h e second harmonic i s highly boosted vr1lich accounts f o r t h e low f i g u r e of merit, 22 % f o r female voices. Rectified s i n g l e side band provides t h e b e s t r e s u l t s .

The short-

comings of t h e single side-band processing a r e mostly due t o instances i n which t h e speech wave e i t h e r was of low i n t e n s i t y o r vras doninated by the voice fundamental.

These findings conform with r e s u l t s from a t h e o r e t i c a l a n a l y s i s made by H. F'ujisaki. (') A t low s i g n a l l e v e l s t h e r e c t i f i e r chara c t e r i s t i c s approximated a square-law function which accounts f o r a second

power dependency of t h e amplitude of modulation products on t h e input s i g n a l

t envelope and t h e amplitude. I n case t h e input s i g n a l has a f l ~ spectrum harmonics are l i n e a r l y r e l a t e d i n phase it may be shown t h a t t h e r a t i o of t h e fundanental t o t h e second harmonic a f t e r t h e SSB-rectification, becomes (N-1)/'(~-2), where N i s the number of harmonics present i n the input. An input band consisting of two harmonics i s t h u s optimal and it has been shown t h a t t h e presence of a formant s t r u c t u r e will favorably influence the r a t i o of fundamental t o second harmonic. A separate low-pass channel plus frequency counter incorporated

i n a p i t c h rneter should favorably supplement the part of t h e system fed from a single side-band input.

Other a l t e r n a t i v e s e x i s t i n p a r a l l e l systems

based on d i f f e r e n t s e l e c t i o n of the input voice band, e.g., t e r i n g before the SSB-operation,

high-pass f i l -

Inverse-filtering methods might produce

a raw material f o r frequency counting competing with t h e SSB-methods.

A

requirement i s then t h a t the base band i n the Fo-region s h a l l be i n t a c t . Preliminary studies have given promising r e s u l t s but f u r t h e r s t u d i e s a r e needed. A. Risberg, A, 14$ller, H, Fujisaki

(1)

Fujisaki, H. : heor ore tical studies on p i t c h extension and formant tracking", i n t e r n a l STL-report, Aug. 20 (1960).

Suggest Documents