Quarterly Progress and Status Report. The OVE III speech synthesizer

Dept. for Speech, Music and Hearing Quarterly Progress and Status Report The OVE III speech synthesizer Liljencrants, J. journal: volume: number: y...

Author: Samson Morrison

11 downloads 0 Views 540KB Size

Report

Download PDF

Recommend Documents

Quarterly Progress and Status Report. The transposer and a model of speech perception

Quarterly Progress and Status Report. Tactile speech communication aids for the deaf: A comparison

Quarterly Progress and Status Report. Bliss communication with speech or text output

Quarterly Progress and Status Report. Electromyographic studies of facial muscles during speech

Quarterly Progress and Status Report. Voice fundamental frequency tracking

Quarterly Progress and Status Report. A multi-language text-to-speech module

Fourteenth Quarterly Progress Report

QUARTERLY STATUS REPORT

Quarterly Progress and Status Report. Diagnostic rhyme test for speech audiometry with severely hard of hearing and profoundly deaf children

Quarterly Progress and Status Report. Spectrum analysis using the fast Fourier transform (FFT)

Quarterly Progress and Status Report. Distribution of dental and retroflex l-sounds across some Swedish dialects

Quarterly Progress and Status Report. Acoustics of voiceless fricatives: Production theory and data

Quarterly Progress and Status Report. A new and pedagogical terminology for Swedish prosody

QUARTERLY PROGRESS REPORT 5 APRIL - JUNE 2012

QUARTERLY PROGRESS REPORT 16 JANUARY MARCH 2015

Quarterly Progress and Status Report. Monteverdi s vespers. A case study in music synthesis

Quarterly Progress and Status Report. Temporal variations in Swedish consonant clusters: preliminary data

ANNUAL PROGRESS REPORT 3 OCTOBER 2012 SEPTEMBER 2013 QUARTERLY PROGRESS REPORT 10 JULY SEPTEMBER 2013

Quarterly Progress and Status Report. Vocal tract sweeptone data and model simulations of vowels, laterals and nasals

ANNUAL PROGRESS REPORT 5 OCTOBER 2014 SEPTEMBER 2015 QUARTERLY PROGRESS REPORT 18 JULY SEPTEMBER 2015

The present status, progress, and usage of speech databases in Japan

Personalizing a Speech Synthesizer by Voice Adaptation

Review on Text-To-Speech Synthesizer

Quarterly Progress and Status Report. Experiments with two cylindrically arched spruce plates and a violin top plate

Dept. for Speech, Music and Hearing

Quarterly Progress and Status Report

The OVE III speech synthesizer Liljencrants, J.

journal: volume: number: year: pages:

STL-QPSR 8 2-3 1967 076-081

http://www.speech.kth.se/qpsr

STL-QPSR 2-3/1967

N.

A.

SPEECH SYNTHESIS

THE OVE 111 SPEECH SYNTHESIZER*

3. Liljencrants The OVE 111 speech synthesizer has been developed at the Speech Transmission Laboratory in 1966 and 1967. The basic terminal analog has a block diagram, Fig. IV-A-2, similar to that of the e a r l i e r OVE I1 system ( I ) . The constructional details a r e however entirely different, especially since the synthesis parameters a r e digitally controlled f r o m a small computer.

The computer programs handle all communication

between the operator and the synthesizer a s well a s other input/output devices. General hardware layout The overall equipment configuration used for synthesis i s shown i n Fig. IV-A-1.

The computer i s a CDC 1700 with a n 8K, 16 bit memory

with 1.1 psec cycle time. a l I/O interface.

Attached to i t i s a disk storage and a gener-

This interface handles digital transmission between

the computer and the laboratory built hardware. output to the monitor oscilloscope.

It also gives analog

The control table contains push-

buttons with indicator lamps and a number of rotary knobs coupled to digital encoders.

Natural and synthetic speech samples may be corn-

pared using a filter bank spectrum analyzer with digitized output

.

@>_

Terminal analog synthesizer The outlines of the synhtesizer a r e given in Fig. IV-A-2.

Some of

i t s key elements may be described closer:

-Control - - - -amElifiedattenuators - -- -----The control principle i s similar to that of the MIT synthesis syst e m ( 3 ) , i. e . the analog circuits a r e directly controlled in a step-wise manner by digital signals.

The complete absence of analog control

signals i s a major promotor of long-time stability.

*

As a consequence,

Paper to be presented a t the 1967 Conference on Speech Communication and Processing, Cambridge, Mass., Nov. 6-8, 1967.

I

OSC ILLO -

OVE

III

SYNTHESIZER

Fig, IY-A-1,

Block diagram of c o m p b ryntheris actup.

h V

;/

Gain ('control

Frequency control

Fig. IV-A-3. a .

One stage of a gain control module.

Each module contains three cascaded s t a g e s .

b. Formant (pole) circuit with frequency and bandwidth controls. c.

Pitch pulse generator.

d.

Antiformant ( z e r o ) circuit.

STL-QPSR 2-3/1967

78.

-P i-t c-h g -u l-s e ~-e -n e-r -a t-o r The oscillator shown i n Fig. IV-A-3c contains a l a r g e hysteresis trigger.

Its square wave output i s integrated. The resultant triangular

wave is fed to the control amplifier and returned to the trigger.

As

soon a s one of the t r i g g e r levels i s reached the slope will be reversed. At point B the peak to peak voltage then i s constant and equal to the t r i g g e r hysteresis.

Thus at point C i t i s proportional t o the inverse

of the control gain.

Here instead the waveform slope i s constant.

It

i s now easily understood that the frequency must be proportional t o the gain A. The square wave s t a r t s a 100 psec one-shot whose output i s used for vowel and nasal excitation after appropriate analog pulse shaping. P a r t of this shaping i s done i n the "KH" network which c o r r e c t s the spectrum f o r the lacking influence of F 5 and higher formants not simulated i n the vowel branch ( s e e Ref. (4)).

-Digital - - -buffer --The digital control signals a r e buffered i n 14 flip-flop r e g i s t e r s with 6 bits each.

These a r e i n t u r n loaded f r o m the computer one at

a time using a de-multiplexer.

The computer output to h a d one reg-

i s t e r will then have t o be 10 bits i n parallel.

Four of these a r e i n t e r -

preted a s an a d d r e s s o r p a r a m e t e r name while the other six give the p a r a m e t e r value.

With this arrangement i t i s necessary to transmit

data only when a p a r a m e t e r has t o be changed.

-Cont - -r -o l g a-r-a m-e-t e-r In Table IV-A-I the synthesizer control p a r a m e t e r s a r e listed together with specifications. Control program The present f i r s t version of the synthesis control program i s p r i marily intended for initial synthesis strategy evaluation.

The data in-

put i s thus not yet automatized but taken f r o m the typewriter.

To e a s e the manual operations required a special language has been developed. All control p a r a m e t e r s for a specific time instant a r e packed into

8 machine words i n the computer. tion used by the program.

This sample a l s o contains informa-

Especially t h e r e i s a pseudo-parameter

STL-QPSR 2-3/1967

Table N-A-I. OVE I11 control parameters.

Data bits no.

/

Inc r e ment

Name

Address code

FO

1

0-5

F1

2

0-5

F2

3

0- 5

F3

4

0-5

A0

5

0-5

Vowel level

AC

6

3-5

Fricative level

AH

5

Aspirative level

AN

7

0-2 4-5

FN

7

0- 3

Nasal formant

FH

8

2-5

F4 and part of KH

KO

9

0-5

Fricative antiformant

K1

10

0-5

K2

11

0-5

B1

12

1-2

B2

12

4-5

B3

12

0

B4

12

3

01

13

0-5

F o r optional adden-

02

14

0-5

da to the circuits

Range

Remarks Pitch fundamental

J

vowel formants

I

Nasal level

?

i

Fricative formants

Vowel formant bandwidths

1

STL-QPSR 2-3/1967 "DR" giving the time duration between the present sample and the previous one.

Later, i n controlling the synthesizer, the program in-

terpolates linearly between adjacent samples f o r each time increment. This way a comparatively small number of samples has to be specified, typically two o r three per phoneme in connected synthetic speech. To build up a sample string the operator can choose among a s e t of one-c haracter control statements preceded by pertinent sample numbers.

Among the operations possible a r e the insertion o r removal

of single o r groups of samples i n the string, listing and alteration of parameters within a sample, and a number of copying operations. Standard sample groups can be loaded by typing pseudo-phonetic characters. Since a l l essential I/O equipment is operated using the computer interrupt system the building up of the sample string can be monitored continuously f r o m the oscilloscope.

It is also possible to make the

computer repetitively output control to the synthesizer in time- sharing with these functions.

This gives an immediate auditory feedback to

the operator. F r o m the manual control table the operator can select parameters to be displayed and move the plot along the time axis with a rotary knob.

With another knob he can select points in time to get thc c o r r e -

sponding output to the synthesizer.

Simultaneously the output spectrum

i s analyzed and displayed on the screen.

The plots may be recorded

for future reference, see Fig. N-A-4. The oscilloscope plot can also be changed to show sections and comparisons for synthetic and natural sounds, Fig. N-A-5. The natural reference spectra a r e then pre-recorded on the disk storage using the spectrum analyzer.

The time coordinate i s again taken f r o m the con-

t r o l table knobs. The programs occupy approximately half the computer memory while the r e s t is available for storage of samples.

This memory

space i s sufficient f o r approximately 30 seconds of speech corresponding to a storage bit r a t e of about 2,000 bits/sec.

Considering the

scheme used this i s a high figure, due partly to inefficient packing oriented to simplify the programming.

Also, to preserve unlimited

Fig. IV-A-4.

Computer plot of some control parameters for the synthesis of /sa:/. Characters indicate: left, parameter names; top, time (seconds); right, frequency (kHz); and bottom, sample number.

STL-QPSR 2-3/1967

81.

'flexibility at this initial stage of development there i s a big redundancy in the data stored.

S o h e future work will be devoted to minimizing the

storage volume needed, important i n computer vocal response applications.

References: (1) Fant, G. and Mdrtony, J. : "Instrumentation f o r Paramzt ric Synthesis (OVEII)", STL-QPSR No. 2/1962, pp. 18-24. (2) Liljencrants, 3. : "A F i l t e r Bank Speech Spectrum Analyzer", paper A27 in Proc. Vth ICA, Li&ge, Sept. 1965.

-

(3) Tomlinson, R. S. : "SPP.SS an Irnpr.~vddT ~ r m i n a Analog i Speech Synthesizer", MIT, RLE, CPR No. 80, Jan. 1966. (4) Fant, G. : "Acoustic Analysis and Synthesis of Speech with 15, No. 1 Applications to Swedishf', Ericsson Technics, (1959), pp. 3-108.