by a dierence in their middle phonemes

Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm Howard J. Hamilton Dept. of Computer Science University of Regi...
0 downloads 0 Views 129KB Size
Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm

Howard J. Hamilton

Dept. of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 [email protected]

Abstract Pronunciation rules can be learned from word-pronunciation pairs using the Version Space Algorithm. The LEP-G (Learning English Pronunciation for Graphemes) program, written in Prolog has learned a significant portion of English pronunciation rules.

1 Introduction

We describe a technique for learning pronunciation rules based on the Version Space algorithm. In particular, we describe how to learn pronunciation rules for a representative subset of the English graphemes. The present work is part of an overall project (LEPW) to learn how to translate English words to the International Phonetic Alphabet (IPA), a system of symbols representing all individual sounds that occur in any spoken human language 6]. The task of learning to translate English words to IPA symbols involves recognizing grapheme(s) of a word, separating the word into syllables, distinguishing open and closed syllables, classifying the stresses of each syllable in a word, learning pronunciation rules for graphemes, and accumulating the pronunciation rules that have been learned 8]. In this paper, we present a learning procedure called LEP-G (learning English pronunciation for graphemes) that learns English pronunciation rules from examples in the form of word-pronunciation pairs. With the pronunciation rules obtained by LEP-G, an English-to-IPA translator can translate not only existing English words in dictionaries but also new words such as tuple, pixel, and deque which are not found in dictionaries. The approach that we have just described is equally applicable to all other phonetic languages. In this paper, we concentrate on learning pronunciation rules for English.

Jian Zhang

Dept. of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 [email protected]

Related work includes 1] and 3], both of which use pronunciation rules to drive English-to-IPA translators. These rules were accumulated via a manual or semi-automated process. We are attempting to completely automate this process using symbolic Machine Learning techniques. In 7], the same problem is tackled using neural net techniques, with good results. A phoneme is the smallest unit of a spoken language which distinguishes meanings. A grapheme is a sequence of one or more letters that represent a single phoneme. For example, the graphemes in the word watch are w, a, and tch. We use the IPA to represent phonemes. For example, cat (pronounced c, ash, t]) and cut (pronounced c, invert v, t]) are distinguished by a di erence in their middle phonemes. Graphemes in English may have one, two, or three letters e.g., ght is a three-letter grapheme. Some graphemes represent vowel sounds, while some represent consonant sounds. An open syllable ends with a vowel, and a closed syllable ends with a consonant 4]. The syllable date in the word candidate is an open syllable and the syllable can in the same word is a closed syllable. We selected the following 12 graphemes for the learning experiment: a, e, i, o, u, b, c, d, ar, au, or, and gh. This subset was chosen to include all the single-vowels, which are the most dicult graphemes for which to choose a pronunciation because each of the vowel graphemes represents more than one sound while most of the consonant graphemes only represent one sound. There are three consonant graphemes in the list, which were chosen alphabetically. We also chose four two-letter vowel graphemes and one two-letter consonant grapheme. For each grapheme, there is at least one corresponding IPA symbol. The main idea is to capture the relationship between a grapheme and one of its IPA symbol and record this relationship in rule form. A relationship is described as a set of conditions on the syllable containing the grapheme. The LEP-G learning algorithm is based on the Version Space algorithm (VSA) 5] with our modications. For each grapheme and its target IPA sym-

bol, we input a set of positive and negative examples. LEP-G chooses, from the version space, the single hypothesis that is consistent with these examples, and this hypothesis becomes one of the pronunciation rules.

[?,?,?,?,?,?,?] GH1

2 A Descriptive Example

Suppose we want to learn when to pronounce the grapheme a with the sound denoted with the IPA symbol ei] from a series of positive and negative examples. In this case, a positive example is a word which has the grapheme a and its IPA symbol is ei] a negative example is a word which has the grapheme a but its IPA symbol is not ei]. The words cake, name, and ate provide positive examples, and the words map, banana and cat provide negative examples. We restrict the set of negative examples to words that include the grapheme to be pronounced, i.e., house is not a negative example for learning to pronounce the grapheme a because house does not contain a. The format of the examples is: IPA, GrB, GrA, SylType, PartOfS, Stress, NumSyl]

where IPA is the IPA symbol to be learned, GrB is the grapheme before the target grapheme, GrA is the grapheme after the target grapheme, SylType tells whether a syllable is open or closed, PartOfS identies the part of speech, e.g., noun or verb, Stress tells whether the syllable is stressed or unstressed, and NumSyl is the number of syllables in the word. The word map has the input form: ash, m, p, closed, noun, stressed, one-syllable]

No indication that it is a positive or negative example in the input form. Whether or not an example is a positive example is determined by the learning procedure depending on the learning task e.g., map is a positive example for learning to pronounce a as ash], but a negative example for learning to pronounce a as ei]. The hypothesis space for this learning problem is the set of tuples containing the seven elds given above, where each of these elds can contain either a specic value or a question mark (?). A specic value in a eld denotes a required condition for that eld, but a question mark means no restriction. Let Pi denote a positive example and Ni denote a negative example, where i is an index over the number of positive or negative examples. The mixed stream of positive and negative examples is: P1 P2 N1 P3 N2 P4 N3

cake: ei, c, k, open, noun, stressed, one-syllable] name: ei, n, m, open, noun, stressed, one-syllable] map: ash, m, p, closed, noun, stressed, one-syllable] ate: ei, empty, t, open, verb, stressed, one-syllable] banana: schwa, b, n, open, noun, unstressed, multi-syllable] candidate: ei, d, t, open, noun, stressed, multi-syllable] cat: ash, c, t, closed, noun, stressed, one-syllable]

SH1 [ei,c,k,open,noun,stressed,one-syllable]

Figure 1: Initial Version Space When the VSA is applied to this problem, the initial general hypothesis is ?, ?, ?, ?, ?, ?, ?]. The rst positive example P1 = ei, c, k, open, noun, stressed, one-syllable]. is used as the initial specic hypothesis. For simplicity, we assume that the rst example will always be positive. The initial version space is shown in Figure 1. The second example is P2 = ei, n, m, open, noun, stressed, one-syllable], which indicates that the current specic hypothesis (SH1) ei, c, k, open, noun, stressed, one-syllable] should be generalized because it is too specic to include this positive example. We make minimal changes to the old specic hypothesis to create a more general hypothesis, such that both P1 and P2 are instances. In this case, we change only the second and third elds in SH1, because P1 and P2 di er in only these elds. This generalization is indicated with an arrow from SH1 to SH2 at the bottom of Figure 2, which shows all of the version space that is examined for this learning problem. The next example is N1 = ash, m, p, closed, noun, stressed, one-syllable], which requires us to make the general hypothesis more specic. To specify the hypothesis space, we not only require that changes by minimal but also that a new general hypothesis is not a generalization of the negative example N1 , or of any other general hypothesis, and that a new general hypothesis has no eld which is more specic than the corresponding eld of the current specic hypothesis. Thus, the three conditions under which a new more-specic hypothesis is not generated from eld i are:  Condition 1: the ith elds of Nj and specic hypothesis are the same Condition 2: the ith elds of the current general hypothesis and the specic hypothesis are the same  Condition 3: the ith eld of Nj is equal to ? If none of these conditions apply, we generate a new general hypothesis from the ith eld by changing the 

[?,?,?,?,?,?,?] GH1

GH2

GH4

N1: [ash,m,p,closed,noun, stressed,one-syllable]

[ei,?,?,?,?,?,?]

[ei,?,?,?,? stressed,?]

GH5

GH3

[ei,?,?,?,?,? one-syllable]

GH6

[ei,?,?, open,?,?,?]

[?,?,?,open,?,?,?]

GH7

[?,?,?,oepn stressed,?]

Pruned away since it is more specific than SH4

GH9

[ei,?,?,open, ?,stressed,?]

GH10

GH8

[?,?,?,open,?, one-syllable]

Pruned away since it is more specific than SH4

[ei,?,?,open,?,?,?]

Pruned away since it is more general than GH9 and GH11

GH11

[ei,?,?,open, ?,stressed,?] Pruned away since it is identical with GH9

GH12

[ei,?,?,open,?,stressed,?] Solution [ei,?,?,open,?,stressed,?] SH4

P4: [ei,d,t,open,noun,stressed,multi-syllable]

SH3

[ei,?,?,open,?,stressed,one-syllable]

P3: [ei,empty,t,open,verb,stressed,one-syllable]

SH2

[ei,?,?,open,noun,stressed,one-syllable]

P2: [ei,n,m,open,noun,stressed,one-syllable]

SH1

[ei,c,k,open,noun,stressed,one-syllable] P1: [ei,c,k,open,noun,stressed,one-syllable]

Figure 2: Final Version Space ith eld of the current general hypothesis to match the ith eld of the specic hypothesis. Let us compare N1 with each of SH2 and GH1, eld by eld. The rst eld of N1 is ash, of SH2 is ei, and of GH1 is ?. None of the three conditions applies here, and therefore, a new general hypothesis GH2 = ei, ?, ?, ?, ?, ?, ?] is generated by replacing ? of GH1 with ei and copying all other elds. The second eld of N1 is m, but the second elds of SH2 is ?. By the second condition, no new general hypothesis is produced from this eld. The third elds of N2 , SH2, and GH1 also satisfy condition 2, and therefore no new general hypothesis is generated. The fourth eld of N1 is closed, of SH2 is open, and of GH1 is ?. This situation does not match any of the three conditions, so a new hypothesis GH3 = ?, ?, ?, open, ?, ?, ?] is generated by replacing ? of GH1 with open and copying all the rest of the elds. The last

three elds of N1 and SH2 are the same, which satises condition 1, and therefore no new hypotheses are generated from these elds. As further versions of hypotheses are generated, LEP-G prunes away some hypotheses. For example, GH10 is a more general hypothesis than GH9, because the sixth eld of GH10 is ?, while that of GH9 is stressed, and therefore GH10 is pruned away. Such pruning is only needed on the few occasions when new general hypotheses are more general than other new general hypotheses. LEP-G continues generating new versions of hypotheses until all the examples have been examined. The nal version space is the one already shown in Figure 2. The pronunciation rule found is: if a occurs within an open, stressed syllable, it should be pronounced ei]. Our explanation so far uses the ordinary VSA. However, general models beginning with `?' are

[ei,?,?,?,?,?,?] GH1

N1: [ash,m,p,closed,noun, stressed,one-syllable]

GH2

[ei,?,?,open,?,?,?] N2: [schwa,b,n,open,noun, unstressed,one-syllable]

GH3

[?,?,?,open,?,?,?]

GH4

[ei,?,?,open,?,?,one-syllable]

N3: [ash,c,t,closed,noun, stressed,one-syllable] GH5

[ei,?,?,open,?,stressed,?]

Solution

Figure 3: Upper Portion of the Final Version Space with MVSA never used in the nal solution because the rst eld matches any negative example. If we use a modied Version Space algorithm (MVSA) that restricts the initial general model to ei, ?, ?, ?, ?, ?, ?], then perform the operations for the examples as above, the nal version space shown in Figure 3. Since the specic hypothesis space is the same as before, it is not shown in Figure 3. Note that both methods have converged to the same hypothesis. The exact sequence of hypotheses for MVSA is somewhat di erent from the leftmost chain in Figure 2 because each negative example interacts with a di erent level of general hypothesis. Some of the advantages of the modied VSA algorithm are clear from the examples just given. With MVSA, the version space is smaller than with VSA: fewer nodes (5 instead of 12) are present and the graph is less complicated. As well, fewer levels are required in the version space (4 instead of 5) and the maximum width of the version space, which measures the maximum storage requirements, is reduced (2 instead of 5). Although the gains are small with this simple example, they become correspondingly greater with more complicated version spaces.

3 Method

Our method is the Version Space algorithm 5] with the three modications. First, we avoid generating examples with an unconstrained value for the IPA symbol by choosing IPA, ?, ?, ..., ?] as the most general hypothesis, instead of ?, ?, ?, ..., ?] (see Section 2). Secondly, we restrict the number of specic hy-

potheses to one. When the current example is positive, specialization takes place and a new set of specic hypotheses is generated. Each new specic hypothesis must be minimally changed from the old, and it must also be a generalization of the input positive example. Suppose, the current specic hypothesis is ei, m, d, open, verb, stressed, one-syllable], and the input positive example is ei, c, k, open, noun, stressed, one-syllable]. If minimal changes are made, the possible new specic hypotheses are: ei, ?, d, open, verb, stressed, one-syllable] ei, m, ?, open, verb, stressed, one-syllable] ei, m, d, open, ?, stressed, one-syllable]

which are not generalizations of the input positive example. Therefore, the minimally changed specic hypothesis for this example is ei, ?, ?, open, ?, stressed, one-syllable]. Wherever di erences exist (such as here in the second, third, and fth elds), we must generalize them to ?. Thus, it is impossible to have more than one specic hypothesis at a time. For this reason, we simplify the VSA by omitting all operations involving more than one specic hypothesis, such as pruning away new specic hypotheses that are generalizations of other specic hypotheses. Thirdly, we avoid pruning the specic hypotheses that match the negative example. Since by the rst modication, the initial most general hypothesis is of the form IPA, ?, ?, ?, ?, ?, ?], the IPA value of all relevant specic hypotheses is xed. Also, by denition, all negative examples must have an IPA value di erent from that in the positive examples. Thus, specic hypotheses will never match any negative examples and we need not prune for this case.

Grapheme Run IPA GraLearned # Symbol pheme Before a 1 ei ? 2 ash ? e 3  ? 4 ii ? i 5 i ? 6 ai ? o 7 open o ? 8 o ? u 9 inv ? 10 u ? b 11 b ? 12 silent m c 13 k empty 14 k ? 15 s ? d 16 d ? ar 17 ar ? au 18 ash l or 19 open o r ? gh 20 silent ?

Grapheme After ? ? ? empty ? ? ? ? ? ? ? empty u l ? ? ? gh ? empty

Openness open closed closed open closed open closed open closed open ? closed ? ? open ? ? ? ? closed

Part of speech ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Stress # of Max # Max # Type of Sylla- of Hys of Hys of Syllable bles MVSA VSA Solution stressed ? 2 5 unique stressed 1 2 6 m.u.b. stressed 1 2 3 s.u.b. stressed 1 6 20 m.u.b. ? ? 1 2 unique stressed ? 1 2 unique stressed 1 1 2 s.u.b. stressed ? 2 3 s.u.b. stressed 1 2 4 s.u.b. stressed 1 1 2 s.u.b. ? ? 1 1 unique ? ? 3 12 m.u.b. ? ? 3 4 unique stressed ? 2 3 unique ? ? 2 3 unique ? ? 1 1 unique stressed ? 1 2 unique stressed ? 1 2 s.u.b. stressed ? 1 2 unique ? ? 3 4 unique

Table 1: Summary of Results and Statistics

4 Results

When implemented in Prolog, LEP.G learned to translate 12 English graphemes into IPA symbols and to generate 20 pronunciation rules. Table 1 summarizes the results of running the LEP-G program. In this table, the 1st column gives the grapheme, the 2nd column gives the run number, and the remaining columns give the learned values for the tuple elds. The 1st line of the table corresponds to the example discussed in Section 2, i.e., the rule that when the grapheme a occurs in an open, stressed syllable it is pronounced ei]. Complete output and a diagram of the version space for each example is presented in 2]. Table 1 also gives statistics for the results. The 10th column, gives the maximum number of hypotheses. The 11th column records the number of active examples, i.e., examples which cause changes to the version space. The last column tells what type of solution we obtained, where unique indicates that a single hypothesis was found, s.u.b. indicates that the solution hypothesis space has a single upper bound, and m.u.b. indicates that the solution hypothesis space has multiple upper bounds. Let us now discuss the most interesting result obtained, which is based on Run11 and Run12, where LEP-G attempted to produce a pronunciation rule for the grapheme b, which has two possible pronunciations: Case 1: b] in basic, cube, rub, blue, blackboard, and abacus Case 2: silent] in aplomb, bomb, climb, comb, thumb, and coxcomb For Run11, we use Case 1 as positive examples and Case 2 as the negative examples. The result is b, ?, ?, ?, ?, ?, ?], which is overly general because it matches all the words in Case 2, where b should be pronounced

silent]. Although, according to VSA, b, ?, ?, ?, ?, ?, ?]

does not match the negative examples (since all negative examples have the value silent] in the rst eld), it is not a correct pronunciation rule for the grapheme b when b follows grapheme m. In Run12, we use Case 2 as the positive examples and Case 1 as the negative examples. The solution is silent, m, empty, closed, ?, ?, ?], which means that if b is following m and nothing is following it, then b is silent. On the basis of this result, we concluded that LEP-G should accumulate all the rules as the learning process progresses and rearrange them according to their speci city. More speci c rules should have higher priority. In general, among the pronunciation rules for a grapheme, we expect the most speci c rule to be for the IPA symbol that has the fewest occurrences (for that grapheme) in the dictionary. When pronunciation rules are used in an English-to-IPA translator, the rule which has more elds with speci c values should have higher priority.

5 Conclusions

So far LEP-G has learned 12 graphemes and produced 20 pronunciation rules from 20 groups of English words. There are total of 64 dierent graphemes in English. The experiment where LEP-G learned pronunciation rules for 12 graphemes strongly suggests that learning the other 52 graphemes is feasible. As well, learning pronunciation rules for English seems possible, because all English words can be decomposed into individual graphemes. The learning program LEP-G does not store a pronouncing dictionary in the database, instead it accumulates the pronunciation rules that it has learned. This approach is space ecient and allows the pronunciation of unseen words.

References

1] H. Elovitz, R. Johnson, A. McHugh, and J. Shore. Letter-to-sound rules for automatic translation of English text to phonetics. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24:446{459, 1976.

2] H. J. Hamilton and J. Zhang. Learning pronunciation rules for English graphemes using the Version Space algorithm. Technical Report 93-2, Department of Computer Science, University of Regina, Regina, Sask., December 1993. 107 pages.

3] D.H. Klatt. The Klattalk text-to-speech system. In Proc. Int. Conf. Acoustics Speech Signal Processing, pages 1589{1592, 1982.

4] Ian R.A. Mackay. Phonetics: The Science of Speech Production. Pro.Ed, Austin, Texas, 1987.

5] T.M. Mitchell. Generalization as search. Articial Intelligence, 18:203{226, 1982.

6] Georey K. Pullum and William A. Ladusaw. Phonetic Symbol Guide. The University of Chicago Press, Chicago, 1986.

7] T. Sejnowski and C. Rosenberg. Parallel networks that learn to pronounce English text. Complex Systems, 1:145{168, 1987.

8] Jian Zhang. Automatic learning of English pronunciation for words. Graduate Student Schoolarly Research, Proceedings Supplement, 1993.

Suggest Documents