Weighted Finite-State Transducers Alexander Seward Centre for Speech Technology KTH Stockholm, Sweden
Alexander Seward KTH
Why Weighted Finite-State Transducers? 1. Efficiency and Generality of Classical Automata Algorithms Efficient algorithms for a variety of problems (e.g. string-matching, compilers, ,parsing, pattern matching, process industri, design of controllability systems in aircrafts). General algorithms: rational operations, optimizations. 2. Weights Handling uncertainty: text, handwritten text, speech, image, biological sequences. Increased generality: finite-state transducers, multiplicity/indeterminism. 3. Applications Text: pattern-matching, indexation, compression. Speech: Large-vocabulary speech recognition, speech synthesis. Image: image compression, filters. * credits to M.Mohri
Alexander Seward KTH
FST Recognized speech DECODER
Alexander Seward KTH
What is a Weighted Finite-State Transducer (WFST) ? DÆE / 0.8
BÆ A / 0.6
AÆB / 0.2
A finite-state machine where each arc is a weighted transduction consisting of an input, an output, and a probability/weight Simply put: A translation device A WFSA is a transducer without output Alexander Seward KTH
WFSTs in recognition • I want a ticke..#noise”..Boston from New York
FST trained on acoustics and language corpus: #noise” must be ”t...to” !
Alexander Seward KTH
WFSTs in recognition The bare was bear naked?
• FST trained on language corpus: • The bear was bare naked
Alexander Seward KTH
Recognition Cascade (simplified) I: Input feature vectors H: HMM
Feature vectors Feature vectors
w
C: Context-Dependency Model CD-HMMs
w
L: Lexicon
Transcr. syms
w
G: Grammars
Words
w
CD-HMMs Transcr. syms Words Words
Use Weighted FST Composition to compose the parts into one Alexander Seward KTH
Weighted FST operations Best-path
Difference
Weight pushing
Closure
Equivalence
Label pushing
Compaction
Hadamard product
Reversal
Composition
Inversion
Epsilon removal
Concatenation
Minimization
Topological sort
Connection
Projection
Union
Determinization
Pruning Alexander Seward KTH
Language model WFSA Model a priori weights for different word sequences (n-grams)
w/0
w/2 x/2
0
y/4
y/5 1
y/3
w/2
z/2
3
z/3
4
x/2
z/1
5/0 x/2 2
4 fictitious words: w, x, y, z
Alexander Seward KTH
LM WFSA - Minimized Model a priori weights to different word sequences (n-grams)
w/0 0
y/4 x/0 y/1
z/1 1 w/7
z/0 2
4 fictitious words: w, x, y, z
Alexander Seward KTH
x/0
3/7
Pronunciation knowledge
Use different weights to model likelihood of pronuciations! Alexander Seward KTH
Lexicon transducer Some phonetically similar words 1
A::-
2
L:-
A:-
3
4
D:talad
31
T:T:T:0
5
10
A::-
A::-
6
11
L:-
L:-
A:-
7
12
A:-
8
13
R:-
N:-
9
14
E0:talare
D:-
32
15
T:T:-
16
A::-
17
L:-
18
F:-
19
E::-
20
L:talfel
36
L.fsm T:-
21
26
A::-
A::-
22
27
L:-
L:-
23
28
A:-
F:-
24
29
2N:-
Ö3:-
36 states Alexander Seward KTH
25
30
A:talarna
R:talför
34
35
E0:talande
33
Euqivalent lexicon transducer – Deterministic 6
2N:-
A:talarna
D:talad
4
R:-
7
E0:talare
A:-
E0:talande
N:0 L2.fsm
T:-
1
A::-
2
L:-
3
8
D:-
11
F:5
Ö3:E::-
9
10
36 Æ 13 states Alexander Seward KTH
R:talför
L:talfel
12
Weighted determinization
a:ba/4 b:aa/3 0
d:c/5
1
a:b/3 b:a/2
c:c/4
2
a:eps/2
3
a:eps/0
5/0
4
c:b/4 a:eps/3 0
1
b:eps/2
d:ba/8 c:a/4
2
3
d:aa/8
Alexander Seward KTH
a:c/0
4/0
Weight pushing x:x/0
x:x/10 1
y:y/1 0
z:z/8
z:z/1 w:w/3
x:x/10
2
1
w:w/10
y:y/2
z:z/0
z:z/9 w:w/11
3/0
x:x/0
y:y/11 0
y:y/12
2
w:w/2
Alexander Seward KTH
3/0
Weighted composition z:x/4 x:y/1 0
1
x:x/3 y:y/5
y:x/2
A: (x, z, z, x) Æ (y, x, x, x) weight: 1 + 4 + 4 + 3 = 12.
A
3/0
B: (y, x, x, x) Æ (z, y, y, y) weight: 3 + 6 + 5 + 5 = 19.
2
o
x:y/5
x:eps/2 0
y:z/3
x:y/6
1/0
=
0
A ○ B: (x, z, z, x) Æ (z, y, y, y) total weight: 12 + 19 = 31.
z:y/9
z:y/10 x:z/4
B
2/0
2 x:y/9
x:y/8 3/0
1
y:eps/4 4
y:z/8
5/0
C
C = A ○ B: (x, z, z, x) Æ (z, y, y, y) same total weight: 4 + 10 + 9 + 8 = 31.
Alexander Seward KTH
Lexicon + Grammar: L ○ G
N:han/1.25
H:eps/0
1
A:eps/0 D:hand/4
2
3
D:dukar/2
N:eps/0 4
D:handukar/2.5
5
0/0
U::eps/0 6
K:eps/0
7
A:eps/0 8
R:eps/0
han hand handdukar dukar
HAN HAND H A N D U: K A R D U: K A R
(Eng: he) (Eng: hand) (Eng: towels) (Eng: set the table)
Sequence: han dukar
H A N D U: K A R
(Eng: he sets the table)
Alexander Seward KTH
HMMs as WFSTs @1:eps/0.6
@1:B[A:]R/1
1
@2:eps/0.45 @2:eps/0.4
@4:eps/0.5
0
@4:R[A:]T/1
4
@6:V[A]L/1
@6:eps/0.5 7
2
@3:eps/0.7 @3:eps/0.55
@2:eps/0.6 @2:eps/0.5
5
8
eps:eps/0.3
@5:eps/0.65 @5:eps/0.4
@7:eps/0.6 @7:eps/0.5
3
6
eps:eps/0.35
@3:eps/0.65 eps:eps/0.35 @3:eps/0.4
eps:eps/0
Alexander Seward KTH
9
10/0
Context-Dependendency Modeling “…var i det…” Bigrams + Triphones (no X-word)
Bigrams
v a r i d e
Triphone
Monophone Diphones
Trigrams + X-word Triphones
v a r i d e
r [i] d a [r] i x [v] a , v[a]r ,
i[d]e
Trigrams
Triphones
d[e]x Alexander Seward KTH