University of Oslo
: Department of Informatics
Finite State Automata Jonathon Read
9 October 2012
INF4820: Algorithms for AI and NLP
Previously
I
Regular expressions are powerful tools for describing infinite sets of strings
I
The fundamental operations are: I I I
Matching characters, wildcards (.) and anchors (ˆ and $) Disjunction (| and [ ]) Quantification (?, *, + and {n, m})
I
Precedence can be enforced with brackets (( and ))
I
More complex operations include capturing groups
Some examples
I
/ˆa (fox|wolf)$/ ⇒ { a fox a wolf }
I
/ˆf[aio]x]$/ ⇒ { fax fix fox }
I
/ˆ(fox[ $])+/ ⇒ { fox fox fox fox fox fox }
Today
I
Describing regular expressions with finite state automata
I
Using finite state automata to process strings
I
State-space search
I
Finite state transducers
I
Applications in gaming artificial intelligence
Finite state automata (FSAs)
I
Mathematical models of computation
I
Can be in any one of a finite number of states
I
Changes state in response to triggering conditions—sets of labelled transitions from state to state
Defining finite state automata
Jurafsky & Martin 2009 Q = {q0 , q1 , . . . , qN−1 } a finite set of N states Σ
a finite input vocabulary
qo
the start state
F δ q, i
the set of final states, F ⊆ Q
the transition function between states. Given a state q ∈ Q and an input symbol i ∈ Σ, δ q, i returns a new state q0 ∈ Q
Sheeptalk
/baa+!/ ⇒ { baa! baaa! baaaa! baaaaa! . . . }
State transition tables
I
Rows represent states (i.e. q ∈ Q)
I
Columns represent possible input from the vocabulary Σ
I
I
Cells are transitions given a state and input (i.e. δ q, i ) Final states are denoted using a colon
Input State
b
0 1 2 3 4:
1
a
!
2 3 3
4
Recognising deterministic FSAs D-Recognise Input: input string, state-transition-table, final-states Output: accept or reject current-state ← 0; index ← 0; repeat if end of input then if current-state is in final-states then return accept; else return reject; end else if state-transition-table[current-state, input[index]] is empty then return reject; else current-state ← state-transition-table[current-state, input[index]]; index ← index + 1; end until stop;
Non-deterministic FSAs
/b((aa)+|(aaa)+)!/ ⇒ { baa! baaa! baaaa! baaaaaa! ... }
Abstract approaches to searching
Heuristic I
Look ahead at input beyond the current index
I
Try to work out/guess which branch to take
Parallel I
Assume unlimited number of cpus etc.
I
Copy state and remaining input, search all branches
Backtracking I
Keep track of choice points
I
Follow each branch, returning to choice points on failure
Recognising non-deterministic FSAs ND-Recognise Input: input string, state-transition-table, final-states Output: accept or reject
agenda ← {h0, 0i}; repeat current-state, index ← pop(agenda); if end of input and current-state in final-states then return accept; end for next-state in state-transition-table[current-state, input[index]] do agenda ← agenda ∪ {hnext-state, index + 1i}; end if agenda is empty then return reject; end until stop;
-transitions
Arcs that do not consume input are called -transitions
-transitions: Concatenation
-transitions: Closure (*)
-transitions: Union (|)
Deterministic and non-deterministic FSAs
Are non-deterministic FSAs more powerful? No, every non-deterministic FSA has a deterministic equivalent (Hopcroft and Ullman, 1979). But they are easier to read—given a non-deterministic FSA with n nodes, its deterministic equivalent can have up to 2n nodes.
Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley.
Finite state transducers (FST)
Finite state automata represent strings in language. Finite state transducers are a type of FSA which map between string pairs. They can have several functions: Recognisers Given a pair of strings, determine whether they are in the string pair language Generators Represent how to construct output in the string pair language Translator Read in one string and output another string Relator Compute relations between sets
Defining finite state transducers
Jurafsky & Martin 2009 Q = {q0 , q1 , . . . , qN−1 } a finite set of N states Σ
a finite input vocabulary
∆
a finite output vocabulary
qo
the start state
F
the set of final states, F ⊆ Q
δ q, i σ q, i
the transition function the output function giving the set of possible output strings for each state and input
Translating Sheeptalk
/baa+!/ becomes /bæ+!/
q0
a:æ
a:
b:b q1
q2
!:! q3
q4
a:æ
Exercise: Hunden snakke Draw a transducer for Norwegian and English dogs, i.e. /(vo(ff|v)[ $])+/ becomes /(woof[ $])+/
Controlling agent behaviour with FSAs
An FSA can control agents in games, e.g. ghosts in Pac-Man have four behaviours: 1. Wander the maze 2. Chase Pac-Man 3. Run away from Pac-Man 4. Return to the centre Each of these behaviours depend on certain triggering events.
Ghost FSA sight Pac-Man
q0 wander maze
Pac-Man eats power-up is in centre
q4 return to centre
lose sight of Pac-Man
q1 chase Pac-Man
Pac-Man eats power-up power-up wears off
is eaten by Pac-Man
q2 run from Pac-Man
Summary
I
Modeling sequences of words with finite state automata (FSAs)
I
FSAs are models of computation that can be in a finite number of states
I
We define transitions that change the state of the machine
I
Non-deterministic FSAs contain choice-points, where for some combination of state and input there is more than one possible action
I
These can be handled with a back-tracking search
I
Finite state transducers produce output in response to input
Next week
I
Probability theory: terminology and notation
I
Estimating probability of words using corpora
I
Handling unseen sequences
I
Applications in natural language processing