Finite State Automata

University of Oslo : Department of Informatics Finite State Automata Jonathon Read 9 October 2012 INF4820: Algorithms for AI and NLP Previously ...
Author: Alaina Cross
1 downloads 2 Views 379KB Size
University of Oslo

: Department of Informatics

Finite State Automata Jonathon Read

9 October 2012

INF4820: Algorithms for AI and NLP

Previously

I

Regular expressions are powerful tools for describing infinite sets of strings

I

The fundamental operations are: I I I

Matching characters, wildcards (.) and anchors (ˆ and $) Disjunction (| and [ ]) Quantification (?, *, + and {n, m})

I

Precedence can be enforced with brackets (( and ))

I

More complex operations include capturing groups

Some examples

I

/ˆa (fox|wolf)$/ ⇒ { a fox a wolf }

I

/ˆf[aio]x]$/ ⇒ { fax fix fox }

I

/ˆ(fox[ $])+/ ⇒ { fox fox fox fox fox fox }

Today

I

Describing regular expressions with finite state automata

I

Using finite state automata to process strings

I

State-space search

I

Finite state transducers

I

Applications in gaming artificial intelligence

Finite state automata (FSAs)

I

Mathematical models of computation

I

Can be in any one of a finite number of states

I

Changes state in response to triggering conditions—sets of labelled transitions from state to state

Defining finite state automata

Jurafsky & Martin 2009 Q = {q0 , q1 , . . . , qN−1 } a finite set of N states Σ

a finite input vocabulary

qo

the start state

F δ q, i

the set of final states, F ⊆ Q 

the transition function between states. Given a state q ∈ Q and an input symbol i ∈ Σ, δ q, i returns a new state q0 ∈ Q

Sheeptalk

/baa+!/ ⇒ { baa! baaa! baaaa! baaaaa! . . . }

State transition tables

I

Rows represent states (i.e. q ∈ Q)

I

Columns represent possible input from the vocabulary Σ

I

I

Cells are transitions given a state and input  (i.e. δ q, i ) Final states are denoted using a colon

Input State

b

0 1 2 3 4:

1

a

!

2 3 3

4

Recognising deterministic FSAs D-Recognise Input: input string, state-transition-table, final-states Output: accept or reject current-state ← 0; index ← 0; repeat if end of input then if current-state is in final-states then return accept; else return reject; end else if state-transition-table[current-state, input[index]] is empty then return reject; else current-state ← state-transition-table[current-state, input[index]]; index ← index + 1; end until stop;

Non-deterministic FSAs

/b((aa)+|(aaa)+)!/ ⇒ { baa! baaa! baaaa! baaaaaa! ... }

Abstract approaches to searching

Heuristic I

Look ahead at input beyond the current index

I

Try to work out/guess which branch to take

Parallel I

Assume unlimited number of cpus etc.

I

Copy state and remaining input, search all branches

Backtracking I

Keep track of choice points

I

Follow each branch, returning to choice points on failure

Recognising non-deterministic FSAs ND-Recognise Input: input string, state-transition-table, final-states Output: accept or reject

agenda ← {h0, 0i}; repeat current-state, index ← pop(agenda); if end of input and current-state in final-states then return accept; end for next-state in state-transition-table[current-state, input[index]] do agenda ← agenda ∪ {hnext-state, index + 1i}; end if agenda is empty then return reject; end until stop;

-transitions

Arcs that do not consume input are called -transitions

-transitions: Concatenation

-transitions: Closure (*)

-transitions: Union (|)

Deterministic and non-deterministic FSAs

Are non-deterministic FSAs more powerful? No, every non-deterministic FSA has a deterministic equivalent (Hopcroft and Ullman, 1979). But they are easier to read—given a non-deterministic FSA with n nodes, its deterministic equivalent can have up to 2n nodes.

Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley.

Finite state transducers (FST)

Finite state automata represent strings in language. Finite state transducers are a type of FSA which map between string pairs. They can have several functions: Recognisers Given a pair of strings, determine whether they are in the string pair language Generators Represent how to construct output in the string pair language Translator Read in one string and output another string Relator Compute relations between sets

Defining finite state transducers

Jurafsky & Martin 2009 Q = {q0 , q1 , . . . , qN−1 } a finite set of N states Σ

a finite input vocabulary



a finite output vocabulary

qo

the start state

F

the set of final states, F ⊆ Q

 δ q, i  σ q, i

the transition function the output function giving the set of possible output strings for each state and input

Translating Sheeptalk

/baa+!/ becomes /bæ+!/

q0

a:æ

a:

b:b q1

q2

!:! q3

q4

a:æ

Exercise: Hunden snakke Draw a transducer for Norwegian and English dogs, i.e. /(vo(ff|v)[ $])+/ becomes /(woof[ $])+/

Controlling agent behaviour with FSAs

An FSA can control agents in games, e.g. ghosts in Pac-Man have four behaviours: 1. Wander the maze 2. Chase Pac-Man 3. Run away from Pac-Man 4. Return to the centre Each of these behaviours depend on certain triggering events.

Ghost FSA sight Pac-Man

q0 wander maze

Pac-Man eats power-up is in centre

q4 return to centre

lose sight of Pac-Man

q1 chase Pac-Man

Pac-Man eats power-up power-up wears off

is eaten by Pac-Man

q2 run from Pac-Man

Summary

I

Modeling sequences of words with finite state automata (FSAs)

I

FSAs are models of computation that can be in a finite number of states

I

We define transitions that change the state of the machine

I

Non-deterministic FSAs contain choice-points, where for some combination of state and input there is more than one possible action

I

These can be handled with a back-tracking search

I

Finite state transducers produce output in response to input

Next week

I

Probability theory: terminology and notation

I

Estimating probability of words using corpora

I

Handling unseen sequences

I

Applications in natural language processing

Suggest Documents