String Matching with Finite Automata
Working Mechanism
Idea of this approach is to build finite automata to scan text T for finding all occurrences of pattern P. This approach examines each character of text exactly once to find the pattern. Thus it takes linear time for matching but preprocessing time may be large.
Finite automata is defined by tuple M = {Q, S, q0, F, d}, where
Q = Set of states in finite automata
S = Set of input symbols
q0 = Initial state
F = Set of final states
d = Transition function defined as d : Q x S → Q
For example :
Q = {0, 1, 2, 3, 4, 5},
q0 = 0,
F = {2, 3},
S = {a, b},
And transition function :
d | a | b |
0 | 1 | 3 |
1 | 2 | 3 |
2 | 2 | 4 |
3 | 1 | 5 |
4 | 2 | 5 |
5 | 5 | 5 |
Graphically we can represent this finite automaton as shown in Fig. 8.3.1.
Finite automata are widely used in compilers and text processors for string matching. Given the string S over alphabet set S. Finite automata,
- Starts with input state q0
- Reads the input string character by character and changes the state according to transition function.
- It accepts the string if a finite automaton ends up in one of the
final / accepting states. - It rejects the string if a finite automaton does not end up in final state.
Let us trace some string for above given finite automata. Consider the string S = abababaaabab. Following table shows the transitions on every input symbol.
Input | a | b | a | b | a | b | a | a | a | b | a | b | |
State | 0 | 1 | 3 | 1 | 3 | 1 | 3 | 1 | 2 | 2 | 4 | 2 | 4 |
After scanning entire string, finite automata is in final state, so string is accepted by the automata. Consider the string S = babababb
Input | b | a | b | a | b | a | b | b | |
State | 0 | 3 | 1 | 3 | 1 | 3 | 1 | 3 | 5 |
The state 5 is not accepting state, so string is not accepted by given finite automata.
Let us extend this concept of finite automata for pattern matching.
Let us consider the text T = t1, t2, t3…tn and pattern P = p1, p2, p3…pm.
The string of length m goes through m + 1 states, numbered as
0, 1, 2, m.
State 0 will be initial / start state and state m will be the only
accepting / final state. If first k characters of pattern match with the text, FA will be in kth state.
Next character tj + k matches with pk + 1, implies first k + 1 characters are matched and FA goes in (k + 1)st state.
\[ delta \] (k, pk + 1) = k + 1
If next character tj + k does not match with pk + 1, then FA enters in one of the 0 to k state
Keep shifting pattern right till there is a match or p is exhausted.
If FA reaches to state m, pattern is found and computation stops. Search time for this approach is linear i.e. O(n). Each character in text is examined exactly once
Algorithm
Algorithm for pattern matching using finite automata is shown below
Algorithm FINTE_AUTOMATA(T, P)
// T is text of length n
// P is pattern of length m
state ← 0 // Initial state is 0
for i ← 1 to n do
state ← δ(state, t¬i) // Return the new state after reading character ti
if state == m then
print “Match found on position”, i - m + 1
end
end
Complexity Analysis
- Construction of finite automata takes O(m3 |Σ|) time in worst case, where m is the length of pattern and |S| is number of input symbols.
- This approach examines each character of text exactly once to find the pattern. So it takes linear time for matching. But pre-processing time may be large. Constructing automata is the difficult part for pattern matching.
- It runs fairly in O(n) time without considering time spend to build transition function.
- Finite automata for few patterns are shown in Fig.8.3.2 :
Examples
Example: Construct and show simulation of finite automata for matching the pattern P = abb for text T = ababbaababba
Solution:
Finite automata that can accept the string abb is shown in Fig. P. 8.3.1.
Moves made by the FA to accept the string abb are shown in Fig 8.3.3 :
Transition table for above finite automata is given as,
Input State | a | b |
q0 | q1 | q0 |
q1 | q1 | q2 |
q2 | q1 | q3 |
q3 | q1 | q0 |
Simulation of given data on this FA is shown in Table P.8.3.2.
a | b | a | b | b | a | a | b | a | b | b | a | Comment |
a | b | b | ||||||||||
a | b | b | ||||||||||
a | b | b | Match found | |||||||||
a | b | b | ||||||||||
a | b | b | ||||||||||
a | b | b | ||||||||||
a | b | b | ||||||||||
a | b | b | ||||||||||
a | b | b | Match found | |||||||||
a | b | b |