String Matching with Finite Automata

Working Mechanism

Idea of this approach is to build finite automata to scan text T for finding all occurrences of pattern P. This approach examines each character of text exactly once to find the pattern. Thus it takes linear time for matching but preprocessing time may be large.

Finite automata is defined by tuple M = {Q, S, q0, F, d}, where  

Q =   Set of states in finite automata

S =   Set of input symbols

q0 =   Initial state

F =   Set of final states

d =   Transition function defined as d : Q x S → Q

For example :

Q = {0, 1, 2, 3, 4, 5},

q0 = 0,

F = {2, 3},

S = {a, b},

And transition function :

dab
013
123
224
315
425
555

Graphically we can represent this finite automaton as shown in Fig. 8.3.1.

Finite automata are widely used in compilers and text processors for string matching. Given the string S over alphabet set S. Finite automata,

  • Starts with input state q
  • Reads the input string character by character and changes the state according to transition function.
  • It accepts the string if a finite automaton ends up in one of the
    final / accepting states.
  • It rejects the string if a finite automaton does not end up in final state.

Let us trace some string for above given finite automata. Consider the string S = abababaaabab. Following table shows the transitions on every input symbol.

Input abababaaabab
State0131313122424

After scanning entire string, finite automata is in final state, so string is accepted by the automata. Consider the string S = babababb

Input babababb
State031313135

The state 5 is not accepting state, so string is not accepted by given finite automata.

Let us extend this concept of finite automata for pattern matching.

Let us consider the text T = t1, t2, t3…tn and pattern P = p1, p2, p3…pm.

The string of length m goes through m + 1 states, numbered as
0, 1, 2, m.

State 0 will be initial / start state and state m will be the only
accepting / final state. If first k characters of pattern match with the text, FA will be in kth state.

Next character tj + k matches with pk + 1, implies first k + 1 characters are matched and FA goes in (k + 1)st state.

\[ delta \] (k, pk + 1)   =   k + 1

If next character tj + k does not match with pk + 1, then FA enters in one of the 0 to k state

Keep shifting pattern right till there is a match or p is exhausted.

If FA reaches to state m, pattern is found and computation stops. Search time for this approach is linear i.e. O(n). Each character in text is examined exactly once

Algorithm

Algorithm for pattern matching using finite automata is shown below

Algorithm  FINTE_AUTOMATA(T, P)
// T is text of length n
// P is pattern of length m

state ← 0    // Initial state is 0
for  i ← 1 to n do
    state ← δ(state, t¬i)    // Return the new state after reading character ti
    if state == m then
        print “Match found on position”, i - m + 1
    end
end

Complexity Analysis

  • Construction of finite automata takes O(m3 |Σ|) time in worst case, where m is the length of pattern and |S| is number of input symbols.
  • This approach examines each character of text exactly once to find the pattern. So it takes linear time for matching. But pre-processing time may be large. Constructing automata is the difficult part for pattern matching.
  • It runs fairly in O(n) time without considering time spend to build transition function.
  • Finite automata for few patterns are shown in Fig.8.3.2 :
FA accepting string a2n-1
FA accepting string a2n
FA accepting string (ab)n

Examples

Example: Construct and show simulation of finite automata for matching the pattern P = abb for text T = ababbaababba

Solution:

Finite automata that can accept the string abb is shown in Fig. P. 8.3.1.

Moves made by the FA to accept the string abb are shown in Fig 8.3.3 :

Transition steps

Transition table for above finite automata is given as,

Input Stateab
q0q1q0
q1q1q2
q2q1q3
q3q1q0

Simulation of given data on this FA is shown in Table P.8.3.2.

ababbaababbaComment
abb          
 abb         
  abb       Match found
   abb       
    abb      
     abb     
      abb    
       abb   
        abb Match found
         abb 

 

Leave a Reply

Your email address will not be published. Required fields are marked *