String Matching with Finite Automata

Working Mechanism

Idea of this approach is to build finite automata to scan text T for finding all occurrences of pattern P. This approach examines each character of text exactly once to find the pattern. Thus it takes linear time for matching but preprocessing time may be large.

Finite automata is defined by tuple M = {Q, S, q0, F, d}, where  

Q =   Set of states in finite automata

S =   Set of input symbols

q0 =   Initial state

F =   Set of final states

d =   Transition function defined as d : Q x S → Q

For example :

Q = {0, 1, 2, 3, 4, 5},

q0 = 0,

F = {2, 3},

S = {a, b},

And transition function :


Graphically we can represent this finite automaton as shown in Fig. 8.3.1.

Finite automata are widely used in compilers and text processors for string matching. Given the string S over alphabet set S. Finite automata,

  • Starts with input state q
  • Reads the input string character by character and changes the state according to transition function.
  • It accepts the string if a finite automaton ends up in one of the
    final / accepting states.
  • It rejects the string if a finite automaton does not end up in final state.

Let us trace some string for above given finite automata. Consider the string S = abababaaabab. Following table shows the transitions on every input symbol.

Input abababaaabab

After scanning entire string, finite automata is in final state, so string is accepted by the automata. Consider the string S = babababb

Input babababb

The state 5 is not accepting state, so string is not accepted by given finite automata.

Let us extend this concept of finite automata for pattern matching.

Let us consider the text T = t1, t2, t3…tn and pattern P = p1, p2, p3…pm.

The string of length m goes through m + 1 states, numbered as
0, 1, 2, m.

State 0 will be initial / start state and state m will be the only
accepting / final state. If first k characters of pattern match with the text, FA will be in kth state.

Next character tj + k matches with pk + 1, implies first k + 1 characters are matched and FA goes in (k + 1)st state.

\[ delta \] (k, pk + 1)   =   k + 1

If next character tj + k does not match with pk + 1, then FA enters in one of the 0 to k state

Keep shifting pattern right till there is a match or p is exhausted.

If FA reaches to state m, pattern is found and computation stops. Search time for this approach is linear i.e. O(n). Each character in text is examined exactly once


Algorithm for pattern matching using finite automata is shown below

// T is text of length n
// P is pattern of length m

state ← 0    // Initial state is 0
for  i ← 1 to n do
    state ← δ(state, t¬i)    // Return the new state after reading character ti
    if state == m then
        print “Match found on position”, i - m + 1

Complexity Analysis

  • Construction of finite automata takes O(m3 |Σ|) time in worst case, where m is the length of pattern and |S| is number of input symbols.
  • This approach examines each character of text exactly once to find the pattern. So it takes linear time for matching. But pre-processing time may be large. Constructing automata is the difficult part for pattern matching.
  • It runs fairly in O(n) time without considering time spend to build transition function.
  • Finite automata for few patterns are shown in Fig.8.3.2 :
FA accepting string a2n-1
FA accepting string a2n
FA accepting string (ab)n


Example: Construct and show simulation of finite automata for matching the pattern P = abb for text T = ababbaababba


Finite automata that can accept the string abb is shown in Fig. P. 8.3.1.

Moves made by the FA to accept the string abb are shown in Fig 8.3.3 :

Transition steps

Transition table for above finite automata is given as,

Input Stateab

Simulation of given data on this FA is shown in Table P.8.3.2.

  abb       Match found
        abb Match found


Leave a Reply

Your email address will not be published. Required fields are marked *