Naive String Matching Algorithm

Working Mechanism

  • This is simple and efficient brute force approach. It compares the first character of pattern with searchable text. If a match is found, pointers in both strings are advanced. If a match is not found, the pointer to text is incremented and pointer of the pattern is reset. This process is repeated till the end of the text.
  • The naïve approach does not require any pre-processing. Given text T and pattern P, it directly starts comparing both strings character by character.
  • After each comparison, it shifts pattern string one position to the right.
  • Following example illustrates the working of naïve string matching algorithm. Here,
    T = PLANINGANDANALYASIS and P = AND
  • Here, ti and pj are indices of text and pattern respectively.

Step 1: T[1] ≠ P[1], so advance text pointer, i.e. ti++.

Step 2 : T[2] ≠ P[1], so advance text pointers i.e. ti++

Step 3 : T[3] = P[1], so advance both pointers i.e. ti++, pj++

Step 4 : T[4] = P[2], so advance both pointers, i.e. ti++, pj++

Step 5 : T[5] ≠ P[3], so advance text pointer and reset pattern pointer, i.e. ti++, pj = 1

Step 6 : T[6] ≠ P[1], so advance text pointer, i.e. ti++

Step 7: T[7] ≠ P[1], so advance text pointer i.e. ti++

Step 8 : T[8] = P[1], so advance both pointers, i.e. ti++, pj++

Step 9 :   T[9] = P[2], so advance both pointers, i.e. ti++, pj++

Step 10 : T[10] = P[3], so advance both pointers, i.e. ti++, pj++

This process continues till the possible comparison in the text string.

Algorithm

Algorithm for naïve string matching approach is described below :

Algorithm  NAÏVE_STRING_MATCHING(T, P)
// T is the text string of length n
// P is the pattern of length m

for  i ← 0 to n – m  do
    if P[1… m] == T[i+1…i+m]  them
        print  “Match Found”
    end
end

Complexity Analysis

There are two cases of consideration :

(i)     Pattern found

  • The worst case occurs when the pattern is at last position and there are spurious hits all the way. Example, T = AAAAAAAAAAB, P = AAAB.
  • To move pattern one position right, m comparisons are made. Searchable text in T has a length (n – m). Hence, in worst case algorithm runs in O(m*(n – m)) time.

(ii)   Pattern not found 

  • In the best case, the searchable text does not contain any of the prefixes of the pattern. Only one comparison requires moving pattern one position right.

Example, T = ABABCDBCAC, P = XYXZ. The algorithm does O(n – m) comparisons.

  • In the worst case, first (m – 1) characters of pattern and text are matched and only last character does not match.

Example, T = AAAAAAAAAAAAC, P = AAAAB. Algorithm takes O(m*(n – m)) time.

Leave a Reply

Your email address will not be published. Required fields are marked *