Naive String Matching Algorithm
- This is simple and efficient brute force approach. It compares the first character of pattern with searchable text. If a match is found, pointers in both strings are advanced. If a match is not found, the pointer to text is incremented and pointer of the pattern is reset. This process is repeated till the end of the text.
- The naïve approach does not require any pre-processing. Given text T and pattern P, it directly starts comparing both strings character by character.
- After each comparison, it shifts pattern string one position to the right.
- Following example illustrates the working of naïve string matching algorithm. Here,
T = PLANINGANDANALYASIS and P = AND
- Here, ti and pj are indices of text and pattern respectively.
Step 1: T ≠ P, so advance text pointer, i.e. ti++.
Step 2 : T ≠ P, so advance text pointers i.e. ti++
Step 3 : T = P, so advance both pointers i.e. ti++, pj++
Step 4 : T = P, so advance both pointers, i.e. ti++, pj++
Step 5 : T ≠ P, so advance text pointer and reset pattern pointer, i.e. ti++, pj = 1
Step 6 : T ≠ P, so advance text pointer, i.e. ti++
Step 7: T ≠ P, so advance text pointer i.e. ti++
Step 8 : T = P, so advance both pointers, i.e. ti++, pj++
Step 9 : T = P, so advance both pointers, i.e. ti++, pj++
Step 10 : T = P, so advance both pointers, i.e. ti++, pj++
This process continues till the possible comparison in the text string.
Algorithm for naïve string matching approach is described below :
Algorithm NAÏVE_STRING_MATCHING(T, P) // T is the text string of length n // P is the pattern of length m for i ← 0 to n – m do if P[1… m] == T[i+1…i+m] them print “Match Found” end end
There are two cases of consideration :
(i) Pattern found
- The worst case occurs when the pattern is at last position and there are spurious hits all the way. Example, T = AAAAAAAAAAB, P = AAAB.
- To move pattern one position right, m comparisons are made. Searchable text in T has a length (n – m). Hence, in worst case algorithm runs in O(m*(n – m)) time.
(ii) Pattern not found
- In the best case, the searchable text does not contain any of the prefixes of the pattern. Only one comparison requires moving pattern one position right.
Example, T = ABABCDBCAC, P = XYXZ. The algorithm does O(n – m) comparisons.
- In the worst case, first (m – 1) characters of pattern and text are matched and only last character does not match.
Example, T = AAAAAAAAAAAAC, P = AAAAB. Algorithm takes O(m*(n – m)) time.