String Matching Algorithms
String matching operation is a core part in many text processing applications. The objective of this algorithm is to find pattern P from given text T. Typically |P|<< |T|. In the design of compilers and text editors, string matching operation is crucial. So locating P in T efficiently is very important.
The problem is defined as follows: “Given some text string T[1….n] of size n, find all occurrences of pattern P[1…m] of size m in T.”
We say that P occurs in text T with number of shifts s, if 0 ≤ s ≤ n – m and T[ (s + 1) … (s + m) ] = P[1…m].
Consider the following example
- In this example, pattern P = ARE is found in text T after four shifts.
- The classical application of such algorithms are to find particular protein pattern in DNA sequence.
- Strings may be encoded using set of character alphabets {a, b, …, z}, binary alphabets {0, 1}, decimal alphabets {0, 1, 2, …, 9}, DNA alphabets {A, C, G, T}. The encoding of the string directly affects the efficiency of searching.
- In the next sections, we will discuss and analyze a few string-matching algorithms.
Some Popular String Matching Algorithms:
- Naive String Matching Algorithm
- String Maching with Automata
- Rabin Karp String Matching Algorithm
- KMP String Matching Algorithm
Additional Reading: [Wiki]