String Matching Algorithms

String matching operation is a core part in many text processing applications. The objective of this algorithm is to find pattern P from given text T. Typically |P|<< |T|. In the design of compilers and text editors, string matching operation is crucial. So locating P in T efficiently is very important.

The problem is defined as follows: “Given some text string T[1….n] of size n, find all occurrences of pattern P[1…m] of size m in T.”

We say that P occurs in text T with number of shifts s, if 0 ≤ s ≤ n – m and T[ (s + 1) … (s + m) ] = P[1…m].

Consider the following example

String matching
  • In this example, pattern P = ARE is found in text T after four shifts.
  • The classical application of such algorithms are to find particular protein pattern in DNA sequence.
  • Strings may be encoded using set of character alphabets {a, b, …, z}, binary alphabets {0, 1}, decimal alphabets {0, 1, 2, …, 9}, DNA alphabets {A, C, G, T}. The encoding of the string directly affects the efficiency of searching.
  • In the next sections, we will discuss and analyze a few string-matching algorithms.

 Additional Reading: [Wiki]

Leave a Reply

Your email address will not be published. Required fields are marked *