# Longest Common Subsequence – Solving using Dynamic Programming

## Longest Common Subsequence

**Longest Common Subsequence Problem: **Let A = < a_{1}, a_{2}, a_{3} …a_{n}> and B = < b_{1}, b_{2}, b_{3} …b_{m}> be two strings over an alphabets. Then B is subsequence of A, if B can be generated by striking out some elements from A.

By subsequence, we mean that the values must occur in the order of the sequence, but they need not be consecutive.

If A = < X, Y, X, X, Z, Y, X > and B = < X, X, Y, X > then by deleting A[2], A[4] and A[5] from A, we can derive B. So B is the subsequence of A.

The common subsequence of A and B is the subsequence that can be generated by striking some characters from A and B both. If A = {a, b, a, c, b, c, b} and B = {a, b, c, b, b, a, c}, then the sequence {a, c}, {a, b, c}, {a, c, c}, {a, b, c, b} etc. are common subsequences of A and B, but {a, b, c, b, a} is not. {a, b, c, b, a} is a subsequence of B but it is not a subsequence of A. String of length m has 2^{m} subsequences. So finding the longest common subsequences using the brute force approach takes exponential time. Such algorithms are practically not useful for long sequences.

## Mathematical Formulation

Let us consider two strings A_{m} and B_{n} of length m and n respectively. If the last character of both strings is the same i.e. a_{m} = b_{n}, then the length of LCS is incremented by one, and the length of both strings is reduced by one. The new problem is now to find LCS of strings A_{m-1} and B_{n-1}.

If a_{m} ¹ b_{n}, we shall consider two sub-problems.

- Apply LCS on strings A
_{m-1}and B - Apply LCS on strings A and B
_{n-1}

Select the result which gives the longest subsequence.

Thus, the optimal substructure of LCS problem is defined as,

## Algorithm for Longest Common Subsequence

The algorithm to solve the LCS problem is described below :

**Algorithm **LONGEST_COMMON_SUBSEQUENCE(X, Y)
// X is string of length n
// Y is string of length m
**for **i ← 1 to m **do**
LCS [i, 0] ← 0
**end**
**for **j ← 0 to n
**do**
LCS [0, j] ← 0
**end**
**for **i ← 1 to m **do**
**for **j ← 1 to n **do**
**if **Xi = = Yj **then**
LCS[i, j] ← LCS[i – 1, j – 1] + 1
**else**
**if **LCS[i – 1, j] ≥ LCS[i, j – 1] **then**
LCS[i, j] ← LCS[i – 1, j]
**else**
LCS[i, j] ← LCS[i, j – 1]
**end**
**end**
**end**
**end**
**return **LCS

## Complexity Analysis of Longest Common Subsequence

In a brute force attack, we need to perform check every subsequence of P[1…m] to see if it is also a subsequence of Q[1…n]. Checking membership of one subsequence of P[1…m] into Q[1…n] takes O(n) time. 2^{m} subsequences are possible for string P of length m. So worst-case running time of brute force approach would be O(n.2^{m}) In dynamic programming, only table of size m ´ n is filled up using two nested for loops. So running time of the dynamic programming approach would take O(mn), the same is the space complexity.

## Example

**Example: Given two sequences of characters, P=<MLNOM> Q=<MNOM>. Obtain the longest common subsequence.**

**Solution:**

Given two strings are P = <MLNOM> and Q = <MNOM>

The optimal substructure of LCS problem using dynamic programming is given as :

Initially, the table looks like,

Let’s compute the remaining cell values row by row :

**Computation for row 1 :**

**LCS [1, 1]** ⇒ i = 1, j = 1, P_{i} = M, Q_{j} = M

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [1, 1] = 1 + LCS[0, 0] = 1 + 0 = 1

**LCS [1, 2]** ⇒ i = 1, j = 2, P_{i} = M, Q_{j} = N

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [1, 2] = max (LCS [0, 2], LCS [1, 1]) = max (0, 1) = 1

**LCS [1, 3]** ⇒ i = 1, j = 3, P_{i} = M, Q_{j} = O

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [1, 3] = max(LCS[0, 3], LCS[1, 2]) = max(0, 1) = 1

**LCS [1, 4]** ⇒ i = 1, j = 4, P_{i} = M, Q_{j} = M

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [1, 4] = 1 + LCS[0, 3] = 1 + 0 = 1

**Computation for row 2 :**

**LCS [2, 1]** ⇒ i = 2, j = 1, P_{i} = L, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [2, 1] = max (LCS [1, 1], LCS [2, 0]) = max (1, 0) = 1

**LCS [2, 2]** ⇒ i = 2, j = 2, P_{i} = L, Q_{j} = N

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [2, 2] = max (LCS [1, 2], LCS [2, 1]) = max (1, 1) = 1

**LCS [2, 3]** ⇒ i = 2, j = 3, P_{i} = L, Q_{j} = O

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [2, 3] = max(LCS[1, 3], LCS[2, 2]) = max(1, 1) = 1

**LCS [2, 4] **⇒ i = 2, j = 4, P_{i} = L, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [2, 4] = max (LCS [1, 4], LCS [2, 3]) = max (1, 1) = 1

**Computation for row 3 :**

**LCS [3, 1]** ⇒ i = 3, j = 1, P_{i} = N, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [3, 1] = max (LCS [2, 1], LCS [3, 0]) = max (1, 0) = 1

**LCS [3, 2]** ⇒ i = 3, j = 2, P_{i} = N, Q_{j} = N

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [3, 2] = 1 + LCS[2, 1] = 1 + 1 = 2

**LCS [3, 3]** ⇒ i = 3, j = 3, P_{i} = N, Q_{j} = O

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [3, 3] = max(LCS[2, 3], LCS[3, 2]) = max(1, 2) = 2

**LCS [3, 4]** ⇒ i = 3, j = 4, P_{i} = N, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [3, 4] = max (LCS [2, 4], LCS [3, 3]) = max (1, 2) = 2

**Computation for row 4 :**

**LCS [4, 1] **⇒ i = 4, j = 1, P_{i} = O, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [4, 1] = max (LCS [3, 1], LCS [4, 0]) = max (1, 0) = 1

**LCS [4, 2]** ⇒ i = 4, j = 2, P_{i} = O, Q_{j} = N

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [4, 2] = max (LCS [3, 2], LCS [4, 1]) = max (2, 1) = 2

**LCS [4, 3]** ⇒ i = 4, j = 3, P_{i} = O, Q_{j} = O

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [4, 3] = 1 + LCS[3, 2] = 1 + 2 = 3

**LCS [4, 4]** ⇒ i = 4, j = 4, P_{i} = O, Q_{j} = M

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [4, 4] = max (LCS [3, 4], LCS [4, 3]) = max (2, 3) = 3

**Computation for row 5 :**

**LCS [5, 1]** ⇒ i = 5, j = 1, P_{i} = M, Q_{j} = M

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [5, 1] = 1 + LCS[4, 0] = 1 + 0 = 1

**LCS [5, 2] ** ⇒ i = 5, j = 2, P_{i} = M, Q_{j} = N

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [5, 2] = max (LCS [4, 2], LCS [5, 1]) = max (2, 1) = 2

**LCS [5, 3]** ⇒ i = 5, j = 3, P_{i} = M, Q_{j} = O

P_{i} ≠ Q_{j} ⇒ LCS [i, j] = max (LCS [i – 1 , j], LCS [i, j – 1])

LCS [5, 3] = max (LCS [4, 3], LCS [5, 2]) = max (3, 2) = 3

**LCS [5, 4]** ⇒ i = 5, j = 4, P_{i} = M, Q_{j} = M

P_{i} = Q_{j} ⇒ LCS [i, j] = 1 + LCS[i – 1, j – 1]

LCS [5, 4] = 1 + LCS[5, 3] = 1 + 3 = 4

So finally, table would be,

**Trace the solution:**

Longest common subsequence for given strings is of length 4. Let us find LCS of P and Q :

**Step 1:** i = 5, j = 4, P_{i} = M, Q_{j} = M

P_{i} = Q_{i}, so add Pi to solution set. And move diagonally back.

Solution set S = {M }

**Step 2:** i = 4, j = 3, P_{i} = O, Q_{j} = O

P_{i} = Q_{i}, so add Pi to solution set. And move diagonally back.

Solution set S = { M, O }**Step 3:** i = 3, j = 2, P

_{i}= N, Q

_{j}= N

P_{i} = Q_{i}, so add Pi to solution set. And move diagonally back.

Solution set S = { M, O, N }

**Step 4:** i = 2, j = 1, P_{i} = L, Q_{j} = M

P_{i} ≠ Q_{j}, and LCS[i, j] is derived from LCS[i – 1, j]. So move in vertical direction.

Solution set S = { M, O, N }

**Step 5:** i = 1, j = 1, P_{i} = M, Q_{j} = M

P_{i} = Q_{i}, so add Pi to solution set. And move diagonally back.

Solution set S = { M, O, N, M }

Moving diagonally back i and j become zero. So stop. We have collected characters from the last position of the string. So reverse the solution set, which is the LCS of P and Q.

So, LCS = MNOM

## Some Popular Problems Solved using Dynamic Programming

- Binomial Coefficient
- Making a Change
- Knapsack Problem
- Multistate Graph Problem
- Optimal Binary Search Tree
- Matrix Chain Multiplication
- Longest Common Subsequence
- Bellman-Ford (Single Source Shortest Path) Algorithm
- Floyd-Warshall (All Pair Shortest Path) Problem
- Assembly Line Scheduling
- Travelling Salseman Problem
- Flow Shop Scheduling

Additional Reading: Read More