## Optimal Binary Search Tree – What it is?

- Optimal Binary Search Tree extends the concept of Binary searc tree. Binary Search Tree (BST) is a
*nonlinear*data structure which is used in many scientific applications for reducing the search time. In BST, left child is smaller than root and right child is greater than root. This arrangement simplifies the search procedure. - Optimal Binary Search Tree (OBST) is very useful in dictionary search. The probability of searching is different for different words. OBST has great application in translation. If we translate the book from English to German, equivalent words are searched from English to German dictionary and replaced in translation. Words are searched same as in binary search tree order.
- Binary search tree simply arranges the words in lexicographical order. Words like
*‘the’*,*‘is’*,*‘there’*are very frequent words, whereas words like*‘xylophone’*,*‘anthropology’*etc. appears rarely.

- It is not a wise idea to keep less frequent words near root in binary search tree. Instead of storing words in binary search tree in lexicographical order, we shall arrange them according to their probabilities. This arrangement facilitates few searches for frequent words as they would be near the root. Such tree is called
**Optimal Binary Search Tree**. - Consider the sequence of
*n*keys K = < k_{1}, k_{2}, k_{3}, …, k_{n}> of distinct probability in sorted order such that

k_{1}< k_{2}< … <k_{n}. Words between each pair of key lead to unsuccessful search, so for n keys, binary search tree contains n + 1 dummy keys d_{i}, representing unsuccessful searches. - Two different representation of BST with same five keys {k
_{1}, k_{2}, k_{3}, k_{4}, k_{5}} probability is shown in following figure - With n nodes, there exist Ω(4
^{n}/ n^{3/2}) different binary search trees. An exhaustive search for optimal binary search tree leads to huge amount of time. - The goal is to construct a tree which minimizes the total search cost. Such tree is called optimal binary search tree. OBST does not claim minimum height. It is also not necessary that parent of sub tree has higher priority than its child.
- Dynamic programming can help us to find such optima tree.

## Mathematical formulation

- We formulate the OBST with following observations
- Any sub tree in OBST contains keys in sorted order k
_{i}…k_{j}, where 1 ≤ i ≤ j ≤ n. - Sub tree containing keys k
_{i}…k_{j}has leaves with dummy keys d_{i-1}….d_{j}. - Suppose k
_{r}is the root of sub tree containing keys k_{i}…..k_{j}. So, left sub tree of root k_{r}contains keys

k_{i}….k_{r-1}and right sub tree contain keys k_{r+1}to k_{j}. Recursively, optimal sub trees are constructed from the left and right sub trees of k_{r}. - Let e[i, j] represents the expected cost of searching OBST. With n keys, our aim is to find and minimize e[1, n].
- Base case occurs when j = i – 1, because we just have the dummy key d
_{i-1}for this case. Expected search cost for this case would be e[i, j] = e[i, i – 1] = q_{i-1}. - For the case j ≥ i, we have to select any key k
_{r}from k_{i}…k_{j}as a root of the tree. - With k
_{r}as a root key and sub tree k_{i}…k_{j}, sum of probability is defined as

_{(Actual key starts at index 1 and dummy key starts at index 0)}

Thus, a recursive formula for forming the OBST is stated below :

e[i, j] gives the expected cost in the optimal binary search tree.

## Algorithm for Optimal Binary Search Tree

The algorithm for optimal binary search tree is specified below :

**Algorithm **OBST(p, q, n)
// e[1…n+1, 0…n ] : Optimal sub tree
// w[1…n+1, 0…n] : Sum of probability
// root[1…n, 1…n] : Used to construct OBST
**for **i ← 1 to n + 1 **do**
e[i, i – 1] ← qi – 1
w[i, i – 1] ← qi – 1
**end**
**for **m ← 1 to n **do**
**for** i ← 1 to n – m + 1 **do**
j ← i + m – 1
e[i, j] ← ∞
w[i, j] ← w[i, j – 1] + pj + qj
**for** r ← i to j **do**
t ← e[i, r – 1] + e[r + 1, j] + w[i, j]
**if **t < e[i, j] **then**
e[i, j] ← t
root[i, j] ← r
**end**
**end**
**end**
**end**
**return **(e, root)

## Complexity Analysis of Optimal Binary Search Tree

It is very simple to derive the complexity of this approach from the above algorithm. It uses three nested loops. Statements in the innermost loop run in Q(1) time. The running time of the algorithm is computed as

Thus, the OBST algorithm runs in cubic time

## Example

**Problem: Let p (1 : 3) = (0.5, 0.1, 0.05) q(0 : 3) = (0.15, 0.1, 0.05, 0.05) Compute and construct OBST for above values using Dynamic approach.**

**Solution:**

Here, given that

i | 0 | 1 | 2 | 3 |

p_{i} | 0.5 | 0.1 | 0.05 | |

q_{i} | 0.15 | 0.1 | 0.05 | 0.05 |

Recursive formula to solve OBST problem is

Where,

Initially,

**Now, we will compute e[i, j]**

Initially,

e[1, 0] = q_{0} = 0.15 (∵ j = i – 1)

e[2, 1] = q_{1} = 0.1 (∵ j = i – 1)

e[3, 2] = q_{2} = 0.05 (∵ j = i – 1)

e[4, 3] = q_{3} = 0.05 (∵ j = i – 1)

e[1, 1] = min { e[1, 0] + e[2, 1] + w(1, 1) }

= min { 0.15 + 0.1 + 0.75 } = 1.0

e[2, 2] = min { e[2, 1] + e[3, 2] + w(2, 2) }

= min { 0.1 + 0.05 + 0.25 } = 0.4

e[3, 3] = min { e[3, 2] + e[4, 3] + w(3, 3) }

= min { 0.05 + 0.05 + 0.15 } = 0.25

e[1, 3] is minimum for r = 1, so r[1, 3] = 1

e[2, 3] is minimum for r = 2, so r[2, 3] = 2

e[1, 2] is minimum for r = 1, so r[1, 2] = 1

e[3, 3] is minimum for r = 3, so r[3, 3] = 3

e[2, 2] is minimum for r = 2, so r[2, 2] = 2

e[1, 1] is minimum for r = 1, so r[1, 1] = 1

Let us now construct OBST for given data.

r[1, 3] = 1, so k_{1} will be at the root.

k_{2}…._{3} are on right side of k_{1}

r[2, 3] = 2, So k_{2} will be the root of this sub tree.

k_{3} will be on the right of k_{2}.

Thus, finally, we get.

Additional Reading: [Wiki]