Optimal Binary Search Tree – How to Solve Using Dynamic Programming
Optimal Binary Search Tree – What it is?
- Optimal Binary Search Tree extends the concept of Binary searc tree. Binary Search Tree (BST) is a nonlinear data structure which is used in many scientific applications for reducing the search time. In BST, left child is smaller than root and right child is greater than root. This arrangement simplifies the search procedure.
- Optimal Binary Search Tree (OBST) is very useful in dictionary search. The probability of searching is different for different words. OBST has great application in translation. If we translate the book from English to German, equivalent words are searched from English to German dictionary and replaced in translation. Words are searched same as in binary search tree order.
- Binary search tree simply arranges the words in lexicographical order. Words like ‘the’, ‘is’, ‘there’ are very frequent words, whereas words like ‘xylophone’, ‘anthropology’ etc. appears rarely.
- It is not a wise idea to keep less frequent words near root in binary search tree. Instead of storing words in binary search tree in lexicographical order, we shall arrange them according to their probabilities. This arrangement facilitates few searches for frequent words as they would be near the root. Such tree is called Optimal Binary Search Tree.
- Consider the sequence of nkeys K = < k1, k2, k3, …, kn> of distinct probability in sorted order such that
k1< k2< … <kn. Words between each pair of key lead to unsuccessful search, so for n keys, binary search tree contains n + 1 dummy keys di, representing unsuccessful searches. - Two different representation of BST with same five keys {k1, k2, k3, k4, k5} probability is shown in following figure
- With n nodes, there exist (2n)!/((n + 1)! * n!) different binary search trees. An exhaustive search for optimal binary search tree leads to huge amount of time.
- The goal is to construct a tree which minimizes the total search cost. Such tree is called optimal binary search tree. OBST does not claim minimum height. It is also not necessary that parent of sub tree has higher priority than its child.
- Dynamic programming can help us to find such optima tree.
Mathematical formulation
- We formulate the OBST with following observations
- Any sub tree in OBST contains keys in sorted order ki…kj, where 1 ≤ i ≤ j ≤ n.
- Sub tree containing keys ki…kj has leaves with dummy keys di-1….dj.
- Suppose kr is the root of sub tree containing keys ki…..kj. So, left sub tree of root kr contains keys
ki….kr-1 and right sub tree contain keys kr+1 to kj. Recursively, optimal sub trees are constructed from the left and right sub trees of kr. - Let e[i, j] represents the expected cost of searching OBST. With n keys, our aim is to find and minimize e[1, n].
- Base case occurs when j = i – 1, because we just have the dummy key di-1 for this case. Expected search cost for this case would be e[i, j] = e[i, i – 1] = qi-1.
- For the case j ≥ i, we have to select any key kr from ki…kj as a root of the tree.
- With kr as a root key and sub tree ki…kj, sum of probability is defined as
(Actual key starts at index 1 and dummy key starts at index 0)
Thus, a recursive formula for forming the OBST is stated below :
e[i, j] gives the expected cost in the optimal binary search tree.
Algorithm for Optimal Binary Search Tree
The algorithm for optimal binary search tree is specified below :
Algorithm OBST(p, q, n)
// e[1…n+1, 0…n ] : Optimal sub tree
// w[1…n+1, 0…n] : Sum of probability
// root[1…n, 1…n] : Used to construct OBST
for i ← 1 to n + 1 do
e[i, i – 1] ← qi – 1
w[i, i – 1] ← qi – 1
end
for m ← 1 to n do
for i ← 1 to n – m + 1 do
j ← i + m – 1
e[i, j] ← ∞
w[i, j] ← w[i, j – 1] + pj + qj
for r ← i to j do
t ← e[i, r – 1] + e[r + 1, j] + w[i, j]
if t < e[i, j] then
e[i, j] ← t
root[i, j] ← r
end
end
end
end
return (e, root)
Complexity Analysis of Optimal Binary Search Tree
It is very simple to derive the complexity of this approach from the above algorithm. It uses three nested loops. Statements in the innermost loop run in Q(1) time. The running time of the algorithm is computed as
Thus, the OBST algorithm runs in cubic time
Example
Problem: Let p (1 : 3) = (0.5, 0.1, 0.05) q(0 : 3) = (0.15, 0.1, 0.05, 0.05) Compute and construct OBST for above values using Dynamic approach.
Solution:
Here, given that
i | 0 | 1 | 2 | 3 |
pi | 0.5 | 0.1 | 0.05 | |
qi | 0.15 | 0.1 | 0.05 | 0.05 |
Recursive formula to solve OBST problem is
Where,
Initially,
Now, we will compute e[i, j]
Initially,
e[1, 0] = q0 = 0.15 (∵ j = i – 1)
e[2, 1] = q1 = 0.1 (∵ j = i – 1)
e[3, 2] = q2 = 0.05 (∵ j = i – 1)
e[4, 3] = q3 = 0.05 (∵ j = i – 1)
e[1, 1] = min { e[1, 0] + e[2, 1] + w(1, 1) }
= min { 0.15 + 0.1 + 0.75 } = 1.0
e[2, 2] = min { e[2, 1] + e[3, 2] + w(2, 2) }
= min { 0.1 + 0.05 + 0.25 } = 0.4
e[3, 3] = min { e[3, 2] + e[4, 3] + w(3, 3) }
= min { 0.05 + 0.05 + 0.15 } = 0.25
e[1, 3] is minimum for r = 1, so r[1, 3] = 1
e[2, 3] is minimum for r = 2, so r[2, 3] = 2
e[1, 2] is minimum for r = 1, so r[1, 2] = 1
e[3, 3] is minimum for r = 3, so r[3, 3] = 3
e[2, 2] is minimum for r = 2, so r[2, 2] = 2
e[1, 1] is minimum for r = 1, so r[1, 1] = 1
Let us now construct OBST for given data.
r[1, 3] = 1, so k1 will be at the root.
k2….3 are on right side of k1
r[2, 3] = 2, So k2 will be the root of this sub tree.
k3 will be on the right of k2.
Thus, finally, we get.
Additional Reading: [Wiki]
Thank you for the explanation, just to point out the min e[2,3] value comes to 0.70 and not 0.80, there is a calculation mistake.
Thank you very much for your careful inspection. I will update it
Thank you for the complete sum explanation. I really appreciate your content. One change I would suggest is in the formula of e[i,j]= min{ e[i, r-1]+ e[r,j] +w[i,j]} This is the correct formula.
Thanks for appreciation. I have verified but I think its correct