Optimal Merge Pattern Problem: “Merge n sorted sequences of different lengths into one sequence while minimizing reads”. Any two sequences can be merged at a time. At each step, the two shortest sequences are merged.

Consider three sorted lists L1, L2 and L3 of size 30, 20 and 10 respectively. Two way merge compares elements of two sorted lists and put them in new sorted list. If we first merge list L1 and L2, it does 30 + 20 = 50 comparisons and creates a new array L’ of size 50. L’ and L3 can be merged with 60 + 10 = 70 comparisons that forms a sorted list L of size 60. Thus total number of comparisons required to merge lists L1, L2 and L3 would be 50 + 70 = 120.

Alternatively, first merging L2 and L3 does 20 + 10 = 30 comparisons, which creates sorted list L’ of size 30. Now merge L1 and L’, which does 30 + 30 = 60 comparisons to form a final sorted list L of size 60. Total comparisons required to create L is thus 30 + 60 = 90.

In both the cases, final output is identical but first approach does 120 comparisons whereas second does only 90. Goal of optimal merge pattern is to find the merging sequence which results into minimum number of comparisons.

Let S = {s1, s2, …, sn} be the set of sequences to be merged. Greedy approach selects minimum length sequences si and sj from S. The new set S’ is defined as, S’ = (S – {si, sj})  ∪ {si + sj}. This procedure is repeated until only one sequence is left.

The working principle of optimal merge patterns is similar to Huffman coding tree construction. This can easily be done with min-heap data structure. In min heap, parent is always smaller than its child’s. Thus root of min-heap always contains minimum element. Using min heap we can retrieve minimum element in constant time. And insertion in min heap takes O(logn) time.

Algorithm for Optimal Merge Pattern

Algorithm for optimal merge patterns is shown below :

Algorithm OPTIMAL_MERGE_PATTERNS(S)
// S is set of sequences

Create min heap H from S

while H.length > 1 do
  min1 ← minDel(H)   
  // minDel function returns minimum element from H and delete it from H
  min2 ← minDel(H)
  NewNode.Data ← min1 + min2
  NewNoode.LeftChild ← min1
  NewNode.RightChild ← min2
  Insert(NewNode, H)	// Insert node NewNode in heap H
end

Complexity Analysis

In every iteration, two delete minimum and one insert operation is performed. Construction of heap takes O(logn) time. The total running time of this algorithm is O(nlogn).                

T(n) = O(n – 1) * max(O(findmin), O(insert))

Case 1 : If list is not sorted :

O(findmin)  = O(n)

O(insert) = O(1)

So, T(n) = (n – 1) * n = O(n2)

Case 2 : If list is sorted :

Case 2.1 : List is represented as an array

O(findmin) = O(1)

O(insert) = O(n)                                                                                                                                              

So, T(n) = (n – 1) * n = O(n2)

Case 2.2 : List is represented as min-heap

O(findmin)  =  O(1)

O(insert)  =   O(logn)

So, T(n) = (n – 1) * logn = O(nlogn)

Example

Example: Consider the sequence {3, 5, 9, 11, 16, 18, 20}. Find optimal merge patter for this data

Solution:

At each step, merge the two smallest sequences

Step 1:

Given sequence is,

Optimal Merge Pattern

Step 2: Merge the two smallest sequences and sort in ascending order

Optimal Merge Pattern

Step 3: Merge the two smallest sequences and sort in ascending order

Optimal Merge Pattern example

Step 4: Merge the two smallest sequences and sort in ascending order

Optimal Merge Pattern example

Step 5: Merge the two smallest sequences and sort in ascending order

Step 6: Merge the two smallest sequences and sort in ascending order

Step 7: Merge the two smallest sequences and sort in ascending order

Total time = 8 + 17 + 27 + 35 + 47 + 82 = 216

Greedy algorithms are used to find an optimal or near-optimal solution to many real-life problems. A few of them are listed below :


Additional Reading; Read More