Anda di halaman 1dari 45

Sorting

How to sort data efficiently?

Sorting
The sorting problem is to arrange a sequence of records so that the values of their key fields form a nondecreasing sequence. That is, given records r1, r2, . . . ,rn, with key values k1, k2, . . . , kn, respectively, we must produce the same records in an order ri1, ri2 , . . . , rin such that ki1 ki2 . kin . The records all need not have distinct values

Insertion Sort
Insertion sort is a simple sorting algorithm that is appropriate for small inputs Its very time consuming so if the size of input is large, Insertion Sort is not a likely choice for sorting that data. Merge Sort, QuickSort are faster algorithms for sorting for that matter.

Insertion Sort contin.


The main idea of insertion sort is Start by considering the first two elements of the array data. If found out of order, swap them Consider the third element and insert it into its proper position among the first three elements Consider the fourth element and place it in its proper position among the first four elements and so on.. After each ith iteration, first i elements should be sorted

Insertion Sort (With an Example)


In e game of cards, a player gets 13 cards. He keeps them in the sorted order in his hand for his ease. A player looks at the first two cards, sorts them and keeps the smaller card first and then the second. Suppose that two cards were 9 and 8, the player swap them and keep 8 before 9. Now he takes the third card. Suppose, it is 10, then it is in its position. If this card is of number 2, the player will pick it up and put it on the start of the cards. Then he looks at the fourth card and inserts it in the first three cards (that he has sorted) at a proper place.

Insertion (With an Example) cont.


He repeats the same process with all the cards and finally gets the cards in a sorted order. Thus in this algorithm, we keep the left part of the array sorted and take element from the right and insert it in the left part at its proper place. Due to this process of insertion, it is called insertion sorting.

Insertion Sort Cont

Insertion Sort cont..


The array consists of the elements 19, 12, 5 and 7. We take the first two numbers i.e. 19 and 12. As we see 12 is less than 19, so we swap their positions. Thus 12 comes at index 0 and 19 goes to index 1. Now we pick the third number i.e. 5. We have to find the position of this number by comparing it with the two already sorted numbers. These numbers are 12 and 19. We see that 5 is smaller than these two. So it should come before these two numbers. Thus the proper position of 5 is index 0. To insert it at index 0, we shift the numbers 12 and 19 before inserting 5 at index 0. Thus 5 has come at its position. Now we pick the number 7 and find its position between 5 and 12. To insert 7 after 5 and before 12, we have to shift the numbers 12 and 19 to the right. After this shifting, we put number 7 at its position. Now the whole array has been sorted so the process stops here.

Insertion Sort Cont.


void insertionSort(int *arr, int N) { int pos, count, val; for(count=1; count < N; count++) { val = arr[count]; for(pos=count-1; pos >= 0; pos--) if (arr[pos] > val) arr[pos+1]=arr[pos]; else break; arr[pos+1] = val; } }

Insertion Sort Analysis


To insert the last element we need at most N-1 comparisons and N1 movements. To insert the N-1st element we need N-2 comparisons and N-2 movements. . To insert the 2nd element we need 1 comparison and one movement. To sum up: 2* (1 + 2 + 3 + N - 1) = 2 * (N - 1)* N / 2 = (N-1)*N = (N2) If the greater part of the array is sorted, the complexity is almost O(N) The average complexity is proved to be = (N2)

Insertion Sort an O(N^2) Algorithm


Insertion Sort in an O(N^2) algorithm. Similarly, there are other O(N^2) algorithms for sorting like Selection Sort, Bubble Sort O(N^2) is not a very favorable amount of time we would like to spend on sorting. Lets try to understand O(nlogn) algorithms for sorting

O(nlogn) Sorting Algorithms


MergeSort Quick Sort Heap Sort Reading 11.1 (MergeSort), 11.2 (QuickSort) from GTM

Merge Sort
MergeSort Algorithms fall under divide and conquer category The divide and conquer strategy is well known in wars. The philosophy of this strategy is , divide your enemy into parts and then conquer these parts. To conquer these parts is easy, as these parts cannot resist or react like a big united enemy. The same philosophy is applied in the above algorithms. To understand the divide and conquer strategy in sorting algorithm, lets consider an example.

Merge Sort Cont Divide and Conquer Example


Suppose we have an unsorted array of numbers is given below.

Merge Sort Cont Divide and Conquer Example


Now we split this array into two parts

Merge Sort Cont Divide and Conquer Example


Now we have two parts of the array. We sort these parts separately. Suppose we sort these parts with an elementary sort algorithm. These parts may be sorted in the following manner.

Merge Sort Cont Divide and Conquer Example


After this we merge these two parts and get the sorted array as shown below.

Simple Analysis of Divide and Conquer


Let see few analysis to confirm the usefulness of the divide and conquer technique. To sort the halves approximate time is (n/2)^2+(n/2)^2 To merge the two halves approximate time is n So, for n=100, divide and conquer takes approximately: = (100/2)^2 + (100/2)^2 + 100 = 2500 + 2500 + 100 = 5100

Divide and Conquer Advantages


Suppose that n is 100. Considering if we apply insertion sort algorithm on it then the time taken will be approximately (100)^2 = 10000 (Since Insertion Sort is an O(N^2) algo.). Now, if we apply divide and conquer technique on it. Then for first half approximate time will be (100/2)^2. Similarly for second half it will be (100/2)^2. The merging approximate time will be 100.

Divide and Conquer Advantages con.


So the whole operation of sorting using this divide and conquer technique in insertion sort will take around (100/2)^2 + (100/2)^2+100 = 5100. Clearly the time spent (5100) after applying divide and conquer mechanism is significantly lesser than the previous time (10000). It is reduced approximately to half of the previous time. This example shows the usefulness of divide and conquer technique.

Food for Thought


By looking at the benefit after dividing the list into two halves, some further questions arise: Why not divide the halves in half? The quarters in half? And so on . . . When should we stop? At n = 1

Mergesort
Merge-sort is based on an algorithmic design pattern called divide-andconquer. The divide-and-conquer pattern consists of the following three steps: 1. Divide: If the input size is smaller than a certain threshold (say, one or two elements), solve the problem directly using a straightforward method and return the solution obtained. Otherwise, divide the input data into two or more disjoint subsets. 2. Recur: Recursively solve the subproblems associated with the subsets. 3. Conquer: Take the solutions to the subproblems and merge them into a solution to the original problem.

Mergesort Con
To sort a sequence S with n elements using the three divide-andconquer steps, the merge-sort algorithm proceeds as follows: 1. Divide: If S has zero or one element, return S immediately; it is already sorted. Otherwise (S has at least two elements), remove all the elements from S and put them into two sequences, S1 and S2, each containing about half of the elements of S; that is, S1 contains the first n/2 elements of S, and S2 contains the remaining n/2 elements. 2. Recur: Recursively sort sequences S1 and S2. 3. Conquer: Put back the elements into S by merging the sorted sequences S1 and S2 into a sorted sequence.

Mergesort Con

Mergesort Con.

Mergesort Con Merging Operation


We have two sorted array and another empty array whose size is equal to the sum of sizes of two sorted arrays.

Merging Operation Con.

Merging Operation con

Mergesort Con Divide and Recur Step

Mergesort Con Divide and Recur Step

Mergesort Con Divide and Recur Step

Mergesort Con Divide and Recur Step

Mergesort Con Divide and Recur Step

Mergsort Con Divide and Recur Step

Mergsort Con Divide and Recur Step

Mergesort Con Divide and Recur Step

Mergesort Con Food for Thought !


Proposition The merge-sort tree associated with an execution of mergesort on a sequence of size n has height log n. Why???

The Running Time for Merging


Let n1 and n2 be the number of elements of S1 and S2, respectively. Algorithm merge has three while loops The key observation is that during each iteration of one of the loops, one element is copied or moved from either S1 or S2 into S (and that element is no longer considered). Since no insertions are performed into S1 or S2, this observation implies that the overall number of iterations of the three loops is n1 +n2. Thus, the running time of algorithm merge is O(n1+n2).

The Running Time of Merging con.

The Running Time of Merging con.

The Running Time of MergeSort con.


let us analyze the running time of the entire merge-sort algorithm, assuming it is given an input sequence of n elements. For simplicity, we restrict our attention to the case where n is a power of 2. we analyze the merge-sort algorithm by referring to the merge-sort tree T. (Recall Figures 11.2 through 11.4.)

The Running Time of MergeSort cont


We call the time spent at a node v of T the running time of the recursive call associated with v, excluding the time taken waiting for the recursive calls associated with the children of v to terminate. In other words, the time spent at node v includes the running times of the divide and conquer steps, but excludes the running time of the recur step.

The Running Time of MergeSort cont.


the conquer step, which consists of merging two sorted subsequences, also takes linear time, independent of whether we are dealing with arrays or linked lists. That is, letting i denote the depth of node v, the time spent at node v is O(n/2^i), since the size of the sequence handled by the recursive call associated with v is equal to n/2^i.

The Running Time of MergeSort con..


Given our definition of time spent at a node, the running time of merge-sort is equal to the sum of the times spent at the nodes of T. Observe that T has exactly 2^i nodes at depth i. This simple observation has an important consequence, for it implies that the overall time spent at all the nodes of T at depth i is O(2^i n/2^i), which is O(n). By Proposition 11.1 (consult GTM Chapter 11, topic 11.1), the height of T is logn. Thus, since the time spent at each of the log n+1 levels of T is O(n), we have the following result.

The Running Time of MergeSort con.


Algorithm merge-sort sorts a sequence S of size n in O(nlogn) time, assuming two elements of S can be compared in O(1) time