Anda di halaman 1dari 9

Comparison of several sorting algorithms

Introduction
From time to time people ask the ageless question: Which sorting algorithm is the fastest? This question doesn't have an easy or unambiguous answer, however. The speed of sorting can depend quite heavily on the environment where the sorting is done, the type of items that are sorted and the distribution of these items. For example, sorting a database which is so big that cannot fit into memory all at once is quite different from sorting an array of 100 integers. Not only will the implementation of the algorithm be quite different, naturally, but it may even be that the same algorithm which is fast in one case is slow in the other. Also sorting an array may be different from sorting a linked list, for example. In this study I will only concentrate on sorting items in an array in memory using comparison sorting (because that's the only sorting method that can be easily implemented for any item type, as long as they can be compared with the less-than operator).

The testing environment


In order to test the speed of the different sorting algorithms I made a C++ program which runs each algorithm several times for randomly-generated arrays. The test was run in a Pentium4 3.4GHz running Suse 9.3, and the program was compiled using gcc 3.3.5 with the compiler options "-O3 -march=pentium4". The drand48() function of glibc was used for random number generation. This should be a rather high-quality random number generator. Four different array sizes were used: 100, 5000, 100000 and 1 million items (the last one used only on the integer array tests). Random numbers between 0 and 10 times the array size were generated to create the array contents, except for the highrepetition test, in which numbers between 0 and 1/100 times the array size were generated (which means that each item is repeated in average 100 times). Four different random number distributions were used: 1. Completely random.

2. Almost sorted: 90% of the items are in increasing order, but 10% of randomlychosen items are random. 3. Almost reversed: Like above, but the sorted items are in reverse order. 4. The array is already sorted, except for the last 256 items which are random. (This case was used to test which sorting algorithm would be best for this kind of data container, where items are kept sorted and new items are added to the end, and the entire container sorted after the amount of items at the end grows too large.) Four different testcases were run: 1. Items are 32-bit integers. These are both very fast to compare and copy. 2. Also 32-bit integers, but with a high number of repetitions. Each value in the array repeats approximately 100 times in average. 3. Items are C++ strings with identical beginnings. Strings of 50 characters (with only the last 8 characters differing) were used for the test. This tests the case where copying is fast but comparison is slow (copying is fast because the strings in gcc use copy-on-write). 4. Items are arrays of integers. Arrays of 50 integers (ie. 200 bytes) were used. Only the first integer was used for the comparison. This tests the case where comparison is fast but copying is slow. More detailed info for these testcases is given in their individual pages. I tried to implement the program so that it first counts how much time is spent generating the data to be sorted, and then this time is substracted from the total time (before dividing it by the number of loops). While it's not possible to do this in a very exact way, I'm confident that the results are close enough to reality. Each testcase with each sorting algorithm was run several times, every time with different random data (about 100-10000 times depending on the size of the array). This was done to average out individual worst cases.

The sorting algorithms


Insertion sort Insertion sort is good only for sorting small arrays (usually less than 100 items). In fact, the smaller the array, the faster insertion sort is compared to any other sorting algorithm. However, being an O(n2) algorithm, it becomes very slow very quick when the size of the array increases. It was used in the tests with arrays of size 100.

Implementation. Shell sort Shell sort is a rather curious algorithm, quite different from other fast sorting algorithms. It's actually so different that it even isn't an O(nlogn) algorithm like the others, but instead it's something between O(nlog2n) and O(n1.5) depending on implementation details. Given that it's an in-place non-recursive algorithm and it compares very well to the other algorithms, shell sort is a very good alternative to consider. Implementation.

Linear Search in a Range


Linear search consists of looking for a particular value in a collection. #include <iostream> using namespace std; intLinearSearch(constint *Array, constint Size, constintValToSearch) { boolNotFound = true; int i = 0; while(i < Size &&NotFound) { if(ValToSearch != Array[i]) i++; else NotFound = false; } if(NotFound == false ) return i; else return -1; } int main() { int Number[] = { 67, 278, 463, 2, 4683, 812, 236, 38 }; int Quantity = sizeof(Number) / sizeof(int); intNumberToSearch = 0; cout<< "Enter the number to search: "; cin>>NumberToSearch; int i = LinearSearch(Number, Quantity, NumberToSearch); if(i == -1)

cout<<NumberToSearch<< " was not found in the collection\n\n"; else { cout<<NumberToSearch<< " is at the " << i+1; if( i == 0 ) cout<< "st else if( i == 1 ) cout<< "nd else if( i == 2 ) cout<< "rd else cout<< "th } return 0; } Here is an example of running the program: Enter the number to search: 278 278 is at the 2nd position of the collection Press any key to continue Here is another example of running the program: Enter the number to search: 288 288 was not found in the collection Press any key to continue Copyright 2004-2009 FunctionX, Inc. position of the collection\n\n"; position of the collection\n\n"; position of the collection\n\n"; position of the collection\n\n";

Do you remember playing the game "Guess a Number", where the responses to the statement "I am thinking of a number between 1 and 100" are "Too High", "Too Low", or "You Got It!"? A strategy that is often used when playing this game is to divide the intervals between the guess and the ends of the range in half. This strategy helps you to quickly narrow in on the desired number. When searching an array, the binary search process utilizes this same concept of splitting intervals in half as a means of finding the "key" value as quickly as possible. If your array is in order (ascending or descending), you can search for the desired "key" item quickly by using a binary search algorithm (referred to as the "divide and

conquer" approach). Consider the following array of integers:

Array of integers, named num, arranged in "ascending order"!!!


10
num[0]

15
num[1]

24
num[2]

36
num[3]

45
num[4]

55
num[5]

64
num[6]

73
num[7]

90
num[8]

98
num[9]

We will be searching for the key number 64. Here is how the binary search will work:
y

First, the middle of the array is located by adding the array subscript of the first value to the subscript of the last value and dividing by two: (0 + 9) / 2 = 4 Integer division is being used to arrive at the 4th subscript as the middle. (The actual mathematical middle would be between the subscripts 4 and 5, but we must work with integer subscripts.) Subscript 4 holds the number 45, which comes before 64. We now know that 64 would exist in the portion of the array to the right of 45. We now find the middle of the right portion of the array by using the same approach: (5 + 9) / 2 = 7 Subscript 7 holds the number 73, which comes after 64, so we now need to find the middle of the portion of the array to the right of 45, but to the left of 73: (5 + 6) / 2 = 5 Subscript 5 holds the number 55, which comes before 64, so we now subdivide again (6 + 6) / 2 = 6 and element 6 holds the number 64.

// function call to the binary search function (listed below) // for the array shown above binarySearch(num, 0, 9, 64);

//Binary Search Function


/ Function accepts an array, the lower bound and upper bound subscripts... // to be searched, and the key number for which we are searching. // There is nothing returned. voidbinarySearch(apvector<int>&array, intlowerbound, intupperbound, int key) { int position; intcomparisonCount = 1; //count the number of comparisons (optional) // To start, find the subscript of the middle position. position = ( lowerbound + upperbound) / 2; while((array[position] != key) && (lowerbound<= upperbound)) { comparisonCount++; if (array[position] > key) // If the number is > key, ..

// decrease position by one.

upperbound = position - 1; } else { // Else, increase position by one. lowerbound = position + 1; } position = (lowerbound + upperbound) / 2; } if (lowerbound< = upperbound) { cout<< "The number was found in array subscript "<< position<<endl<<endl; cout<< "The binary search found the number after " <<comparisonCount << " comparisons.\n"; // printing the number of comparisons is optional } else cout<< "Sorry, the number is not in this array. The binary search made " <<comparisonCount<< " comparisons."; return; // you may also consider returning the subscript }
2065549273

This tutorial is based on the sorting algorithm Insertion Sort, one of the best known and usually one of the first few sorting routines students learn who are going into the field of computer science. Although this algorithm is not the most efficient for certain data sets, it most certainly has its purpose for others. Having an assortment of different sorting algorithms at your finger tips parallels to a wizard having a book of many spells. The more spells you know, the more you will be able to counter-act other spells. Having a good arsenal of these algorithms and knowing what situations to apply them to will help you make the best decision in a real-life application. Keep this in mind as you add this spell to your book:
What is Insertion Sort?:

Insertion Sort as you know is a fairly simple to implement comparison based sorting algorithm. As Ive stated before, Insertion Sort has its moment in the sun when used upon small data sets, which Quick Sort (Check out my Quick Sort Tutorial for more information) does very poorly on. Of course, conversely Insertion Sort does a poor job with large data sets. Another benefit in using Insertion Sort would be if you know that the majority of your set of data will be sorted, before choosing an algorithm, Insertion Sort also shines here. Insertion Sort is also more efficient than its counterpart sorts,

Bubble Sort and Selection Sort both of which have about the same running time, however in practice Insertion Sort tends to trump these as well. One of the final reasons as to why Insertion Sort is efficient is that it happens to be an on-line algorithm which means it takes the input as it comes to it and does not have to digest the entire set at once.
The Algorithm:

Now that we know a bit about what Insertion Sort is and appropriate situations to use it in, lets take a brief look at the algorithm to which it is based upon: In short if we have a set of data, and we want to perform the Insertion Sort routine, we simply insert the elements one by one into our array. As each element is inserted, it is compared to the element it is next to. If the current element is less than the previous one we move it down progressively as we call this function. To make this a bit more concrete an example is provided here: Legend: ( Numbers highlighted in Green are the sorted elements in the data set ). Here is our set of integers for this example. We start off with 7 inserted into the list, which is fine and we progress from there: [ 7 9 0 5 6 4 8 3 ] -Swaps made: (0) We move on to 9 which is greater than 7. Since this is so no swaps are made and the list is happy: [ 7 9 0 5 6 4 8 3 ] -Swaps made: (0) Now we get to an element of lesser value, 0. We take the appropriate action, compare it to9 and find out it is less so we move it down in between 7 and 9. We then compare it to 7 which it is also less than, move it on down again for a total of 2 swaps: [ 0 7 9 5 6 4 8 3 ] -Swaps made: (2) 5 is the next victim on our list. 5 is compared with 9 moved down, compared with 7and likewise moved down. Finally compared with "0" and since 5 is greater than 0 it is in the correct position and does not swap: [ 0 5 7 9 6 4 8 3 ] -Swaps made: (2) Its time for 6 to shine, it is compared with 9 moved down, and compared with 7 and moved down. 5 is less than 6 so it is in its new happy home: [ 0 5 6 7 9 4 8 3 ] -Swaps made: (2)

4 has a bit of a journey ahead of it. It will be swapped with 9, then 7, 6, and 5. It is greater than 0 so it has reached its destination: [ 0 4 5 6 7 9 8 3 ] -Swaps made: (4) 8 has a short journey it is compared with 9 and swapped. Compared with 7 and decides its content with his position so he stays: [ 0 4 5 6 7 8 9 3 ] -Swaps made: (1) 3 traverses almost all of the way down the list until it meets up with his new neighbor 0: [ 0 3 4 5 6 7 8 9 ] -Swaps made: (6)

The idea of Insertion Sort is fairly easy to see, it logically makes a lot of sense in the realm of comparisons. Now that you have a good handle on whats going on let us take a look behind the scenes and see what is happening from the coding aspect of things.
The Code

The code, like the algorithm is easy to implement and see what is happening throughout. Let us take a look at the code: Insertion Sort Routine: 01 //insertionSort Function 02 void insertionSort(int size, int data[]){
03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 //Prints the sorted list while(j >= 0 && data[j] >val){ data[j + 1] = data[j]; j = j - 1; }//end while data[j + 1] = val; }//end for //iterate through entire list for(int i = 1; i < size; i++){ val = data[i]; j = i - 1; int j, val;

18 19 20 21

for ( int i = 0; i < size; i++ ) cout << data[i] << " ";

22 }//end insertionSort Function

As we can see quite clearly, the code follows the basic structure of the algorithm very nicely. Next we will give a brief asymptotic analysis of the Insertion Sort algorithm.
Analysis & Conclusion

As we have witnessed Insertion Sort has its optimum niche in the world of sorting algorithms. Remember that sorting algorithms are situation specific, and if one algorithm does not cater to a specific case than you must strategically pick another, you wouldn't attempt to cut down a tree with a spoon would? Insertion Sort as every other sort have their pros and cons use this knowledge to your advantage and be able to apply it to the correct occurrences. Now for a look at asymptotic analysis: Worst Case: Insertion Sort has a worst case of O(N * N) or O(N)(squared). This is achieved because regardless it has to traverse as well as swap through the entire list Average Case: Average follows suit with the worse case, mainly because on average you are going to have to traverse the list and swap, it as well gives the running time of O(N * N) or O(N)(squared). Average case can also be analyzed from a different view point as well, if we take O(N + i ), "i" being inversions we can achieve the average case by this method as well. Best Case: The best case would be that the list was already sorted, meaning it would only have to traverse the list as oppose to traverse and swap. This yields a linear time complexity to it, orO(N)

Anda mungkin juga menyukai