Selection algorithm
In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of orderable values, such as numbers. The value that it finds is called the order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes
Problem statement
An algorithm for the selection problem takes as input a collection of values, and a It outputs the smallest of these values, or, in some versions of the problem, a collection of the smallest values. For this to be well-defined, it should be possible to sort the values into an order from smallest to largest; for instance, they may be integers, floating-point numbers, or some other kind of object with a numeric key. However, they are not assumed to have been already sorted. Often, selection algorithms are restricted to a comparison-based model of computation, as in comparison sort algorithms, where the algorithm has access to a comparison operation that can determine the relative ordering of any two values, but may not perform any other kind of arithmetic operations on these values.To simplify the problem, some works on this problem assume that the values are all distinct from each or that some consistent tie-breaking method has been used to assign an ordering to pairs of items with the same value as each other. Another variation in the problem definition concerns the numbering of the ordered values: is the smallest value obtained by as in zero-based numbering of arrays, or is it obtained by following the usual English-language conventions for the smallest, second-smallest, etc.? This article follows the conventions used by Cormen et al., according to which all values are distinct and the minimum value is obtained from
With these conventions, the maximum value, among a collection of values, is obtained by When is an odd number, the median of the collection is obtained by When is even, there are two choices for the median, obtained by rounding this choice of down or up, respectively: the lower median with and the upper median with
Algorithms
Sorting and heapselect
As a baseline algorithm, selection of the smallest value in a collection of values can be performed by the following two steps:- Sort the collection
- If the output of the sorting algorithm is an array, retrieve its element; otherwise, scan the sorted sequence to find the element.
For a sorting algorithm that generates one item at a time, such as selection sort, the scan can be done in tandem with the sort, and the sort can be terminated once the element has been found. One possible design of a consolation bracket in a single-elimination tournament, in which the teams who lost to the eventual winner play another mini-tournament to determine second place, can be seen as an instance of this Applying this optimization to heapsort produces the heapselect algorithm, which can select the smallest value in This is fast when is small relative but degenerates to for larger values such as the choice used for median finding.
Pivoting
Many methods for selection are based on choosing a special "pivot" element from the input, and using comparisons with this element to divide the remaining input values into two subsets: the set of elements less than the pivot, and the set of elements greater than the pivot. The algorithm can then determine where the smallest value is to be found, based on a comparison of with the sizes of these sets. In particular, the smallest value is and can be found recursively by applying the same selection algorithm then the smallest value is the pivot, and it can be returned immediately. In the remaining case, the smallest value is and more specifically it is the element in position It can be found by applying a selection algorithm recursively, seeking the value in this positionAs with the related pivoting-based quicksort algorithm, the partition of the input into and may be done by making new collections for these sets, or by a method that partitions a given list or array data type in-place. Details vary depending on how the input collection is The time to compare the pivot against all the other values However, pivoting methods differ in how they choose the pivot, which affects how big the subproblems in each recursive call will be. The efficiency of these methods depends greatly on the choice of the pivot. If the pivot is chosen badly, the running time of this method can be as slow
- If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a geometric series However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each
- Quickselect chooses the pivot uniformly at random from the input values. It can be described as a prune and search algorithm, a variant of quicksort, with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections quickselect only makes one of these two calls. Its expected time For any constant, the probability that its number of comparisons exceeds is superexponentially small
- The Floyd–Rivest algorithm, a variation of quickselect, chooses a pivot by randomly sampling a subset of data values, for some sample and then recursively selecting two elements somewhat above and below position of the sample to use as pivots. With this choice, it is likely that is sandwiched between the two pivots, so that after pivoting only a small number of data values between the pivots are left for a recursive call. This method can achieve an expected number of comparisons that is In their original work, Floyd and Rivest claimed that the term could be made as small as by a recursive sampling scheme, but the correctness of their analysis has been Instead, more rigorous analysis has shown that a version of their algorithm achieves for this Although the usual analysis of both quickselect and the Floyd–Rivest algorithm assumes the use of a true random number generator, a version of the Floyd–Rivest algorithm using a pseudorandom number generator seeded with only logarithmically many true random bits has been proven to run in linear time with high probability.
- The median of medians method partitions the input into sets of five elements, and uses some other non-recursive method to find the median of each of these sets in constant time per set. It then recursively calls itself to find the median of these medians. Using the resulting median of medians as the pivot produces a partition with Thus, a problem on elements is reduced to two recursive problems on elements and at most elements. The total size of these two recursive subproblems is at allowing the total time to be analyzed as a geometric series adding Unlike quickselect, this algorithm is deterministic, not It was the first linear-time deterministic selection algorithm and is commonly taught in undergraduate algorithms classes as an example of a divide and conquer that does not divide into two equal However, the high constant factors in its time bound make it slower than quickselect in and slower even than sorting for inputs of moderate
- Hybrid algorithms such as introselect can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case
Factories