Randomized algorithm
A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.
There is a distinction between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite, and algorithms which have a chance of producing an incorrect result or fail to produce a result either by signaling a failure or failing to terminate. In some cases, probabilistic algorithms are the only practical means of solving a problem.
In common practice, randomized algorithms are approximated using a pseudorandom number generator in place of a true source of random bits; such an implementation may deviate from the expected theoretical behavior and mathematical guarantees which may depend on the existence of an ideal true random number generator.
Motivation
As a motivating example, consider the problem of finding an ‘a’ in an array of n elements.Input: An array of n≥2 elements, in which half are ‘a’s and the other half are ‘b’s.
Output: Find an ‘a’ in the array.
We give two versions of the algorithm, one Las Vegas algorithm and one Monte Carlo algorithm.
Las Vegas algorithm:
findingA_LV
begin
repeat
Randomly select one element out of n elements.
until 'a' is found
end
This algorithm succeeds with probability 1. The number of iterations varies and can be arbitrarily large, but the expected number of iterations is
Since it is constant, the expected run time over many calls is.
Monte Carlo algorithm:
findingA_MC
begin
i := 0
repeat
Randomly select one element out of n elements.
i := i + 1
until i = k or 'a' is found
end
If an ‘a’ is found, the algorithm succeeds, else the algorithm fails. After k iterations, the probability of finding an ‘a’ is:
This algorithm does not guarantee success, but the run time is bounded. The number of iterations is always less than or equal to k. Taking k to be constant the run time is.
Randomized algorithms are particularly useful when faced with a malicious "adversary" or attacker who deliberately tries to feed a bad input to the algorithm such as in the Prisoner's dilemma. It is for this reason that randomness is ubiquitous in cryptography. In cryptographic applications, pseudo-random numbers cannot be used, since the adversary can predict them, making the algorithm effectively deterministic. Therefore, either a source of truly random numbers or a cryptographically secure pseudo-random number generator is required. Another area in which randomness is inherent is quantum computing.
In the example above, the Las Vegas algorithm always outputs the correct answer, but its running time is a random variable. The Monte Carlo algorithm is guaranteed to complete in an amount of time that can be bounded by a function the input size and its parameter k, but allows a small probability of error. Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by having it output an arbitrary, possibly incorrect answer if it fails to complete within a specified time. Conversely, if an efficient verification procedure exists to check whether an answer is correct, then a Monte Carlo algorithm can be converted into a Las Vegas algorithm by running the Monte Carlo algorithm repeatedly till a correct answer is obtained.
Computational complexity
models randomized algorithms as probabilistic Turing machines. Both Las Vegas and Monte Carlo algorithms are considered, and several complexity classes are studied. The most basic randomized complexity class is RP, which is the class of decision problems for which there is an efficient randomized algorithm which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class for RP is co-RP. Problem classes having algorithms with polynomial time average case running time whose output is always correct are said to be in ZPP.The class of problems for which both YES and NO-instances are allowed to be identified with some error is called BPP. This class acts as the randomized equivalent of P, i.e. BPP represents the class of efficient randomized algorithms.
Early history
Sorting
was discovered by Tony Hoare in 1959, and subsequently published in 1961. In the same year, Hoare published the quickselect algorithm, which finds the median element of a list in linear expected time. It remained open until 1973 whether a deterministic linear-time algorithm existed.Number theory
In 1917, Henry Cabourn Pocklington introduced a randomized algorithm known as Pocklington's algorithm for efficiently finding square roots modulo prime numbers.In 1970, Elwyn Berlekamp introduced a randomized algorithm for efficiently computing the roots of a polynomial over a finite field. In 1977, Robert M. Solovay and Volker Strassen discovered a polynomial-time randomized primality test. Soon afterwards Michael O. Rabin demonstrated that the 1976 Miller's primality test could also be turned into a polynomial-time randomized algorithm. At that time, no provably polynomial-time deterministic algorithms for primality testing were known.
Data structures
One of the earliest randomized data structures is the hash table, which was introduced in 1953 by Hans Peter Luhn at IBM. Luhn's hash table used chaining to resolve collisions and was also one of the first applications of linked lists. Subsequently, in 1954, Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research introduced linear probing, although Andrey Ershov independently had the same idea in 1957. In 1962, Donald Knuth performed the first correct analysis of linear probing, although the memorandum containing his analysis was not published until much later. The first published analysis was due to Konheim and Weiss in 1966.Early works on hash tables either assumed access to a fully random hash function or assumed that the keys themselves were random. In 1979, Carter and Wegman introduced universal hash functions, which they showed could be used to implement chained hash tables with constant expected time per operation.
Early work on randomized data structures also extended beyond hash tables. In 1970, Burton Howard Bloom introduced an approximate-membership data structure known as the Bloom filter. In 1989, Raimund Seidel and Cecilia R. Aragon introduced a randomized balanced search tree known as the treap. In the same year, William Pugh introduced another randomized search tree known as the skip list.
Implicit uses in combinatorics
Prior to the popularization of randomized algorithms in computer science, Paul Erdős popularized the use of randomized constructions as a mathematical technique for establishing the existence of mathematical objects. This technique has become known as the probabilistic method. Erdős gave his first application of the probabilistic method in 1947, when he used a simple randomized construction to establish the existence of Ramsey graphs. He famously used a more sophisticated randomized algorithm in 1959 to establish the existence of graphs with high girth and chromatic number.Examples
Quicksort
is a familiar, commonly used algorithm in which randomness can be useful. Many deterministic versions of this algorithm require O time to sort n numbers for some well-defined class of degenerate inputs, with the specific class of inputs that generate this behavior defined by the protocol for pivot selection. However, if the algorithm selects pivot elements uniformly at random, it has a provably high probability of finishing in O time regardless of the characteristics of the input.Randomized incremental constructions in geometry
In computational geometry, a standard technique to build a structure like a convex hull or Delaunay triangulation is to randomly permute the input points and then insert them one by one into the existing structure. The randomization ensures that the expected number of changes to the structure caused by an insertion is small, and so the expected running time of the algorithm can be bounded from above. This technique is known as randomized incremental construction.Min cut
Input: A graph GOutput: A cut partitioning the vertices into L and R, with the minimum number of edges between L and R.
Recall that the contraction of two nodes, u and v, in a graph yields a new node u ' with edges that are the union of the edges incident on either u or v, except from any edge connecting u and v. Figure 1 gives an example of contraction of vertex A and B.
After contraction, the resulting graph may have parallel edges, but contains no self loops.
Karger's basic algorithm:
begin
i = 1
repeat
repeat
Take a random edge ∈ E in G
replace u and v with the contraction u'
until only 2 nodes remain
obtain the corresponding cut result Ci
i = i + 1
until i = m
output the minimum cut among C1, C2,..., Cm.
end
In each execution of the outer loop, the algorithm repeats the inner loop until only 2 nodes remain, the corresponding cut is obtained. The run time of one execution is, and n denotes the number of vertices.
After m times executions of the outer loop, we output the minimum cut among all the results. The figure 2 gives an
example of one execution of the algorithm. After execution, we get a cut of size 3.