Random number generation
Random number generation is a process by which, often by means of a random number generator, a sequence of numbers or symbols is generated that cannot be reasonably predicted better than by random chance. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators, wherein each generation is a function of the current value of a physical environment's attribute that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-called random number generations done by pseudorandom number generators, which generate pseudorandom numbers that are in fact predetermined—these numbers can be reproduced simply by knowing the initial state of the PRNG and the method it uses to generate numbers. There is also a class of non-physical true random number generators that produce true random numbers without an access to a dedicated hardware source, by scavenging entropy that is present in the computer system. See the details in True vs. pseudo-random numbers.
Various applications of randomness have led to the development of different methods for generating random data. Some of these have existed since ancient times, including well-known examples like the rolling of dice, coin flipping, the shuffling of playing cards, the use of yarrow stalks in the I Ching, as well as countless other techniques. Because of the mechanical nature of these techniques, generating large quantities of sufficiently random numbers required much work and time. Thus, results would sometimes be collected and distributed as random number tables.
Several computational methods for pseudorandom number generation exist. All fall short of the goal of true randomness, although they may meet, with varying success, some of the statistical tests for randomness intended to measure how unpredictable their results are. This generally makes them unusable for applications such as cryptography. However, carefully designed cryptographically secure pseudorandom number generators also exist, with special features specifically designed for use in cryptography.
Practical applications and uses
Random number generators have applications in gambling, statistical sampling, computer simulation, cryptography, completely randomized design, and other areas where producing an unpredictable result is desirable. Generally, in applications having unpredictability as the paramount feature, such as in security applications, hardware generators are generally preferred over pseudorandom algorithms, where feasible.Pseudorandom number generators are very useful in developing Monte Carlo-method simulations, as debugging is facilitated by the ability to run the same sequence of random numbers again by starting from the same random seed. They are also used in cryptography – so long as the seed is secret. The sender and receiver can generate the same set of numbers automatically to use as keys.
The generation of pseudorandom numbers is an important and common task in computer programming. While cryptography and certain numerical algorithms require a very high degree of apparent randomness, many other operations only need a modest amount of unpredictability. Some simple examples might be presenting a user with a "random quote of the day", or determining which way a computer-controlled adversary might move in a computer game. Weaker forms of randomness are used in hash algorithms and in creating amortized searching and sorting algorithms.
Some applications that appear at first sight to be suitable for randomization are in fact not quite so simple. For instance, a system that "randomly" selects music tracks for a background music system must only appear random, and may even have ways to control the selection of music: a truly random system would have no restriction on the same item appearing two or three times in succession.
True vs. pseudo-random numbers
There are two principal methods used to generate random numbers. The first method measures some physical phenomenon that is expected to be random and then compensates for possible biases in the measurement process. Example sources include measuring atmospheric noise, thermal noise, and other external electromagnetic and quantum phenomena. For example, cosmic background radiation or radioactive decay as measured over short timescales represent sources of natural entropy.The speed at which entropy can be obtained from natural sources is dependent on the underlying physical phenomena being measured. Thus, sources of naturally occurring true entropy are said to be blocking they are rate-limited until enough entropy is harvested to meet the demand. On some Unix-like systems, including most Linux distributions, the pseudo device file will block until sufficient entropy is harvested from the environment. Due to this blocking behavior, large bulk reads from, such as filling a hard disk drive with random bits, can often be slow on systems that use this type of entropy source.
The second method uses computational algorithms that can produce long sequences of apparently random results, which are in fact completely determined by a shorter initial value, known as a seed value or key. As a result, the entire seemingly random sequence can be reproduced if the seed value is known. This type of random number generator is often called a pseudorandom number generator. This type of generator typically does not rely on sources of naturally occurring entropy, though it may be periodically seeded by natural sources. This generator type is non-blocking, so they are not rate-limited by an external event, making large bulk reads a possibility.
Standard cryptographic designs take a hybrid approach, using randomness harvested from natural sources to seed a cryptographically secure pseudorandom number generators. Hardware random number generators generally produce only a limited number of random bits per second. In order to increase the available output data rate, they are often used to generate the "seed" for a faster PRNG. PRNG also helps with the noise source "anonymization" and entropy extraction. With a proper PRNG algorithm selected, the combination can satisfy the requirements of Federal Information Processing Standards and Common Criteria standards.
Generation methods
Physical methods
The earliest methods for generating random numbers, such as dice, coin flipping and roulette wheels, are still used today, mainly in games and gambling, as they tend to be too slow for most applications in statistics and cryptography.Computational methods
For performance and security reasons, most random number generators use PRNGs in their construction.By humans
Random number generation may also be performed by humans, in the form of collecting various inputs from end users and using them as a randomization source. However, most studies find that human subjects have some degree of non-randomness when attempting to produce a random sequence of e.g., digits or letters. They may alternate too much between choices when compared to a good random generator; thus, this approach is not widely used. However, for the very reason that humans perform poorly in this task, human random number generation can be used as a tool to gain insights into brain functions otherwise not accessible.Post-processing and statistical checks
Even given a source of plausible random numbers, obtaining numbers which are completely unbiased takes care. In addition, behavior of these generators often changes with temperature, power supply voltage, the age of the device, or other outside interference.Generated random numbers are sometimes subjected to statistical tests before use to ensure that the underlying source is still working, and then post-processed to improve their statistical properties. An example would be the TRNG9803 hardware random number generator, which uses an entropy measurement as a hardware test, and then post-processes the random sequence with a shift register stream cipher. It is generally hard to use statistical tests to validate the generated random numbers. Wang and Nicol proposed a distance-based statistical testing technique that is used to identify the weaknesses of several random generators. Li and Wang proposed a method of testing random numbers based on laser chaotic entropy sources using Brownian motion properties.
Statistical tests are also used to give confidence that the post-processed final output from a random number generator is truly unbiased, with numerous randomness test suites being developed.
Other considerations
Reshaping the distribution
Uniform distributions
Most random number generators natively work with integers or individual bits, so an extra step is required to arrive at the canonical uniform distribution between 0 and 1. The implementation is not as trivial as dividing the integer by its maximum possible value. Specifically:- The integer used in the transformation must provide enough bits for the intended precision.
- The nature of floating-point math itself means there exists more precision the closer the number is to zero. This extra precision is usually not used due to the sheer number of bits required.
- Rounding error in division may bias the result. At worst, a supposedly excluded bound may be drawn contrary to expectations based on real-number math.
Uniformly distributed integers are commonly used in algorithms such as the Fisher–Yates shuffle. Again, a naive implementation may induce a modulo bias into the result, so more involved algorithms must be used. A method that nearly never performs division was described in 2018 by Daniel Lemire, with the current state-of-the-art being the arithmetic encoding-inspired 2021 "optimal algorithm" by Stephen Canon of Apple Inc.
Most 0 to 1 RNGs include 0 but exclude 1, while others include or exclude both.