Information theory
Information theory is the mathematical study of the quantification, storage, and communication of a particular type of mathematically defined information. The field was established and formalized by Claude Shannon in the 1940s, though early contributions were made in the 1920s through the works of Harry Nyquist and Ralph Hartley. It is at the intersection of electronic engineering, mathematics, statistics, computer science, neurobiology, physics, and electrical engineering.
As a simple example, if one flips a fair coin and does not know the outcome, then they lack a certain amount of information. If one looks at the coin, they will know the outcome and gain that same amount of information. For a fair coin, the probability of either heads or tails is 1/2 and that amount of information can be expressed as = 1 bit of information.
A key measure in information theory is information entropy. Entropy is equal to the lack of information about a random variable or the outcome of a random process. In the above coin flip example, the entropy in the case where you don't know the outcome is 1 bit. In the case where you do know the outcome, the entropy is zero. As another example, knowing the outcome of a fair coin flip provides less information than identifying the outcome from a roll of a die. A simple way to look at information entropy in bits is approximately the average minimum number of yes/no questions you need to ask in order to recover complete information.. For the coin flip example, you only need to ask one question, e.g. "Is it heads?". "Yes" means its heads, "No" means it's tails, and you have your one bit of complete information.
Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy. Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory and information-theoretic security.
Applications of fundamental topics of information theory include source coding/data compression, and channel coding/error detection and correction. Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones and the development of the Internet and artificial intelligence. The theory has also found applications in other areas, including statistical inference, cryptography, neurobiology, perception, signal processing, linguistics, the evolution and function of molecular codes, thermal physics, molecular dynamics, black holes, quantum computing, information retrieval, intelligence gathering, plagiarism detection, pattern recognition, anomaly detection, the analysis of music, art creation, imaging system design, study of outer space, the dimensionality of space, and epistemology.
Overview
Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was formalized in 1948 by Claude Shannon in a paper entitled A Mathematical Theory of Communication, in which information is thought of as a set of possible messages, and the goal is to send these messages over a noisy channel, and to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem, showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression and error-correction techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.
A third class of information theory codes are cryptographic algorithms. Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis, such as the unit ban.
Historical background
The landmark event establishing the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948. Historian James Gleick rated the paper as the most important development of 1948, noting that the paper was "even more profound and more fundamental" than the transistor. He came to be known as the "father of information theory". Shannon outlined some of his initial ideas of information theory as early as 1939 in a letter to Vannevar Bush.Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation , where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as, where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which since has sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.
Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.
In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion:
With it came the ideas of:
- The information entropy and redundancy of a source, and its relevance through the source coding theorem;
- The mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
- The practical result of the Shannon–Hartley law for the channel capacity of a Gaussian channel; as well as
- The bit—a new way of seeing the most fundamental unit of information.
Quantities of information
Another useful concept is mutual information defined on two random variables, which describes the measure of information in common between those variables, which can be used to describe their correlation. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit or shannon, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.
In what follows, an expression of the form is considered by convention to be equal to zero whenever. This is justified because for any logarithmic base.
Entropy of an information source
Based on the probability mass function of a source, the Shannon entropy H, in units of bits per symbol, is defined as the expected value of the information content of the symbols.The amount of information conveyed by an individual source symbol with probability is known as its self-information or surprisal,. This quantity is defined as:
A less probable symbol has a larger surprisal, meaning its occurrence provides more information. The entropy is the weighted average of the surprisal of all possible symbols from the source's probability distribution:
Intuitively, the entropy of a discrete random variable is a measure of the amount of uncertainty associated with the value of when only its distribution is known. A high entropy indicates the outcomes are more evenly distributed, making the result harder to predict.
For example, if one transmits 1000 bits, and the value of each of these bits is known to the receiver ahead of transmission, no information is transmitted. If, however, each bit is independently and equally likely to be 0 or 1, 1000 shannons of information have been transmitted.