Intrinsic dimension
In mathematics, the intrinsic dimension of a subset can be thought of as the minimal number of variables needed to represent the subset. The concept has widespread applications in geometry, dynamical systems, signal processing, statistics, and other fields. Due to its widespread applications and vague conceptualization, there are many different ways to define it rigorously. Consequently, the same set might have different intrinsic dimensions according to different definitions.
The intrinsic dimension can be used as a lower bound of what dimension it is possible to compress a data set into through dimension reduction, but it can also be used as a measure of the complexity of the data set or signal. For a data set or signal of N variables, its intrinsic dimension M satisfies 0 ≤ M ≤ N, although estimators may yield higher values.
Exact dimension
Differential
In differential geometry, given a differentiable manifold N and a submanifold M, the intrinsic dimension of M is its dimension. Suppose N has n dimensions and M has m dimensions, then that means around any point in M, there exists a local coordinate system of N, such that the manifold M is simply the subset of N defined by.Metric
Given a mere metric space, we can still define its intrinsic dimension. The most general case is the Hausdorff dimension, though for metric spaces occurring in practice, the box-counting dimension and the packing dimension often are identical to the Hausdorff dimension.Let be a metric space and be totally bounded. Define the covering numberThe metric entropy is . The upper and lower metric entropy dimensions areIf they are equal, then is that common value, called the metric entropy dimension. The entropy dimensions are usually used in information theory, and especially coding theory, since entropy is involved in its definition.
Topological
If is merely a topological space, then we can still define its intrinsic dimension, using the topological dimension or Lebesgue covering dimension.An open cover of a topological space is a family of open sets such that their union is the whole space, =. The order or ply of an open cover = is the smallest number for which each point of the space belongs to at most open sets in the cover: in other words 1 ∩ ⋅⋅⋅ ∩ +1 = for '1,..., +1 distinct. A refinement of an open cover = is another open cover =, such that each is contained in some . The covering dimension of a topological space is defined to be the minimum value of such that every finite open cover of X has an open refinement with order. The refinement can always be chosen to be finite. Thus, if is finite, 1 ∩ ⋅⋅⋅ ∩ +2 = for '1,..., +2 distinct. If no such minimal exists, the space is said to have infinite covering dimension.
Introductory example
Let be a two-variable function which is of the formfor some one-variable function g which is not constant. This means that f varies, in accordance to g, with the first variable or along the first coordinate. On the other hand, f is constant with respect to the second variable or along the second coordinate. It is only necessary to know the value of one, namely the first, variable in order to determine the value of f. Hence, it is a two-variable function but its intrinsic dimension is one.
A slightly more complicated example is.
f is still intrinsic one-dimensional, which can be seen by making a variable transformation
and
which gives
Since the variation in f can be described by the single variable y1 its intrinsic dimension is one.
For the case that f is constant, its intrinsic dimension is zero since no variable is needed to describe variation. For the general case, when the intrinsic dimension of the two-variable function f is neither zero or one, it is two.
In the literature, functions which are of intrinsic dimension zero, one, or two are sometimes referred to as i0D, i1D or i2D, respectively.
Signal processing
In signal processing of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal.For an N-variable function f, the set of variables can be represented as an N-dimensional vector x:
If for some M-variable function g and M × N matrix A it is the case that
- for all x;
- M is the smallest number for which the above relation between f and g can be found,
The intrinsic dimension is a characterization of f, it is not an unambiguous characterization of g nor of A. That is, if the above relation is satisfied for some f, g, and A, it must also be satisfied for the same f and g′ and A′ given by
and
where B is a non-singular M × M matrix, since
The Fourier transform of signals of low intrinsic dimension
An N variable function which has intrinsic dimension M < N has a characteristic Fourier transform. Intuitively, since this type of function is constant along one or several dimensions its Fourier transform must appear like an impulse along the same dimension in the frequency domain.A simple example
Let f be a two-variable function which is i1D. This means that there exists a normalized vector and a one-variable function g such thatfor all. If F is the Fourier transform of f it must be the case that
Here G is the Fourier transform of g, δ is the [Dirac delta function|Dirac impulse function] and m is a normalized vector in perpendicular to n. This means that F vanishes everywhere except on a line which passes through the origin of the frequency domain and is parallel to m. Along this line F varies according to G.
The general case
Let f be an N-variable function which has intrinsic dimension M, that is, there exists an M-variable function g and M × N matrix A such thatIts Fourier transform F can then be described as follows:
- F vanishes everywhere except for a subspace of dimension M
- The subspace M is spanned by the rows of the matrix A
- In the subspace, F varies according to G the Fourier transform of ''g''
Generalizations
The type of intrinsic dimension described above assumes that a linear transformation is applied to the coordinates of the N-variable function f to produce the M variables which are necessary to represent every value of f. This means that f is constant along lines, planes, or hyperplanes, depending on N and M.In a general case, f has intrinsic dimension M if there exist M functions a1, a2,..., aM and an M-variable function g such that
- for all x
- M is the smallest number of functions which allows the above transformation
- , f is i1D and is constant along any circle centered at the origin
- , f is i1D and is constant along all rays from the origin
Local Intrinsic Dimensionality
Different parts of a dataset might have different intrinsic dimensions. This is often referred to as local intrinsic dimensionality. Often data is distributed on a lower-dimensional manifold when only considering a nearby subset of the data. For example, the function can be considered one-dimensional when y is close to 0, two-dimensional when y is close to 1, and again one-dimensional when y is positive and much larger than 1.Local intrinsic dimensionality is often used with respect to data. It then usually is estimated based on the k nearest neighbors of a data point, often based on a concept related to the doubling dimension in mathematics. Since the volume of a d-sphere grows exponentially in d, the rate at which new neighbors are found as the search radius is increased can be used to estimate the local intrinsic dimensionality. However, alternate approaches of estimation have been proposed, for example angle-based estimation.
Intrinsic dimension estimation
Intrinsic dimension of data manifolds can be estimated by many methods, depending on assumptions of the data manifold. A 2016 review is.The two-nearest neighbors method is a method for estimating the intrinsic dimension of an immersed Riemannian manifold. The algorithm is as follows:
Scatter some points on the manifold.
Measure for many points, where are the distances to the point's two closest neighbors.
Fit the empirical CDF of to.
Return.
History
During the 1950s so called "scaling" methods were developed in the social sciences to explore and summarize multidimensional data sets. After Shepard introduced non-metric multidimensional scaling in 1962 one of the major research areas within multi-dimensional scaling was estimation of the intrinsic dimension. The topic was also studied in information theory, pioneered by Bennet in 1965 who coined the term "intrinsic dimension" and wrote a computer program to estimate it.During the 1970s intrinsic dimensionality estimation methods were constructed that did not depend on dimensionality reductions such as MDS: based on local eigenvalues., based on distance distributions, and based on other dimension-dependent geometric properties
Estimating intrinsic dimension of sets and probability measures has also been extensively studied since around 1980 in the field of dynamical systems, where dimensions of attractors have been the subject of interest. For strange attractors there is no manifold assumption, and the dimension measured is some version of fractal dimension — which also can be non-integer. However, definitions of fractal dimension yield the manifold dimension for manifolds.
In the 2000s the "curse of dimensionality" has been exploited to estimate intrinsic dimension.
Applications
The case of a two-variable signal which is i1D appears frequently in computer vision and image processing and captures the idea of local image regions which contain lines or edges. The analysis of such regions has a long history, but it was not until a more formal and theoretical treatment of such operations began that the concept of intrinsic dimension was established, even though the name has varied.For example, the concept which here is referred to as an image neighborhood of intrinsic dimension 1 or i1D neighborhood is called 1-dimensional by Knutsson, linear symmetric by Bigün & Granlund and simple neighborhood in Granlund & Knutsson.