One-class classification
In machine learning, one-class classification, also known as unary classification or class-modelling, is an approach to the training of binary classifiers in which only examples of one of the two classes are used.
Examples include the monitoring of helicopter gearboxes, motor failure prediction, or assessing the operational status of a nuclear plant as 'normal': In such scenarios, there are few, if any, examples of the catastrophic system states – rare outliers – that comprise the second class. Alternatively, the class that is being focussed on may cover a small, coherent subset of the data and the training may rely on an information bottleneck approach.
In practice, counter-examples from the second class may be used in later rounds of training to further refine the algorithm.
Overview
The term one-class classification was coined by Moya & Hush and many applications can be found in scientific literature, for example outlier detection, anomaly detection, novelty detection. A feature of OCC is that it uses only sample points from the assigned class, so that a representative sampling is not strictly required for non-target classes.Introduction
SVM based one-class classification relies on identifying the smallest hypersphere consisting of all the data points. This method is called Support Vector Data Description. Formally, the problem can be defined in the following constrained optimization form,However, the above formulation is highly restrictive, and is sensitive to the presence of outliers. Therefore, a flexible formulation, that allow for the presence of outliers is formulated as shown below,
From the Karush–Kuhn–Tucker conditions for optimality, we get
where the 's are the solution to the following optimization problem:
subject to,
The introduction of kernel function provide additional flexibility to the One-class SVM algorithm.
PU (Positive Unlabeled) learning
A similar problem is PU learning, in which a binary classifier is constructed by semi-supervised learning from only positive and unlabeled sample points.In PU learning, two sets of examples are assumed to be available for training: the positive set and a mixed set, which is assumed to contain both positive and negative samples, but without these being labeled as such. This contrasts with other forms of semisupervised learning, where it is assumed that a labeled set containing examples of both classes is available in addition to unlabeled samples. A variety of techniques exist to adapt supervised classifiers to the PU learning setting, including variants of the EM algorithm. PU learning has been successfully applied to text, time series, bioinformatics tasks, and remote sensing data.
Approaches
Several approaches have been proposed to solve one-class classification. The approaches can be distinguished into three main categories, density estimation, boundary methods, and reconstruction methods.Density estimation methods
methods rely on estimating the density of the data points, and set the threshold. These methods rely on assuming distributions, such as Gaussian, or a Poisson distribution. Following which discordancy tests can be used to test the new objects. These methods are robust to scale variance.Gaussian model is one of the simplest methods to create one-class classifiers. Due to Central Limit Theorem, these methods work best when large number of samples are present, and they are perturbed by small independent error values. The probability distribution for a d-dimensional object is given by:
Where, ' is the mean and ' is the covariance matrix. Computing the inverse of covariance matrix is the costliest operation, and in the cases where the data is not scaled properly, or data has singular directions pseudo-inverse is used to approximate the inverse, and is calculated as.
Boundary methods
Boundary methods focus on setting boundaries around a few set of points, called target points. These methods attempt to optimize the volume. Boundary methods rely on distances, and hence are not robust to scale variance. K-centers method, NN-d, and SVDD are some of the key examples.K-centers
In K-center algorithm, small balls with equal radius are placed to minimize the maximum distance of all minimum distances between training objects and the centers. Formally, the following error is minimized,
The algorithm uses forward search method with random initialization, where the radius is determined by the maximum distance of the object, any given ball should capture. After the centers are determined, for any given test object the distance can be calculated as,