Structured kNN
Structured k-nearest neighbours is a machine learning algorithm that generalizes k-nearest neighbors.
k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output.
For instance, a data sample might be a natural language sentence, and the output could be an annotated parse tree. Training a classifier consists of showing many instances of ground truth sample-output pairs. After training, the SkNN model is able to predict the corresponding output for new, unseen sample instances; that is, given a natural language sentence, the classifier can produce the most likely parse tree.
Training
As a Training, validation, and test [data sets|training set], SkNN accepts sequences of elements with class labels. The type of element does not matter; the only requirement is a defined metric function that gives a distance between each pair of elements of a set.SkNN is based on idea of creating a graph, with each node representing a class label. There is an edge between a pair of nodes if there is a sequence of two elements in the training set with corresponding classes. The first step of SkNN training is the construction of such a graph from training sequences. There are two special nodes in the graph corresponding to sentence beginnings and ends: if a sequence starts with class C, the edge between node START and node C should be created.
Like regular k-NN, the second part of SkNN training consists of storing the elements of a training sequence in a certain way. Each element of the training sequences is stored in the node related to the class of the previous element in the sequence. Every first element is stored in the START node.