Hidden Markov model
A hidden Markov model is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing. By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time. Estimation of the parameters in an HMM can be performed using maximum likelihood estimation. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate parameters.
Hidden Markov models are known for their applications to thermodynamics, statistical mechanics, physics, chemistry, economics, finance, signal processing, information theory, pattern recognition—such as speech recognition, handwriting recognition, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.
Definition
Let and be discrete-time stochastic processes and. The pair is a hidden Markov model if- is a Markov process whose behavior is not directly observable ;
- ,
- is a Markov process whose behavior is not directly observable ;
- ,
Terminology
Examples
Drawing balls from hidden urns
In its discrete form, a hidden Markov process can be visualized as a generalization of the urn problem with replacement.Consider this example
In a room that is not visible to an observer there is a genie. The room contains urns X1, X2, X3,... each of which contains a known mix of balls, with each ball having a unique label y1, y2, y3,.... The genie chooses an urn in that room and randomly draws a ball from that urn. It then puts the ball onto a conveyor belt, where the observer can observe the sequence of the balls but not the sequence of urns from which they were drawn.The genie has some procedure to choose urns:
- The choice of the urn for the n-th ball depends only upon a random number and the choice of the urn for the ball.
- The choice of urn does not directly depend on the urns chosen before this single previous urn.
Markov process
The Markov process cannot be observed, only the sequence of labeled balls, thus this arrangement is called a hidden Markov process. This is illustrated by the lower part of the diagram shown in Figure 1, where one can see that balls y1, y2, y3, y4 can be drawn at each state. Even if the observer knows the composition of the urns and has just observed a sequence of three balls, e.g. y1, y2 and y3 on the conveyor belt, the observer still cannot be sure which urn the genie has drawn the third ball from. However, the observer can work out other information, such as the likelihood that the third ball came from each of the urns.Weather guessing game
Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.Alice believes that the weather operates as a discrete Markov chain. There are two states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are hidden from her. On each day, there is a certain chance that Bob will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system is that of a hidden Markov model.
Alice knows the general weather trends in the area, and what Bob likes to do on average. In other words, the parameters of the HMM are known. They can be represented as follows in Python:
states =
observations =
start_probability =
transition_probability =
emission_probability =
In this piece of code,
start_probability represents Alice's belief about which state the HMM is in when Bob first calls her. The particular probability distribution used here is not the equilibrium one, which is approximately . The transition_probability represents the change of the weather in the underlying Markov chain. In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. The emission_probability represents how likely Bob is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny, there is a 60% chance that he is outside for a walk.''A similar example is further elaborated in the Viterbi algorithm page.''
Structural architecture
The diagram below shows the general architecture of an instantiated HMM. Each oval shape represents a random variable that can adopt any of a number of values. The random variable is the hidden state at time denote conditional dependencies.From the diagram, it is clear that the conditional probability distribution of the hidden variable at time, given the values of the hidden variable at all times, depends only on the value of the hidden variable ; the values at time and before have no influence. This is called the Markov property. Similarly, the value of the observed variable depends on only the value of the hidden variable .
In the standard type of hidden Markov model considered here, the state space of the hidden variables is discrete, while the observations themselves can either be discrete or continuous. The parameters of a hidden Markov model are of two types, transition probabilities and emission probabilities. The transition probabilities control the way the hidden state at time is chosen given the hidden state at time.
The hidden state space is assumed to consist of one of possible values, modelled as a categorical distribution. This means that for each of the possible states that a hidden variable at time can be in, there is a transition probability from this state to each of the possible states of the hidden variable at time, for a total of transition probabilities. The set of transition probabilities for transitions from any given state must sum to. Thus, the matrix of transition probabilities is a Markov matrix. Because any transition probability can be determined once the others are known, there are a total of transition parameters.
In addition, for each of the possible states, there is a set of emission probabilities governing the distribution of the observed variable at a particular time given the state of the hidden variable at that time. The size of this set depends on the nature of the observed variable. For example, if the observed variable is discrete with possible values, governed by a categorical distribution, there will be separate parameters, for a total of emission parameters over all hidden states. On the other hand, if the observed variable is an -dimensional vector distributed according to an arbitrary multivariate Gaussian distribution, there will be parameters controlling the means and parameters controlling the covariance matrix, for a total of emission parameters.
Inference
Several inference problems are associated with hidden Markov models, as outlined below.Probability of an observed sequence
The task is to compute in a best way, given the parameters of the model, the probability of a particular output sequence. This requires summation over all possible state sequences:The probability of observing a sequence
of length L is given by
where the sum runs over all possible hidden-node sequences
Applying the principle of dynamic programming, this problem, too, can be handled efficiently using the forward algorithm.
Probability of the latent variables
A number of related tasks ask about the probability of one or more of the latent variables, given the model's parameters and a sequence of observations.Filtering
The task is to compute, given the model's parameters and a sequence of observations, the distribution over hidden states of the last latent variable at the end of the sequence, i.e. to compute. This task is used when the sequence of latent variables is thought of as the underlying states that a process moves through at a sequence of points in time, with corresponding observations at each point. Then, it is natural to ask about the state of the process at the end.This problem can be handled efficiently using the forward algorithm. An example is when the algorithm is applied to a Hidden Markov Network to determine.
Smoothing
This is similar to filtering but asks about the distribution of a latent variable somewhere in the middle of a sequence, i.e. to compute for some. From the perspective described above, this can be thought of as the probability distribution over hidden states for a point in time k in the past, relative to time t.The forward-backward algorithm is a good method for computing the smoothed values for all hidden state variables.
Most likely explanation
The task, unlike the previous two, asks about the joint probability of the entire sequence of hidden states that generated a particular sequence of observations. This task is generally applicable when HMM's are applied to different sorts of problems from those for which the tasks of filtering and smoothing are applicable. An example is part-of-speech tagging, where the hidden states represent the underlying parts of speech corresponding to an observed sequence of words. In this case, what is of interest is the entire sequence of parts of speech, rather than simply the part of speech for a single word, as filtering or smoothing would compute.This task requires finding a maximum over all possible state sequences, and can be solved efficiently by the Viterbi algorithm.