Gated recurrent unit
In artificial neural networks, the gated recurrent unit is a gating mechanism used in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory with a gating mechanism to input or forget certain features, but lacks a context vector or output gate, resulting in fewer parameters than LSTM.
GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM. GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better.
Architecture
There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.In the following, the operator denotes the Hadamard product.
Fully gated unit
Initially, for, the output vector is.\begin
z_t &= \sigma \
r_t &= \sigma \\
\hat_t &= \phi \\
h_t &= \odot h_ + z_t \odot \hat_t
\end
Variables :
- : input vector
- : output vector
- : candidate activation vector
- : update gate vector
- : reset gate vector
- , and : parameter matrices and vector which need to be learned during training
- : The original is a logistic function.
- : The original is a hyperbolic tangent.
Alternate forms can be created by changing and
- Type 1: each gate depends only on the previous hidden state and the bias.
- :
- Type 2: each gate depends only on the previous hidden state.
- :
- Type 3: each gate is computed using only the bias.
- :
Minimal gated unit
Variables
- : input vector
- : output vector
- : candidate activation vector
- : forget vector
- , and : parameter matrices and vector
Light gated recurrent unit
LiGRU has been studied from a Bayesian perspective. This analysis yielded a variant called light Bayesian recurrent unit, which showed slight improvements over the LiGRU on speech recognition tasks.