|
There are several possible architectures for switching SSMs.
Figure 4.3 shows some of the most basic
ones [43]. The first subfigure corresponds to the case
where the function
and possibly the model for noise
in Equation (4.13) are different for different states. In
the second subfigure, the function
and the noise
depend on the switching variable. Some combination of these two
approaches is of course also possible. The third subfigure shows an
interesting architecture proposed by Ghahramani and
Hinton [18] in which there are several completely
separate SSMs and the switching variable chooses between them. Their
model is especially interesting as it uses ensemble learning to infer
the model parameters.
One of the problems with switching SSMs is that the exact E-step of
the EM algorithm is intractable, even if the individual continuous
hidden states are Gaussian. Assuming the HMM has states, the
posterior of a single state variable
will be a mixture of
Gaussians, one for each HMM state
. When this is propagated
forward according to the dynamical model, the mixture grows
exponentially as the number of possible HMM state sequences increases.
Finally, when the full observation sequence of length
is taken
into account, the posterior of each
will be a mixture of
Gaussians.
Ensemble learning is a very useful method in developing a tractable algorithm for the problem, although there are other heuristic methods for the same purpose. The other methods typically use some greedy procedure in collapsing the distribution and this may cause inaccuracies. This is not a problem with ensemble learning -- it considers the whole sequence and minimises the Kullback-Leibler divergence, which in this case has no local minima.