MacKay showed in [39] how ensemble learning can be applied to learning of HMMs with discrete observations. With suitable priors for all the variables, the problem can be solved analytically and the resulting algorithm turns out to be a rather simple modification of the Baum-Welch algorithm.
MacKay uses Dirichlet distributions as priors for the model parameters
. (See Appendix A for the definition of
the Dirichlet distribution and some of its properties.) The
likelihood of the model is discrete with respect to all the parameters
and the Dirichlet distribution is the conjugate prior of such a
distribution. Because of this, the posterior distributions of all the
parameters are also Dirichlet. The update rule for the parameters
of the Dirichlet distribution of the transition probabilities
is
The posterior distribution of the hidden state probabilities turns out
to be of exactly the same form as the likelihood in
Equation (4.6) but with replaced with
and
similarly for
and
. The required expectations over
the Dirichlet distribution can be evaluated as in
Equation (A.13) in
Appendix A.