The basic model is the same as the one presented in
Section 4.1. The hidden state sequence is
denoted by
and other parameters by
.
The exact form of
will be specified later. The observations
, given the corresponding hidden state,
are assumed to be Gaussian with diagonal covariance matrix.
Given the HMM state sequence
, the individual observations are
assumed to be independent. Therefore the likelihood of the data can
be written as
Because of the Markov property, the prior distribution of the probabilities of the hidden states can also be written in factorial form:
The factors of Equations (5.1) and (5.2) are defined to be
![]() |
![]() |
(5.3) |
![]() |
![]() |
(5.4) |
![]() |
![]() |
(5.5) |
The priors of all the parameters defined above are
![]() |
![]() |
(5.6) |
![]() |
![]() |
(5.7) |
![]() |
![]() |
(5.8) |
![]() |
![]() |
(5.9) |
The parameters
and
of the
Dirichlet priors are fixed. Their values should be chosen to reflect
true prior knowledge on the possible initial states and transition
probabilities of the chain. In our example of speech recognition
where the states of the HMM represent different phonemes, these values
could, for instance, be estimated from textual data.
All the other parameters
and
have higher hierarchical priors. As the number of parameters in such
priors grows, only the full structure of the hierarchical prior of
is given. It is:
![]() |
![]() |
(5.10) |
![]() |
![]() |
(5.11) |
![]() |
![]() |
(5.12) |
The hierarchical prior of for example
can be
summarised as follows:
The set of model parameters
consists of all these parameters
and all the parameters of the hierarchical priors.
In the hierarchical structure formulated above, the Gaussian prior for
the mean of a Gaussian is a conjugate prior. Thus the posterior
will also be Gaussian.
The parameterisation of the variance with
,
is somewhat less conventional. The conjugate
prior for variance of a Gaussian is the inverse gamma
distribution. Adding a new level of hierarchy for the parameters of
such a distribution would, however, be significantly more difficult.
The present parameterisation allows adding similar layers of hierarchy
for the parameters of the priors of
and
. In this
parameterisation the posterior of
is not exactly Gaussian but it
may be approximated with one. The exponential function will ensure
that the variance will always be positive and the posterior will thus
be closer to a Gaussian.