The prior of the parameters

Let us denote the elements of the weight matrices of the MLP networks by $\mathbf{A}= (A_{ij}), \mathbf{B}= (B_{ij}), \mathbf{C}= (C_{ij})$ and $\mathbf{D}= (D_{ij})$ . The bias vectors consist similarly of elements $\mathbf{a} = (a_i), \mathbf{b} = (b_i), \mathbf{c} = (c_i)$ and $\mathbf{d} = (d_i)$ .

All the elements of the weight matrices and the bias vectors are assumed to be independent and Gaussian. Their priors are as follows:

$\displaystyle p(A_{ij})$	$\displaystyle = N(A_{ij};\; 0, 1)$	(5.27)
$\displaystyle p(B_{ij})$	$\displaystyle = N(B_{ij};\; 0, \exp(2 v_{B_j}))$	(5.28)
$\displaystyle p(a_i)$	$\displaystyle = N(a_i;\; m_a, \exp(2 v_a))$	(5.29)
$\displaystyle p(b_i)$	$\displaystyle = N(b_i;\; m_b, \exp(2 v_b))$	(5.30)
$\displaystyle p(C_{ij})$	$\displaystyle = N(C_{ij};\; 0, \exp(2 v_{C_i}))$	(5.31)
$\displaystyle p(D_{ij})$	$\displaystyle = N(D_{ij};\; 0, \exp(2 v_{D_j}))$	(5.32)
$\displaystyle p(c_i)$	$\displaystyle = N(c_i;\; m_c, \exp(2 v_c))$	(5.33)
$\displaystyle p(d_i)$	$\displaystyle = N(d_i;\; m_d, \exp(2 v_d)).$	(5.34)

Each of the bias vectors has a hierarchical prior that is shared among the different elements of that particular vector. The hyperparameters

and

all have zero mean Gaussian priors with standard deviation 100, which is a flat, essentially noninformative prior.

The structure of the priors of the weight matrices is much more interesting. The prior of $\mathbf{A}$ is chosen to be fixed to resolve a scaling indeterminacy between the hidden states $\mathbf{s}(t)$ and the weights of the MLP networks. This is evident from Equation (5.19) where any scaling in one of these parameters could be compensated by the other without affecting the results in any way. The other weight matrices $\mathbf{B}, \mathbf{C}$ and $\mathbf{D}$ have zero mean priors with common variance for all the weights related to a single hidden neuron.

The remaining variance parameters from the priors of the weight matrices and from Equations (5.23), (5.25) and (5.26) again have hierarchical priors defined as

$\displaystyle p(v_{B_j})$	$\displaystyle = N(v_{B_j};\; m_{v_B}, \exp(2 v_{v_B}))$	(5.35)
$\displaystyle p(v_{C_i})$	$\displaystyle = N(v_{C_i};\; m_{v_C}, \exp(2 v_{v_C}))$	(5.36)
$\displaystyle p(v_{D_j})$	$\displaystyle = N(v_{D_j};\; m_{v_D}, \exp(2 v_{v_D}))$	(5.37)
$\displaystyle p(v_{n_k})$	$\displaystyle = N(v_{n_k};\; m_{v_n}, \exp(2 v_{v_n}))$	(5.38)
$\displaystyle p(v_{m_k})$	$\displaystyle = N(v_{m_k};\; m_{v_m}, \exp(2 v_{v_m}))$	(5.39)
$\displaystyle p(v_{s^0_k})$	$\displaystyle = N(v_{s^0_k};\; m_{v_s^0}, \exp(2 v_{v_s^0})).$	(5.40)