At the beginning of the learning for a new data set, the posterior means of the network weights are initialised to random values and the variances to small constant values. The original data is augmented with delay coordinate embedding, which was presented in Section 2.1.4, so that it consists of multiple time-shifted copies. The hidden states are initialised with a principal component (PCA) [27] projection of the augmented data. It is also used in training at the beginning of the learning.
The learning procedure of the NSSM consists of sweeps. During one sweep, all the parameters of the model are updated as outlined above. There are, however, different phases in learning so that not all the parameters are updated at the very beginning. These phases are summarised in Table 6.1.