The results

After the first phase of training with the small data set, the segmentations done by the algorithm seem pretty random. This can be seen from Figure 7.5. The model has not even learnt how to separate the speech signal from the silence at the beginning and end of the data segments.

**Figure 7.5:** An example of the segmentation given by the algorithm after the first phase of learning. The states `' and `' correspond to silence at the beginning and the end of the utterance. The first subfigure shows the marginal probabilities of the HMM states for each sample. The second subfigure shows the data, the third shows the continuous hidden states $\mathbf{s}(t)$ and the last shows the innovation processes $\mathbf{s}(t)- \mathbf{g}(\mathbf{s}(t-1))$ . The HMM does its segmentation solely based on the values of the innovation process, i.e. the last subfigure. The word in the figure is ``VASEN'' meaning ``left''.
$\includegraphics[width=\textwidth]{pics/segment_vasen_ver1}$

After the full training the segmentations seem rather good, as Figures 7.6 and 7.7 show. This is a very encouraging result, considering that the segmentations are performed using only the innovation process (the last subfigure) of the NSSM which consists mostly of the leftovers of the other parts of the model. The results should be significantly better with a model that gives the HMM a larger part in predicting the data.

**Figure 7.6:** An example of the segmentation given by the algorithm after complete learning. The data and the meanings of the different parts of the figure are the same as in Figure 7.5. The results are significantly better though not yet quite perfect.
$\includegraphics[width=\textwidth]{pics/segment_vasen_ver2}$

**Figure 7.7:** Another example of segmentation given by the algorithm after complete learning. The meanings of the different parts of the figure are the same as in Figures 7.5 and 7.6. The figure illustrates the segmentation of a longer word. The result shows several relatively probable paths, not just one as in the previous figures. The word in the figure is ``POHJANMAALLA''. The double phonemes are treated as one in the segmentation.
$\includegraphics[width=\textwidth]{pics/segment_pohjanmaalla}$