One distinctive property of the data is that it is not very continuous. This is due to the bad frequency resolution of the relatively short time Fourier transform used in the preprocessing.
The same data set was used with the static NFA model in [25]. The part used in these experiments consisted of spectrograms of 24 individual words, spoken by 20 different speakers. The preprocessed data consisted of 2547 spectrogram vectors with 30 components.
|
For studying the dimensionality of the data, linear and nonlinear factor analysis were applied to the data. The results are shown in Figure 7.2. All the NFA experiments used an MLP network with 30 hidden neurons. The data manifold is clearly nonlinear, because nonlinear factor analysis is able to explain it equally well with fewer components than linear factor analysis. The difference is especially clear when the number of components is relatively small. Even though the analysis only uses static models, it can be used to estimate a lower bound for the number of continuous hidden states used in the experiments with dynamical models.
|
|
A small segment of the original data and its reconstructions with eight nonlinear and linear components are shown in Figure 7.3. The reconstructed spectrograms are somewhat smoother than the original ones. Still, all the discriminative features of the original spectrum are well preserved in the nonlinear reconstruction. This means that the dropped components mostly correspond to noise. The linear reconstruction is not as good, especially at the beginning.
The extracted nonlinear factors, rotated with linear ICA, are shown in Figure 7.4. They seem rather smooth so it seems plausible that the dynamic models would be able to model the data better. The representation of the data given by the nonlinear factors seems, however, somewhat more difficult to interpret. It is rather difficult to see how the different factors affect the predicted outputs.