Decoding toolbox to analyze MEG/EEG evoked responses (beta
version)
Matlab codes described in the manuscript Decoding
proprioceptive and tactile stimuli from MEG signals: a
feasibility study and comparison of methods.Download toolbox from here. The main script to perform analysis is called decoding_main.m.
Note: the code is dependent on and includes two other packages: glmnet_matlab and mne-matlab-master. Please check the web pages to see how to reference them.
Spectrospatial decoding toolbox
Spectrospatial Decoding Toolbox (SpeDeBox) is a Matlab toolbox designed for the spectrospatial analysis of multichannel biomedical signals, acquired, e.g., using magnetoencephalography (MEG), electroencephalography (EEG) or electromyography (EMG). The version 2.0 of the toolbox has been published in September 2015. The methods implemented in toolbox are described in the following publications:Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015. Available here
Kauppi, J-P., Parkkonen, L., Hari, R., and Hyvärinen, A. Decoding magnetoencephalographic rhythmic activity using spectrospatial information, NeuroImage 83:921-936, 2013. Available hereDownload SpeDeBox v.2.0 from here
Contents
Overview
SpeDeBox combines unsupervised and supervised learning methods to gain insights into interesting spectrospatial structures in data. SpeDeBox implements three classifiers:- Spectral linear discriminant analysis (Spectral LDA) classifier
- Sparse and bilinear classifier
- Three-way discriminant analysis (Three-way DA) classifier
The toolbox performs classification on the basis of epochs (time-windows) extracted from multichannel data. Thus, to carry out the analysis, category labels for each time-point must be provided together with an experimental data. For instance, different stimuli applied during experiments can serve as distinct categories. The analysis provides both the accuracy of the classification and the visualization of relevant spectrospatial patterns. Binary classification problem is a special case, where one set of discriminative spectrospatial patterns is provided (category 1 versus category 2). In a multicategory case, each category is characterized by its own set of discriminative spectrospatial patterns.
Input data format
To perform data analysis with the SpeDeBox, two variables must be available in the Matlab workspace: 1) the variable containing the actual multichannel data, and 2) the variable containing category labels of the data. Category labels must be provided as discrete numeric values values: {1,2,...,K}, where K is the total number of categories. Both data and class label variables can be stored e.g. as a mat-file with a specific variable name (such as Data and Labels), and then loaded into the Matlab's workspace using the load-command. Example analysis 2 demonstrates how to load the two variables, Data and Labels into the Matlab. It is assumed that the types of the variables are either double or single.There are three alternative ways to provide multichannel data for the SpeDeBox:
- Raw format, where an input data set is a C by M real-valued matrix, where C is the number of sensors and M is the length of the time-series (i.e., a total number of time-points). In this case, category labels must be provided as an M-dimensional vector, i.e., each time-point must have a category label. Example analysis 2 below uses this data representation.
- Epoch format, where an input data set is a C by N by T real-valued tensor, where C is the number of sensors, N is the is the number of epochs and T is the length of the epoch (i.e., time-window). In this case, category labels must be provided as an N-dimensional vector, i.e., each epoch must have a category label available. Example analysis 1 below uses this data representation.
- Spectral format, where an input data sets is a C by N by F complex-valued tensor, where C is the number of sensors, N is the is the number of epochs and F is the number of frequency bins in the epoch. In this case, category labels must be provided as an N-dimensional vector, i.e., each epoch must have a category label available. This representation is like Epoch format, but each epoch has been transformed into the Fourier-domain.
Note: For the final analysis, the input data set is further split into a training, validation, and test data set. See examples below how this can be performed.
Citations
Please cite the following papers if you use the toolbox in your research.Spectral LDA classifier:
Kauppi, J-P., Parkkonen, L., Hari, R., and Hyvärinen, A. Decoding magnetoencephalographic rhythmic activity using spectrospatial information, NeuroImage 83:921-936, 2013.
Friedman, J, Hastie, T, and Tibshirani, R. Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software 33(1),:1-22, 2010.
A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010
Sparse bilinear classifier:
Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015
Schmidt, M, Fung, G, Rosales, R. Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches, European Conference on Machine Learning (ECML), 2007A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010
Three-way DA classifier:
Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015
Friedman, J, Hastie, T, and Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software 33(1),:1-22, 2010
A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010
If you use the example EMG data set provided in the analysis example 2 below, please cite:
Examples
Here, two example Matlab scripts are presented to illustrate the analysis with the SpeDeBox. The first script is an example analysis with two-category synthetic data, without using Fourier ICA or visualization functions. The second script is an example analysis with the real six-category EMG hand-movement data, including the Fourier ICA and the visualization of the most discriminative spectrospatial patterns. You can directly copy and paste these scripts to the Matlab's editor or command line. Alternatively, these same scripts can be found from the m-files analysis_example1 and analysis_example2 of the SpeDeBox.
% ***********************************************************************
% Set SpeDeBox to Matlab's search path
% ***********************************************************************
% Before performing the analysis, make sure that the toolbox
% and all its subfolders are set to Matlab's path. It is also highly recommended
% to set the directory of the toolbox as the Matlab's current directory.
% If these steps have not been already carried out, below is a script
% to perform this. Verify that the path is correct in case you
% have several copies of the toolbox available in your file system.
z = which('analysis_example1.m'); % Find full filename to this m-file
f = fileparts(z); % separate filename and pathname
disp('Verify that the toolbox is in the following path:')
disp(f) % visualize pathname
addpath(genpath(f)) % add path and all its subfolders to Matlab's path
cd(f) % set toolbox directory as current Matlab's directory
% ***********************************************************************
% Generate synthetic data
% ***********************************************************************
T = 50; % total number of time-points in each epoch
C = 64; % total number of sensors
N = 500; % total number of epochs
K = 2; % total number of categories
% Generate category labels:
labels = 1 + floor(K*rand(1,N));
% Generate random data in Epoch format:
data = rand(C,N,T);
% add class-specific information to randomly selected channels:
for m = 1:K
rnCol = randperm(size(data,1));
data(rnCol(1:4),labels == m,:) = 0.1*m + data(rnCol(1:4),labels == m,:);
end
% ***********************************************************************
% Set preprocessing parameters
% ***********************************************************************
% Select preprocessing options:
% Set parameters of the STFT:
params.general.samplfreq = 1000; % sampling frequency in Hz
params.general.minfreq = 1; % minimum frequency of interest in Hz
params.general.maxfreq = 500; % maximum frequency of interest in Hz
params.general.do_ica = 0; % do not run Fourier-ICA
% ***********************************************************************
% Set classifier parameters
% ***********************************************************************
% set regularization sequences for the Sparse bilinear classifier:
params.sparse_bilinear.lambdas_comp = [10 20 30]; % regularization sequence of spatial coefficients
params.sparse_bilinear.lambdas_freq = [0]; % regularization sequence of frequency coefficients
% Note: Forcing higher values for lambdas_comp lead to more easily interpretable models
% with less number of relevant components.
% set rank (number of coefficient vectors) of the classifiers for the Sparse bilinear
% and the Three-way DA classifiers:
params.sparse_bilinear.modelRank = 1;
params.three_way_DA.modelRank = 3;
% ***********************************************************************
% Split data into training, validation and test data sets
% ***********************************************************************
% Split data set such that:
% the first 1/3 of the data is used for classifier training
% the second 1/3 of the data is used for validation of the classifier hyperparameters
% the last 1/3 of the data is used for evaluating the classifier performance for unseen data
% Note: This procedure for classifier testing is not comprehensive and could be improved
% e.g. by using a 10-fold cross-validation.
% number of samples used for training, validation and testing:
nr_samples_train = round(N/3);
nr_samples_validation = round(N/3);
nr_samples_test = round(N/3);
% train, validation and test data set indices:
samples_train = 1:nr_samples_train;
samples_validation = (nr_samples_train+1):(nr_samples_train+nr_samples_validation);
samples_test = (nr_samples_train+nr_samples_validation+1):N;
% obtain train, validation and test data sets:
data_train = data(:,samples_train,:); % training data set (first 1/3 of the data)
labels_train = labels(samples_train); % training labels
data_validation = data(:,samples_validation,:); % validation data set (second 1/3 of the data)
labels_validation = labels(samples_validation); % validation labels
data_test = data(:,samples_test,:); % test data set (last 1/3 of the data)
labels_test = labels(samples_test); % test labels
% ***********************************************************************
% Preprocess data using STFT
% ***********************************************************************
% The following function performs the selected preprocessing options (STFT, ICA).
% Outputs are Fourier-transformed complex-valued epochs, class labels of the epoch,
% and the estimated complex-valued mixing matrix (A).
[data_train,data_validation,data_test,labels_train,labels_validation,...
labels_test,A] = preprocess_data(data_train,data_validation,...
data_test,labels_train,labels_validation,labels_test,params);
% Take the absolute value of the Fourier coefficients as the classifier inputs:
data_train = abs(data_train);
data_validation = abs(data_validation);
data_test = abs(data_test);
% ***********************************************************************
% Classifier training and testing
% ***********************************************************************
% Train and validate Sparse bilinear classifier:
[Classifier_final1,resu1] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'sparse_bilinear',params);
% Train and validate Spectral LDA classifier:
[Classifier_final2,resu2] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'spectral_LDA',params);
% Train and validate Three-way DA classifier:
[Classifier_final3,resu3] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'three_way_DA',params);
% ***********************************************************************
% Display classification accuracies
% ***********************************************************************
% Classification performance is saved in the structs:
% resu1 Sparse bilinear
% resu2 Spectral LDA
% resu3 Three-way DA
% Plot classification accuracy of the test data with the accuracy of two decimals:
disp(' ')
disp('Classification accuracy for the test data set:')
disp(['Sparse bilinear: ' num2str(round(resu1.acc*1e4)/1e2) '%'])
disp(['Spectral LDA : ' num2str(round(resu2.acc*1e4)/1e2) '%'])
disp(['Three-way DA : ' num2str(round(resu3.acc*1e4)/1e2) '%'])
disp(' ')
% ***********************************************************************
To begin with, load the real EMG data set from here and save the file under the data-subfolder of the SpeDeBox.
After this, follow the steps listed below to carry out the analysis using the Matlab:
- Set SpeDeBox to Matlab's search path
- Load EMG data
- Set preprocessing parameters
- Set classifier parameters
- Split data into training, validation and test data sets
- Preprocess data using Fourier-ICA
- Classifier training and testing
- Visualize spectrospatial patterns of the final classifiers
- Display classification accuracies
% ***********************************************************************
% Set SpeDeBox to Matlab's search path
% ***********************************************************************
% Before performing the analysis, make sure that the toolbox
% and all its subfolders are set to Matlab's path. It is also highly recommended
% to set the directory of the toolbox as the Matlab's current directory.
% If these steps have not been already carried out, below is a script
% to perform this. Verify that the path is correct in case you
% have several copies of the toolbox available in your file system.
z = which('analysis_example2.m'); % Find full filename to this m-file
f = fileparts(z); % separate filename and pathname
disp('Verify that the toolbox is in the following path:')
disp(f) % visualize pathname
addpath(genpath(f)) % add path and all its subfolders to Matlab's path
cd(f) % set toolbox directory as current Matlab's directory
% ***********************************************************************
% Load real EMG data set
% ***********************************************************************
% Give the mat-file name containg C by N data matrix and class label vector of length(N):
filename = 'EMG_preproc_hand_subj1.mat';
% Note! Before performing this part, copy the real EMG data through the link
% above to your computer. It is assumed that the mat-file is saved under the
% SpeDeBox -folder or one of its subfolders (e.g. subfolder "data") which are
% under the Matlab's search path. If the file is not under the Matlab's search
% path, you must provide the string containing a full pathname to the
% file: filename = '/.../.../EMG_preproc_hand_subj1.mat';
load(filename) % load the EMG data to Matlab's workspace.
% Data is a C by N matrix containing the actual data.
% Labels contains class labels {1,2,3,4,5,6} for each time-point.
% For simplicity of visualization in this demonstration, we concentrate on
% 2-category classification and remove rest of the data:
Data(:,Labels>=3)=[];
Labels(Labels>=3)=[];
% initialize random number generator to make re-generation of the results possible:
rng(3,'twister')
% ***********************************************************************
% Set preprocessing parameters
% ***********************************************************************
% Set parameters of the STFT:
params.general.samplfreq = 2500; % sampling frequency in Hz
params.general.minfreq = 20; % minimum frequency of interest in Hz
params.general.maxfreq = 60; % maximum frequency of interest in Hz
% Because the data sets are here provided in the "Raw format", epoch length and
% epoch overlap must be given:
params.general.windowlength_sec = 0.2; % time-window length in seconds
params.general.overlapfactor = 1; % step size of the time-window
% overlapfactor = 1 means step size of windowlength
% overlapfactor = 2 means step size of windowlength/2
% overlapfactor = 3 means step size of windowlength/3, and so on.
% We also apply Fourier-ICA:
params.general.do_ica = true; % run Fourier-ICA
% Set parameters of the ICA:
params.ica.pcadim = 30; % PCA dimension (corresponds here to the number of independent components)
params.ica.complexmixing = true; % complex-valued (true) or real-valued (false) mixing matrix
params.ica.maxiter = 2000; % maximum number of iterations
% ***********************************************************************
% Set classifier parameters
% ***********************************************************************
% set regularization sequences for the Sparse bilinear classifier:
params.sparse_bilinear.lambdas_comp = [10 20 30]; % regularization sequence of spatial coefficients
params.sparse_bilinear.lambdas_freq = [0]; % regularization sequence of frequency coefficients
% Note: Forcing higher values for lambdas_comp lead to more easily interpretable models
% with less number of relevant independent components.
% set rank (number of coefficient vectors) of the classifiers for the Sparse bilinear and the Three-way DA classifiers:
params.sparse_bilinear.modelRank = 1;
params.three_way_DA.modelRank = 3;
% ***********************************************************************
% Split data into training, validation and test data sets
% ***********************************************************************
% Split data set such that:
% the first 1/3 of the data is used for classifier training
% the second 1/3 of the data is used for validation of the classifier hyperparameters
% the last 1/3 of the data is used for evaluating the classifier performance for unseen data
% Note: This procedure for classifier testing is not comprehensive and could be improved
% e.g. by using a 10-fold cross-validation.
% number of samples used for training, validation and testing:
nr_samples_train = round(size(Data,2)/3);
nr_samples_validation = round(size(Data,2)/3);
nr_samples_test = round(size(Data,2)/3);
% train, validation and test data set indices:
samples_train = 1:nr_samples_train;
samples_validation = (nr_samples_train+1):(nr_samples_train+nr_samples_validation);
samples_test = (nr_samples_train+nr_samples_validation+1):size(Data,2);
% obtain train, validation and test data sets:
data_train = Data(:,samples_train); % training data set (first 1/3 of the data)
labels_train = Labels(samples_train);
data_validation = Data(:,samples_validation); % validation data set (second 1/3 of the data)
labels_validation = Labels(samples_validation);
data_test = Data(:,samples_test); % test data set (third 1/3 of the data)
labels_test = Labels(samples_test);
% ***********************************************************************
% Preprocess data using Fourier-ICA
% ***********************************************************************
% The following function performs the selected preprocessing options (STFT, ICA).
% Outputs are Fourier-transformed complex-valued epochs, class labels of the epoch,
% and the estimated complex-valued mixing matrix (A).
[data_train,data_validation,data_test,labels_train,labels_validation,...
labels_test,A] = preprocess_data(data_train,data_validation,...
data_test,labels_train,labels_validation,labels_test,params);
% Take the absolute value of the Fourier coefficients as the classifier inputs:
data_train = abs(data_train);
data_validation = abs(data_validation);
data_test = abs(data_test);
% ***********************************************************************
% Classifier training and testing
% ***********************************************************************
% Train and validate Sparse bilinear classifier:
[Classifier_final1,resu1] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'sparse_bilinear',params);
% Train and validate Spectral LDA classifier:
[Classifier_final2,resu2] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'spectral_LDA',params);
% Train and validate Three-way DA classifier:
[Classifier_final3,resu3] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'three_way_DA',params);
% ***********************************************************************
% Visualize spectrospatial patterns of the final classifiers
% ***********************************************************************
% Here, we visualize spectrospatial characteristics captured by Spectral LDA and Sparse and bilinear classifiers.
% Generate a position map of the EMG 4*12 channel electrode array:
resolution = 0.008;
nrRows = 4;
nrColumns = 12;
p1 = repmat((0:resolution:(resolution*(nrColumns-1)))',nrRows,1);
p2 = repmat((0:resolution:(resolution*(nrRows-1)))',1,nrColumns)';
pos_map = [p1 p2(:)];
% Pick one of the two electrode arrays for visualization:
secondElectrode = 49:96; % visualize second electrode
spatialPatterns = A(secondElectrode,:);
% Or visualize first electrode:
% firstElectrode = 1:48;
% spatialPatterns = A(firstElectrode,:);
phasemap_th = 0.5; % magnitude threshold used in the visualization of the phase pattern
max_nr_components = inf; % The maximum number of spatial patterns visualized
% Set this value to inf to plot the entire information corresponding to all nonzero coefficients.
% Note: the maximum number of components can be set lower value to allow easier visualization for complex models.
% visualize spectral LDA classifier:
relevantCoefficients2 = visualize_patterns_spectral_LDA(Classifier_final2,spatialPatterns,pos_map,phasemap_th,max_nr_components,data_train,labels_train,params);
% Visualize sparse bilinear classifier:
relevantCoefficients1 = visualize_patterns_sparse_bilinear(Classifier_final1,spatialPatterns,params.sparse_bilinear,pos_map,phasemap_th,max_nr_components,params);
% Both classifiers provide information about class-discriminative independent components:
%
% Spectral LDA classifier visualization includes:
% 1) Spatial maps of the discriminative independent components, including both magnitude and phase maps
% 2) Spectral filter of these components, denoting class-discriminative frequencies estimated by LDA
% 3) Actual spectra of these components, averaged across epochs
%
% Sparse and bilinear visualization includes:
% 1) Spatial maps of the discriminative independent components, including both magnitude and phase maps
% 2) Category-specific spectral filter(s) denoting class-discriminative frequencies
%
% See our papers for more details.
% ***********************************************************************
% Display classification accuracies
% ***********************************************************************
% Classification performance is saved in the structs:
% resu1 Sparse bilinear
% resu2 Spectral LDA
% resu3 Three-way DA
% Plot classification accuracy of the test data with the accuracy of two decimals:
disp(' ')
disp('Classification accuracy for the test data set:')
disp(['Sparse bilinear: ' num2str(round(resu1.acc*1e4)/1e2) '%'])
disp(['Spectral LDA : ' num2str(round(resu2.acc*1e4)/1e2) '%'])
disp(['Three-way DA : ' num2str(round(resu3.acc*1e4)/1e2) '%'])
disp(' ')
% ***********************************************************************
Contact
If you find a bug or need more help with the use of the toolbox, please contact either Jukka-Pekka Kauppi jukka-pekka.kauppi@jyu.fi or Prof. Aapo Hyvärinen aapo.hyvarinen@helsinki.fi
License
Copyright 2015 Jukka-Pekka Kauppi
Spectrospatial decoding toolbox (SpeDeBox) is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
SpeDeBox is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
History
The first version of the toolbox was published in 2013, consisting of Spectral LDA classifier with a synthetic data analysis example. Version 2.0 was published in September 2015, including additionally Sparse bilinear and Three-way DA classifiers as well as a real data example. Most source code was re-written because the original code contained a high number of user parameters and it was difficult to set all these parameters correctly. The current version requires less user parameters but slightly more manual customization (such as splitting of training, validation and test data sets), having the advantage that the code is easier to follow and adapt for different needs.