Machine learning based framework for EEG/ERP analysis

Rober Boshra, Kyle Ruiter, James Reilly, John Connolly

January 2016

Abstract

vigil of the hypnosis state. Variables such as sex, level of hypnotic depth, Relative Power, Absolute Power, Total Absolute Power, Median Frequency and Total Median Frequency were analyzed. They were used as summary measures the percentage, the arithmetic mean and the standard deviation. In addition, it was used as hypothesis tests: the t’ student test to measure differences and the statistician False Discovery Rate, through the Quantitative Analysis Module of Medicid 5 (Neuronic). To validate the statistical informa-tion it was used as technique multivariate analysis of variance MANOVA. For all the hypothesis tests, it was considered a level of statistical significance of 0.05. The most outstanding results show that the Relative Power of the Theta band ascended in the initial and half levels of the hypnosis; the Fp2 and F4 derivations, displayed a greater number of modified measures. The conclusion rests that quantitative analysis of EEG broadband, allows distinguishing the vigil from the hypnosis state in young healthy people. Introduction/Background: Event Related Potential (ERP) analysis of Electroencephalography (EEG) data has been widely used in research on language, cognition, and pathology. The high dimen-sionality of a typical EEG/ERP dataset makes it a time-consuming prospect to properly analyze, explore, and validate knowledge without a particular restricted hypothesis. Methods: This study proposes an automated empirical greedy approach to the analysis process to datamine an EEG dataset for the location, robustness, and latency of ERPs, if any, present in a given dataset. We utilize Support Vector Machines (SVM), a well established machine learning model, with a feature selection algorithm named minimum redundancy maximum relevancy (mRMR) (Peng et al., 2005), on top of a preprocessing pipeline that produces a large bag of features including auto/cross power spectral densities, skewness, kurtosis, and electrode amplitudes. A hybrid of monte-carlo bootstrapping, cross-validation, and permutation tests is used to ensure the reproducibility of results. Results: This method has been tested and validated on three different datasets with different ERPs (N100, Mismatch Negativity (MMN), Phonological Mapping Negativity (PMN), and P300). Results show statistical significance in the identification of all ERPs in their respective experimental conditions, latency, and location. Limitations: As machine learning approaches were used in this study, a relatively large number of trials is needed to extract knowledge from a given dataset. In the case of a small dataset, type II errors are prevalent due to the conservative nature of the algorithm. Furthermore, this framework has been built to compare only two conditions. Extending it for more classes was outside the scope of this study. Conclusion: This study introduces an easy to use framework for EEG/ERP dataset analysis. The algorithms used serve to reduce researcher bias, time spent during analysis, and provide statistically sound results that are agnostic to dataset specifications including the ERPs in question.

Type

Journal article

Publication

International Journal of Psychophysiology