MILimbEEG: An EEG Signals Dataset based on Upper and Lower Limb Task During the Execution of Motor and Motorimagery Tasks

Published: 8 July 2023| Version 2 | DOI: 10.17632/x8psbz3f6x.2


In this work, an experimental methodology for the acquisition of EEG signals from volunteer subjects was developed. The volunteers are colleagues and research fellows from ESPOL and patients of the Hospital Luis Vernaza for participating as test subjects. This dataset consists of over 8680 four-second EEG recordings obtained from 60 volunteers. Equipment: We use the OpenBCI Cyton + Daisy ( Biosensing Board for EEG signal recording. The OpenBCI equipment has an active bandpass filter in the 5 to 50Hz range, additionally, a notch filter at 60Hz. This non-invasive device operates within a sampling frequency of 125Hz and has 16 dry electrodes with two ground references, distributed in the international 10-10 system. All 16 EEG electrodes were recorded in monopolar configuration, in which the potential of each electrode is compared with a neutral electrode located in both lobes of the ears. Data Description: Each recording was recorded in a CSV file format, the values of each electrode are in microvolts (uV). In total, each subject generates 124 CSV files in each experiment (run). Some subjects perform two experiments, one executing the motor tasks and the other imagining doing them. The tasks are described below: - Recording a Baseline with Eyes Open (BEO) without any task command: only once at the beginning of each run. - Closing Left Hand (CLH): five times per run. - Closing Right Hand (CRH): five times per run. - Dorsal flexion of Left Foot (DLF): five times per run. - Plantar flexion of Left Foot (PLF): five times per run. - Dorsal flexion of Right Foot (DRF): five times per run. - Plantar flexion of Right Foot (PRF): five times per run. - Resting in between tasks (Rest): after each task, in total 31 files. CSV file encoding: - Subject ID: Assigned ID to each test subject in order to hide their identity. e.g. Sx, such that x can be any number from 1 to 60. - Repetition number: The participants may perform more than one repetition of the experiment. ExaOnly one subject volunteered to perform up to 4 repetitions. e.g. Rx, such that x can be any repetition number between 1 and 4. - Motor or Motor Imagery Activity: For each repetition, participants are asked to perform first the motor tasks (M) and then the motor imagery tasks (I). & Mx and Ix, where x is the Label of the task performed. - Label: Identifier of the performed task, where 1 is for BEO, 2 for CLH, 3 for CRH, 4 for DLF, 5 for PLF, 6 for DRF, 7 for PRF and finally 8 for Rest. e.g. M2 represents the CLH Motor task. - Task repetition number: Ordinal number of the task repetition. Tasks are presented randomly up to 5 times per run. e.g. S24R1I6_5 is from subject 24, repetition 1, DRF Imagery task. Finally, the number five at the end represents the fifth task repetition in the record. Additionally, this dataset includes the file "Test_Subject_Annotations.csv", with the demographic information of each of the 60 volunteers, respecting the confidentiality of each individual.


Steps to reproduce

Raw dataset preparation: First we set the folder where the functions are located, using the function addpath(genpath('./src')). The data will be located in the data folder, using the function addpath(path=fullfile('./data/')). Finally, the function folders = FindFolders(path) is used to generate a vector with the names of all the folders inside data. Since inside the data folder are the folders of all subjects, the generated vector has the names from S1 to S24. The data can be visualized using the function plot(dataNew). Raw dataset preprocessing: Before preprocessing the csv files you have to load them using the function readtable which returns the file as a table. Then with the function dataNew=table2array(data) the table file is converted into an array with double values. In the preprocessing the data can be normalized using the function DataNorm = fNormalization(dataNew), this function receives the raw data and returns the normalized data. The data can be visualized using the function plot(DataNorm). In this pre-processing stage, a band pass filter was also created in the frequency range of 7 to 31 Hz. To include the bands mu (7.5 and 12.5) Hz and beta (16 and 31) Hz, frequency bands related to the execution or thought of motor activities. Feature extraction: Each of the 124 files belonging to each of the 24 subjects represents one of the 8 possible tasks. The function Label = fLabelEEG(filenames(j).name) allows to know the task to which each file belongs, this number is known as label. For feature extraction many sophisticated methods can be used, in this example code we have decided to use the RMS value of each electrode per file. In this way, using the function DataRMS = [rms(DataNorm) Label] to convert each file into a vector of 16 values and at the end we add the Label corresponding to the task that represents that file. Finally, we obtain a data matrix called allData containing 2976 rows x 17 columns. Statistical information of rms in EEG dataset: Statistical information can be extracted from each electrode using the function datastats. The electrodes show mean values close to zero and low standard deviation values. These results show which electrodes have the lowest low frequency noise (Offset Voltage). Feature Selection: For feature selection, the correlation matrix can be used to identify electrodes that have a high correlation. The function corrcoef(allData(:,1:16)) allows to calculate the correlation matrix of each electrode with the other electrodes. The results indicate that there are no electrodes that are highly correlated and therefore do not have redundant information. Finally, the file was stored in csv format using the function csvwrite('AllDataRMS.csv',allData).


Escuela Superior Politecnica del Litoral


Human-Machine Interface, Brain, Electroencephalography, Brain-Computer Interface