Imagined Speech Datasets Applying Traditional and Gamified Acquisition Paradigms

Published: 22 April 2024| Version 1 | DOI: 10.17632/57g8z63tmy.1
Contributors:
,
,
,
,
,
,
,
,

Description

Recent computational advances have benefited brain-computer interfaces, but human factors have been continuously overlooked. Paradigm design directly impacts users' emotional state, thus impacting brain signal quality. This dataset provides electroencephalographic (EEG) data of 15 participants while performing imagined speech. Each participant performed two paradigms: a traditional paradigm, based on conventional design; and an experimental (gamified) paradigm, based on a video game. All files were pre-processed using EEGLAB. Raw and pre-processed data are provided. Additionally, a set of questionnaires was applied to allow the study of the effects of each paradigm on each participant. First, age and sex were registered. Participants then answered the Internal Representations Questionnaire, which provides individual propensities to different kinds of internal representations. The measured representations are visual imagery, internal verbalization, and orthographic imagery. This questionnaire also provides a self-rating on manipulating mental representations. Participants also answered a Self-Assessment Manikin test before and after each paradigm, to establish emotional state, and a User Experience Questionnaire after each paradigm, to evaluate each paradigm across attractiveness, perspicuity, efficiency, dependability, stimulation, and novelty. The Experimental Paradigm includes vocalized speech after every imagined speech instance. Each vocalized word was registered on a spreadsheet. Instances that contain errors are not recommended for training classification models. Possible applications of this dataset include, but are not limited to: classifying between up to four different imagined speech words, identifying the differences between imagined speech and vocalized speech, and researching the impact of the acquisition paradigm on brain signals quality.

Files

Steps to reproduce

This dataset was recorded using mBrainTrain's Smarting. The sampling frequency was 500Hz. A 24-channel recording cap was used. The channels were: Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, Fz, Cz, Pz, Afz, CPz, POz, M1, and M2. The FCz electrode was used as a reference, and the Fpz electrode was the ground. EEGLAB was used for preprocessing. First, gyroscopes were removed, channel locations were added, and each file was visually inspected to remove noisy data segments. Only two segments were eliminated, both for S6 on the experimental paradigm. These segments were: 21.4s to 24.s, and 34.5s to 35.5s. Then, each file was re-referenced to M1 and M2, and a 1-100Hz band-pass filter and a 60Hz notch filter were applied. Finally, an Independent Component Analysis was applied. Components with more than 60% of brain activity were kept, as well as those with more than 10% brain activity and more than 50% of an unidentified source, classified as "other". The traditional paradigm was made using OpenVibe, and the code can be found at https://github.com/EdgarAgRod/Traditional_Imagined_Speech_Paradigm. The experimental paradigm was made with Python, primarily relying on the Pygame library. This code can be found at https://github.com/AlmaCuevas/Gamified_Imagined_Speech_Paradigm

Institutions

Instituto Tecnologico y de Estudios Superiores de Monterrey

Categories

Neuroscience, Speech Processing, Electroencephalography, Brain-Computer Interface

Funding

Consejo Nacional de Ciencia y Tecnología

Instituto Tecnológico y de Estudios Superiores de Monterrey

Licence