Thai Overt-Mimed Speech EEG Dataset

Name: Thai Overt-Mimed Speech EEG Dataset
Creator: Apit Hemakom
Published: 2026-05-29T20:29:11.436Z
Keywords: Electroencephalography, Speech Generation

Hemakom, Apit

doi:10.17632/yzts9t69r2.1

Thai Overt-Mimed Speech EEG Dataset

Published: 29 May 2026| Version 1 | DOI: 10.17632/yzts9t69r2.1

Contributor:

Apit Hemakom

Description

This dataset contains electroencephalography (EEG) recordings from 20 participants performing a Thai word-production task. The vocabulary consisted of 20 Thai words divided into two sets of 10 words. Set 1 included: 1) turn on the light, 2) turn off the light, 3) defecate, 4) urinate, 5) hungry, 6) thirsty, 7) sad, 8) afraid, 9) angry, and 10) happy. Set 2 included: 11) wipe body, 12) yes, 13) no, 14) doctor, 15) pain, 16) head, 17) leg, 18) arm, 19) back, and 20) neck. Each word was recorded under two articulation conditions: overt speech and mimed speech. The 20 participants were divided into two non-overlapping groups. Group 1 consisted of 10 participants with odd-numbered IDs and performed the 10 words in Set 1. Group 2 consisted of 10 participants with even-numbered IDs and performed the 10 words in Set 2. For each assigned word, each participant repeated the word 200 times under the overt and mimed speech conditions. Data collection was conducted in a quiet room. Participants were seated approximately 30 cm from a computer screen used to display the target words. EEG signals were recorded from 32 scalp channels positioned at AF3, AF4, AF7, AF8, F3, F4, F7, F8, FC3, FC4, FC5, FC6, C3, C4, T7, T8, CP1, CP2, CP3, CP4, CP5, CP6, P1, P2, TP7, TP8, P7, P8, AFz, Fz, Cz, and Pz. All electrodes were connected to a g.HIamp amplifier (g.tec, Austria), and EEG signals were recorded using g.RECORDER software at a sampling rate of 512 Hz. Each trial was recorded for 2 seconds, corresponding to 1,024 samples per trial. EEG samples for each word and articulation condition were stored as compressed NumPy .npz files, with one file corresponding to one word under one articulation condition. Each file contains EEG trials for a given word and condition, with each trial represented as a 2-second segment containing 1,024 time samples from 32 EEG channels. To reduce file size, EEG signals are stored as int16 arrays with scaling information. The original floating-point EEG amplitudes can be approximately restored by multiplying the stored int16 EEG data by the corresponding scale value. Each .npz file contains at least the following fields: eeg, scale, fs, channels, and unit. The eeg field contains the compressed EEG data with shape (trials, samples, channels), the scale field contains the scaling factor used to restore the signal amplitude, fs indicates the sampling rate of 512 Hz, channels contains the EEG channel names, and unit specifies the signal unit after restoration. The EEG data can be loaded and restored as follows: --------- import numpy as np data = np.load("filename.npz") eeg = data["eeg"].astype(np.float32) * data["scale"] --------- After restoration, the EEG array is represented as an approximate float32 signal with shape (trials, 1024, 32), corresponding to trials, time samples, and EEG channels. This scaling-based storage format reduces file size while preserving signal amplitude information for subsequent EEG analysis and model training.

Files

Institutions

National Science and Technology Development Agency
Pathum Thani, Pathum Thani

Thai Overt-Mimed Speech EEG Dataset

Description

Files

Institutions

Categories

Licence