Data for: The upper bound of the misclassification risk for the long-term myoelectric signal recognition based on the adaptive learning

Published: 21 May 2018| Version 2 | DOI: 10.17632/j4pm6s3fzy.2
Qi Huang,


In order to provide the guideline for designing and evaluating the long-term myoelectric signal recognition methods based on the adaptive learning, we proposed theoretical models to describe how the boundary of the misclassification risk (BMR) change along parameters including, the adaptive learning times, the adaptive learning frequencies, the generalization ability of the predictive model, and the ratio of samples without supervised information during the adaptive learning. The models are built up based on the formulated adaptive learning process of the long-term myoelectric signal recognition, and the normalized definition of the concept drift and the concept sequence. Experiments based on both realistic long EMG data sequences (Realistic Concept-drift-rate Sequence, RCS) and simulated EMG data sequences with controllable permutations (Zero Concept-drift-rate Sequence, ZCS, and Constant Concept-drift-rate Sequence, CCS) are conducted to validate our theoretical analyses. All the data sequences are reorganized from the raw EMG dataset. The reorganization methods are presented in the paper. In the data set, we present the raw EMG dataset and the recognition result of the ZCSs, the CSSs and the RCSs. The folder “RawEMGData” contains the raw EMG dataset containing EMG data of 8 subjects noted as S1 to S8. For each subject, we acquire 24 data sessions, corresponding to the .dat files in the folder. The data sessions were acquired every a half hour. Each data session includes 4000 samples of 8 motion classes (500 samples for one motion class). Each row containing 8 data in the file represents a sample. The former 7 data in a row are the data acquired by 7 EMG channels, the last one data in a row is the motion label of the sample. The folder “CCSResult” contains the recognition result of the CCSs. The data are saved in the form of 5-dimention array in .mat files. The entry at position (i,j,k,l,m) saves the classification result of samples in lth session of kth session sequence, which are classified by the mth adaptive learner. The classification result is the number of samples with the ground truth label of j, and the classification result of i. The position varies from (1,1,1,1,1) to (8,8,100,L,8). The variable m=1 corresponds to the result of the IAL, m=2 to 8 correspond to the results of the RAL1 to RAL7. The maximum value of l is noted as L, which equals 8, 16, 18, 20, 22, 27, 32, 64, 128 for file “Bt_Seq1.mat” to file “Bt_Seq9.mat”.


Steps to reproduce

The folder “RCSResult” contains the recognition result of the RCSs. The data are saved in the form of 5-dimention array in .mat files. The meaning of the position is the same as the CCS result. The variable L equals 3, 6, 12,24 for file “Bt_Seq1.mat” to file “Bt_Seq4.mat”. Only when the variable k ranges from 1 to 16, the data is valid. The folder “ZCSResult” contains the result of the ZCSs. The file “Bt_1.mat” saves the data of the IAL, while the files “Bt_2.mat” to “Bt_8.mat” save the data of the RAL1 to RAL7. The data are saved in a 2-dimension array. Each entry represents the recognition accuracy of a session of samples. The row number corresponds to the session number in a session sequence, the column number corresponds to the sequence number. The .dat files can be opened and read by the notepad directly. The .mat files can be opened and read by Matlab.


Harbin Institute of Technology