Multimodal Keyboard Acoustic (MKA) Datasets

Published: 14 June 2024| Version 2 | DOI: 10.17632/bpt2hvf8n3.2
Karwan Mahdi Rawf,


Our research team from the Computer Science Department at the University of Halabja has developed an innovative dataset collection named the Multimodal Keyboard Acoustic (MKA) Datasets. The Multimodal Keyboard Acoustic (MKA) Datasets, designed to aid in keyboard sound recognition and analysis, address the critical need for defending against acoustic-based cyber threats. With the increasing sophistication of cyberattacks, focusing on keyboard acoustics is particularly timely. The MKA Datasets encompass detailed recordings from six commonly used platforms: HP, Lenovo, MSI, Mac, Messenger, and Zoom. Each platform's dataset includes raw recordings, segmented sound files, and matrices derived from these sounds, capturing the subtle variations in typing behavior across different devices and applications. We meticulously organize the MKA datasets to facilitate ease of use and thorough analysis. Each platform has a dedicated folder containing subfolders for raw data, segmented sound files, and matrices. Additionally, an aggregated folder combines data from all platforms, providing a broad spectrum for cross-platform analysis. In total, the MKA datasets consist of around 2630 files with.wav extensions for sound segments, as well as an equal number of matrix and.txt files. The number of files varies by platform, with approximately 70 files for HP, Lenovo, MSI, Zoom, and Messenger, and 61 files for Mac. Within each platform's dataset, the "Sound segments" folder stores six one-second WAV audio excerpts derived from the corresponding raw data files for each class, renamed using a convention of "class_name+1" to "class_name+6" for each platform individually and "class_name+platform_name1" to "class_name+platform_name6" for the aggregated datasets. The "Sound segment (.matrix)" folder contains feature representations, such as MFCCs, extracted from each sound segment. Additionally, the "Sound segment metadata (.txt)" folder holds detailed information for each sound segment, including recording conditions, platform information, and keystroke class labels. Beyond cybersecurity, the MKA datasets have potential applications in domains such as speech recognition and natural language processing. The datasets, which provide a diverse set of sound profiles, support the development of more robust and adaptable algorithms in these fields. The versatility of the MKA datasets makes them an invaluable tool not only for advancing cybersecurity research, but also for improving the efficiency and accuracy of human-computer interaction technologies. Through our comprehensive approach, we aim to contribute significantly to both academic research and practical applications in these interconnected areas.



University of Halabja


Acoustics, Signal Processing, Cybersecurity, Database Security, Keyboard, Recognition, Acoustic Behavior Associated with Sound Location, Encryption Key