Behaviour Biometrics Dataset
The dataset provides a collection of behaviour biometrics data (commonly known as Keyboard, Mouse and Touchscreen (KMT) dynamics). The data was collected for use in a FinTech research project undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. The project called CyberSIgnature uses KMT dynamics data to distinguish between legitimate card owners and fraudsters. An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user KMT dynamics were captured while they entered fictitious card information on the GUI application. The dataset consists of 1,760 KMT dynamic instances collected over 88 user sessions on the GUI application. Each user session involves 20 iterations of data entry in which the user is assigned a fictitious card information (drawn at random from a pool) to enter 10 times and subsequently presented with 10 additional card information, each to be entered once. The 10 additional card information is drawn from a pool that has been assigned or to be assigned to other users. A KMT data instance is collected during each data entry iteration. Thus, a total of 20 KMT data instances (i.e., 10 legitimate and 10 illegitimate) was collected during each user entry session on the GUI application. The raw dataset is stored in .json format within 88 separate files. The root folder named `behaviour_biometrics_dataset' consists of two sub-folders `raw_kmt_dataset' and `feature_kmt_dataset'; and a Jupyter notebook file (kmt_feature_classificatio.ipynb). Their folder and file content is described below: -- `raw_kmt_dataset': this folder contains 88 files, each named `raw_kmt_user_n.json', where n is a number from 0001 to 0088. Each file contains 20 instances of KMT dynamics data corresponding to a given fictitious card; and the data instances are equally split between legitimate (n = 10) and illegitimate (n = 10) classes. The legitimate class corresponds to KMT dynamics captured from the user that is assigned to the card detail; while the illegitimate class corresponds to KMT dynamics data collected from other users entering the same card detail. -- `feature_kmt_dataset': this folder contains two sub-folders, namely: `feature_kmt_json' and `feature_kmt_xlsx'. Each folder contains 88 files (of the relevant format: .json or .xlsx) , each named `feature_kmt_user_n', where n is a number from 0001 to 0088. Each file contains 20 instances of features extracted from the corresponding `raw_kmt_user_n' file including the class labels (legitimate = 1 or illegitimate = 0). -- `kmt_feature_classification.ipynb': this file contains python code necessary to generate features from the raw KMT files and apply simple machine learning classification task to generate results. The code is designed to run with minimal effort from the user.
Steps to reproduce
An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user behaviour biometrics commonly known as keystroke, mouse and touchscreen (KMT) dynamics were captured while users entered fictitious card information on the GUI application. To capture such data, the Kivy Python library was used. The library contains event listeners capable of monitoring any occurrence of events such as key press, key release, mouse movement, mouse press or mouse release. This data containing raw KMT data about each event was stored for further processing. To illustrate how this data can be useful for user identification, we extracted features from the raw data and applied a simple machine learning classification task to generate results. Both datasets (i.e., `raw_kmt_dataset' and `feature_kmt_dataset') are included in this submission with a Jupyter notebook file (kmt_feature_classification.ipynb) which contains code necessary for performing the classification task.