synthetic_transactions_noisy

Published: 30 June 2025| Version 1 | DOI: 10.17632/7gpd5chb2f.1
Contributors:
Mohammed Borhan Uddin, Jaynab Sultana

Description

Data Type: This dataset also contains 50,000 entries and 10 columns. However, all columns are of object data type (object), indicating that the values are stored as strings. This suggests that there might be inconsistencies or "noise" in the data, which would need to be addressed before analysis. Description: Count: Each column has 50,000 non-null entries, indicating no missing values. Unique Values: All columns have 4 unique values. This confirms the presence of "noise," as ideal boolean data should only have two unique values (True/False). The additional unique values are likely variations of "True" and "False" (e.g., 'TRUE', 'True', 'FALSE', 'False' or other typos), or other unexpected entries. Top Value: The most frequent value for all columns is 'FALSE' (in uppercase), with frequencies ranging from 32,108 to 32,479. This is similar to the synthetic_transactions_50k.csv dataset, where the majority of transactions do not contain these items.

Files

Categories

Data Mining

Licence