Cryptocurrency_Scam_Dataset_for_DQN_Models

Published: 11 November 2024| Version 1 | DOI: 10.17632/ffd5crf8mx.1
Contributor:
jacob neyole

Description

This dataset, titled "Cryptocurrency Scam Dataset for DQN Models," contains transaction records potentially used for detecting scams or fraudulent activities in cryptocurrency transactions. It consists of 1,245 rows and 13 columns with the following features: - Transaction_Value: The value of the transaction. - Transaction_Fees: Fees associated with the transaction. - Number_of_Inputs: Number of inputs in the transaction. - Number_of_Outputs: Number of outputs in the transaction. - Gas_Price: Cost of executing the transaction. - Wallet_Age_Days: Age of the wallet in days. - Wallet_Balance: Balance of the wallet involved in the transaction. - Transaction_Velocity: Frequency of transactions within the wallet. - Exchange_Rate: Cryptocurrency exchange rate at the time of transaction. - Is_Scam: Binary label indicating whether the transaction is part of a scam (1) or not (0). - Action: Action taken based on the transaction (e.g., flagging for investigation). - Reward: A reward metric, possibly indicating model-based feedback. - Predicted_Action: Predicted actions (currently missing data for this column).

Files

Steps to reproduce

1. Research Objective and Scope • Objective: The primary goal was to gather and analyze data related to cryptocurrency scams, enabling the development of DQN (Deep Q-Learning Network) models to detect and predict scam patterns. • Scope: The dataset focuses on cryptocurrency transactions, addresses, and potential indicators of scam activities. The time range, data sources, and types of scams captured (e.g., Ponzi schemes, phishing, fake tokens) should also be specified for reproducibility. 2. Data Collection Methodology • Data Sources: List the data sources used, such as blockchain explorers, cryptocurrency exchanges, public datasets, or APIs that provide scam-related data. Mention any partnerships or open-access data repositories accessed. • Data Extraction Protocol: Describe the protocol for extracting data from the sources, including any APIs, web scraping techniques, or SQL queries used. Documenting this step allows others to replicate data extraction. • Sampling and Filtering Criteria: Explain any filters or criteria applied to isolate scams from legitimate transactions, including specific scam indicators like abnormal transaction frequency, large single transactions, suspicious address links, etc. 3. Instruments, Software, and Tools • Software/Tools Used: Document the software used for data extraction, preprocessing, and analysis. This might include: o Programming Languages: e.g., Python or R for data manipulation and analysis. o Libraries: e.g., Pandas, NumPy for data handling; requests for API calls; web scraping libraries like BeautifulSoup or Selenium if applicable. o Blockchain Explorers: Mention if specific platforms or tools like Etherscan API or Blockchain.com API were used to gather data. • Data Storage and Workflow Tools: If you used a data storage solution like MySQL, MongoDB, or any cloud services, document those as well, including details on data storage structures and formats (e.g., CSV, JSON). 4. Data Preprocessing and Cleaning • Cleaning Protocols: Document any preprocessing steps applied to clean and standardize the data, such as removing duplicates, handling missing values, normalizing fields (e.g., converting time zones or standardizing currency values). • Feature Engineering: Describe any additional features created or computed, like transaction frequency, the number of unique addresses interacted with, or transaction volume trends. 5. Quality Assurance and Validation • Verification Protocols: Mention methods used to validate the data, such as cross-referencing with known scam addresses or using statistical methods to identify anomalies. • Error Handling: Document any procedures for dealing with discrepancies or errors in the data to nsure consistency and reliability. 6. Data Documentation and Metadata

Institutions

Umma University

Categories

Machine Learning, Deep Q-Network

Licence