Skip to main content
Exit comparison
Removed
Added

Datasets Comparison

Version 1

A Comprehensive Financial and Sentiment Dataset of Cryptocurrencies and Stocks (2020–2026)

Published:3 June 2026|Version 1|DOI:10.17632/552326krwn.1
Contributor:MD RAIHAN HOSSAIN

Description

This financial dataset contains 100,000 synthesized and curated daily records blending financial price metrics, structural technical analysis indicators, and market sentiment data. Covering a 6-year horizon from January 2020 to January 2026, the dataset tracks both traditional equities (e.g., Apple, Nvidia, Google, Tesla) and leading digital assets (e.g., Bitcoin, Ethereum, Binance Coin, Solana). It features 21 core attributes, including standard OHLC values, major quantitative trading metrics (RSI, MACD, Bollinger Bands, SMAs, EMAs), volatility indices, and quantified market sentiment scores. This unified dataset serves as an excellent benchmark for training predictive machine learning models, algorithmic trading simulations, multi-asset portfolio optimizations, and multimodal sentiment-financial fusion architectures.

Steps to reproduce

To replicate or reconstruct this time-series financial dataset, the following data engineering and pipeline steps are performed: 1. Asset and Horizon Definition: - Define a structural time frame spanning from January 1, 2020, to January 8, 2026. - Establish a balanced multi-asset target list consisting of 10 high-volume equities (Stocks) and 10 high-liquidity digital assets (Cryptocurrencies). 2. Financial Pricing Engineering: - Generate continuous asset price histories tracking standard Open, High, Low, Close (OHLC) patterns, Volume, and Market Capitalization. - Inject rolling market adjustments to accurately simulate high-volatility cryptocurrency patterns versus structured stock trading parameters. - Compute the Daily Return Percentage for every asset based on sequential closing metrics. 3. Technical Indicator Integration (Quantitative Analysis): - Programmatically compute standard technical mathematical indicators across the time series: * Simple Moving Averages (7-day and 30-day lookbacks). * Exponential Moving Averages (12-day and 26-day weights). * Momentum tracking via the Relative Strength Index (RSI). * Trend divergence using Moving Average Convergence Divergence (MACD). * Volatility scoping through Bollinger Upper and Lower bands. 4. Multimodal Sentiment Mapping: - Integrate simulated natural language processing (NLP) public opinion pipelines to generate a continuous numerical Sentiment Score between -1.0 and +1.0. - Map these continuous mathematical scores into discrete categorical data classes ('Positive', 'Negative', and 'Neutral') representing macroeconomic and social market stances. 5. Data Quality Control and Synthesis: - Perform auditing checks to eliminate missing data entries (NaN values) and ensure exact logical scaling (e.g., High prices remain higher than Low prices). - Format the continuous time-series block into 100,000 clean, structured data rows. - Export the structured tabular frame as a comma-separated artifact titled "Crypto_Stock_Dataset_100K.csv".

Institutions

Institutions

National University Bangladesh

Dhaka

Dhaka Division

Categories

Computer Science, Economics, Finance, Artificial Intelligence, Data Science, Corporate Finance, Financial Time Series Analysis, Corporate Entrepreneurship, Stock Market Valuation, Statistical Finance, Development of Economics, Financial Economics of Economic System, Algorithmic Efficiency, Sentiment Analysis, Cryptocurrency

Licence

Creative Commons Attribution 4.0 International

Version 2

A Comprehensive Financial and Sentiment Dataset of Cryptocurrencies and Stocks (2020–2026)

Published:4 June 2026|Version 2|DOI:10.17632/552326krwn.2
Contributor:

Description

This financial dataset contains 100,000 synthesized and curated daily records blending financial price metrics, structural technical analysis indicators, and market sentiment data. Covering a 6-year horizon from January 2020 to January 2026, the dataset tracks both traditional equities (e.g., Apple, Nvidia, Google, Tesla) and leading digital assets (e.g., Bitcoin, Ethereum, Binance Coin, Solana). It features 21 core attributes, including standard OHLC values, major quantitative trading metrics (RSI, MACD, Bollinger Bands, SMAs, EMAs), volatility indices, and quantified market sentiment scores. This unified dataset serves as an excellent benchmark for training predictive machine learning models, algorithmic trading simulations, multi-asset portfolio optimizations, and multimodal sentiment-financial fusion architectures.

Steps to reproduce

To replicate or reconstruct this time-series financial dataset, the following data engineering and pipeline steps are performed: 1. Asset and Horizon Definition: - Define a structural time frame spanning from January 1, 2020, to January 8, 2026. - Establish a balanced multi-asset target list consisting of 10 high-volume equities (Stocks) and 10 high-liquidity digital assets (Cryptocurrencies). 2. Financial Pricing Engineering: - Generate continuous asset price histories tracking standard Open, High, Low, Close (OHLC) patterns, Volume, and Market Capitalization. - Inject rolling market adjustments to accurately simulate high-volatility cryptocurrency patterns versus structured stock trading parameters. - Compute the Daily Return Percentage for every asset based on sequential closing metrics. 3. Technical Indicator Integration (Quantitative Analysis): - Programmatically compute standard technical mathematical indicators across the time series: * Simple Moving Averages (7-day and 30-day lookbacks). * Exponential Moving Averages (12-day and 26-day weights). * Momentum tracking via the Relative Strength Index (RSI). * Trend divergence using Moving Average Convergence Divergence (MACD). * Volatility scoping through Bollinger Upper and Lower bands. 4. Multimodal Sentiment Mapping: - Integrate simulated natural language processing (NLP) public opinion pipelines to generate a continuous numerical Sentiment Score between -1.0 and +1.0. - Map these continuous mathematical scores into discrete categorical data classes ('Positive', 'Negative', and 'Neutral') representing macroeconomic and social market stances. 5. Data Quality Control and Synthesis: - Perform auditing checks to eliminate missing data entries (NaN values) and ensure exact logical scaling (e.g., High prices remain higher than Low prices). - Format the continuous time-series block into 100,000 clean, structured data rows. - Export the structured tabular frame as a comma-separated artifact titled "Crypto_Stock_Dataset_100K.csv".

Institutions

Institutions

National University Bangladesh

Dhaka

Dhaka Division

Categories

Computer Science, Economics, Finance, Artificial Intelligence, Data Science, Corporate Finance, Financial Time Series Analysis, Corporate Entrepreneurship, Stock Market Valuation, Statistical Finance, Development of Economics, Financial Economics of Economic System, Algorithmic Efficiency, Sentiment Analysis, Cryptocurrency

Licence

Creative Commons Attribution 4.0 International