ChanEst Dataset for Deep Learning-Based 6G Channel Estimation
Description
ChanEst dataset was developed to address the core research challenge that accurate channel estimation is increasingly difficult due to highly diverse, dynamic, and extreme propagation environments in 6G. Although deep learning (DL) techniques have shown strong potential as alternatives to conventional estimators, their progress is limited due to the lack of reproducible and reconfigurable datasets that adhere to 3GPP compliance and realistic receiver preprocessing. ChanEst directly responds to this gap by providing a fully controlled, standards aligned dataset generation framework designed for DL based channel estimation. ChanEst is generated using the 6G Exploration Library in MATLAB and strictly follows 3GPP physical layer specifications. Each sample is constructed on an OFDM grid (612 subcarriers × 14 symbols) populated with DM RS Type 2 pilots (3GPP TS 38.211). The transmitted signals propagate through standardized 3GPP TDL channel models (TDL A to TDL E), while channel parameters, SNR (−10 to 30 dB), delay spread (10–2000 ns), and Doppler shift (5–5000 Hz), are stratified to ensure wide and balanced scenario diversity representative of 6G FR3 operation at 7 GHz. This design prevents overrepresentation of mild or unrealistic cases and ensures coverage of high mobility and high dispersion conditions. Each dataset entry consists of two tensors: • X_input: LS channel estimates at pilot positions followed by 2 D linear interpolation, representing the noisy and imperfect receiver perspective. • Y_label: perfect OFDM grid channel responses, serving as the supervised learning target. Both tensors are stored in real valued format, with channels represented as [K × L × C × N], where C = 2 × NTx × NRx to pack real and imaginary components. The framework supports SISO and MIMO, configurable numerologies, and user defined parameter ranges. Dataset size is scalable based on computational capacity, making ChanEst suitable for lightweight prototyping as well as large scale model training. Extensive validation confirms that ChanEst exhibits strong physical consistency. Correlation between X_input and Y_label is high under mild channel conditions and degrades appropriately under severe scenarios, reflecting real world estimation difficulty. NMSE trends align with expected impacts of noise, delay spread, and mobility, demonstrating that the dataset avoids degenerate or trivial cases. Scenario distributions remain balanced across propagation regimes, ensuring fairness and robustness in DL model benchmarking. ChanEst is delivered with detailed metadata, including SNR, Doppler, delay spread, and TDL profile, enabling stratified evaluation, targeted stress testing, and reproducible comparisons against classical estimators. Its flexibility allows researchers to reconfigure antenna dimensions, numerology, and parameter ranges to suit diverse DL-based research tasks in 6G and beyond. The dataset is available in MATLAB (.mat) and HDF5 (.h5) formats.
Files
Steps to reproduce
The ChanEst dataset is fully reproducible and reconfigurable using the provided MATLAB generation framework released on GitHub (https://github.com/obydelion/ChanEst-Dataset-Generation). Reproduction requires MATLAB R2024a or later with the 5G Toolbox and 6G Exploration Library installed. Step 1 – Environment Setup: Install MATLAB and ensure the 5G toolboxes are available. Clone or download the ChanEst generation scripts from the repository. Set the random number generator seed to the documented value to ensure deterministic behaviour. Step 2 – OFDM and Pilot Configuration: Initialize the carrier and transmission parameters using nrCarrierConfig and nrPDSCHConfig. Configure a 60 kHz SCS, 51 resource blocks, and 14 OFDM symbols per slot. Generate DM-RS Type-2 pilot patterns according to 3GPP TS 38.211, ensuring standardized pilot density and placement. This pilot structure defines the sparsity pattern used for LS estimation and interpolation Step 3 – Channel Modelling: Instantiate the 3GPP TDL channel model (nrTDLChannel) with profiles TDL-A through TDL-E. For each sample, SNR (−10 to 30 dB), delay spread (10–2000 ns), and Doppler shift (5–5000 Hz) values were obtained via stratified random sampling to ensure balanced coverage across their respective ranges. Configure the channel to output the perfect OFDM-grid frequency response. The framework supports both SISO and MIMO operation by modifying antenna and correlation parameters while preserving the same dataset format. Step 4 – Signal Transmission and Noise Injection: Transmit a DM-RS-only waveform through the configured TDL channel. Apply AWGN at the selected SNR. Demodulate the received waveform using OFDM demodulation consistent with the transmit configuration. Step 5 – Input–Label Construction: Compute LS channel estimates at pilot positions and apply 2-D linear interpolation across time and frequency to obtain the input tensor. Extract the perfect channel response from the channel model as the label tensor. The dataset stores raw input and labels; however normalization can be applied by the user during training if desired. Step 6 – Metadata Logging and Validation: Log all scenario parameters, seeds, and configuration structures. Perform automated validation checks, including dimension consistency and physical sanity checks (e.g., NMSE trends versus SNR and Doppler). Step 7 – Storage: The dataset can be saved in MATLAB v7.3 (.mat) and HDF5 (.h5) formats with compression. Each file contains the tensors, metadata logs, and configuration structures required for reuse or regeneration. Following these steps guarantees exact reproducibility and enables controlled reconfiguration of ChanEst for new antenna setups, parameter ranges, or learning paradigms.
Institutions
- Edge Hill UniversityEngland, Ormskirk
- Sheffield Hallam UniversityEngland, Sheffield
- Asia Pacific University of Technology & InnovationKuala Lumpur, Kuala Lumpur