Synthetic Optical Network Dataset with Q-Factor, BER, and Receiver Sensitivity Metrics under EDFA-FBG Conditions
Description
This dataset presents a large-scale synthetic simulation of performance parameters in fiber-optic communication systems, specifically designed to evaluate the impact of EDFA (Erbium-Doped Fiber Amplifier) and FBG (Fiber Bragg Grating) components under varying transmission distances and conditions. The dataset comprises 1,000,000 rows of simulated records, each capturing key optical network metrics including Q-Factor, Bit Error Rate (BER), and Receiver Sensitivity for both downstream and upstream transmission directions. The core objective of this dataset is to provide a controlled and reproducible framework for studying how signal quality degrades or improves under different fiber distances and the presence or absence of EDFA/FBG devices. The dataset generation process was carefully crafted using stochastic modeling based on empirical trends observed in real-world optical experiments. The following parameters are included in the dataset: - Distance_km: The length of the optical fiber in kilometers. Values include typical transmission spans (0, 10, 20, 40, 45, 70, and 75 km). - Q_Factor_Downstream and Q_Factor_Upstream: Quality factor metrics for downstream and upstream channels respectively. These values are adjusted based on the transmission distance and the presence of EDFA/FBG to reflect signal integrity. - BER_Downstream and BER_Upstream: Bit Error Rates computed from the Q-factor using an exponential decay model that mimics realistic signal degradation in optical fibers. - Receiver_Sensitivity_Downstream and Receiver_Sensitivity_Upstream: Values representing how sensitive the receiver is to signal quality degradation at different distances, influenced by the transmission condition. - Power_Level_dBm: Simulated variation in power levels (in dBm) to account for fluctuations due to hardware or environmental conditions. -mNoise_Factor: A synthetic noise parameter that introduces random variations to simulate physical imperfections and disturbances in signal transmission. Condition: Indicates whether the data point is simulated under "No_EDFA_FBG" or "With_EDFA_FBG" conditions. The dataset serves as a valuable resource for research in several domains including: - Optical Network Simulation and Modeling - Machine Learning for Optical Systems - Performance Prediction and Optimization in Fiber-Optic Networks - Benchmarking of AI-based Diagnostic Tools in Telecommunications Researchers can use this dataset to train, validate, and benchmark machine learning models for signal classification, fault detection, adaptive modulation, or link quality prediction. The inclusion of both amplified and unamplified scenarios makes the dataset versatile for comparative studies and ablation analysis. This synthetic dataset is fully reproducible, extendable, and free from real-world acquisition constraints, making it suitable for academic and industrial experimentation, prototyping, and algorithm development in next-generation optical communication systems.
Files
Steps to reproduce
To reproduce the synthetic optical dataset, you will need a Python environment with the `pandas` and `numpy` libraries installed. These can be installed using pip with the following command: `pip install pandas numpy`. Once your environment is ready, prepare the Python script as provided in the dataset documentation. This script contains all the necessary functions to generate synthetic data, including modules for simulating Q-Factor values, calculating Bit Error Rate (BER), and estimating Receiver Sensitivity for both downstream and upstream communication channels. Each function is designed to model realistic behavior based on empirical trends and includes stochastic elements to ensure diversity and variability across samples. The script allows for customization of several parameters. Notably, the `num_samples` parameter can be adjusted to control the number of data points generated. For the published version of the dataset, this was set to 1,000,000. The `distances` list defines the optical fiber lengths used in simulation and includes the values [0, 10, 20, 40, 45, 70, 75] kilometers, which reflect common practical distances in optical communication studies. Users may modify this list to explore performance under different network scenarios. Once the parameters are set, the script can be executed using any standard Python environment or Jupyter notebook. Running the script will simulate each sample based on randomly selected distance and condition values (either "No_EDFA_FBG" or "With_EDFA_FBG"). The simulation incorporates signal degradation behavior, amplification benefits, and receiver noise characteristics. Upon completion, the resulting dataset is stored in a pandas DataFrame, which is then exported as a CSV file using the command `df.to_csv("synthetic_optical_dataset.csv", index=False)`. This file contains all relevant features, including Distance (in km), Q-Factor, BER, Receiver Sensitivity, Power Level, Noise Factor, and the Condition label indicating whether signal enhancements were used. Researchers can now use this CSV file directly for a variety of applications, such as machine learning model training and testing, signal quality prediction, performance benchmarking, and simulation-based analysis of fiber-optic networks. The structured and reproducible nature of the script ensures that the dataset can be regenerated with ease, and the parameters can be tuned to produce customized datasets tailored to specific research needs. This makes the resource particularly valuable for comparative studies, prototyping, and algorithm development in optical telecommunications.