Operational Data for Fault Prognosis in Particle Accelerators with Machine Learning
This dataset presents real operational data collected from the power systems of the Spallation Neutron Source facility, which provides the most intense neutron beam in the world. The authors have established a radio-frequency test facility (RFTF) and simulated system failures in the lab without causing a catastrophic system failure. Waveform signals have been collected from the RFTF normal operation as well as during fault induction efforts. The dataset provides a significant amount of normal and faulty signals for the training of statistical or machine learning models. Then, the authors performed 21 test experiments, where the faults are slowly induced into the RFTF system for the purpose of testing the models in fault prognosis to detect and prevent impending faults. The test experiments include interesting combinations of magnetic flux compensation and start pulse width adjustments, which cause gradual deterioration in the waveforms (e.g., system output voltage, system output current, insulated-gate bipolar transistor currents, magnetic fluxes), which mimic the fault scenarios. Accordingly, this dataset can be valuable for developing models to predict impending fault scenarios in power systems in general and in particle accelerators in specific. All experiments occurred in the Spallation Neutron Source facility of Oak Ridge National Laboratory in Oak Ridge, Tennessee of the United States in July 2022.
Steps to reproduce
Dataset Specifications: - General subject: Electrical and Electronic Engineering - Specific subject areas: Fault prognosis, machine learning, control engineering - Type of data: Table -- time series - Data usage: A simple Python script called "load_dataset.py" is provided with the dataset to show the user how to load and plot the data. - How data were acquired: The radio-frequency test facility (RFTF) data acquisition system records 12 waveforms with a time length of 1.5 ms and a sampling rate of 400 ns, then writes them to an external hard drive. The data are structured in 3D arrays and saved to binary NumPy files. - Data Format: Raw. - Readable data: We provided excerpts of the binary data in "data/train/sample_train_data.xlsx" and "data/test/sample_test_data.xlsx", which have human-readable data to give the user an impression of the binary data nature and structure. - Parameters for data collection: Time series data are collected from real-time operation of the radio-frequency test facility (RFTF) high voltage converter modulator system during July 2022. The dataset features 12 unique waveforms such as IGBT currents, modulator voltage and current, cap bank voltage, and magnetic flux. - Description of data collection: The raw data from the controller are reported without preprocessing. Each pulse in the dataset has a time length of 1.5 ms. The dataset includes a prognosis test set which involves a set of 21 experiments where fault precursors are induced into the system. Depending on the machine condition, the recorded pulse could be a normal or a faulty pulse. - Data source location: Spallation Neutron Source, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States. - Related work: See "Related links" below.