Real Electronic Signal Data from Particle Accelerator Power Systems for Machine Learning Anomaly Detection
This work describes real-time series datasets collected from the high voltage converter modulators (HVCM) of the Spallation Neutron Source facility. HVCMs are used to power the linear accelerator klystrons, which in turn produce the high-power radio frequency to accelerate the negative hydrogen ions (H−). Waveform signals have been collected from the operation of more than 15 HVCM systems categorized into four major subsystems during the years 2020-2022. The data collection process occurred in the Spallation Neutron Source facility of Oak Ridge, Tennessee in the United States. For each of the four subsystems, there are two datasets. The first one contains the waveform signals, while the second contains the label of the waveform, whether it has a normal or faulty signal. A variety of waveforms are included in the datasets including insulated-gate bipolar transistor (IGBT) currents in three phases, magnetic flux in the three phases, modulator current and voltage, cap bank current and voltage, and time derivative change of the modulator voltage. The datasets provided are useful to test and develop machine learning and statistical algorithms for applications related to anomaly detection, system fault detection and classification, and signal processing.
Steps to reproduce
Please, read the open-access paper below, which describes this whole dataset and how to use it. - Dataset article: Radaideh, M. I., Pappas, C., Cousineau, S., Real electronic signal data from particle accelerator power systems for machine learning anomaly detection. Data in Brief, in press, 2022, pp. 108473. - Article link: https://doi.org/10.1016/j.dib.2022.108473 Dataset Specifications: - General subject: Electrical and Electronic Engineering - Specific subject areas: Applied Machine Learning, signal processing, anomaly detection - Type of data: Table -- time series (see Section 1 of the paper) - Data usage: A simple Python script called "load_data.py" is provided with the dataset to show the user how to load and plot the data. - How data were acquired: The system controller collects waveform signals with a sampling rate of 400 ns from an accelerator power subsystem, and writes them to a hard drive on the controller's computer. - Data Format: Data is minimally preprocessed and saved to binary numpy files (see Section 2.2 of the paper). - Readable data: We provided excerpts of the binary data in "sample_data.xlsx", which has human-readable data to give the user an impression of the binary data's nature and structure. - Parameters for data collection: Time series data were collected from real-time operation of 15 different high voltage converter modulator systems during the period 2020-2022. Each system features 14 unique waveforms with both normal and anomaly signals. The label of each waveform is provided in a separate file (see Section 2.1 of the paper). - Description of data collection: The raw data from the controller were preprocessed to remove erroneous signals that look like white noise. The relevant pulses with time length of 1.8 ms were extracted from the waveform to remove the timestamps when the system is idle, which significantly reduces data size (see Section 2.2 of the paper). - Data source location: Spallation Neutron Source, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States. - Related work: See "Related links" below.