# Diesel Engine Faults Features Dataset (3500-DEFault)

## Description

The objective of this dataset is the fault diagnosis in diesel engines to assist the predictive maintenance, through the analysis of the variation of the pressure curves inside the cylinders and the torsional vibration response of the crankshaft. Hence a fault simulation model based on a zero-dimensional thermodynamic model was developed. The adopted feature vectors were chosen from the thermodynamic model and obtained from processing signals as pressure and temperature inside the cylinder, as well as, torsional vibration of the engine’s flywheel. These vectors are used as input of the machine learning technique in order to discriminate among several machine conditions. The database is expected to emulate all operating scenarios under study. In our case, all possible diesel machine faults and system conditions variations, which correspond to severities levels containing enough information to characterize and discriminate the faults. The developed database covered the following operating conditions: Normal (without faults), Pressure reduction in the intake manifold, Compression ratio reduction in the cylinders and Reduction of amount of fuel injected into the cylinders. In all scenarios, the motor rotation frequency was set at 2500 RPM. The rotation of 2500 RPM was used, since it presented the lowest joint error rate in the estimation of the mean and maximum pressures of the burning cycle, between the experimental data (according to data supplied by the manufacturer) and the simulated data, during the validation stage of the thermodynamic and dynamic models. The entire database comprises a total of 3500 different fault scenarios for 4 distinct operational conditions. 250 of which from the normal class, 250 from ``pressure reduction in the intake manifold" class, 1500 from ``compression ratio reduction in the cylinders" class and 1500 from the ``reduction of amount of fuel injected into the cylinders" class. This database is named 3500-DEFault database.

## Files

## Steps to reproduce

Normal (without faults): in this class, no fault is implemented and 250 different instances (realizations) are created from the insertion of 0.1\% of maximum severity with normal Gaussian probability distribution covering a range between 0 and 0.1\% in the 27 input variables of severity adopted from the thermodynamic model. The objective of this step is to emulate the real motor in normal operation, where the machine variables drifts with a small range around the optimal functioning. Pressure reduction in the intake manifold: Several scenarios with severities of [1, 2, 3, ..., 50] % in pressure reduction in the intake manifold variable are considered, making a total of 250 ``pressure reduction in the intake manifold" scenarios. Compression ratio reduction in the cylinders: This condition involves all cylinders to create the scenarios. Several scenarios with severities of [1, 2, 3, ..., 50] % related to Compression ratio reduction variables and the cylinders i = [1, 2, 3, 4, 5, 6] are considered, making a total of 250 ``compression ratio reduction" different scenarios for each cylinder, respectively, generating a total of 1500 ``compression ratio reduction" scenarios. Reduction of amount of fuel injected into the cylinders: This condition involves the all cylinders to create the scenarios. Several scenarios with severities of [1, 2, 3, ..., 50] % related to reduction of amount of fuel injected variables and the cylinders i = [1, 2, 3, 4, 5, 6] are considered, making a total of 250 ``reduction of amount of fuel injected" different scenarios addressed for each cylinder, making a total of 1500 ``reduction of amount of fuel injected into the cylinders" addressed scenarios. The technique for feature extraction consists in estimating the mean and maximum pressure values from the 6 pressure cylinder signals, and obtaining spectral information from the torsional vibration signal. The adopted measures associated to the faults in order to discriminate them are: In order to generate the feature vector, the first step is to estimate the maximum values of pressure curves for each cylinder; The second step is to obtain the mean values from the cylinders pressures curves and the third step is to calculate the spectral values from the torsional vibration curves , and its first 24 harmonics (with magnitude, phase and frequency), i.e., the first 24 half orders of the engine, considering the rotation fixed at 2500 RPM the first half order is given by 2500/120 Hz. The feature vector achieving a 84 dimensionality vector with 3500 sample elements, i.e., this dataset has 84 columns by 3500 rows. The first 84 columns refer to the features vector and the last 13 columns, at the level of severity (up to 50%) of the operating variables of the engine.