Breaking the black box barrier: predicting remaining useful life under uncertainty from raw images with interpretable neural networks
In recent years, prognostics has emerged as a focal point across various industries, gaining substantial attention for its prowess in optimizing maintenance schedules, elevating operational efficiency, and averting costly unplanned downtime. At the heart of prognostics lies the paramount parameter of Remaining Useful Life (RUL), signifying the critical time prior to system failure. Recent advancements in deep learning have ushered in the capability to forecast RUL by extracting features from a range of data formats: time-series, images, or sequences of images, representing one-, two-, or three-dimensional data respectively. While one-dimensional data has been frequently encountered in the literature, current approaches for predicting RUL from (sequences of) images still heavily rely on techniques, such as digital image correlation. These methods bring with them substantial computational overheads and intricate data acquisition strategies. Furthermore, the challenge of predicting RUL using high-dimensional data is exacerbated by the unreliable characteristics of deep learning models. In this regard, this study introduces an innovative deep learning architecture based on Transformers designed to tackle the dual challenges posed by high-dimensional data and the black-box nature of existing models. Building upon the remarkable achievements of Transformers in natural language processing and computer vision, their distinctive attention mechanism proves instrumental in realizing RUL predictions under uncertainty with just a sparse set of raw image sequences as input. By decomposing the spatiotemporal domain and harnessing the power of attention, our model adeptly sidesteps the black-box limitations of such architectures, thus enabling transparent and interpretable predictions. The proposed architecture is evaluated on an experimental dataset acquired by a real composite structure that is under fatigue loads with visible cracks that propagate with time. Given the unprocessed sequences of raw images as inputs, our model efficiently estimates the stochastic RUL. Concurrently, by leveraging the attention mechanism, we effectively demonstrate a strong correlation between the model's spatiotemporal focus to the sequences with the RUL making it, to the best of our knowledge, the first model to provide interpretable stochastic RUL predictions directly from sequential images of that nature.
Steps to reproduce
There are 7 specimens, namely A003, A009, A011, A012, A013, A014, and A019. Each specimen contains a sequence of images starting from the perfect condition until the end of life. Each sequence of images is captured by 2 cameras at the same time. Therefore, the name of each image is A***_*_***_**, where the first 3 unseen digits correspond to the specimen number, the next digit to the corresponding camera (0 or 1), the next 3 digits to the timestep that the image was captured, and the last 2 digits to the nth image of the sequence. The provided code in Github (see link below) suffices to reproduce the methodology and results presented in the paper.