# Epit and Epass descriptors of 316L stainless steel estimated by Machine Learning

## Description

This database comprises 5 datasets of pitting/passivity descriptors estimated from Potentiodynamic Polarisation (PP) curves. Each dataset consists of 1 CSV file comprising the following features (columns): Epit_x, Epit_y, Epass_x, Epass_y. The “Maps” indexes (rows/data samples) correspond to the PP tests numbering. Epit_x and Epass_x, Epit_y and Epass_y correspond to Epit and Epass (V), log(jpit), and log(jpass) (µA/cm²). This descriptors database was derived from the 5 datasets of log(j) Vs E curves obtained in high throughput fashion with the SECCM on 316L stainless steel (5 different combinations of [NaCl] and scan rates). The descriptors datasets present the same amount of data samples as the source (log(j) Vs E) datasets (287, 377, 119, 125 and 47) available at: Bertolucci Coelho, Leonardo; Ustarroz, Jon (2023), “Micro-scale potentiodynamic polarisation (log(j)) curves of 316L stainless steel”, Mendeley Data, V1, doi: 10.17632/7j6b6y48jw.1 This descriptors database was deployed as described in the following scientific article, accepted for publication in npj Materials Degradation journal on 25 September 2023: “Estimating pitting descriptors of 316L stainless steel by machine learning and statistical analysis”. Leonardo Bertolucci Coelho1,2,∗, Daniel Torres1, Vincent Vangrunderbeek2, Miguel Bernal1, Gian Marco Paldino3, Gianluca Bontempi3, Jon Ustarroz 1,2 1 ChemSIN – Chemistry of Surfaces, Interfaces and Nanomaterials, Université libre de Bruxelles (ULB), Brussels, Belgium 2 Research Group Electrochemical and Surface Engineering (SURF), Vrije Universiteit Brussel, Brussels, Belgium 3 Machine Learning Group (MLG), Université libre de Bruxelles (ULB), Brussels, Belgium *leonardo.bertolucci.coelho@ulb.be In “Estimating pitting descriptors of 316L stainless steel by machine learning and statistical analysis”, we provide a methodology for estimating Epass (passive potential) and Epit (pitting potential) from: 1. typical log(j) Vs E curves with a straightforward passivity breakdown (using an algorithm based on linear regression (LR)); 2. PP curves with more unique profiles mainly due to metastable events (using Artificial Neural Networks (ANN) trained on the LR estimates). For further details on the acquisition of the PP curves, please refer to: Bertolucci Coelho, Leonardo (2023), “Micro-scale potentiodynamic polarisation curves of 316L stainless steel ”, Mendeley Data, V3, doi: 10.17632/78rz8vw46x.3 Coelho, L. B. et al. Probing the randomness of the local current distributions of 316 L stainless steel corrosion in NaCl solution. Corros. Sci. 217, 111104 (2023).

## Files

## Steps to reproduce

The ML methods used are accessible in the Code files (also made available on GitHub) that can be downloaded with the article “Estimating pitting descriptors of 316L stainless steel by machine learning and statistical analysis”. 1. Rule-based linear regression A deterministic rule-based algorithm, based on LR, estimated Epit/jpit (or Epass/jpass) descriptors pairs (continuous values) from polarisation curves. The obtained (X,Y) coordinates for Epit (or Epass) is the one that maximises the sum of the R² for the two LRs. The method was our initial labelling strategy, thus providing labels (data targets) to the unlabelled data. The label validation was done by visual examination of the Epit/log(jpit) (or Epass/log(jpass)) in the individual curves. 2. ANN In the case of unsatisfactory estimates, the strategy was to employ supervised ANN. The ANN was trained on the set of satisfactory estimates and then deployed on the unsatisfactory examples. Contrary to standard practice, where a fixed proportion of the data (e.g., 20%) is randomly selected for testing, our test sets comprised specifically challenging samples. To focus specifically on the relevant regions of the PP curves that encompass the passivity/pitting descriptors, the log(j) Vs E (smoothed) curves were partitioned through data slicing. Sparse sampling was conducted at every 40th (or 60th) point from the sliced log(j) array, leading to a final selection of 13 (or 12) log(j) values (for Epit or Epass). These numbers of input features were found to represent the target regions of the PP curves adequately. Reducing the curves to a selection of log(j) values that are linearly spaced in terms of their “E (V) stamps” was sufficient to describe the relevant regions in the curves. To improve the model convergence, we applied the StandardScaler method (sklearn.preprocessing) to the sliced log(j) data to standardise the input and output data. A sequential model (keras.models.Sequential) was defined, generating classic multi-layer perceptron networks. The number of nodes in the input layer was equal to the number of input descriptors (12 or 13 log(j) values). The output layer consisted of a single node, providing only one output (Epit or Epass). Given that log(j) is a function of E, the Epit and Epass estimates sufficed for finding the corresponding log(jpit) and log(jpass) values. The network’s topology, including the optimal number of hidden layers and nodes within each layer, was determined through exploratory testing and visual validation. The loss function was the MSE with Adam optimiser. The ReLU activation was used in the input/hidden layers. The number of batches was equal to the number of training samples. After preliminary training, the network was pruned (tensorflow_model_optimization) and further validated by random sampling (validation_split=0.1) of the labelled dataset. After achieving satisfactory validation, the final stage consisted of retraining the model with the labelled dataset.

## Institutions

## Categories

## Funding

Fonds De La Recherche Scientifique - FNRS

Chargé de recherches - CR