Python code and data for intelligent data-driven ensemble approaches for bending strength prediction of ultra-high performance concrete beams
Description
# Supplementary Material: Python Code for ML-Based Flexural Capacity Prediction This notebook (`ML_UHPC_Flexure_Python_Code.ipynb`) provides the full implementation of the machine learning framework developed in the manuscript using the dataset "data.xlsx": **"Intelligent data driven ensemble approaches for bending strength prediction of ultra-high performance concrete beams."** --- ## 1. Data Import and Preprocessing - Loads the harmonized database of **264 UHPFRC beam tests** compiled from 54 studies. - Defines **10 input features** (geometry, reinforcement, and material properties) and the target variable (*ultimate bending moment capacity, Mc*). - Performs preprocessing steps: - Winsorization of extreme values. - Feature engineering (e.g., computing concrete area *Ac* and moment of inertia *Ic*). - Dataset partitioning into training (70%), validation (15%), and testing (15%). --- ## 2. Model Development and Hyperparameter Tuning - Implements six ensemble algorithms: - Random Forest (RF) - Gradient Boosting Machine (GBM) - LightGBM - AdaBoost - CatBoost - XGBoost - Hyperparameter tuning performed via **Bayesian optimization** with **10-fold cross-validation**. - Repeatability is ensured using multiple random seeds and error bar reporting. --- ## 3. Model Evaluation and Benchmarking - Evaluates models using **R², RMSE, MAE, and CoV**. - Benchmarks ML predictions against **international and national design codes**: - Chinese UHPC draft, JGJ/T 465-2019 - Swiss SIA 2052 - ACI 318, ACI 544, FHWA - Produces comparative plots of predicted vs. experimental capacities and prediction-to-experiment ratios. --- ## 4. Explainability via SHAP Analysis - Uses **Shapley Additive Explanations (SHAP)** to quantify feature importance. - Identifies **effective depth (d)** and **reinforcement ratio (ρs)** as the most influential parameters. - Provides: - Global SHAP importance ranking. - SHAP summary (beeswarm) plots. - SHAP dependence plots for feature interactions. --- ## 5. Uncertainty and Repeatability - Multiple training runs with different random seeds to test robustness. - Error bars included in performance metrics for reliability. --- ### Purpose This notebook ensures **transparency and reproducibility** of the proposed ML framework. It enables researchers and practitioners to: - Apply the models to new UHPFRC beam datasets. - Extend the methodology to other structural behaviors (e.g., shear, serviceability). - Integrate **physics-informed constraints** into ensemble learning models.
Files
Steps to reproduce
# Steps to Reproduce 1. **Download Data and Code** - Access the dataset and notebook from the Mendeley Data repository: - `data.xlsx` (harmonized database of 264 beams). - `ML_UHPC_Flexure_Python_Code.ipynb` (this notebook). 2. **Open the Notebook** - Use **Google Colab**, **Jupyter Notebook**, or any Python IDE that supports `.ipynb` files. - Upload both the dataset and the notebook into your environment. 3. **Install Dependencies** - Ensure the following Python libraries are available in your environment: - `numpy` - `pandas` - `scikit-learn` - `shap` - `xgboost` - `lightgbm` - `catboost` - `bayesian-optimization` - `matplotlib` - `seaborn` - In Colab, missing packages can be installed directly inside a cell. 4. **Run Notebook Cells in Sequence** - **Section 1 – Data Import & Preprocessing** Loads dataset, applies winsorization, feature engineering, and train/val/test split. - **Section 2 – Model Development & Hyperparameter Tuning** Trains six ensemble models with Bayesian optimization. - **Section 3 – Evaluation & Benchmarking** Computes R², RMSE, MAE, CoV, and compares ML predictions with design codes. - **Section 4 – SHAP Explainability** Generates global importance, beeswarm, and dependence plots. - **Section 5 – Uncertainty & Repeatability** Runs multiple seeds and includes error bars for robustness. 5. **Verify Outputs** - Predicted vs. experimental bending capacity plots. - Ratios of ML predictions vs. code predictions. - SHAP plots highlighting effective depth (*d*) and reinforcement ratio (ρs). - Error bars showing repeatability across random seeds. 6. **Extend (Optional)** - Replace the dataset with new beam test data. - Adapt the pipeline for shear or serviceability prediction. - Incorporate physics-informed constraints into ensemble models.
Institutions
- Ceske Vysoke Uceni Technicke v Praze
- University of Zambia