Solid Recovery, Delignification, and Sugar Yield in LCB pretreatment using Machine Learning and TOPSIS

Published: 28 January 2026| Version 1 | DOI: 10.17632/74tccjwxhn.1
Contributors:
BISWANATH MAHANTY,

Description

SCB pretreatment with varying NaOH conc, Temp, and time (based on CCD) maps to solid recovery, delignification, and sugar yield from the treated biomass. The dataset was augmented with Gaussian noise, and three different ML models were developed for three responses. Multi-objective optimization was adopted using GA - the pareto optimal solution was screened based on TOPSIS method.

Files

Steps to reproduce

Data Preparation and Linear Modeling -------------------------------------- a0_Exp_RS.mat contains the experimental Central Composite Design (CCD) dataset comprising three predictors and three response variables. a1_MLR_mdl.m develops multiple linear regression (MLR) models using the experimental dataset and stores the model coefficients, diagnostics, and performance metrics in a1_MLR_ori.mat. a2_MLR_gendata.m inflates the original CCD dataset to a user-defined size through Gaussian noise injection, preserving the statistical structure of the experimental data. The augmented dataset is saved as a2_MLR_inflated.mat. Machine Learning Model Development -------------------------------------- b1_Bayesian_tree.m develops a Bayesian-optimized Random Forest (tree-based) model using the inflated dataset. The optimized model and associated metadata are stored in b1_tree_out.mat. b2_Bayesian_fitrnet.m develops a Bayesian-optimized Artificial Neural Network (ANN) model using MATLAB’s fitrnet framework. The trained model and optimization results are saved in b2_fitrnet_out.mat. Multi-Objective Optimization -------------------- c1_GA_Ind_min_max.m computes the individual minimum and maximum bounds for each response variable, which are required for normalization during multi-objective optimization. The results are stored in c1_Ind_min_max.mat. d1_ANN_MOO_ga.m performs multi-objective optimization (MOO) using a genetic algorithm coupled with the ANN surrogate model. The resulting Pareto-optimal solution set is saved in d2_Multi_objective_result.mat. d2_ANN_MOO_ga.m further analyzes the Pareto front using Analytic Hierarchy Process (AHP)-derived weights, TOPSIS ranking, and sensitivity analysis to identify robust and preferred solutions. Statistical Analysis and Visualization -------------------------- f0_mlrplot.m generates diagnostic and performance plots related to the MLR models. f1_stat_heatmap.m produces a correlation heatmap illustrating the relationships between predictors and response variables. f2_modelscatter.m generates scatter plots comparing experimental responses with model predictions. f4_boxplot_dataquality.m compares the distribution of the original experimental data and the inflated dataset using box plots to assess data quality and consistency. f3_PPE_Shaply_finalplot.m performs Shapley value analysis to quantify feature importance and interpretability of the machine-learning models. Uncertainty Quantification ------------------------------- r1_bootstrap.m estimates bootstrap-based prediction intervals for the experimental data points using the trained machine-learning models. r2_boot_figure.m generates the corresponding graphical representation of the bootstrap prediction intervals.

Institutions

Karunya University

Categories

Artificial Neural Network, Machine Learning, Analytical Hierarchy Process, MOOs, Technique for Order of Preference by Similarity to Ideal Solution, Biomass Pretreatment, Random Decision Forest, Non-Dominated Sorting Genetic Algorithm II

Licence