CFD-informed machine learning surrogate dataset for thermo-hydraulic prediction in partially porous wavy channels
Description
This dataset supports the manuscript “CFD-Informed Machine Learning Surrogate Modeling for Thermo-Hydraulic Prediction in Partially Porous Wavy Channels for Heat Sink Applications.” It contains the cleaned long-form CFD-derived dataset, fixed train-test split information, trained-model outputs, prediction files, validation tables, plot data, and one final reproducibility script used to train Random Forest surrogate models for predicting average Nusselt number and pressure drop. The dataset includes 4,608 CFD-derived samples generated from 18 geometric configurations and 256 operating-condition combinations. The input features are Reynolds number, Prandtl number, Darcy number, porosity, porous slab thickness, wave amplitude, and wavelength. The target outputs are average Nusselt number and pressure drop.
Files
Steps to reproduce
1. Download and extract the complete dataset ZIP file. 2. Keep the repository folder structure unchanged: 02_processed_data/ 03_splits/ 04_models/ 05_predictions/ 06_scripts/ 07_tables/ 08_plot_data/ 3. Install the required Python packages: pip install numpy pandas scikit-learn matplotlib joblib openpyxl 4. Open a terminal or command prompt in the top-level dataset folder. 5. Run the reproducibility script: python 06_scripts/01_train_RF_models_random80_20.py 6. The script reads the cleaned long-form CFD-derived dataset from: 02_processed_data/ML_dataset_longform.csv and the fixed random 80/20 split file from: 03_splits/random80_20_split.csv 7. The script trains Random Forest surrogate models for Nuavg and DelP_Pa and saves the outputs in: 04_models/random80_20/ 05_predictions/random80_20/ 07_tables/random80_20/ 8. The generated prediction and metric files can be compared with the corresponding files included in the repository and with the values reported in the manuscript.
Institutions
- University of Wisconsin–La CrosseWisconsin, La Crosse