A Multi-Output Stacked Ensemble Regression Framework for Seismic Response Prediction of Regular RCC Framed Structures using ETABS-Simulated Data
Description
This dataset comprises 250 synthetically generated reinforced concrete (RCC) framed building models, designed and analyzed using ETABS software under the Indian seismic code IS 1893:2016 and structural code IS 456:2000. Each model represents a regular RCC configuration with variations in the number of storeys, bay width, column dimensions, beam sizes, slab thickness, material grades, and seismic coefficients. The dataset includes 36 input parameters covering geometric, material, and seismic properties. The output parameters consist of six key seismic performance indicators: base shear, maximum storey lateral load, displacements in X and Y directions, and interstorey drift ratios in X and Y directions. These were extracted directly from ETABS analysis results. The dataset supports the development and benchmarking of machine learning models, particularly for multi-output regression tasks in seismic performance prediction. It has been used in the paper titled “A Multi-Output Stacked Ensemble Regression Framework for Seismic Response Prediction of Regular RCC Framed Structures using ETABS-Simulated Data”. File Format: CSV (Comma-Separated Values), preprocessed and ready for machine learning applications. Potential Use Cases: Training ML models for seismic prediction Seismic design optimization Structural risk assessment Educational use in structural engineering courses
Files
Steps to reproduce
1. **Data Acquisition** * Download the provided CSV file from Mendeley Data. * The file contains 250 samples with 36 input parameters and 6 output parameters. 2. **Software Requirements** * Python 3.8+ * Required libraries: `pandas`, `numpy`, `scikit-learn`, `xgboost`, `lightgbm`, `catboost`, `matplotlib`, `seaborn` * (Optionally) use Anaconda for environment setup. 3. **Data Preprocessing** * Load the CSV using `pandas.read_csv()`. * Scale the input features using `RobustScaler()` or `StandardScaler()` from `sklearn.preprocessing`. * Ensure input-output split (X and Y): * X → 36 input columns * Y → 6 output seismic parameters 4. **Model Training** * Use any multi-output regressor wrapper, e.g., `MultiOutputRegressor()` or `RegressorChain()` * Train individual models (RF, XGBoost, LGBM, CatBoost) with optimized hyperparameters. * Train a stacked ensemble using `StackingRegressor()` with ElasticNetCV as meta-learner. 5. **Evaluation** * Evaluate using R², RMSE, and MAE metrics on both training and test splits. * Use `cross_val_score()` with 10-fold CV for generalization validation. 6. **Reproducibility** * Set `random_state=42` in all models and shuffles. * Use consistent train-test splits (e.g., 80:20) to compare results.
Institutions
- Maulana Azad National Institute of Technology