Supplementary Data for [Machine Learning Reveals and Predicts Phenotypic Heterogeneity in the Prenatal Valproic Acid Rat Model of Autism Spectrum Disorder]
Description
This repository contains the longitudinal behavioral dataset and the custom Python computational pipeline used in the study, "Machine Learning Reveals and Predicts Phenotypic Heterogeneity in the Prenatal Valproic Acid Rat Model of Autism Spectrum Disorder". The primary objective of this project is to address the inherent phenotypic heterogeneity of the prenatal valproic acid (VPA) rat model of Autism Spectrum Disorder (ASD). The provided code objectively stratifies adult behavioral phenotypes without a priori clinical assumptions and prospectively predicts susceptibility versus resilience using early neurodevelopmental milestones.
Files
Steps to reproduce
This repository contains datasets and analytical workflows to replicate our study on phenotypic heterogeneity (susceptibility vs. resilience) in the VPA-induced rat model of autism. 1. Experimental Design and Data Collection Animal Model: Wistar rats received prenatal valproic acid (VPA, 500 mg/kg, i.p., GD12.5) or physiological saline (NaCl) to induce ASD-like phenotypes. Measured Variables: We assessed early neurodevelopmental milestones (e.g., righting reflex, negative geotaxis) during the neonatal period, and adult behavioral profiles (Sociability Index, Social Novelty Index, repetitive behaviors) post-weaning. 2. Computational and Statistical Analysis All computational modeling, preprocessing, and visualization were performed in Python 3.12. We utilized pandas and NumPy for data handling, scikit-learn for machine learning (HCA and Random Forest), and SciPy/statsmodels for statistical testing. Visualizations (ROC curves, PR-AUC, confusion matrices) were generated using matplotlib and seaborn. Data are expressed as mean ± SEM or median (IQR). 3. Reproducibility and Code Inventory For full reproducibility, we provide the primary dataset and two independent Jupyter notebooks: VPA_Longitudinal_Behavioral_Data.csv – The core longitudinal dataset containing both neonatal developmental milestone delays and adult behavioral metrics. 01_Unsupervised_Phenotypic_Clustering.ipynb – Unsupervised Hierarchical Cluster Analysis (HCA) of adult behaviors to mathematically validate the VPA-ASD and VPA-Resilient clinical labels. 02_RandomForest_Susceptibility_Prediction.ipynb – Supervised Random Forest (RF) classifier utilizing nested cross-validation to predict adult phenotypic outcomes exclusively from early milestone delays.