Integrative machine-learning on high-throughput human data identifies age-specific hallmarks of Alzheimer’s disease
Alzheimer’s disease (AD) is an incredibly complex and presently incurable age-related brain disorder. To better understand this debilitating disease, we collated publicly available RNA-Seq, microarray, proteomics, and microRNA samples derived from AD patients and non-AD controls. 4089 samples originating from brain tissues and blood remained after applying quality filters. Since disease progression in AD correlates with age, we stratified this large dataset into three different age groups: < 75 years, 75-84 years, and ≥ 85 years. The RNA-Seq, microarray, and proteomics datasets were then combined into different integrated datasets. Ensemble machine learning was employed to identify genes and proteins that can accurately classify samples as either AD or control. These predictive inputs were then subjected to network-based enrichment analyses. The ability of genes/proteins associated with different pathways in the Molecular Signatures Database to diagnose AD was also tested. We separately identified microRNAs that can be used to make an AD diagnosis and subjected the predicted gene targets of the most predictive microRNAs to an enrichment analysis. The following key themes emerged from our machine learning and bioinformatics analyses: cell death, cellular senescence, energy metabolism, genomic integrity, glia, immune system, metal ion homeostasis, oxidative stress, proteostasis, and synaptic function. Many of the results also demonstrated unique age-specificity. For example, results highlighting cellular senescence only emerged in the earliest and intermediate age ranges while the majority of results relevant to cell death appeared in the youngest patients. These data demonstrate that, like aging, AD is a multifaceted process characterized by diverse dysfunction. Please see paper for detailed methods.