Single cell transcriptome analysis of human embryonic stem cell-derived neurons spanning the rostrocaudal and dorsoventral axes
Description
Our inability to derive the neuronal diversity that comprises the posterior central nervous system (pCNS) using human pluripotent stem cells (hPSCs) poses an impediment to understanding human neurodevelopment and disease in the hindbrain and spinal cord. We established a modular, monolayer differentiation paradigm that recapitulates both rostrocaudal (R/C) and dorsoventral (D/V) patterning, enabling derivation of diverse pCNS neurons with discrete regional specificity. Expansive single-cell RNA-sequencing (scRNAseq) analysis coupled with a novel computational pipeline allowed us to detect hundreds of transcriptional markers within region-specific phenotypes, enabling discovery of gene expression patterns across R/C and D/V developmental axes. Processed data matrix: For each of the 6 samples from direct differentiation (GSE186696) and 14 samples from modular differentiation (GSE186697), We merged the gene expression matrices from each sample into a single matrix while taking the union of the genes from each matrix. The combined matrix is [12,543 cells x 20,598 genes] for the direct differentiation dataset and [49,959 cells x 23,941 genes] for the multiple generation dataset. We transformed the values of these matrices by taking their square root and standardizing each cell’s expression profile by dividing by the mean expression of a gene in each cell for the subsequent clustering analysis. Clusters - HOX profile clusters: We applied Louvain clustering (k=13) for the visualization of the simultaneous expression of HOX genes in our data set (GSE186697), which revealed inter- and intra-sample HOX profile heterogeneity. Clusters - primary clusters: We applied sparse non-negative matrix factorization (NMF) (Kim and Park, 2008) based clustering to define cardinal cell groups within our data set (GSE186697) in an unbiased manner and identified 25 primary clusters. Clusters - subpopulation subclusters: We organized related primary clusters into 17 different groups, and then developed and applied a consensus clustering based approach with the goal of defining robust subclusters representing subtypes of known cardinal populations. Consensus_graph_matrix: We regrouped our 25 primary cell clusters into 17 subgroups based on similarity of the cell types assigned to each cluster, and created a consensus graph of cell co-clustering relationship per subgroup. For every pair of cells in a subgroup, we counted the proportion of times the two cells were in the same cluster (across multiple clustering approaches), and generated a weighed graph of cells with weights corresponding to this proportion.