Multiplexed single-cell gene expression and TCR sequencing data generated from human thymus and blood samples
Description
Single-cell RNA sequencing (scRNA-seq) has improved our ability to study rare cell subsets. To maximise reagent utility, researchers often process samples from multiple donors or conditions simultaneously (multiplexing) and overload microfluidic chip channels (superloading). While superloading reduces processing time and costs, it increases the incidence of doublets, potentially complicating downstream analyses. To investigate the effects of superloading on primary immune cells, we generated and analysed single-cell gene expression and TCR data from standard- and super-loaded gel beads-in-emulsion (GEM) chip channels. While the transcriptomic profiles of the two loading methods were largely similar, we observed that most T cells expressing multiple TCR chains were doublets, underscoring the need for TCR configuration-based doublet removal for accurate T cell analysis.
Files
Steps to reproduce
We isolated thymocytes and peripheral blood mononuclear cells (PBMC) from the thymus and blood of four healthy human donors (Thymus_A, Thymus_B, PBMC1, and PBMC2). Each sample was labelled with hashtag oligonucleotide (HTO) antibodies (TotalSeq™ anti-human Hashtag Cat. 394661, 394663, 394665, 394667, Biolegend) for multiplexing. The labelled cells were loaded into two GEM chip channels—generating a 40,000-cell 4-plex GEM well (40K) and an 80,000-cell 4-plex GEM well (80K), each respectively comprising 10,000 cells and 20,000 cells from each donor. The libraries for single-cell gene expression, V(D)J, and cell surface proteins were generated using Chromium X (Chromium Next GEM Single Cell 5' HT Reagent Kits v2 Dual Index, User guide CG000424, Rev C, 10x Genomics). The quality of cDNA and cell multiplexing libraries was assessed using the 4150 TapeStation (Agilent). Sequencing was conducted on NovaSeq 6000 (Illumina), with depths of 50,000 reads/cell for gene expression, 5,000 reads/cell for V(D)J, and 1,000 reads/cell for cell surface proteins. The 40K and 80K raw sequence reads were aligned to the human genome (Cell Ranger, 10x Genomics, v6.1.2), and gene x cell matrices (provided here in subfolder '1') were generated. Seurat objects for 40K and 80K were created, and ambient RNA was separately removed using SoupX. HTO data were added as assays, and demultiplexing was performed using HTODemux. The demultiplexed data were then exported (provided here in subfolder '2') for further analysis using Scanpy. Separate adata objects for 40K and 80K were initiated, and doublets were predicted using Scrublet. The 40K and 80K adata objects and their TCR data (provided here in '40k80k_V(D)J' in subfolder '1') were concatenated and merged into a MuData object. Gene and cell filtering, normalisation, and log transformation were sequentially performed. PCA was conducted, followed by Harmony-based integration and Leiden clustering. Marker genes for cluster annotation were generated using the Wilcoxon rank-sum test. Singlet differential gene expression analyses were conducted using diffxpy, while decoupleR and PyDESeq2 were used for pseudobulk analysis. Scirpy was used for TCR analysis.
Institutions
Categories
Funding
National Research Foundation of Korea
Bio & Medical Technology Development Program of the National Research Foundation of Korea (NRF) funded by the Korean government (MSIT) under Grant No. 2022M3A9D3016848
Seoul National University
Creative Pioneering Researchers Program, Project No. 800-20240446