Inherited CD4 deficiency: long-read TCR single-cell RNA sequencing (scRNASeq) of stimulated TCRab+ memory T cells

Published: 30 January 2024| Version 1 | DOI: 10.17632/cscnx5rgps.1
Contributors:
Antoine Guérin,
,

Description

To characterize the impact of inherited CD4 deficiency on TCR alpha/beta chains, long reads single-cell RNA sequencing was performed on isolated memory (defined as CD45RA-CCR7+/-) CD3+CD8-TCRαβ+ T cells from healthy donors (n=4) and CD4 deficient (P1-P5) patients or CD3+CD8+TCRαβ+ T cells from healthy donors (n=1). Following isolation, stimulation, 10X Genomics Chromium capture and cDNA library preparation (see short read ScRNAseq dataset), TCR contigs for single cells were obtained by the Repertoire and Gene Expression by Sequencing (RAGE-seq) method (Singh et al., 2019). Basic repertoire features for the single cells were summarized in R (version 4.2.2, https://www.R-project.org/) via RStudio IDE (version 2022.12.0.353, http://www.posit.co/) R packages used for data manipulation and plotting included tidyverse (version 2.0.0), rstatix (version 0.7.2, https://CRAN.R-project.org/package=rstatix) and ggpuborg/package=ggpubr). To classify single cells as CD4 or CD8 based on paired TRA:TRB paired sequences an XGBoost model as described in (Carter et al., 2019a) was used. Modelling was undertaken in R using the xgboost package (version 1.7.3.1, https://CRAN.R-project.org/package=xgboost). Model features included the TRA and TRB CDR3 lengths, TRA and TRB CDR3 charge, CDR3 amino acid proportions and one-hot encoding for TRBV and TRAV gene usage. The modelling objective was binary:logistic and the following parameters were used; nrounds=1000, booster=gbtree, eta=0.01, max_depth=10, gamma=1, subsample=0.8, colsample_bytree=0.8 and min_child_weight=5. Training and test sets used the paired single cell data from Carter et al. (Carter et al., 2019a) obtain from [https://github.com/JasonACarter/CD4_CD8-Manuscript] after excluding TRA:TRB pairs that were observed across both CD4 and CD8. Training and test dataset were a random sub-sampling of 14,000 CD4 and 14,000 CD8 across 10 iterations. To set thresholds for the binary logistic regression scores for assigning CD4 and CD8 the labelled test data was used to set the misidentification rate to 5% for each iteration. Cells that fell outside the threshold were considered ‘non-attributed’. Cell types for the single cells from RAGE-seq were predicted for TRA:TRB pairs for each of the 10 model iterations and the mean percentage of cells assigned CD4 and CD8 was calculated across the iterations for each donor.

Files

Institutions

Garvan Institute of Medical Research

Categories

Health Sciences

Licence