Study design. The aim of this study was to identify features of CD8
+ T cell responses to SARS-CoV-2 associated with disease state and HLA genetics, including immunodominant T cell epitopes, evidence of immune recall, and shared TCR sequence motifs. We used libraries of peptide-HLA tetramers with epitopes derived from across the SARS-CoV-2 proteome presented in four HLAs with high prevalence in North America. Samples from acute and convalescent patients, with HLAs matching the tetramer libraries, were acquired as they became available and screened in several batches alongside samples from unexposed subjects. A total of 27 acute, 28 convalescent, and 23 unexposed subjects were screened providing HLA-matched analysis for 43 A*02:01, 18 A*24:02, 17 B*07:02, and 9 A*01:01 samples.
Antigen library design. Antigenic peptide libraries were designed by scoring all possible 9mer peptides derived from the entire SARS-CoV-2 proteome (NC_045512.2) using netMHC-4.0 (
32 (
link)) in the HLA-A*02:01, HLA-A*01:01, HLA-A*24:02 or HLA-B*07:02 alleles. SARS-CoV-1 peptides that had evidence of T cell positive assays, obtained from the Immune Epitope Database (
www.iedb.org; (
51 (
link))), and that were highly homologous to their SARS-CoV2 counterparts within hamming-distance of 2 were converted to 9-mers. Additionally, SARS-CoV-2 peptides predicted to raise immunogenic responses by others were also included (
52 ,
53 (
link)). Finally, libraries included a set of well-defined viral epitopes from Cytomegalovirus, Epstein-Barr virus, and Influenza viruses (CEF peptide pool) that elicit T cell responses in the population at large. Antigenic peptides with 500 nM affinity or lower were then selected for inclusion (
Data file S8).
Production of tetramer library pools. HLA-A*01:01, -A*02:01, -A*24:02 and HLA-B*07:02 extracellular domains were expressed in
E. coli and refolded along with beta-2-microglobulin and ultraviolet (UV)-labile place-holder peptides STAPGJLEY, KILGFVFJV, VYGJVRACL and AARGJTLAM, respectively (
54 (
link)). A C-terminal sortase recognition sequence on the HLA was modified by sortase transpeptidation (
55 (
link),
56 ) with a synthetic alkynylated linker peptide, featuring an N-terminal triglycine connected to propargylglycine via a PEG linker (Genscript, Piscataway, NJ). The modified HLA monomer was then purified by size exclusion chromatography (SEC). Full-length streptavidin with an N-terminal Flag tag and a C-terminal sortase recognition sequence and 6xHisTag was prepared by expression and purification from
E. coli using immobilized metal affinity chromatography and SEC. Streptavidin was modified by sortase transpeptidation with a synthetic azidylated linker peptide, featuring an N-terminal triglycine connected to picolyl azide via a PEG linker (Click Chemistry Tools, Scottsdale, AZ). HLA tetramers were produced by mixing alkynylated HLA monomers and azidylated streptavidin in 0.5 mM copper sulfate, 2.5 mM BTTAA (2-(4-((Bis((1-(tert-butyl)-1H-1,2,3-triazol-4-yl)methyl)amino)methyl)-1H-1,2,3-triazol-1-yl)acetic acid) and 5 mM ascorbic acid for up to 4 hours on ice, followed by purification of highly multimeric fractions by SEC. Individual peptide exchange reactions containing 500 nM HLA tetramer and 60 uM peptide were exposed to long-wave UV (366 nm) at a distance of 2-5 cm for 30 min at 4°C, followed by 30 min incubation at 30°C. A biotinylated oligonucleotide barcode (Integrated DNA Technologies) was added to each individual reaction followed by 30 min incubation at 4°C. Individual tetramer reactions were then pooled and concentrated using 30 kDa molecular weight cut-off centrifugal filter units (Amicon). Tetramer production was quality controlled using SEC (
Fig. S1a), sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) (
Fig. S1b), and UV-mediated peptide exchange by assessing binding to peptide-expanded cell lines (
Fig. S2).
Patient Samples. Peripheral blood mononuclear cells (PBMCs) from COVID-19 positive donors or unexposed donors were obtained from Precision 4 Medicine (USA), the Massachusetts Consortium on Pathogen Readiness (MassCPR, Boston, USA), or CTL (USA), all under appropriate informed consent. Patients were defined COVID-19 positive based on positive SARS-CoV-2 real-time reverse transcriptase–polymerase-chain-reaction (RT-PCR) using nasopharyngeal swabs. Patient samples were characterized as “acute” if collected while the patient was hospitalized and as “convalescent” if collected after recovery or when presenting mild disease. Samples from unexposed subjects were collected prior to December, 2019. A summary of patient samples used in this study are presented in
Data file S2.
Cell Staining. PBMCs were thawed, and CD8+ T cells were enriched by magnetic-activated cell sorting (MACS) using a CD8+ T Cell Isolation Kit (Miltenyi) following the manufacturers protocol. The CD8+ T cells were then stained with tetramer libraries (
Data file S8), matched to subject HLAs (
Data file S2) and at 1nM final concentration for each member, in the presence of 2 mg/mL
salmon sperm DNA in PBS with 0.5% BSA solution for 20 min. Cells were then labeled with anti-TCR antibody-derived tag (ADT, clone IP26, Biolegend, CA, USA) for 15 min followed by washing. Tetramer bound cells were then labeled with phycoerythrin (PE) conjugated anti-DKDDDDK-Flag antibody (BioLegend, CA, USA) followed by dead cell discrimination using 7-amino-actinomycin D (7-AAD). The live, tetramer positive cells were sorted (
Fig. S3) using a Sony MA900 Sorter (Sony). When necessary, sorting gates were set liberally to enable sufficient cell recovery for single-cell sequencing.
Sample multiplexing. To ensure sufficient cell loading and subsequent cDNA production in single-cell sequencing, we used sample multiplexing for several experiments. When applied, samples were independently stained with tetramer libraries, labeled using custom anti-TCR ADTs with unique 15 base pair DNA barcodes (clone IP26, BioLegend, CA, USA), and sorted. ADT-labeled, sorted samples were combined prior to encapsulation and single-cell sequencing. In several cases, an expanded T cell line (Cellero Anti-MART-1, MA, USA) was labeled with a BV785 anti-CD8 antibody (BioLegend, CA, USA), stained using a tetramer for ELAGIGILTV in A*02:01, and subsequently mixed and co-sorted alongside samples interrogated for this study. This provided confirmation of tetramer staining, guidance for gating, and verification of the multiplexing strategy (
Fig. S3). The anti-MART-1 T cells (TCR sequences provided in
Data file S9) were excluded from any subsequent analyses.
Single-cell Sequencing. Tetramer positive cells were counted by Nexcelom Cellometer (Lawrence, MA, USA) using AOPI stain following manufacturer’s recommended conditions. When possible, 15,000 cells were targeted for encapsulation. Single-cell encapsulations were generated utilizing 5′ v1 Gem beads from 10x Genomics (Pleasanton, CA, USA) on a 10x Chromium controller and downstream TCR, Gene Expression, and Surface marker libraries were made following manufacturer recommended conditions. All libraries were quantified on a BioRad CFX 384 (Hercules, CA, USA) using Kapa Biosystems (Wilmington, MA, USA) library quantified kits and pooled at an equimolar ratio. TCRs, Gene Expression, surface markers, and tetramer generated libraries were sequenced on Illumina (San Diego, CA, USA) NextSeq550 instruments. Sequencing data were processed using the Cell Ranger Software Suite (Version 3). Samples were demultiplexed and unique molecular identifier (UMI) counts were quantified for TCRs, tetramers, and gene expression.
Single-cell Transcriptomic Analysis. Hydrogel-based RNA-seq data were analyzed using the Cell Ranger package from 10X Genomics (v3.1.0) with the GRCh38 human expression reference (v3.0.0). Except where noted, Scanpy (v1.6.0 (
57 (
link))) was used to perform the subsequent single cell analyses. Any exogenous control cells identified by TCR clonotype were removed before further gene expression processing. Hydrogels that contain UMIs for less than 300 genes were excluded. Genes that were detected in less than 3 cells were also excluded from further analysis. Several additional quality control thresholds were also enforced. To remove data generated from cells likely to be damaged, upper thresholds were set for percent UMIs arising from mitochondrial genes (13%). To exclude data likely arising from multiple cells captured in a single drop, upper thresholds were set for total UMI counts based on individual distributions from each encapsulation (from 1500 to 3000 UMIs). A lower threshold of 10% was set for UMIs arising from ribosomal protein genes. Finally, an upper threshold of 5% of UMIs was set for the MALAT1 gene. Any hydrogel outside of any of the thresholds was omitted from further analysis. A total of 15,683 hydrogels were carried forward. Gene expression data were normalized to counts per 10,000 UMIs per cell (CP10K) followed by log1p transformation: ln(CP10K + 1).
Highly variable genes were identified (1,567) and scaled to have a mean of zero and unit variance. They were then provided to scanorama (v1.7, (
58 (
link))) to perform batch integration and dimension reduction. The data were used to generate the nearest neighbor graph which was in turn used to generate a UMAP representation that was used for Leiden clustering. The hydrogel data (not scaled to mean zero, unit variance, and before extraction of highly variable genes) were labeled with cluster membership and provided to SingleR (v1.4.0, (
59 (
link))) using the following references from Celldex (v1.0.0, (
59 (
link))): Monaco Immune Data, Database Immune Cell Expression Data, and Blueprint Encode Data. SingleR was used to annotate the clusters with their best-fit match from the cell types in the references. Clusters that yielded cell types other than types of the T Cell lineage were removed from consideration and the process was repeated starting from the batch integration step. The best-fit annotations from SingleR after the second round of clustering and the annotation was assigned as putative labels for each Leiden cluster. Further clustering of transcriptomic data was performed across the genes shown in
Fig. 5 using KMeans in sklearn (v0.24) with n_clusters set to 8. As the method has a preference to assign like-sized clusters, further consolidation of two central memory clusters was performed.
In order to provide corroboration for the SingleR best-fit annotations and further evidence as to the phenotype of the clusters, gene panels representing functional categories (Naïve, Effector, Memory, Exhaustion, Proliferation) were used to score each hydrogel’s expression profiles using scanpy’s “score_genes” function (
57 (
link)) which compares the mean expression values of the target gene set against a larger set of randomly chosen genes that represent background expression levels. The gene panels for each class were: Naïve - TCF7, LEF1, CCR7; Effector - GZMB, PRF1, GNLY; Memory - AQP3, CD69, GZMK; Exhaustion - PDCD1, TIGIT, LAG3; Proliferation - MKI67, TYMS. The gene expression matrix for all hydrogels were first imputed using the MAGIC algorithm (v2.0.4, (
60 (
link)). These functional scores were the only data generated from imputed expression values.
Scoring peptide-HLA-TCR interactions. Tetramer data analysis was performed using built-in methods of pandas (v1.2.5) and numpy (v1.20.3) in Python (v3.7.3). For each single-cell encapsulation, tetramer UMI counts (columns) were matrixed by cell (rows) and log-transformed. Duplicates of this matrix were independently Z-score transformed by row or column, and subsequently median-centered by the opposite axis (column or row), respectively (
Fig. S7). For each peptide-HLA-cell interaction, this provided two scores - inter-tetramer (
) and inter-cell (
), which were used to calculate a classifier for unique CDR3 a/b clonotypes across
cells as
. Classifier thresholds for positive interactions were set at 40, 36, 50, and 65 for A*02:01, B*07:02, A*24:02, and A*01:01, respectively.
Frequency Calculation. The frequency of reactive T cells in parent CD8
+ T cell populations was estimated using a calculation of compounded frequency by taking the product of the fraction of reactive cells in the sorted population and the fraction of cells sorted (
Fig. S8). When sample multiplexing was applied, care was taken to include only de-multiplexed cells from the corresponding sample to determine reactive cell fraction.
TCR Network Analysis. TCR motif analysis was performed using scirpy (v0.6.1) with receptor_arms = “any,” metric = “alignment,” and default cutoff of 10. Once clusters were identified, sequence alignment was performed using the pairwise2 module in Biopython (v1.78) and visualized using logomaker (v0.8).
Recombinant TCR validation. Recombinant TCRs identified from patient samples were ordered from TWIST Biosciences in the pLVX-EF1a lentiviral backbone (Takara) as a bicistronic TCRb-T2A-TCRa vector. Viral supernatants from transfected HEK 293T cells were collected 48 and 72 hours after transfection and added to the parental TCRab
−/− Jurkat J76 cell like (
34 (
link)) expressing CD8 and a nuclear factor of activated T cells (NFAT)-green fluorescent protein (GFP) reporter, referred to as J76-CD8-NFAT-GFP. Recombinant TCR surface expression was confirmed through flow cytometry by staining transduced J76-CD8-NFAT-GFP cells with anti-CD3-PE (Clone UCHT1) and anti-TCRab-allophycocyanin (APC) antibodies (Clone IP26).
To assess functional activity of recombinant TCRs, J76-CD8-NFAT-GFP expressing recombinant TCRs were incubated at a 1:1 ratio with the HLA-A*02:01
+and HLA-B*07:02
+ HCC 1428 BL (ATCC CRL-2327) lymphoblastic cell line, with a final concentration of 0.5% dimethylsulfoxide (DMSO, vehicle) or 50 uM of cognate peptide (New England Peptide, >95% pure). Cell mixtures were incubated in the Sartorius IncuCyte at 37°C, 5% CO
2 overnight and analyzed for NFAT-GFP expression measured as total integrated intensity (GCU x mm
2/image) at 12 hours after assay setup. At 16 hours, cells were removed from the IncuCyte and subsequently washed and blocked with staining buffer (BD 554656), stained with anti-CD3-PE-Cy7 (Clone UCHT1) and anti-CD69-APC (Clone FN50) antibodies, and analyzed using the Intellicyt iQue Screener Plus and FlowJo v10. CD69 activity was measured as percent positive of CD3
+ cells.
Francis J.M., Leistritz-Edwards D., Dunn A., Tarr C., Lehman J., Dempsey C., Hamel A., Rayon V., Liu G., Wang Y., Wille M., Durkin M., Hadley K., Sheena A., Roscoe B., Ng M., Rockwell G., Manto M., Gienger E., Nickerson J., Moarefi A., Noble M., Malia T., Bardwell P.D., Gordon W., Swain J., Skoberne M., Sauer K., Harris T., Goldrath A.W., Shalek A.K., Coyle A.J., Benoist C., Pregibon D.C., Jilg N., Li J., Rosenthal A., Wong C., Daley G., Golan D., Heller H., Sharpe A., Abayneh B.A., Allen P., Antille D., Armstrong K., Boyce S., Braley J., Branch K., Broderick K., Carney J., Chan A., Davidson S., Dougan M., Drew D., Elliman A., Flaherty K., Flannery J., Forde P., Gettings E., Griffin A., Grimmel S., Grinke K., Hall K., Healy M., Henault D., Holland G., Kayitesi C., LaValle V., Lu Y., Luthern S., Schneider J.M., Martino B., McNamara R., Nambu C., Nelson S., Noone M., Ommerborn C., Pacheco L.C., Phan N., Porto F.A., Ryan E., Selleck K., Slaughenhaupt S., Sheppard K.S., Suschana E., Wilson V., Carrington M., Martin M., Yuki Y., Alter G., Balazs A., Bals J., Barbash M., Bartsch Y., Boucau J., Carrington M., Chevalier J., Chowdhury F., DeMers E., Einkauf K., Fallon J., Fedirko L., Finn K., Garcia-Broncano P., Ghebremichael M.S., Hartana C., Jiang C., Judge K., Kaplonek P., Karpell M., Lai P., Lam E.C., Lefteri K., Lian X., Lichterfeld M., Lingwood D., Liu H., Liu J., Ly N., Hill Z.M., Michell A., Millstrom I., Miranda N., O'Callaghan C., Osborn M., Pillai S., Rassadkina Y., Reissis A., Ruzicka F., Seiger K., Sessa L., Sharr C., Shin S., Singh N., Sun W., Sun X., Ticheli H., Trocha-Piechocka A., Walker B., Worrall D., Yu X.G, & Zhu A. (2021). Allelic variation in class I HLA determines CD8+ T cell repertoire shape and cross-reactive memory responses to SARS-CoV-2. Science Immunology.