Complete plasmid sequences are provided in SI Appendix, Table S1. Lentiviral transfer plasmids with 47× and 240×CAG-repeats were previously described (6 (link)). Lentiviral packaging and envelope constructs were obtained from Addgene: pCMV-VSV-G was a gift from Bob Weinberg (Addgene plasmid # 8454; https://www.addgene.org/8454/; RRID: Addgene_8454) (58 (link)); psPAX2 was a gift from Didier Trono (Addgene plasmid # 12260; https://www.addgene.org/12260/; RRID: Addgene_12260). Plasmids containing CAG repeats with endogenous flanking sequences from ATXN3, ATXN8, and HTT were generously provided by Laura Ranum (3 (link)).
The library of 240×CAG repeat-containing plasmids with variable flanking sequences was generated as follows. Double-stranded DNA oligonucleotides (~300 bases) were purchased from Quintara Biosciences. These DNA fragments were inserted in plasmids with 47× or 240×CAG repeats between EcoRI and MluI sites using standard restriction digestion and ligation procedures. To construct variants of CAGRAN with 5× or 22× CAG repeats, we purchased CAG repeat-containing single-stranded DNA (Integrated DNA Technologies, IDT), annealed and incorporated them between EcoRI and SgrDI sites downstream of the flanking sequence in CAGRAN. To construct CAGRANBFP, EBFP2 was obtained as a double-stranded DNA fragment (IDT) and was cloned between BamHI and NotI sites in CAGRAN. All cloning and plasmid preparations were performed in Stbl3 Escherichia coli cells (Invitrogen, C7373-03) grown at 30 °C. Since repeat number can spontaneously change during the cloning process, for each construct we verified the repeat tract in two ways. One, we optimized a Sanger sequencing protocol (in collaboration with Quintara Biosciences), which used betaine and 7-deaza-dGTP. This optimized sequencing protocol provided ~800 base long reads from each end. In constructs with 240×CAG repeats, Sanger sequencing did not provide sufficient read length to unambiguously determine the number of repeats, so the repeat number was verified by examining the size of the insert after restriction digestion and gel electrophoresis. Sanger sequencing revealed eight unintended interruptions in the CAG repeat track in our constructs with 240×CAG repeats (sequences in Dataset S1). These sites contained a deletion of G nucleotide and the first interruption occurred at 42 bases from the start of the repeat tract. These interruptions were present in our parent plasmid, and were common to all 240×CAG repeat-containing constructs examined in this study (CAGRAN, CAGFOCI, and related constructs). We observed similar phenotypes (e.g., toxicity, cytoplasmic RNA aggregation, and RAN translation) in constructs containing these repeat interruptions or corresponding constructs with uninterrupted 47×CAG repeats.