Comprehensive Structural Variant Curation in Chlamydomonas

SMs were called using three different variant callers, each of which relied on a different underlying alignment tool. Sniffles v1.0.12b (Sedlazeck et al. 2018 (link)) was used to call SMs based on the pbmm2 read alignments described above. BAM files were preprocessed using SAMtools-calmd to generate the MD tag, which provides information on mismatching positions (i.e., variable coordinates in the reads). Sniffles was first run on each MA line individually, and the resulting VCF files were merged using SURVIVOR v1.0.7 (Jeffares et al. 2017 (link)). Following the pipeline recommended for population calling (https://github.com/fritzsedlazeck/Sniffles/wiki/), Sniffles was then run again with the merged VCF as input and the option “‐‐Ivcf.” This population calling enables consistent presence or absence calls for SMs across all MA lines within a strain. SURVIVOR was used again to generate a multisample VCF.
MUM&Co v3 (O'Donnell and Fischer 2020 (link)) was used to call SMs from individual alignments of MA line assemblies to their ancestral reference, setting a genome size of 110 Mb (“-g 110000000”). MUM&Co calls variants based on alignments produced by MUMmer v4 (Marçais et al. 2018 (link)), which is performed as part of a single script. Variants were obtained as TSV and VCF files.
The variation graph tool (vg) (Garrison et al. 2018 (link)) was used to call variants directly from the pangenome alignments using the deconstruct command (“‐‐path-traversals”). The resulting VCF file for each strain was reduced to variants >50 bp.
All called variants in callable regions were manually curated via visualization of read and assembly alignments using the Integrative Genomics Viewer (IGV) (Robinson et al. 2011 (link)). SMs were rejected if they were not supported unambiguously by the read alignments. Read support for very large SMs was visualized via Ribbon v1.1 (Nattestad et al. 2021 (link)), which enables the visualization of reads mapping to discordant genomic regions. Supplemental Figures S12–S26 provide examples of SM visualization and curation. Most variants were entirely spanned by the reads, leading to simple visual confirmation in IGV, but variants >30 kb in length (approximately the upper limit of read lengths), including large inversions and translocations, required additional curation. In addition to read support from Ribbon, these rearrangements were traced in the MA line assemblies by manually assessing the discordant mapping of MA line contigs in the PAF alignment files (see Supplemental Fig. S23). Complex SMs, including large rearrangements and duplications, were further visualized using Ribbon v1.1 (Nattestad et al. 2021 (link)).
Duplications and deletions were curated as tandem repeat expansions or contractions if they involved the duplication or deletion of one or more monomers of a tandem repeat. Most fell within existing tandem repeat annotations, that is, satellites and microsatellites, whereas a small number required manual inspection of indel flanks by self-vs-self dotplots generated using the MAFFT v7 online server (Katoh et al. 2019 (link)). Deletions that perfectly intersected with TEs annotated by RepeatMasker in the ancestor genome were called as mobile excisions. Mobile insertions for described TE families were identified as cases in which the inserted sequence had a near-perfect BLASTN match (Camacho et al. 2009 (link)) to the Chlamydomonas repeat library (Craig 2021 (link)). These hits all had expected length distributions; LINE and PLE insertions frequently only contained the 3′ end owing to 5′ truncation, whereas insertions of other TEs corresponded to the entire length of the TE. In cases in which an inserted sequence had no match to an existing TE model, we queried the insert sequence against the ancestor genome, extracted and aligned hits, and manually curated new consensus sequences following established protocols for mobile element annotation (Goubert et al. 2022 (link)). All insertions unambiguously matched either the existing or newly produced consensus sequences and could be neatly defined to specific mobile element families. The one exception to this pattern was the duplications mediated by Dualen LINEs, where the sequence called as an insertion partly matched Dualen-4b_cRei and partly matched the sequence immediately flanking the insertion. These Dualen-mediated duplications were manually split to two called SMs: one mobile insertion and one duplication of the appropriate lengths.
When curating inversions and translocations, we noticed that many events featured additional insertions at the rearrangement breakpoints that were not specifically detected by the variant callers. As above, these insertions were compared to the annotated TEs and defined as mobile insertions of specific TE families. Five rearrangements could not be fully characterized because one of the breakpoints was clearly supported, but the other was in an uncallable region. These were arbitrarily classified as translocations.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

López-Cortegano E., Craig R.J., Chebib J., Balogun E.J, & Keightley P.D. (2023). Rates and spectra of de novo structural mutations in Chlamydomonas reinhardtii. Genome Research, 33(1), 45-60.

Publication 2023

Chlamydomonas Consensus sequences Deletion Deletions Genome Indel Insertions Inversions Library Microsatellites Rearrangements Satellites Strain Survivor Tandem repeat Translocations

Corresponding Organization :

Other organizations : University of Edinburgh, QB3, University of California, Berkeley, University of Toronto

Top 5 similar protocols

Variable analysis

independent variables

Three different variant callers (Sniffles, MUM&Co, and vg) that relied on different underlying alignment tools (pbmm2, MUMmer, and pangenome alignments)

dependent variables

Structural variants (SMs) called by the three variant callers

control variables

Genome size of 110 Mb used for MUM&Co variant calling
Manual curation of all called variants in callable regions using read and assembly alignments in IGV
Visualization of read support for large SMs using Ribbon
Curation of duplications and deletions as tandem repeat expansions or contractions
Identification of mobile element insertions and excisions using BLAST and consensus sequences

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!