Basecalling was performed using ONT Guppy v3.0.4 aboard the MinIT data processing unit (ONT-MinIT-Release 19.06.8) using a minimum quality score of 7 for filtering low-quality reads. All FASTQ files within each sample were concatenated into a single file and filtered to only include reads between 1000 and 2100 nucleotides in length. Reads were then corrected and trimmed using Canu v1.9 [28 (
link)] with the following parameters:
-correct,
genomeSize = 1.7 k,
minOverlapLength = 1000,
corOutCoverage = 1000000;
-trim trimReadsCoverage = 20. Next, reads containing intact forward and reverse primer sequences were extracted using
bbduk.sh (
k = 18,
restrictleft/right = 500,
rcomp =
f,
mm =
f,
edist = 2) via BBTools v38.55 [29 ], and primer sequences were queried to establish plus and minus strand reads separately. Minus strand reads were then reverse complemented and combined with plus strand reads into a single FASTA file. To filter out off-target reads, a
Blastocystis reference database was downloaded from NCBI using the following criteria: “
blastocystis [ORGN] AND 0:6000 [SLEN] AND biomol_genomic[PROP].” The FASTA file containing the reference sequences was indexed using VSEARCH v2.14.1 [30 (
link)] with
vsearch --makeudb_usearch command. Read filtering was then performed using the
vsearch --usearch_global command with the following parameters:
--id 0.9 --query_cov 0.9. Next, consensus sequences were generated by clustering reads using the
vsearch --cluster_fast command with a 98% identity threshold. Consensus sequences were checked for chimeras using the
vsearch --uchime_denovo command and then filtered using a minimum abundance threshold of 5. Sequences were polished using Racon v1.4.11 [31 (
link)]. The alignment file needed for polishing was generated using Minimap2 v2.17-r941 [32 (
link)] (
-ax asm5 --secondary =
no) by mapping the VSEARCH filtered reads to the chimera-free sequences. Polishing was then performed using default Racon parameters. Polished sequences were clustered again at a 98% identity threshold and prepared for another round of improvement with Nanopolish v0.11.1 [33 (
link)] to leverage signal-level FAST5 data. The reads used for this step were Canu-corrected, trimmed reads that were down-sampled using
bbnorm.sh to a target coverage of 500. Down-sampled reads were mapped to the Racon-polished, re-clustered consensus sequences using Minimap2 (
-ax asm5 --secondary =
no), and the alignment file was sorted and indexed using Samtools v1.9 [34 (
link)]. Polishing was executed using the
nanopolish variants --consensus command with the parameters
--min-flanking-sequence = 10,
--fix-homopolymers, and
--max-haplotypes = 1000000. The
nanopolish vcf2fasta command was then used to apply the improvements from the previous step to the Racon-polished, re-clustered consensus sequences. Nanopolished sequences were re-clustered once more at a 98% identity threshold to obtain final consensus sequences. Subtypes were assigned based on the best match to a reference in the GenBank database using BLAST. The nucleotide sequences obtained in this study have been deposited in GenBank under the accession numbers MT898451–MT898459.
For comparison purposes, for each same sample, full-length sequences and partial sequences obtained with MinION and MiSeq, respectively, were aligned using ClustalW in MegAlign 15 (DNASTAR Lasergene 15, Madison, WI, USA), and pairwise distances between consensus sequences were calculated.
Maloney J.G., Molokin A, & Santin M. (2020). Use of Oxford Nanopore MinION to generate full-length sequences of the Blastocystis small subunit (SSU) rRNA gene. Parasites & Vectors, 13, 595.