We demultiplexed raw sequence reads to fastq files based on index sequences using the BaseSpace Sequence Hub cloud service of Illumina. We then further demultiplexed individual libraries to fastq files for each sample using a custom Python script. Following this, we processed the samples using a Nextflow24 (link) (version 20.10.0) based analysis pipeline from nf-core25 (link) called viralrecon26 (version 1.1.0). In short, we trimmed the adapters from the fastq reads using fastp27 (link) (version 0.20.1) and aligned them to the SARS-CoV-2 reference genome (NC_045512.2) using bowtie 228 (link) (version 3.5.1). Following this, we sorted and indexed the reads using samtools29 (link) (version 1.9), we trimmed amplicon primer sequences using ivar30 (link) (version 1.2.2), called variants, and generated the subsequent consensus sequence also using ivar. To determine the percentage of reads that mapped to different organisms and common contaminants, we used FastQ-screen31 (link) (version 0.14.1). Briefly, 100,000 reads were sampled from the fastq files and aligned to 14 reference sequences using bowtie 228 (link) (version 3.5.1) (see Supplementary TableĀ 2). We performed all subsequent analyses using custom R scripts.
Free full text: Click here