CLC Genomics Workbench version 7.0.3 (CLC Bio, Qiagen) was used to analyze and process the sequencing reads of both the Ion Torrent PGM and the Illumina MiSeq. First, adaptor contamination was removed from the reads. Next, the sequencing reads were trimmed from both sides using the modified Mott trimming algorithm to reach a Q20 score, which means that the chance that a particular base in the sequence is called incorrectly by the sequencer is 1 in 100. Afterwards, all ambiguous (N) bases were trimmed from the reads. We also removed the reads with a read length below 50. For the Illumina MiSeq, the broken pairs resulting from trimming and filtering were also removed. The remaining reads were assembled using default settings for de novo assembly. In addition, the processed reads were also aligned with the pHW197-M plasmid reference sequence or the influenza PR8 reference genome (based on the sequences encoding the eight segments in the pHW vectors, determined by Sanger sequencing, with addition of the extra 20 nucleotides present at the 5′ site in the RT-PCR primers) using local alignment. For this, the following default penalties were used: match = +1, mismatch = −2, insertion/deletion = −3, filtering threshold: length fraction = 0.9 and similarity fraction = 0.8. Non-specific matches, defined as reads aligning to more than one position with an equally good score, were ignored. Sequence variants were called using all available sequencing data that covered each nucleotide at least 100 times and had a central base quality score of Q20 or greater. The A-to-G variant introduced by the primer at position 24 in the HA, NP, NA, M and NS segments was not taken into account during the influenza quasispecies variant analysis. All numerical data mentioned in the text are presented as averages with their standard deviations (± SD).
Free full text: Click here