Data collected from NGS experiments were analyzed in order to identify single nucleotide variants and small insertions/deletions. The first steps (including base calling and demultiplexing) was performed using MiSeq provided software (Real Time Analysis RTA v.1.18.54 and Casava v.1.8.2, Illumina, Inc., San Diego, CA). FastQ files for each sample, containing mate paired-end reads after demultiplexing and adapter removal, were used as input for MiSeq pipeline. Brefly, FastQ files were processed with MiSeq Reporter v2.0.26 using the Custom Amplicon workflow. This analytical method required FastQ files, a “Manifest file” containing information about the sequences of primer pairs, the expected sequence of the amplicons and the coordinates of the reference genome (Homo sapiens, hg19, build 37.2) as input. Each read pair was aligned using the MEM algorithm of the BWA software [27 (link)]. The aligned BAM file were used as input to GATK variant caller (Genome Analysis ToolKit, v1.6) [28 (link)], thus generating a VCFv4.3 file for each sample. NGS data have been uploaded and are available at the public repository for research data Harvard Dataverse https://doi.org/10.7910/DVN/DEAEVL. All other data are within the paper and its Supporting Information files.
Free full text: Click here