Image analysis software was provided as part of the Genome Analyzer analysis pipeline and configured for fully automatic parameter selection. Single-end reads were 76 bases in total length. Quality control was performed using FastQC, showing overall low error rates. The reference genome used was the latest FlyBase version at the time (y1; cn1bw1sp1 strain, Dm5.30). The data was aligned using the BWA algorithm (Li and Durbin, 2009 (link)). A total of 5,234,506 reads were NOT mapped to the genome (i.e., 10.01%). This is usually due to low quality reads or reads have missing base calling information (i.e., “B” in the quality stream). The rest of the reads for X1 and X2 were mapped as indicated. Gap estimation: according to the mapping software, the gap between pair-end reads is 360 ± 20 bp. The distribution percentiles are 345 (25%), 360 (50%), and 375 (75%). The set of6 and to the NCBI’s map of RefSeq and candidate Drosophila genes7.
Reads were filtered using a minimum mapping quality of 20 (MAPQ). Variant calling was performed using SamTools (Li et al., 2009 (link)) and BcfTools. When using individual calls without base alignment quality (BAQ) model, (Li, 2011 (link)) a total of 1,036,435 homozygous SNPs were detected. Using multi-sample calling methods and BAQ model, (Li, 2011 (link)) the number of homozygous SNPs was reduced to 204,250. Variant annotation and filtering was performed using the software SnpEff (Cingolani et al., Fly, in press) and SnpSift, described below.