The libraries were sequenced using the HiSeq1000 system (Illumina, San Diego, CA, USA). The generated sequences were analyzed using eXSP, an in-house pipeline designed to automate the analysis workflow, composed of modules performing every step using the appropriate tools available to the scientific community or developed in-house [33 (link)]. Paired sequencing reads were aligned to the reference genome (UCSC, hg19 build) using BWA and sorted with SAM tools and Picard (http://picard.sourceforge.net). Post alignment processing (local realignment around insertions-deletions and base recalibration) and SNV and small insertions-deletions (ins-del) calling were performed using the Genome Analysis Toolkit (GATK) [34 (link)] with parameters adapted to the haloplex-generated sequences. The called SNV and ins-del variants produced with both platforms were annotated using ANNOVAR [35 (link)] with; the relative position in genes using RefSeq [36 (link)] gene model, amino acid change, presence in dbSNP v137 [37 (link)], frequency in NHLBI Exome Variant Server (http://evs.gs.washington.edu/EVS) and the 1000 genomes large scale projects, multiple cross-species conservation and prediction scores of damaging on protein activity [38 (link)]. The annotated variants were then imported into the internal variation database.
Free full text: Click here