CASAVA v1.8.2 performed demultiplexing and conversion, while the custom script, Pulse_conversion.py [13 ], extracted raw sequence reads during the sequence run at the predefined SBS cycles: 35, 50, 75 and 100.
Sequence and alignment quality metrics for each patient and PE pulse are available in Additional file 1: Table S2-S3. Variant quality metrics for 2 SE pulses are available in Additional file 1: Table S4. Briefly, the sequence reads were aligned to the whole human genome reference GRCH37 using BWA v0.7.5a [14 (link)], while selecting only aligned reads mapping to the exome, as defined by SureSelect Human All Exon V4 (Agilent), for downstream analysis using SAMtools [15 (link)], BEDTools [16 (link)] and Picard [17 ]. All coverage metrics were calculated using Chanjo v0.4 [11 ]. GATK [18 (link), 19 (link)] v2.7-2 performed realignment, base recalibration, variant identification, recalibration and genotyping. Annovar [20 (link)] v2012-10-23 annotated the variants and MIPs custom scripts ranked them according to pathogenic potential. A summary of annotations and score parameters are available in Additional file 1: Table S5. All variants overlapping dbCMMS v.1.0 were loaded into Scout v1.0 hosted at SciLifeLab for clinical evaluation.
Free full text: Click here