For the virtual-tumor benchmarking data, we measured sensitivity and specificity by applying MuSE and MuTect [5 (link)] to the combination of 24 spike-in BAMs (4 different variant allele fractions × 6 distinct depths) with the same depth non-spike-in WGS BAMs. The matched-normal WGS BAM was fixed at 30× depth. We considered any missed calls from our in silico spike-in ground truth as false negatives, and any calls from the non-spike-in WGS BAMs as false positives. The denominator for the FPR calculation is the total length of the hg19 reference genome from chromosome 1 to chromosome X.
For the DREAM challenge IS1, IS2, and IS3 data, we took the organizer provided script and the truth VCF files to compute sensitivity and specificity [23 ]. We extracted the sensitivity and specificity of SomaticSniper, Strelka, and VarScan2 from the DREAM challenge leaderboards. The denominator for the FPR calculation is the total length of the hg19 reference genome from chromosome 1 to chromosome X.
For the multi-region lung adenocarcinoma data, we calculated sensitivity and the positive predictive value (PPV) based on an artificial truth set for the reason that the known validation set was extracted and compiled from the paper’s supplementary document and was biased toward Caller A. The artificial truth set included shared calls (Fig.3 c ; black in ovals 1, 2, and 3), validated calls (Fig. 3 c ; orange in oval 1), and unique-not-validated calls that helped the recognition of trunk mutations (Fig. 3 c ; red in oval 2 and blue in oval 3). Here, a trunk mutation was a somatic variant call that all tumor regions of one patient had at the same genomic locus. All the other calls were considered as false positives (Fig. 3 c ; red in five-pointed star 4 and blue in five-pointed star 5). We evaluated accuracy using the F1 and F2 scores, which were defined as
To compare the performance of multiple callers in the ACC WES data and the ICGC Pilot-63 WGS data, we also made the artificial truth sets by taking calls that were shared by at least three callers, and computed sensitivity. We regarded other calls as false positives to calculate PPVs. We calculated the F1 and F2 scores by following the same equation above.
For the DREAM challenge IS1, IS2, and IS3 data, we took the organizer provided script and the truth VCF files to compute sensitivity and specificity [23 ]. We extracted the sensitivity and specificity of SomaticSniper, Strelka, and VarScan2 from the DREAM challenge leaderboards. The denominator for the FPR calculation is the total length of the hg19 reference genome from chromosome 1 to chromosome X.
For the multi-region lung adenocarcinoma data, we calculated sensitivity and the positive predictive value (PPV) based on an artificial truth set for the reason that the known validation set was extracted and compiled from the paper’s supplementary document and was biased toward Caller A. The artificial truth set included shared calls (Fig.
To compare the performance of multiple callers in the ACC WES data and the ICGC Pilot-63 WGS data, we also made the artificial truth sets by taking calls that were shared by at least three callers, and computed sensitivity. We regarded other calls as false positives to calculate PPVs. We calculated the F1 and F2 scores by following the same equation above.
Full text: Click here