The NSCLC selector was validated
in silico using an independent cohort of lung adenocarcinomas
20 (link) (
Fig. 1c). To assess statistical significance, we analyzed the same cohort using 10,000 random selectors sampled from the exome, each with an identical size distribution to the CAPP-Seq NSCLC selector. The performance of random selectors had a normal distribution, and p-values were calculated accordingly. Of note, all identified somatic lesions were considered in this analysis.
Related to
Fig. 1d, the probability
P of recovering at least two reads of a single mutant allele in plasma for a given depth and detection limit was modeled by a binomial distribution. Given
P, the probability of detecting all identified tumor mutations in plasma (e.g., median of 4 for CAPP-Seq) was modeled by a geometric distribution. Estimates are based on 250 million 100 bp reads per lane (e.g., using an Illumina HiSeq 2000 platform). Moreover, an on-target rate of 60% was assumed for CAPP-Seq and WES.
To evaluate the impact of reporter number on tumor burden estimates, we performed Monte Carlo sampling (1,000x), varying the number of reporters available {1,2,…,
max n} in two spiking experiments (
Fig. 2g–i and
Supplemental Fig. 4).
To assess the significance of tumor burden estimates in plasma DNA using SNVs, we compared patient-specific SNV frequencies to the null distribution of selector-wide background alleles. Indels were analyzed separately using mutation-specific background rates and Z statistics. Fusion breakpoints were considered significant when present with >0 read support due to their ultra-low false detection rate.
For each patient, we calculated a
ctDNA detection index (akin to a false positive rate) based on p-value integration from his or her array of reporters (
Table 1 and
Supplementary Table 4). Specifically, for cases where only a single reporter type was present in a patient’s tumor, the corresponding p-value was used. If SNV and indel reporters were detected, and if each independently had a p-value <0.1, we combined their respective p-values using Fisher’s method
43 . Otherwise, given the prioritization of SNVs in the selector design, the SNV p-value was used. If a fusion breakpoint identified in a tumor sample (i.e., involving
ROS1,
ALK, or
RET) was recovered in plasma DNA from the same patient, it trumped all other mutation types, and its p-value (~0) was used. If a fusion detected in the tumor was not found in corresponding plasma (potentially due to hybridization inefficiency; see
Supplementary Methods), the p-value for any remaining mutation type(s) was used. The ctDNA detection index was considered
significant if the metric was ≤0.05 (≈FPR ≤5%), the threshold that maximized CAPP-Seq sensitivity and specificity in ROC analyses (determined by Euclidean distance to a perfect classifier; i.e., TPR = 1 and FPR = 0;
Fig. 3,
Fig. 4,
Table 1, and
Supplementary Table 4).
Additional details are presented in the
Supplementary Methods.
Newman A.M., Bratman S.V., To J., Wynne J.F., Eclov N.C., Modlin L.A., Liu C.L., Neal J.W., Wakelee H.A., Merritt R.E., Shrager J.B., Loo BW J.r., Alizadeh A.A, & Diehn M. (2014). An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nature medicine, 20(5), 548-554.