To identify a sequence signature that predicts high-performing guides, we evaluated data from TKOv1 screens. From the base 90k TKOv1 library (Hart et al. 2015 (link)), we identified genes in the new CEG2 set that were targeted by six gRNAs each. gRNAs were ranked by log fold-change, and the three gRNAs with the best (most negative) fold-change were identified, as well as the worst (remaining three gRNA). Then, the frequency of each nucleotide at each position in the 20-mer guide sequence was calculated for all best guides targeting all selected genes, and the same was done for the worst guides. The worst frequency was subtracted from the best, resulting in a Δ-frequency table. This process was repeated independently for each replicate at the endpoint for six TKOv1 90k library screens (DLD1, GBM, HAP1, HCT116, RPE1, and RPE1dTP53) for a total of 16 samples.
The Δ-frequency tables were summed across the 16 samples and scaled so that the most extreme value (C18) equals one. As TKOv1 explicitly excludes gRNA with T in the last four positions, no score is discovered here; we manually set the score to −1 at these four positions. The final score table is in Table S4. To calculate the sequence score of any candidate gRNA sequence, simply add the nucleotide scores at each position of the gRNA.
The score table was evaluated against the 85k supplementary TKOv1 library, which was only applied to HCT116 and HeLa. We calculated the sequence score for all gRNA targeting essential genes, then compared the fold-change distribution of gRNA in the top quartile of scores to the gRNA in the bottom quartile. We repeated this process for the Yusa, Sabatini, and GeCKO v2 libraries.
Free full text: Click here