Sodium bisulfite converts unmethylated cytosines in DNA molecules to uracils, which are read out as thymines during sequencing. However depending on the treatment time and/or experimental conditions, the conversion may not be complete, leaving certain unmethylated cytosines as C’s. The bisulfite conversion rate, defined as the rate at which unmethylated cytosines in the sample appear as T’s in the sequenced reads, is an important measure of the quality of a WGBS experiment. Estimating bisulfite conversion rate requires a priori knowledge of the methylation status on at least a portion of the cytosines in the sample. One typical technique is to spike in some DNA that is known to be unmethylated, such as a Lambda virus, when preparing sequencing libraries. Alternatively, one may use other unmethylated cytosines, such as the those in chloroplast DNA of plants or mitochondrial DNA of humans [32] (link). We count the number of converted reads (containing T’s) and the total number of reads covering those unmethylated cytosines. The ratio of converted reads to all reads gives the estimates of the bisulfite conversion rate. The method is implemented in the bsrate program.
Free full text: Click here