Cistrome DB data quality controls include six metrics, representing DNA sequencing quality, ChIP quality, and genomic distribution characteristics. Read quality is based on the median FASTQ read quality, mapping quality is measured by the percentage of reads that each map to a unique genomic locus, and the PCR bottleneck coefficient (PBC) is used to estimate the rate of read duplication through PCR amplification (27 (link),28 (link)). The fraction of non-mitochondrial reads in peak regions (FRiP) and the number of peaks with 10-fold enrichment are used to reflect the quality of the ChIP experiment (27 (link),28 (link)). A union of DNase hypersensitive sites (Union DHS) was summarized using a large collection of DNase-seq samples from the Cistrome DB (19 (link),29 (link)). The percentage of peaks that overlap with the union of DHS sites is used to characterize the data quality based on the genomic distribution of the peaks. Although most TFs and chromatin associated factors tend to bind at DHS sites, some histone marks and factors do not follow this trend. Cutoffs were determined based on the distribution of these quality control metrics in the Cistrome DB (22 (link)), and a red dot indicates data with lower quality on a metric while a green dot indicates higher quality of a sample (Figure
Uniformly Processed Cistrome DB Data
Cistrome DB data quality controls include six metrics, representing DNA sequencing quality, ChIP quality, and genomic distribution characteristics. Read quality is based on the median FASTQ read quality, mapping quality is measured by the percentage of reads that each map to a unique genomic locus, and the PCR bottleneck coefficient (PBC) is used to estimate the rate of read duplication through PCR amplification (27 (link),28 (link)). The fraction of non-mitochondrial reads in peak regions (FRiP) and the number of peaks with 10-fold enrichment are used to reflect the quality of the ChIP experiment (27 (link),28 (link)). A union of DNase hypersensitive sites (Union DHS) was summarized using a large collection of DNase-seq samples from the Cistrome DB (19 (link),29 (link)). The percentage of peaks that overlap with the union of DHS sites is used to characterize the data quality based on the genomic distribution of the peaks. Although most TFs and chromatin associated factors tend to bind at DHS sites, some histone marks and factors do not follow this trend. Cutoffs were determined based on the distribution of these quality control metrics in the Cistrome DB (22 (link)), and a red dot indicates data with lower quality on a metric while a green dot indicates higher quality of a sample (Figure
Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link:
Access Free Full Text.
Corresponding Organization : Dana-Farber Cancer Institute
Other organizations : Harvard University
Protocol cited in 121 other protocols
Variable analysis
- Raw DNA sequence data for each sample was downloaded and uniformly processed by the ChiLin pipeline, which uses BWA to map reads to the hg38 or mm10 genomes and MACS2 to identify statistically significant peaks.
- Motif scanning was also performed on transcription factor or chromatin regulator ChIP-seq samples based on enrichment of the motif sequence relative to the center of the peaks.
- Target genes were predicted from ChIP-seq peaks using the regulatory potential model which weighs the impact of each peak by exponential decay of distance to gene transcription start site (TSS).
- Statistically significant peaks identified by MACS2.
- Enrichment of motif sequence relative to the center of the peaks.
- Impact of each peak on target gene prediction based on distance to gene transcription start site (TSS).
- DNA sequencing quality metrics (median FASTQ read quality, percentage of reads that map to a unique genomic locus, PCR bottleneck coefficient (PBC)).
- ChIP quality metrics (fraction of non-mitochondrial reads in peak regions (FRiP), number of peaks with 10-fold enrichment).
- Genomic distribution characteristics (percentage of peaks that overlap with the union of DNase hypersensitive sites (Union DHS)).
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!