Almost all ChIP-seq analysis programs have been designed and optimized for eukaryotic ChIP-seq data and, in our experience, do not perform well with bacterial ChIP-seq data. We have generated custom Python scripts to identify peaks in bacterial ChIP-seq data. First, all datasets were normalized to 100 million reads. Pairs of replicate datasets were considered together. For each replicate dataset in the pair, an appropriate threshold was determined. The plus and minus strands were considered separately. For the first replicate, for a given strand, a value T1 was selected as the threshold. For the second replicate, a value T2 was selected as the threshold. Values for T1 and T2 were considered between 1 and 1000. For each combination of values for T1 and T2, the number of genome positions with values ≥T1 in the first replicate and with values ≥T2 in the second replicate was determined. The false discovery rate was estimated using the null hypothesis that no regions are enriched. The combination of thresholds yielding the highest number of true positive positions, with an estimated false discovery rate of less than 0.01, was selected. Once T1 and T2 were chosen, peak calling was performed as previously described (Supplementary Material of [54] (link)). Briefly, a region was identified as a peak if both replicates showed enrichment above the corresponding thresholds for each strand. For a peak to be called there must be a peak on the plus strand within a threshold distance of a peak on the minus strand, as previously described (Supplementary Material of [54] (link)). To identify regions of artifactual enrichment, peaks identified in tagged strains were compared to those called in a control ChIP-seq experiment using an untagged strain (DMF35). For each factor, the calculated T values were adjusted to reflect the total number of reads in control experiment replicates and then applied for peak calling in the controls. Any regions for which a peak was called in the true ChIP-seq experiment and in the untagged control experiment within 50 bp of each other were considered potential artifacts and excluded from further analysis.
Custom ChIP-seq Peak Calling for Bacteria
Almost all ChIP-seq analysis programs have been designed and optimized for eukaryotic ChIP-seq data and, in our experience, do not perform well with bacterial ChIP-seq data. We have generated custom Python scripts to identify peaks in bacterial ChIP-seq data. First, all datasets were normalized to 100 million reads. Pairs of replicate datasets were considered together. For each replicate dataset in the pair, an appropriate threshold was determined. The plus and minus strands were considered separately. For the first replicate, for a given strand, a value T1 was selected as the threshold. For the second replicate, a value T2 was selected as the threshold. Values for T1 and T2 were considered between 1 and 1000. For each combination of values for T1 and T2, the number of genome positions with values ≥T1 in the first replicate and with values ≥T2 in the second replicate was determined. The false discovery rate was estimated using the null hypothesis that no regions are enriched. The combination of thresholds yielding the highest number of true positive positions, with an estimated false discovery rate of less than 0.01, was selected. Once T1 and T2 were chosen, peak calling was performed as previously described (Supplementary Material of [54] (link)). Briefly, a region was identified as a peak if both replicates showed enrichment above the corresponding thresholds for each strand. For a peak to be called there must be a peak on the plus strand within a threshold distance of a peak on the minus strand, as previously described (Supplementary Material of [54] (link)). To identify regions of artifactual enrichment, peaks identified in tagged strains were compared to those called in a control ChIP-seq experiment using an untagged strain (DMF35). For each factor, the calculated T values were adjusted to reflect the total number of reads in control experiment replicates and then applied for peak calling in the controls. Any regions for which a peak was called in the true ChIP-seq experiment and in the untagged control experiment within 50 bp of each other were considered potential artifacts and excluded from further analysis.
Corresponding Organization : New York State Department of Health
Protocol cited in 9 other protocols
Variable analysis
- Presence or absence of tagged protein (in the ChIP-seq experiment)
- Enrichment of genomic regions above specified thresholds (T1 and T2) in the ChIP-seq experiment
- Genome sequence and annotation (MG1655 genome, NC_000913.2)
- Mapping of sequencing reads to the reference genome using CLC Genomics Workbench
- Normalization of all ChIP-seq datasets to 100 million reads
- Considering plus and minus strands separately for peak calling
- Adjusting thresholds (T1 and T2) to maintain a false discovery rate (FDR) of less than 0.01
- Comparing peaks identified in tagged strains to those in an untagged control strain (DMF35) to exclude potential artifacts
- Not explicitly mentioned
- ChIP-seq experiment using an untagged strain (DMF35)
Annotations
Based on most similar protocols
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required
Revolutionizing how scientists
search and build protocols!