This V-plot was cross-correlated against matrices defining the fragment center and size information for a genomic region, such that the cross-correlation signal at position x along the genome is given by
where F is the matrix of fragment center and size information for fragments of size 105 to 250 bp with centers between x − 60 and x + 60 and V is the V-plot matrix. This raw signal is then normalized using a background signal that is intended to represent the expected signal from the cross-correlation, given (1) the number of fragments observed, and (2) the Tn5 sequence preference. The background signal at position x is defined as
where B represents a matrix with relative probabilities of generating fragments of different sizes and center positions such that ∑B = 1. The scaling factor ∑F, the sum of all reads in the signal matrix, ensures that the background signal represents the expected signal given the observed number of fragments. To determine B, the probability of observing individual insertion sites was first modeled as follows. Tn5 has a sequence preference across ∼21 bp that it contacts (Buenrostro et al. 2013 (link)); therefore, we developed a Position Weight Matrix (PWM) for sequence content ±10 bp from Tn5 insertion points in ATAC-seq performed on genomic DNA. Relative probabilities are calculated for each genomic position using this PWM, and then this 1D sequence preference is used to calculate the relative probability of observing particular ATAC-seq fragments (which require two Tn5 insertions) by multiplying the probabilities of the two insertions needed for that fragment with the probability of observing a fragment of that size (determined from the fragment-size distribution). The normalized nucleosome signal is given by subtracting this background signal from the cross-correlation signal:
where F is the matrix of fragment center and size information for fragments of size 105 to 250 bp with centers between x − 60 and x + 60 and V is the V-plot matrix. This raw signal is then normalized using a background signal that is intended to represent the expected signal from the cross-correlation, given (1) the number of fragments observed, and (2) the Tn5 sequence preference. The background signal at position x is defined as
where B represents a matrix with relative probabilities of generating fragments of different sizes and center positions such that ∑B = 1. The scaling factor ∑F, the sum of all reads in the signal matrix, ensures that the background signal represents the expected signal given the observed number of fragments. To determine B, the probability of observing individual insertion sites was first modeled as follows. Tn5 has a sequence preference across ∼21 bp that it contacts (Buenrostro et al. 2013 (link)); therefore, we developed a Position Weight Matrix (PWM) for sequence content ±10 bp from Tn5 insertion points in ATAC-seq performed on genomic DNA. Relative probabilities are calculated for each genomic position using this PWM, and then this 1D sequence preference is used to calculate the relative probability of observing particular ATAC-seq fragments (which require two Tn5 insertions) by multiplying the probabilities of the two insertions needed for that fragment with the probability of observing a fragment of that size (determined from the fragment-size distribution). The normalized nucleosome signal is given by subtracting this background signal from the cross-correlation signal:
Full text: Click here