RNA-Seq profiles are formed from n RNA samples. Let πgi be the fraction of all cDNA fragments in the i-th sample that originate from gene g. Let G denote the total number of genes, so for each sample. Let denote the coefficient of variation (CV) (standard deviation divided by mean) of πgi between the replicates i. We denote the total number of mapped reads in library i by Ni and the number that map to the g-th gene by ygi. Then

Assuming that the count ygi follows a Poisson distribution for repeated sequencing runs of the same RNA sample, a well known formula for the variance of a mixture distribution implies:

Dividing both sides by gives

The first term 1/μgi is the squared CV for the Poisson distribution and the second is the squared CV of the unobserved expression values. The total CV2 therefore is the technical CV2 with which πgi is measured plus the biological CV2 of the true πgi. In this article, we call ϕg the dispersion and the biological CV although, strictly speaking, it captures all sources of the inter-library variation between replicates, including perhaps contributions from technical causes such as library preparation as well as true biological variation between samples.