In Sects. 2.2 through 2.6 we will define a method that, given lcNGS data for two or more individuals and two alternative pedigrees relating them, computes a likelihood ratio. Combining this ratio with a prior odds yields a posterior odds which may be used to choose between the pedigrees. To assess such a method, the best alternative is of course to apply it and competing methods to large numbers of cases where the true pedigree is known in each case. Such comparisons are limited by the availability of data.
An alternative, used in this paper, is to compare competing methods on simulated data. The assessment is then divided into two tasks: Showing that the simulation model yields data that is realistic in relevant ways, and comparing methods on the simulated data. The simulation model used in this paper uses population models and inheritance models presented in Sects. 2.3 and 2.4, where the inheritance model includes linkage (i.e., effects of crossovers inside the considered pedigree). When two loci are strongly linked, their alleles will often be inherited together, as haplotypes, through the pedigree, thus strongly influencing the information about the pedigree contained in the data. This motivates why data simulation should contain linkage.
The population model includes important standard features such as kinship, however, it does not include linkage disequilibrium (LD, i.e., effects of crossovers outside of the considered pedigree). This means that the effect of LD on competing methods is not assessed. Current methods for handling LD include grouping markers together [9 (link)] or using an multiorder Markov chain [19 ]. Both ideas may be possible to combine with our approach. We have chosen to defer treatment of LD to a later paper.
Section 2.2 presents the observational model we use to simulate lcNGS data from simulated genotypes. This is a simplified model simulating only counts of reads at each locus. Section 3.1 contains a small study and an argument why we believe this observational model captures features of lcNGS data essential for relationship inference, in particular when one or more of the samples are based on small amounts of DNA.
Our likelihood method for pedigree inference uses exactly the same likelihood as the one used in data simulation. In any simulation study, when simulation is done using a particular probability distribution, it will be optimal to use the same distribution for likelihood computations. What our study illustrates is the size of the performance reduction when using a likelihood method that ignores linkage or the uncertainty in genotypes that is inherent in lcNGS data. Finally, we compare our approach with NgsRelate [18 (link)] which uses a maximum likelihood procedure to find the most likely Jacquard coefficients. NgsRelate does not account for genetic linkage between the included genetic markers.
Free full text: Click here