The effects of all bi-allelic SNPs (low, medium and high effects) on the genome were determined based on the pre-built release 7.0 annotation from the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/) using SnpEff51 (link) release 4.1l, with parameters -v -noLog -canon rice7. Using sequence ontology terms, a low-effect SNP was classified as ‘synonymous_variant’, ‘splice_region_variant’, ‘initiator_codon_variant’, ‘5_prime_UTR_premature_start_codon_gain_variant’ or ‘stop_retained_variant’. A moderate-effect SNP was identified as a ‘missense_variant’ and a high-effect SNP as a ‘start_lost’, ‘stop_gained’, ‘stop_lost’, ‘splice_donor_variant’ or ‘splice_acceptor_variant’. For indel effects, only indels with lengths that were not multiples of three were counted and SNPs overlapped with protein-coding regions (CDSs of RGAP 715 (link) genes) were considered as the most disruptive effects on genes. Results of the SNP and indel effect analysis are given in Supplementary Data 2 Tables 3, 4. We computed the SNP numbers (proportions) of rare SNPs and homozygous singletons for a ‘typical genome’ of a subpopulation as the median SNP number (proportion) of the SNPs in a given category among those genomes for that subpopulation (Supplementary Data 2 Table 5).
Free full text: Click here