An analysis of variance was conducted to obtain the variance components which were used to calculate the heritability of seed protein and oil content. The variances of location, replications within locations, accessions, and the accession × location interaction were determined using the PROC GLM procedure of the Statistical Analysis System (SAS institute, Inc., Cary, NC). Genetic and environmental variances were extracted from the variance component estimates based on the expected mean squares. For the estimation of the heritability of seed protein and seed oil concentration, replications and locations were considered to be random effects.
The heritability of seed protein and oil concentration was defined as
hB2=σg2σg2+σe2,
where σg2 is the genetic variance among accessions, and σe2 is the environmental variance which results from error and the accession × location interaction.
To obtain the matrix of population structure, a total of 42,368 SNPs were analyzed in the 298 germplasm accessions using the Admixture program v. 1.22 [58 (link)]. The 10-fold cross-validation procedure was performed with 25 random seeding replications for K values from 2 to 30. The minimum mean standard error was when K = 17. The kinship coefficient matrix that explained the most probable identity by state of each allele between individuals was estimated with the TASSEL program [59 (link)]. For a genome-wide association study, we compared the false positive rate using the general linear model (GLM), the mixed linear model (MLM), and the compressed MLM of the TASSEL program [59 (link)]. The MLM was as good as the compressed MLM and greatly reduced the false positive rate versus the GLM. For this study, the value of 0.001 was used as a Type I error significance threshold P value. As a verification of the genome regions identified in this study we compared the genomic locations of previously reported seed protein and oil QTL with the physical positions of the markers showing significant associations in this study.
Free full text: Click here