Microsatellite Genotyping Harmonization Across Datasets

Among the 750 remaining microsatellites that were genotyped in the new samples, 693 had previously been genotyped in the HGDP–CEPH diversity panel [7 (link),11 (link),13 ]. For some of these loci, there was a change in primer length or position between the two studies, or a systematic change occurred in the algorithm by which allele size was determined from raw genotyping products—or both. In cases where the primers changed, allele sizes from the new dataset were adjusted by the appropriate length in order to align its list of allele sizes with the earlier list for the HGDP–CEPH dataset.
To identify systematic changes between datasets, for each locus the allele sizes of one dataset were translated by a constant and the G test statistic of independence between allele frequencies and dataset (older HGDP–CEPH dataset versus newly genotyped dataset) was then computed [23 ]. Considering all possible constants for translation of allele sizes, the one that minimized the G statistic was determined. In implementing the G test, two groups of comparisons were performed. In the first group of comparisons, the constant of translation was determined by comparing 80 Jewish individuals genotyped simultaneously with the Native Americans to all 255 individuals from Europe and the Middle East in the HGDP–CEPH H1048 dataset [109 (link)], excluding Mozabites. The second group of comparisons involved 346 Native American individuals from Central and South America in this newer dataset (all 336 sampled Central and South Americans excluding Ache, and ten additional individuals who were later excluded) and 63 Native American individuals from the Maya, Pima, and Piapoco populations in the older H1048 dataset (the Piapoco population is described as “Colombian” in previous analyses of these data). The constants expected based on the two G tests—labeled S₁ for the comparison of the Jewish populations to European and Middle Eastern populations and S₂ for the Native American comparison—were then compared with the constant of translation expected from consideration of three additional sources of information available for the two datasets: the genotypes of a Mammalian Genotyping Service size standard (S₃), a code letter provided by the Mammalian Genotyping Service indicating the nature of the change in primers (S₄), and the locations of the primers themselves in the human genome sequence (S₅).
Among the 693 markers, 687 had the same optimal constant of translation (that is, the constant that minimizes the G statistic) in the two different sets of population comparisons (S₁ = S₂). The remaining six markers with different optimal constants of translation in the two G tests were compared with the value expected from the locations of the old and new primers in the human genome (S₅). In all six cases, the optimal constant for the comparison of the Jewish and European/Middle Eastern datasets agreed with the value based on the primer locations (S₁ = S₅). As real population differences between datasets are more likely in Native Americans due to the larger overall level of genetic differentiation in the Americas, we used the constant obtained based on the Jewish and European/Middle Eastern comparison (S₁) for allele size calibration.
Of the remaining 687 markers, 638 had an optimal constant of translation that agreed with the value expected based on the code letter provided by the Mammalian Genotyping Service (S₁ = S₂ = S₄). Thus, there were 49 markers for which the code letter was either uninformative or produced a constant of translation that disagreed with S₁ and S₂. For 35 of these markers, the constant of translation based on the size standard (S₃) agreed with S₁ and S₂. For eight of the remaining 14 markers, the constant of translation based on the primer sequences (S₅) agreed with S₁ and S₂. The six markers with disagreements (AAT263P, ATT070, D15S128, D6S1021, D7S817, and TTTAT002Z), having S₁ ≠ S₅, were then discarded. For the remaining 687 markers that were not discarded, 685 had G < 48 in both G tests, while the other two markers (D14S587 and D15S822) had G > 91 in the Jewish versus European/Middle Eastern comparison. These two extreme outliers, which also had the highest G values for the Native American comparison, were then excluded (Figure S6).
To further eliminate loci with extreme genotyping errors, we performed Hardy-Weinberg tests [110 (link)] within individual populations for the 685 remaining markers. This analysis, performed using PowerMarker [111 (link)], used only the 44 populations in which all 685 markers were polymorphic. We calculated the fraction of populations with a significant p-value (<0.05) for the Hardy-Weinberg test (Figure S7). Two markers (GAAA1C11 and GATA88F08P) were extreme outliers, with more than 43% of populations producing p < 0.05. For the remaining markers, the proportion of tests significant at p < 0.05 varied from 0 to 35% without any clear outliers, and with most markers having less than 10% of tests significant at p < 0.05. Excluding the two Hardy-Weinberg outliers, 683 markers remained. Five additional markers (AGAT120, AGAT142P, D14S592, GATA135G01, and TTTA033) were excluded due to missing data: for each of these markers there was at least one population in which all genotypes were missing. Thus, 678 loci remained for the combined analysis with the HGDP–CEPH panel.

Free full text: Click here

Wang S., Lewis CM J.r., Jakobsson M., Ramachandran S., Ray N., Bedoya G., Rojas W., Parra M.V., Molina J.A., Gallo C., Mazzotti G., Poletti G., Hill K., Hurtado A.M., Labuda D., Klitz W., Barrantes R., Bortolini M.C., Salzano F.M., Petzl-Erler M.L., Tsuneto L.T., Llop E., Rothhammer F., Excoffier L., Feldman M.W., Rosenberg N.A, & Ruiz-Linares A. (2007). Genetic Variation and Population Structure in Native Americans. PLoS Genetics, 3(11), e185.

Publication 2007

Ache Allele Based sequences European Genetic differentiation Genotypes Human genome Mammalian Microsatellites Native american Pima Polymorphic Primer South americans

Corresponding Organization : University of Tarapacá

Top 5 similar protocols

Protocol cited in 16 other protocols

Variable analysis

independent variables

Loci (microsatellites) genotyped in the new samples

dependent variables

Allele sizes of the microsatellites
Allele frequencies of the microsatellites

control variables

Genotypes of a Mammalian Genotyping Service size standard
Code letter provided by the Mammalian Genotyping Service indicating the nature of the change in primers
Locations of the primers in the human genome sequence

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!