The methodological approach presented in the paper is implemented in the adegenet package [6 (link)] for the R software [27 ]. The function find.clusters runs successive K-means for a range of k values, and computes the BIC of the corresponding models. The basic K-means procedure is implemented by the function kmeans in the stats package [27 ]. DAPC is implemented as the function dapc, and relies on procedures from ade4 [55 ,59 ,60 ] and MASS [61 ] to perform PCA (dudi.pca) and DA (lda). Both find.clusters and dapc can be used with any quantitative data, and have specific implementations for genetic data. The analysis of the four simulated datasets presented in Figures 4 and 5 can be reproduced by executing the example of the dataset dapcIllus. Similarly, analyses of the extended HGDP-CEPH and of the seasonal influenza (H3N2) data can be reproduced by executing the example of the datasets eHGDP and H3N2, respectively. Documentation and support can be found at the adegenet website (http://adegenet.r-forge.r-project.org/).
Free full text: Click here