Data for testing and validation of Fast UniFrac came from four main sources: (1) a large meta-analysis of Sanger-sequencing data from a wide range of different host-associated and free-living environments (Ley et al., 2008b (link)), (2) an analysis of how gut bacterial populations change in obese humans on fat and carbohydrate restricted diets (Ley et al., 2006 (link)) (3) pyrosequencing studies of the human hand (Fierer et al., 2008 (link)), and of fecal microbiota of lean and obese twin pairs and their mothers (Turnbaugh et al., 2009 (link)), and (4) a PhyloChip study of citrus pathogens (Sagaram et al., 2009 (link)). These studies were chosen as they represent some of the largest datasets for their respective types of analyses. A reference tree was assembled from the Greengenes core set (DeSantis et al., 2006 (link)): both this tree and the PhyloChip G2 reference tree are available from the Fast UniFrac web site.