To illustrate the application of these ideas to a real data set we reanalysed a study of the gut microbiomes of twins and their mothers [27] (link). These comprised faecal samples from 154 different individuals characterised by family and body mass index – ‘Lean’, ‘Obese’ and ‘Overweight’. Each individual was sampled at two time points approximately two months apart. The V2 hypervariable region of the 16S rRNA gene was amplified by PCR and then sequenced using 454. We reanalysed this data set filtering the reads, denoising and removing chimeras using the AmpliconNoise pipeline [10] (link), [11] . Denoised reads were then classified to the genus level using the RDP stand-alone classifier [5] (link). This gave a total of 570,851 reads split over 278 samples since of the 308 possible some failed to possess any reads following filtering. The size of individual samples varied from just 53 to 10,585 with a median of 1,599. A total of 129 different genera were observed with a genera diversity per sample that varied from just 12 to 50 with a median of 28. One extra category ‘Unknown’ was used for those reads that failed to be classified with greater than 50% bootstrap certainty. We will refer to this as the ‘Twins’ data set.
Free full text: Click here