In multivariate analyses such as PCA, large differences in variances between columns are corrected by standardizing each column; i.e. dividing each column by its standard deviation. Thus each column will have the same weight in the multivariate analysis. For OTU abundance tables, such a procedure is inappropriate as the disparities in column sums can be 100-fold. Methods based on chi-squared distances rather than variances deal with this by comparing weighted column profiles [62] , computed as relative abundances for each OTU within a column, with the overall column sum retained as a weighting factor. However, chi-square distances are sums of squares and can be overly sensitive to outliers and sequencing “jackpot” effects such as those occurring in pyrosequencing data [63] (link). Bray-Curtis distances can be a useful alternative, as it is based on the distance between profiles, as long as the differences in actual column sums are also accounted for in the final study. The other approach to the problem of disparities between column sums has been to subsample the over-abundant columns down to the same number as the smaller ones. However this results in a loss of information, rarely an optimal procedure in statistical contexts. This subsampling procedure is inspired by the popular idea of rarefaction in coverage studies first invented by Sanders [64] , but has yet to be proved beneficial for all microbial community structures. The parallels between gene expression microarray analyses and microbial abundance analyses was mentioned in [65] (link), which proposed several expression-inspired strategies for robustifying abundance measurements. The main points were that rankings and thresholding are important in the presence of noise and high variability in sequence depths. As in gene expression analysis filtering the OTUs is beneficial, especially in the latter multiple testing adjustments. The phyloseq package enables easy filtering and rank transformations in the same vein as robust multi-array averaging (rma) [66] (link). We provide further details in (McMurdie and Holmes, [67] ).
Full text: Click here