Except as stated otherwise, taxonomic abundances for 16S samples were generated from filtered sequence reads using the RDP classifier [101 (link)], with confidences below 80% rebinned to 'uncertain'. For all the datasets described below, the final input for LEfSe is a matrix of relative abundances obtained from the read counts with per-sample normalization to sum to one. Witten-Bell smoothing [102 ] was used to accommodate rare types, but due to LEfSe's non-parametric approach, this has minimal effect on the discovered biomarkers and on the LDA score. This also allows our biomarker discovery method to avoid most effects of sequence quality issues as long as any sequencing biases are homogeneous among different conditions, as no specific assumptions on the statistical distribution and noise model are made by the algorithm as is standard for non-parametric approaches.
Free full text: Click here