We conducted our tests against NBC and five other webservers in July and August of 2010. WebCARMA and MG-RAST require no parameters. Phylopythia requires the type of model to match against. MG-RAST requires an E-value cutoff under the SEED viewer (which we selected the highest). We selected default BLAST parameters for the NT database for Galaxy. For NBC, we used an Nmer size of 15 and the default 1032 organism genome-list. For CAMERA, we only retained the best top-hit organism for each read and used the ‘All Prokaryotes’ BLASTN database (and used the default parameters for the rest).
We implement the NBC approach in Rosen et al. (2008 ) that assigns each read a log-likelihood score. We introduce two functions of NBC: (i) the novice functionality and (ii) the expert functionality. We expect that most users will fit into the ‘novice’ category, which will enable them to upload their FASTA file of reads and obtain a file of summarized results matching each read to its most likely organism, given the training database. The parameters that (expert and novice) users can choose from are as follows:
Upload File: the FASTA formatted file of metagenomic reads. The webserver also accepts .zip, .gz and .tgz of several FASTA files.
Genome list: the algorithm speed depends linearly on the number of genomes that one scores against. So, if an expert user has prior knowledge about the expected microbes in the environment, he/she can select only those microbes that should be scored against. This will both speed up the computation time and reduce false positives of the algorithm.
Nmer length: the user can select different Nmer feature sizes, but it is recommended that the novice user use N = 15 since it works well for both long and short reads (Rosen et al., 2008 ).
Email: The user's email address is required so that they can be notified as to where to retrieve the results when the job is completed.
Output: For a beginner, we suggest to (i) upload a FASTA file with the metagenomic reads and (ii) enter an email address. The output is a link to a directory that contains your original upload file (renamed as userAnalysisFile.txt), the genomes that were scored against (masterGenomeList.txt) and a summary of the matches for each read (summarized_results.txt). The expert user may be particularly interested in the *.csv.gz files where he/she can analyze the ‘score distribution’ of each read more in depth.