For protein identification of the mock community samples, a database was created using all protein sequences from the reference genomes of the organisms used in the mock communities (Supplementary TableĀ 8). The cRAP protein sequence database (http://www.thegpm.org/crap/) containing protein sequences of common laboratory contaminants was appended to the database. The final database contained 123,100 protein sequences and is available from the PRIDE repository (PXD006118). For protein identification of the soda lake mats we used the database described above. For protein identification of the human saliva metaproteomes we used the same public databases as described in Grassl et al.9 (link) as a starting point. Namely the protein sequences from the human oral microbiome database53 and the human reference protein sequences from Uniprot (UP000005640). CD-HIT was used to remove redundant sequences from the database using an identity threshold of 95%49 (link). The saliva metaproteome database contained 914,388 protein sequences and is available from the PRIDE repository (PXD006366). For peptide identification and protein inference the MS/MS spectra were searched against the databases using the Sequest HT node in Proteome Discoverer version 2.0.0.802 (Thermo Fisher Scientific) or the MaxQuant software version 1.5.5.115 (link).
Free full text: Click here