Database searching of the 50 MS .RAW files was performed in Proteome Discoverer (PD) 1.4 using SEQUEST algorithm. All 589 C. sativa protein sequences publicly available on 13 December 2018 from UniprotKB (www.uniprot.org; key word used “Cannabis sativa”) were downloaded as a FASTA file. These also included 77 sequences from the European hop Humulus lupulus, the closest relative to C. sativa [27 (link)], as well as 72 sequences from the Chinese grass Boehmeria nivea, also closely related to cannabis [27 (link)]. Because GOT sequence was not included, we retrieved it from patent WO 2011/017798 Al [28 ] and included it to the FASTA file (590 entries). The FASTA file was imported and indexed in PD 1.4. The SEQUEST algorithm was used to search the indexed FASTA file. The database searching parameters specified trypsin as the digestion enzyme and allowed for up to two missed cleavages. The precursor mass tolerance was set at 10 ppm, and fragment mass tolerance set at 0.5 Da. The peptide absolute Xcorr threshold was set at 0.4 and protein relevance threshold was set at 1.5. Carbamidomethylation (C) was set as a static modification. Oxidation (M), phosphorylation (STY), conversion from Gln to pyro-Glu (N-term Q) and Glu to pyro-Glu (N-term E), and deamination (NQ) were set as dynamic modifications. The target decoy peptide-spectrum match (PSM) validator was used to estimate false discovery rates (FDR). At the peptide level, peptide confidence value set at high was used to filter the peptide identification, and the corresponding FDR on peptide level was less than 1%. At the protein level, protein grouping was enabled.
All nLC-MS/MS files are available from the stable public repository MassIVE at the following URL: http://massive.ucsd.edu/ProteoSAFe/datasets.jsp with the accession number MSV000083191.
Free full text: Click here