Only compounds with measured mass spectra were used. In silico predicted MS/MS spectra available in certain public databases [12 (link)] were not considered in our study. A merged list of InChIKeys was initially created from public and commercial datasets published by Vinaixa et al. 2016 [13 (link)]. This list was further updated with new entries and resources [16 (link),17 (link)] yielding: 9419 InChIKeys of compounds from the METLIN database [18 (link)] provided by Agilent Technologies; 399 InChIKeys from ReSpect [19 (link)]; 1171 InChIKeys from the Wiley MS for ID database provided by Herbert Oberacher; 3401 InChIKeys from the GNPS [20 (link)]; 11,009 InChIKeys from MassBank [21 (link)]; 3480 InChIKeys from mzCloud provided by Robert Mistrik (21 June 2016); 1034 InChIKeys from the HMDB [12 (link)] (downloaded on 21 June 2016); and 242,463 InChIKeys from NIST 14 provided by Stephen Stein and Dmitrii Tchekhovskoi. These InChIKey lists (which often contained duplicated entries) were merged for a total of 261,330 non-redundant InChIKey, containing 253,927 non-redundant InChIKey first-block. The InChIKey mapping was performed using the first block of the string, thus not taking into account charge or stereochemistry.
Free full text: Click here