Due to the overwhelming sparsity in microbiome datasets, some filtering is required in order to infer microbe-metabolite interactions. We chose to filter out microbes that appear in less than 10 samples, since these microbes don’t have enough information to infer which metabolites are co-occurring with them. In other words the mmvec model has too many degrees of freedom to perform inference on these microbes. For the cystic fibrosis study, there were 172 samples and after filtering there were 138 unique microbial taxa and 462 metabolite features. For the biocrust soils study, there were 19 samples and after filtering there were 466 unique microbial taxa and 85 metabolite features. For the murine high fat diet study, there were 434 samples and after filtering there were 902 microbes and 11978 metabolites. For the IBD dataset, there were 13920 features in the c18 LCMS dataset, 26966 features in the c8 LCMS dataset and 562 taxa. Cross validation was performed across all studies to evaluate overfitting. In the desert biocrust soils experiment, 1 sample out of 19 samples was randomly chosen to be left out for cross-validation. In all of the other studies, 10 samples were randomly chosen to be left out for cross-validation. All of the analyses can be found under https://github.com/knightlab-analyses/multiomic-cooccurences.