FeGenie is implemented in Python 3, with three required dependencies: HMMER v. 3.2.1 (Johnson et al., 2010 (link)), BLASTp v. 2.7.1 (Madden, 2013 ), and Prodigal v. 2.6.3 (Hyatt et al., 2010 (link)). External installation of these dependencies is not required if FeGenie is configured using Conda2. There are two optional dependencies, which must be installed externally: R (R Core Team, 2013 ) and Rscript (R Core Team, 2013 ). R packages used in FeGenie include argparse (Davis, 2018 ), ggplot2 (Wickham, 2009 ), ggdendro (de Vries and Ripley, 2016 ), reshape (Wickham, 2007 (link)), reshape2 (Wickham, 2007 (link)), grid (R Core Team, 2013 ), ggpubr (Kassambara, 2017 ), tidyverse (Wickham, 2017 ), and Pvclust (Suzuki and Shimodaira, 2006 (link)); users need to install these packages independently using Rscript (detailed instructions on this are available within the FeGenie Wiki3). The overall workflow of FeGenie is outlined in Figure 3. User-provided input to this program includes a folder of genomes or metagenomes, which must all be in FASTA format, comprised of contigs or scaffolds. Users can also submit amino acid gene sequences in FASTA or GenBank format. First, Prodigal (Hyatt et al., 2010 (link)) is used to predict open-reading frames (ORFs). A custom library of profile HMMs (library described in section “HMM Development: Building and Calibrating HMMs”) is then queried against these ORFs using hmmsearch (Johnson et al., 2010 (link)), with custom bit score cutoffs for each HMM. Additionally, genes shown to be involved in dissimilatory iron reduction but lacking sufficient homologs in public repositories (precluding us from building reliable HMMs) are queried against the user-provided dataset using BLASTp (Madden, 2013 ) with a default e-value cutoff of 1E-10. These genes include the S-layer proteins implicated in iron reduction in Thermincola potens JR (Carlson et al., 2012 (link)), as well as porin-cytochrome encoding operons implicated in iron reduction in Geobacter spp. (Shi et al., 2014 (link)). The results of hmmsearch (Johnson et al., 2010 (link)) and BLAST (Madden, 2013 ) are then analyzed and candidate gene neighborhoods identified. Potential for dissimilatory iron oxidation and reduction is determined based on a set of rules that are summarized in Supplementary Table S3. Even though the sensitivity of each HMM has been calibrated against NCBI’s nr database (see section “HMM Development: Building and Calibrating HMMs for details on the calibration process), we recommend that users take advantage of an optional cross-validation feature of the program that allows users to search each FeGenie-identified putative iron gene against a user chosen database of reference proteins (e.g., NCBI’s nr, RefSeq). Based on these analyses, FeGenie outputs the following files:
Free full text: Click here