FeGenie is implemented in Python 3, with three required dependencies:
HMMER v. 3.2.1 (Johnson et al., 2010 (
link)),
BLASTp v. 2.7.1 (Madden, 2013 ), and
Prodigal v. 2.6.3 (Hyatt et al., 2010 (
link)). External installation of these dependencies is not required if FeGenie is configured using Conda
2. There are two optional dependencies, which must be installed externally:
R (R Core Team, 2013 ) and
Rscript (R Core Team, 2013 ). R packages used in FeGenie include
argparse (Davis, 2018 ),
ggplot2 (Wickham, 2009 ),
ggdendro (de Vries and Ripley, 2016 ),
reshape (Wickham, 2007 (
link)),
reshape2 (Wickham, 2007 (
link)),
grid (R Core Team, 2013 ),
ggpubr (Kassambara, 2017 ),
tidyverse (Wickham, 2017 ), and
Pvclust (Suzuki and Shimodaira, 2006 (
link)); users need to install these packages independently using
Rscript (detailed instructions on this are available within the FeGenie Wiki
3). The overall workflow of FeGenie is outlined in
Figure 3. User-provided input to this program includes a folder of genomes or metagenomes, which must all be in FASTA format, comprised of contigs or scaffolds. Users can also submit amino acid gene sequences in FASTA or GenBank format. First,
Prodigal (Hyatt et al., 2010 (
link)) is used to predict open-reading frames (ORFs). A custom library of profile HMMs (library described in section “HMM Development: Building and Calibrating HMMs”) is then queried against these ORFs using
hmmsearch (Johnson et al., 2010 (
link)), with custom bit score cutoffs for each HMM. Additionally, genes shown to be involved in dissimilatory iron reduction but lacking sufficient homologs in public repositories (precluding us from building reliable HMMs) are queried against the user-provided dataset using
BLASTp (Madden, 2013 ) with a default e-value cutoff of 1E-10. These genes include the S-layer proteins implicated in iron reduction in
Thermincola potens JR (Carlson et al., 2012 (
link)), as well as porin-cytochrome encoding operons implicated in iron reduction in
Geobacter spp. (Shi et al., 2014 (
link)). The results of
hmmsearch (Johnson et al., 2010 (
link)) and
BLAST (Madden, 2013 ) are then analyzed and candidate gene neighborhoods identified. Potential for dissimilatory iron oxidation and reduction is determined based on a set of rules that are summarized in
Supplementary Table S3. Even though the sensitivity of each HMM has been calibrated against NCBI’s nr database (see section “HMM Development: Building and Calibrating HMMs
” for details on the calibration process), we recommend that users take advantage of an optional cross-validation feature of the program that allows users to search each FeGenie-identified putative iron gene against a user chosen database of reference proteins (e.g., NCBI’s nr, RefSeq). Based on these analyses, FeGenie outputs the following files:
Garber A.I., Nealson K.H., Okamoto A., McAllister S.M., Chan C.S., Barco R.A, & Merino N. (2020). FeGenie: A Comprehensive Tool for the Identification of Iron Genes and Iron Gene Neighborhoods in Genome and Metagenome Assemblies. Frontiers in Microbiology, 11, 37.