Comparative Proteomics for Investigating Host Adaptation in Salmonella

We have designed this approach with whole proteome analyses in mind, so as a proof-of-concept, we applied the approach to the study of host adaptation in Salmonella enterica, using S. Enteritidis and S. Gallinarum as a specific example. For a summary of the workflow of this portion of the analysis, see Supplementary Figure 1. Genomes for Salmonella Enteritidis str. P125109 (AM933172.1) and Salmonella Gallinarum str. 287/91 (AM933173.1) were retrieved. Custom gene models were constructed by searching each S. Enteritidis protein coding gene against the Uniref90 database. Profiles were built using sequences showing 40% or greater sequence identity to the original query protein.
In order to assess whether models built from a small number of sequences performed well, we tested the performance of models using proteins from the humsavar database (http://www.uniprot.org/docs/humsavar), which catalogs human polymorphisms and disease variants for a wide range of proteins which have different levels of representation in Uniprot90, and therefore result in profile models built from varying numbers of sequences. We built custom models for each protein in the database, then separated the models into classes based on both number of sequences and effective sequence number. We then computed AUC values for predictions made on variant data from each class. Results from this test can be found in Supplementary Table S2. We saw no classes where custom models performed worse than Pfam models, so we did not filter models built from few sequences. We did, however, use Pfam models for those proteins with no homologs in Uniref90 rather than build a model from the query sequence, as we felt this could bias results towards the reference species.
To assess the quality of our analysis, we wished to compare our results to those of Nuccio and Bäumler (2014) (link), so we used the ortholog calls provided in their supplementary material. Genes represented by a custom model were scored using the appropriate model (n = 3154), all others were scored using Pfam domain models. hmmsearch was used for scoring. If hits to multiple Pfam models occurred, any overlapping hits were competed based on E-value. Orthologs with incompatible Pfam domain architectures (n = 32) were excluded from scoring, but counted as hypothetically attenuated coding sequences (HACs) in Figure 2A if they involved a loss of domain in one serovar. We anticipate that the variance of delta-bitscores will increase with evolutionary distance, so rather than establish a fixed scoring cutoff for HACs, we identify loss-of-function mutations using an empirical distribution. We set a score threshold at which 2.5% of genes on the least dispersed side of the distribution would be called as HACs. If the two bacteria show an equal rate of protein function loss, this would result in 5% of orthologous proteins being called as HACs, however if one proteome shows a greater degree of functional degradation, this will result in a greater proportion of orthologs being classified as HACs.
Functional classification was performed using pathways from the KEGG database (Kanehisa et al., 2016 (link)). We grouped genes into four categories: those present in a pathway but with no ortholog in the other serovar; genes identified as hypothetically disrupted coding sequences (HDCs) by Nuccio and Bäumler (2014) (link); genes identified by our DBS method as HACs, but not as HDCs; and finally genes not identified as non-functional by either method. dN/dS values were calculated using PAML (Yang, 1997 (link)), and for the comparison of DBS and dN/dS, genes were filtered for those with dN > 0 and dS > 0.0001. Correlations between measures were computed using a Spearman’s rho statistic (R package cor).
In our investigation of multiple S. enterica isolates, for all orthologous groups with a gene present in S. Enteritidis, scores were collated, and if individual scores were significantly different to the median score for all isolates, we identified these proteins as HACs. Significant difference was calculated in a similar way to the pairwise comparisons, with the score corresponding to the most extreme 2.5% of delta-bitscores on the least dispersed side of an empirical distribution being used as the cutoff.

Free full text: Click here

Wheeler N.E., Barquist L., Kingsley R.A, & Gardner P.P. (2016). A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes. Bioinformatics, 32(23), 3566-3574.

Publication 2016

A genes Bacteria Coding sequences Evolutionary Felt Gene Genomes Host adaptation Human Mutations Polymorphisms Protein sequence Proteins Proteins variants Proteome Salmonella Salmonella enterica Salmonella enteritidis Than

Corresponding Organization : University of Canterbury

Other organizations : Norwich Research Park, Quadram Institute, Wellcome Sanger Institute

Top 5 similar protocols

Protocol cited in 7 other protocols

Variable analysis

independent variables

Salmonella enterica serovar Enteritidis str. P125109 (AM933172.1)
Salmonella enterica serovar Gallinarum str. 287/91 (AM933173.1)

dependent variables

Host adaptation in Salmonella enterica
Protein function loss or disruption between the two serovars
Percentage of orthologous proteins classified as hypothetically attenuated coding sequences (HACs)
Functional classification of genes based on KEGG pathways
DN/dS values for comparison with delta-bitscores

control variables

Custom gene models constructed by searching each S. Enteritidis protein coding gene against the Uniref90 database
Pfam domain models used for proteins with no homologs in Uniref90
Ortholog calls provided in the supplementary material of Nuccio and Bäumler (2014) study
KEGG pathways for functional classification of genes
PAML used for calculating dN/dS values

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!