Identifying Core Genome Orthologs in E. coli/Shigella
A preliminary set of orthologs was defined by identifying unique pairwise reciprocal best hits, with at least 80% similarity (∼85% identity) in amino acid sequence and less than 20% difference in protein length. The analysis of orthology was made for every pair of E. coli/Shigella genomes. The core genome, consisting of genes ubiquitously found among all strains of the species, was defined as the intersection of pairwise lists. For every pair of genomes this list of persistent orthologs was then supplemented, with attention to conservation of gene order. Because (i) few rearrangements are observed at these short evolutionary distances, and (ii) horizontal gene transfer is frequent, genes outside conserved blocks of synteny are likely to be xenologs or paralogs. Hence, we combined the homology analysis (protein sequence similarity ≥80%, ≤20% difference in protein length) with the classification of these genes as either syntenic or nonsyntenic, for positional orthology determination. The analysis was made for every pair of E. coli/Shigella genomes. The definitive list of orthologs of the pan-genome was then defined as the union of pairwise lists. A syntenic block was defined as a set of consecutive pairs of genes in the core genome. Conserved order gene blocks are obtained by comparison of the localisation of best bi-directional hit pairs in the core genome, adopting a window size of one gap. These lists were also used to perform gene accumulation curves using R, which describe the number of new genes and genes in common, with the addition of new comparative genomes (Figure 1). The procedure was repeated 1000 times by randomly modifying genome insertion order to obtain median and quartiles.
Other organizations :
Sorbonne Université, Institut Pasteur, Centre National de la Recherche Scientifique, Délégation Paris 7, Inserm, Université Paris Cité, Genoscope, Commissariat à l'Énergie Atomique et aux Énergies Alternatives, Délégation Paris 5, Hôpital Robert-Debré, Assistance Publique – Hôpitaux de Paris, Mathématiques et Informatique Appliquées du Génome à l'Environnement, University of Minnesota, Veterans Health Administration, Université Joseph Fourier, Université Grenoble Alpes
Percentage of similarity in amino acid sequence (≥80%)
Difference in protein length (≤20%)
Syntenic or non-syntenic classification of genes
dependent variables
Identification of unique pairwise reciprocal best hits (orthologs)
Composition of the core genome (genes ubiquitously found among all strains)
Composition of the pan-genome (definitive list of orthologs)
control variables
Evolutionary distance between E. coli and Shigella genomes (short)
Frequency of horizontal gene transfer
controls
Positive control: Pairwise reciprocal best hits with ≥80% similarity and ≤20% difference in protein length
Negative control: Genes outside conserved blocks of synteny (likely xenologs or paralogs)
Annotations
Based on most similar protocols
Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.
As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.
About PubCompare
Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.
We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.
However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.
Ready to
get started?
Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required