The largest database of trusted experimental protocols

Genome, Archaeal

Genomes of the domain Archaea, a group of single-celled microorganisms.
Archaeal genomes display a unique combination of eukaryotic and prokaryotic features, making them a fasinating subject of study.
Explore the diversity and evolution of these ancient life forms using PubCompare.ai's AI-driven optimization platform.
Quickly identify the most effective research protocols from literature, pre-prints, and patents to advance your archaeal genomics work.
Expereinece the power of PubCompare.ai today.

Most cited protocols related to «Genome, Archaeal»

The HiSeq and MiSeq metagenomes were built using 20 sets of bacterial whole-genome shotgun reads. These reads were found either as part of the GAGE-B project [21 (link)] or in the NCBI Sequence Read Archive. Each metagenome contains sequences from ten genomes (Additional file 1: Table S1). For both the 10,000 and 10 million read samples of each of these metagenomes, 10% of their sequences were selected from each of the ten component genome data sets (i.e., each genome had equal sequence abundance). All sequences were trimmed to remove low quality bases and adapter sequences.
The composition of these two metagenomes poses certain challenges to our classifiers. For example, Pelosinus fermentans, found in our HiSeq metagenome, cannot be correctly identified at the genus level by Kraken (or any of the other previously described classifiers), because there are no Pelosinus genomes in the RefSeq complete genomes database; however, there are seven such genomes in Kraken-GB’s database, including six strains of P. fermentans. Similarly, in our MiSeq metagenome, Proteus vulgaris is often classified incorrectly at the genus level because the only Proteus genome in Kraken’s database is a single Proteus mirabilis genome. Five more Proteus genomes are present in Kraken-GB’s database, allowing Kraken-GB to classify reads better from that genus. In addition, the MiSeq metagenome contains five genomes from the Enterobacteriaceae family (Citrobacter, Enterobacter, Klebsiella, Proteus and Salmonella). The high sequence similarity between the genera in this family can make distinguishing between genera difficult for any classifier.
The simBA-5 metagenome was created by simulating reads from the set of complete bacterial and archaeal genomes in RefSeq. Replicons from those genomes were used if they were associated with a taxon that had an entry associated with the genus rank, resulting in a set of replicons from 607 genera. We then used the Mason read simulator [22 ] with its Illumina model to produce 10 million 100-bp reads from these genomes. First we created simulated genomes for each species, using a SNP rate of 0.1% and an indel rate of 0.1% (both default parameters), from which we generated the reads. For the simulated reads, we multiplied the default mismatch and indel rates by five, resulting in an average mismatch rate of 2% (ranging from 1% at the beginning of reads to 6% at the ends) and an indel rate of 1% (0.5% insertion probability and 0.5% deletion probability). For the simBA-5 metagenome, the 10,000 read set was generated from a random sample of the 10 million read set.
Full text: Click here
Publication 2014
Bacteria Citrobacter Deletion Mutation Enterobacter Enterobacteriaceae Genome Genome, Archaeal Genome, Bacterial Genome Components INDEL Mutation Klebsiella Metagenome Pelosinus fermentans Proteus Proteus mirabilis Proteus vulgaris Replicon Salmonella Strains
In order to evaluate merging performance, we used synthetically generated data from a eukaryotic genome to allow precise evaluation of merging accuracy as well as real-world shotgun metagenome data from a prokaryotic community. These two datasets include eukaryotic, bacterial and archaeal organisms with complete reference genomes spanning a large spectrum of %GC.
We synthetically generated 20 million reads based on the Chlamydomonas reinhardtii genome (v3.0), which was retrieved from the JGI Plant Genomics Resource Phytozome (ftp://ftp.jgi-psf.org/pub/JGI_data/Chlamy/v3.0/Chlre3.fasta.gz). Synthetic reads were generated using BBMap (https://sourceforge.net/projects/bbmap/) as follows: first, reference sequences were indexed (Table 1A). Second, synthetic reads were generated (Table 1B). Third, read headers were renamed according to their known insert size, to allow subsequent grading (Table 1C). Fourth, reads were decompressed and moved to ramdisk (Table 1D).
Real-world data is comprised of shotgun metagenomic sequence data from MBARC-26, a microbial mock community consisting of 23 bacterial and 3 archaeal strains [3 (link)–10 (link)]. DNA extraction from MBARC-26, Illumina metagenome library creation, and shotgun sequencing were performed as described in [4 ], yielding 2x150 bp reads.
Reference genomes for MBARC-26 were retrieved from JGI’s IMG [11 (link)] and used for mapping as described in the following: Reference genomes were first indexed (Table 1E). Second, shotgun metagenome reads were mapped to reference sequences to a) determine insert sizes, and b) to remove reads that mapped with indels or that did not map in a properly paired orientation (Table 1F) using BBMap’s default settings. This filtering step ensured the correct determination of the insert size for each read pair for subsequent grading; insert sizes of unpaired reads cannot be determined, and reads mapped with indels yield a different insert size as calculated by mapping versus merging. Mapping was not necessary for the synthetic data as the true insert size was known a priori. The remaining shotgun metagenome reads were subsampled to 20 million read pairs (Table 1G).
Grading was performed using GradeMerge (Table 1H) to obtain the number of correctly and incorrectly merged reads. A merged read was considered correct if its length exactly matched the insert size indicated by its header. The reported percentage values and signal-to-noise ratio (SNR) are defined as:
C%=100*CP
I%=100*IP
SNR=10log10CI
, where:
Assembly quality was evaluated using raw shotgun metagenomic reads from MBARC-26 subsampled to 20 million read pairs (Table 1I). To eliminate potential impact originating from pre-processing, reads were not filtered or trimmed. Reads were merged with each tool, then both the merged and unmerged output was passed to SPAdes v. 3.8.2 [12 (link)] for assembly in metagenome mode (Table 1J). Assembled contigs were compared to the metagenome reference using QUAST v. 4.2 [13 (link)] for evaluation (Table 1K). Global and local misassemblies as defined in [13 (link)] were combined and are reported as “total misassemblies”.
Full text: Click here
Publication 2017
Archaea Bacteria Chlamydomonas reinhardtii Chlamys DNA Library Eukaryota Genome Genome, Archaeal Genome, Plant INDEL Mutation Metagenome Microbial Community Prokaryotic Cells
The genome sequencing revolution has radically altered the field of microbiology. Whole-genome sequencing for prokaryotes became a standard method of study ever since the first complete genome of free-living organism, Haemophilus influenza, was sequenced in 1995 (14 (link)). Due to the widespread use of the next generation sequencing (NGS) techniques, thousands of genomes of prokaryotic species are now available, including genomes of multiple isolates of the same species, typically human pathogens. Thus, the mere density of comparative genomic information for high interest organisms provides an opportunity to introduce a pan-genome based approach to prediction of the protein complement of a species.
The collection of prokaryotic genomes available at NCBI is growing exponentially and shows no signs of abating: as of January 2016 NCBI's assembly resource contains 57 890 genome assemblies representing 8047 species (see genome browser https://www.ncbi.nlm.nih.gov/genome/browse/, for the up-to-date information). Notably, genomes of different strains of the same species can vary considerably in size, gene content and nucleotide composition. In 2005, Tettelin et al. (15 (link)) introduced the concept of pan-genome, aiming to provide a compact description of the full complement of genes of all the strains of a species. Genes common to all pan-genome members (or to the vast majority of them) are called core genes; those present in just a few clade members are termed accessory or dispensable genes; genes specific to a particular genome (strain) are termed unique genes (16 (link)).
In PGAP we define the pan-genome of a clade at a species or higher level (17 ). To be included as a core gene for a species-level pan-genome, we require the gene to be present in the vast majority—at least 80%—of all genomes in the clade. A set of core genes gives rise to a set of core proteins. We show in Figure 1 how the number of protein clusters, for each of four well studied large clades, depends on the fraction of the clade members that contribute proteins to the cluster. There are three critical regions in this analysis: (i) unique genes, present in less than 1% of all clade members; (ii) dispensable genes, present in 1–20% of genomes; and (iii) core genes, found in at least 80% of the represented genomes. Based on our analysis, there are very few clusters appearing in at least 20% of the members of a clade but no more than 80% of the members. The use of a cutoff of 80% was chosen to capture a wide set of genes conserved within the whole clade while eliminating genes having less abundant representation. We further subject the core proteins to clustering using USearch to reduce the total number of proteins required to represent the full protein complement of the pan-genome (18 (link)). We use the representative core proteins to infer genes for homologous core proteins in a newly sequenced genome (19 ).
The notion of the pan-genome can be generalized beyond a species level and applies, in fact, to any taxonomy level (from genus to phylum to kingdom). Notably, in the pan-genomes of Archaea and Bacteria, the universally conserved ribosomal genes make a group of core genes. The main practical value of the pan-genome approach is in formulating an efficient framework for comparative analysis of large groups of closely related organisms separated by small evolutionary distances as defined by ribosomal protein markers (20 (link),21 (link)).
Publication 2016
Bacteria Biological Evolution Complement System Proteins Gene Products, Protein Genes Genes, vif Genome Genome, Archaeal Haemophilus influenzae Homo sapiens Nucleotides Pathogenicity Prokaryotic Cells Proteins Ribosomal Proteins Ribosomes SET protein, human Strains
We used three simulated metagenomic data sets consisting of 40, 132 and 596 genomes of the CAMI (Critical Assessment of Metagenome Interpretation) challenge19 (link). We downloaded the gold standard assemblies and the assignment of assembled contigs to reference genomes from data.cami-challenge.org and used this information to calculate the accuracy of reconstructed bins.
For each bin Bb of the set of predicted bins B, we determined the highest fraction in terms of nucleotides that belong to a certain genome Gg from the set of reference genomes G. Based on the sequence lengths of Bb and Gg we calculated the F1 score (equation (2)), which is the harmonic mean of precision (equation (3)) and recall (equation (4)). F1Scoreb=2PbRbPb+Rb Pb=lengthBbGglengthBb,whereg=argmaxiGlengthGiBblengthBb Rb=lengthBbGglengthGb,whereg=argmaxiGlengthGiBblengthBb
Because DAS Tool only selects bacterial and archaeal genomes, all bins that map to circular elements were removed from the evaluation. To determine how well the binning tools resolve strain variation we not only calculated F1 scores on the entire set of reference genomes but also on subsets of genomes with and without common strains in the data set. The classification of reference genomes belonging to the set of unique strains (<95% average nucleotide identity (ANI) to other genomes) or common strains (≥95% ANI) was obtained from data.cami-challenge.org.
For real metagenomics data sets where the ground truth in terms of genome composition is unknown, we estimated genome completeness based on marker genes using the lineage workflow of CheckM15 (link) and the Bacteria odb9 data set of BUSCO16 (link). Completeness and contamination of BUSCO results was calculated based on the percentage of present and duplicate marker genes per bin.
Full text: Click here
Publication 2018
Bacteria Genetic Markers Genome Genome, Archaeal Gold Mental Recall Metagenome Nucleotides Strains
The protein sets for all newly included bacterial and archaeal genomes, the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, the microsporidian Encaephalitozoon cuniculi, the thale cress Arabidopsis thaliana, and the fruit fly Drosophila melanogaster were extracted from the Genome division of the (NCBI, NIH, Bethesda). The protein sequences for the nematode Caenorhabditis elegans were from the WormPep67 database, the sequences for Homo sapiens were from the NCBI build 30.
Publication 2003
Amino Acid Sequence Arabidopsis thalianas Bacteria Caenorhabditis elegans Cuniculus Drosophila Drosophila melanogaster Genome Genome, Archaeal Homo sapiens Microspora Nematoda Proteins Saccharomyces cerevisiae Schizosaccharomyces pombe Yeasts

Most recents protocols related to «Genome, Archaeal»

We analysed 21 114 complete bacterial and archaeal genomes from NCBI RefSeq (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/, last accessed in March 2021) [46 (link)], representing 5840 species of Bacteria and 288 species of Archaea. HMM profiles were extracted from the Panther database (version 15) [47 (link)]. TnSeq data was accessed on the Fitness browser https://fit.genomics.lbl.gov/cgi-bin/myFrontPage.cgi [48 (link)].
Full text: Click here
Publication 2023
Archaea Bacteria Genome Genome, Archaeal
Developing a fully inclusive database is essential for training and testing any taxonomic classification method. Only including the highest quality genomes can give uncharacteristic advantages during benchmarks that may not be reflected in real-world applications. While Kraken2 maintains a robust standard database and a prokaryotic database, many of the genomes in the mock shotgun dataset [28 (link)] and identified in the real metagenomes were not present in either.
POSMM’s speed is dependent upon the number of models (i.e. genomes) being queried. To keep analysis within a reasonable time window and give all species with sequenced genomes equal representation without redundancy, we developed a priority system for collecting representative genomes for all species currently available in NCBI GenBank. First, the archaeal and bacterial assembly summaries were downloaded from the RefSeq release FTP site (https://ftp.ncbi.nlm.nih.gov/refseq/release/). Taxid numbers were used to isolate unique species and then a representative genome for each species was obtained. Using the NCBI RefSeq terminology, included with the assembly summary, we selected ‘Reference’ genomes where available, otherwise ‘Representative’ genomes. When neither of Reference and Representative genome was available, the decision was based on the assembly level in the order of ‘complete’, ‘chromosome’, ‘scaffold’, and finally ‘contig’. Species with only a partial representation of their genomes were not included in our custom database. In the event of a tie, one genome was randomly chosen using a random number generator from the Python standard library.
The custom database is comprised of genomes of 29,870 unique species (Additional file 1: Table 1). These genomes represent various quality levels; partial genome assemblies were not included. Because of variable quality of the genome assemblies, each genome was subjected to filtering for potentially extraneous sequences. The same type of genome set can be downloaded using the POSMM “–runmode setup” and “–gtype bacteria/archaea” parameters.
Furthermore, two additional recently developed programs KrakenUniq [29 (link)] and Kaiju [11 (link)] were assessed against Kraken2, POSMM, and the hybrid of Kraken2 and POSMM, on the simulated metagenomes. For KrakenUniq, a custom database of bacterial and archaeal genomes was built. For Kaiju, the NCBI RefSeq database was used.
Full text: Click here
Publication 2023
Archaea Bacteria Chromosomes DNA Library Genome Genome, Archaeal Hybrids Inclusion Bodies Metagenome Prokaryotic Cells Python
We combined prokaryotic genome sequences from two resources to create a panel of reference genomes across which we determined the phylogenetic and taxonomic distribution of ARGs. The first data set consisted of 154,723 metagenome-assembled genomes reconstructed from the same human microbiome samples that we obtained metagenome assemblies for26 (link). Of these reconstructed genomes 70,178 were labeled as ‘high-quality’ in the original study, based on >90% completeness and < 0.5% strain heterogeneity. The second data set consists of 152,497 bacterial and archaeal genomes from NCBI RefSeq accessed on 19 April 2019. These genomes included representatives from the principal phyla found in the human gut microbiome although were dominated by Proteobacteria (Proteobacteria: 83,445; Firmicutes: 44,484; Actinobacteria: 16,529 and Bacteroidetes: 3563, Others: 3634).
Genome sequences from the two sources were clustered into species-level bins (SGBs) based on 5% average nucleotide identity (ANI) radius according to the method described in Pasolli et al. (2019). The list of reconstructed genomes used in this study and their mapping to SGBs and full-rank taxonomy are provided in Supplementary Data 3. The list of RefSeq genome accession numbers used in this study and their mapping to SGBs and full-rank taxonomy are provided in Supplementary Data 4.
Full text: Click here
Publication 2023
Actinomycetes Bacteria Bacteroidetes Firmicutes Gastrointestinal Microbiome Genetic Heterogeneity Genome Genome, Archaeal Homo sapiens Human Microbiome Metagenome Nucleotides Prokaryotic Cells Proteobacteria Radius Simpson-Golabi-Behmel Syndrome, Type 1 Strains
The 30,238 bacterial and 1,672 archaeal genomes from the Genome Taxonomy Database (GTDB), release 05-RS95 (17 July 2020), were downloaded with the taxonomy and predicted protein sequences (33 (link), 34 (link)).
Full text: Click here
Publication 2023
Amino Acid Sequence Bacteria Genome, Archaeal
Protein sequences were functionally annotated based on the accession number of their best Hmmsearch, version 3.3 (E value cutoff of 0.001), match against the KOfam database (downloaded on 18 February 2020) (53 (link), 54 (link)). Domains were predicted using the same Hmmsearch procedure against the Pfam database, version 33.0 (55 (link)). SIGNALP, version 5.0, was run to predict the putative cellular localization of the proteins using the parameters -org arch in archaeal genomes and -org gram+ in bacterial genomes (56 (link)). Prediction of transmembrane helices in proteins was performed using TMHMM, version 2.0 (default parameters) (57 (link)).
Full text: Click here
Publication 2023
Amino Acid Sequence Cells Genome, Archaeal Genome, Bacterial Helix (Snails) Proteins

Top products related to «Genome, Archaeal»

Sourced in United States
The JetQuick DNA purification kit is a laboratory equipment product designed for the rapid and efficient extraction and purification of DNA from various biological samples. The kit utilizes a proprietary column-based method to isolate and concentrate DNA, while removing contaminants and inhibitors. The core function of this product is to provide a reliable and streamlined process for obtaining high-quality DNA samples suitable for downstream applications, such as PCR, sequencing, and molecular analysis.
Goldstar Red Taq polymerase is a thermostable DNA polymerase used for PCR amplification. It has 5'-3' DNA polymerase and 5'-3' exonuclease activities.
The M35-A X-OMAT processor is a piece of lab equipment designed for automated film processing. It is used to develop and fix photographic film. The machine handles the various chemical baths and washing required to produce processed film.
Sourced in United States, Gabon, China
The E.Z.N.A.® Stool DNA Kit is a laboratory product designed for the extraction and purification of DNA from stool samples. It utilizes a silica-based membrane technology to efficiently recover DNA from a variety of stool sources.
Sourced in United States, China, Germany, United Kingdom, Hong Kong, Canada, Switzerland, Australia, France, Japan, Italy, Sweden, Denmark, Cameroon, Spain, India, Netherlands, Belgium, Norway, Singapore, Brazil
The HiSeq 2000 is a high-throughput DNA sequencing system designed by Illumina. It utilizes sequencing-by-synthesis technology to generate large volumes of sequence data. The HiSeq 2000 is capable of producing up to 600 gigabases of sequence data per run.
Sourced in United States, Germany
The 454 GS-FLX sequencer is a next-generation DNA sequencing system developed by Roche. It utilizes pyrosequencing technology to enable rapid and high-throughput sequencing of DNA samples. The core function of the 454 GS-FLX sequencer is to perform DNA sequencing, providing researchers with genomic data for various applications.
Sourced in United States
Rapid-hyb buffer is a solution used in molecular biology applications for rapid hybridization of nucleic acid probes to target sequences. It is designed to facilitate efficient and fast hybridization of labeled DNA or RNA probes to complementary target sequences.
Sourced in United States, China
The Genome Analyzer is a high-throughput DNA sequencing instrument developed by Illumina. It is designed to analyze DNA sequences rapidly and accurately, providing researchers with valuable genetic information.
Sourced in United States, Germany, Canada, Lithuania, United Kingdom, China, Japan, Argentina, Brazil
RNase-free DNase I is an enzyme used to remove DNA contamination from RNA samples. It functions by selectively degrading DNA without affecting the integrity of RNA.
Sourced in United Kingdom, United States, Sweden, Japan, Germany, Canada
Hybond-N+ membrane is a nylon-based membrane used for nucleic acid transfer and immobilization in molecular biology applications. It provides a stable surface for the binding and detection of DNA, RNA, and other nucleic acid samples. The membrane is designed to offer high binding capacity and efficient capillary transfer of nucleic acids.

More about "Genome, Archaeal"

Archaeal genomes are a fascinating subject of study, displaying a unique combination of eukaryotic and prokaryotic features.
These single-celled microorganisms, belonging to the domain Archaea, offer insights into the evolution of life on our planet.
Explore the diversity of these ancient life forms using the powerful AI-driven optimization platform of PubCompare.ai.
Quickly identify the most effective research protocols from literature, pre-prints, and patents to advance your archaeal genomics work.
Leverage the latest tools and technologies, such as the JetQuick DNA purification kit, Goldstar Red Taq polymerase, M35-A X-OMAT processor, E.Z.N.A.® Stool DNA Kit, HiSeq 2000, 454 GS-FLX sequencer, Rapid-hyb buffer, Genome Analyzer, and RNase-free DNase I, to enhance your research accuracy and efficiency.
Uncover the diversity and evolution of these ancient life forms, and experience the power of PubCompare.ai's AI-driven optimization platform to streamline your archaeal genomics research.
Explore the latest advancements, identify the most effective methods, and propel your work forward with confidence.
Expereinece the power of PubCompare.ai today and take your archaeal genomics research to new heights.