The largest database of trusted experimental protocols

HGAP

HGAP (High-Throughput Genome Assembly and Polishing) is a powerful approach that leverages advanced computational techniques to assemble and refine genomic sequences from vast amounts of sequencing data.
This process enables researchers to efficiently reconstruct complete or near-complete genomes, even from complex or fragmented samples.
HGAP is particularly useful for studying the genetic makeup of diverse organisms, including microbes, plants, and animals.
By optimizing protocols and identifying the best tools and products, HGAP can streamline workflows, enhance reproducibility, and drive breakthroughs in a wide range of biological and biomedical research fields.

Most cited protocols related to «HGAP»

Genomic DNA for E. coli EC958 was prepared using the Qiagen DNeasy Blood and Tissue kit, as per manufacturer's instructions. The genome of E. coli EC958 was sequenced by generating a total of 601,224 pre-filtered reads with an average length of 1,600 bp, from six SMRT cells on a PacBio RS I sequencing instrument, using an 8–12 kilobase (kb) insert library, generating approximately 200-fold coverage (GATC Biotech AG, Germany).
De novo genome assemblies were produced using PacBio's SMRT Portal (v2.0.0) and the hierarchical genome assembly process (HGAP) [23] (link), with default settings and a seed read cut-off length of 5,000 bp to ensure accurate assembly across E. coli rRNA operons. Assemblies were performed multiple times using different combinations of between one and six SMRT cells of read data. The best assembly results were obtained with six SMRT cells which yielded approximately 547 Mb of sequence from 190,145 post-filtered reads (Table 1). The average read length was found to be 2,875 bp with an average single pass accuracy of 86.5%. During the preassembly stage 190,145 long reads were converted into 23,772 high quality, preassembled reads with an average length of 4,573 bp. Assembly of these reads returned seven contigs, three were greater than 500 kb. Furthermore, the largest contig (∼3.8 Mb) was estimated to contain 74.5% of the chromosome of EC958. For all other assemblies total contig numbers exceeded 10 (Table 1). However, for assemblies using two or three SMRT cells, assembly metrics could be improved >2-fold by reducing the seed read length (Table 1).
To determine their correct order and orientation, contigs from our six SMRT cell assembly were aligned to the complete genome of E. coli SE15 using Mauve v. 2.3.1 [24] (link). Contig ordering was confirmed by PCR. Overlapping but un-joined contigs, a characterised artefact of the HGAP assembly process [23] (link), were manually trimmed based on sequence similarity and joined. All joins were manually inspected using ACT [25] (link) and Contiguity (http://mjsull.github.io/Contiguity/).
A single contig representing the EC958 large plasmid pEC958 was identified and isolated by BLASTn comparison against the previous draft assembly of EC958 (NZ_CAFL00000000.1) [7] (link). Overlapping sequences on the 5′ and 3′ ends of the plasmid contig were then manually trimmed based on sequence similarity. Although the EC958 small plasmid (pEC958B) was too small to be assembled as part of the main assembly, 25 unassembled PacBio reads, with an average length of 2,031 bp, were found to align to the small 4,080 bp plasmid contig that had previously been assembled from 454 GS-FLX reads (emb|CAFL01000138).
To determine if reads containing unremoved adapter sequence have had an impact on the assembly of EC958 we first screened the filtered subreads for adapter sequence using BBMap version 31.40 (http://sourceforge.net/projects/bbmap/). A high level of adapter contamination would likely pose some risk of misassembly. Additionally, to eliminate the possibility that aberrant reads have resulted in the inclusion of assembly artefacts in the EC958 genome assembly, contig-ends were screened for hairpin artefacts using MUMmer version 3.23 [26] (link).
Full text: Click here
Publication 2014
A total of 390 sera were collected from two different veterinary clinic laboratories located in Apulia region, Southern Italy and tested upon request of the veterinarian practitioners after anamnesis, medical history and clinical examination. One-hundred seventy-four sera (collection A) had been collected for diagnosis of infectious diseases (FIV, FeLV, feline coronavirus (FCoV), toxoplasmosis, hemoplasmosis, bacterial and fungal infections). A total of 147/174 (84.5%) sera were submitted with a suspect/request of diagnosis for FIV/FeLV, with 42 sera being positive for retrovirus. Fisher’s exact test was performed to the collection A to evaluate the correlation between DCH positive cats and retrovirus positive cats. The significance level of the test was set at 0.05. Of the other 27 sera (15.5%), 14 were sent for a suspect/request of diagnosis for coronavirus (with 7/14 being positive), 5 for toxoplasmosis (with 2/5 being positive), 2 for giardia (with 1/2 being positive). Five sera were collected from animals with suspected bacterial/fungal infections and 1 serum (negative) was from a cat with suspected hemoplasmosis. A total of 216 sera (collection B) submitted to the laboratory for pre-surgical evaluation (n = 85) or for suspected metabolic (n = 127) or neoplastic (n = 4) disease was used for comparison to generate a baseline. Information on the sera analyzed in the study is included in Fig. 1. The study was approved by the Ethics Committee of the Department of Veterinary Medicine, University of Bari (authorization 23/2018). All experiments were performed in accordance with relevant guidelines and regulations.
Total DNA was extracted from collected sera by using QIAamp cador Pathogen Mini Kit (QIAGEN, Hilden, Germany), according to the manufacturer’s instructions. We performed sample screening using a PCR with consensus pan-hepadnavirus primers2 (link) and a PCR with primers specific for DCH3 (link). Also, we screened sera using a quantitative PCR (qPCR) designed based on the sequence of the Australian reference strain AUS/2016/Sydney (GenBank accession nr. MH307930) (Table 2). For qPCR, we calculated DCH DNA copy numbers on the basis of standard curves generated by 10-fold dilutions of a plasmid standard TOPO XL PCR containing a 1.4 kb long fragment of the polymerase region of the Australian reference strain AUS/2016/Sydney (IQ Supermix; Bio-Rad Laboratories SRL, Segrate, Italy). We added 10 μL of sample DNA or plasmid standard to the 15-μL reaction master mix (IQ Supermix; Bio-Rad Laboratories SRL, Segrate, Italy) containing 0.6 μmol/L of each primer and 0.1 μmol/L of probe. Thermal cycling consisted of activation of iTaq DNA polymerase at 95 °C for 3 min and 42 cycles of denaturation at 95 °C for 10 s and annealing-extension at 60 °C for 30 s. We evaluated the specificity of the assay using a panel of feline DNA viruses (parvovirus, herpesvirus and poxvirus). The qPCR assay was able to detect as few as 101 DNA copies per mL of standard DNA and 3.3 × 100 DNA copies per mL of DNA template extracted from clinical samples. DCH quantification displayed acceptable levels of repeatability over a range of target DNA concentrations, when calculating the intra- and inter-assay coefficients of variation within and between runs, respectively15 (link),16 (link).

Primer/probes used in this study.

AssayPrimerSequence 5′ – 3′Amplicon sizeTm (°C)Reference
PCR with consensus pan-hepadnavirus primersHBV-pol-F1TAGACTSGTGGTGGACTTCTC59345Wang et al.2 (link)
HBV-pol-R1CATATAASTRAAAGCCAYACAG
HBV-pol-F2TGCCATCTTCTTGTTGGTTC25845
HBV-pol-R2AGTRAAYTGAGCCAGGAGAAAC
PCR with specific primers for DCHHgap-FGTGCTCTGATAACCGTATGCTC23055Aghazadeh et al.3 (link)
Hgap-RCTAGAATGGCTACATGGGGTTAG
Quantitative PCR (qPCR)FHBV- forCGTCATCATGGGTTTAGGAA10550This study
FHBV- revTCCATATAAGCAAACACCATACAAT
FHBV- prob[FAM]TCCTCCTAACCATTGAAGCCAGACTACT [BHQ]
We carried out inferential statistical analyses using the Chi-Squared test with Yates’ Correction, the evaluation of the odds ratio (OR) and 95% confidence interval (CI95%) with the online software MedCalc easy-to-use Statistical software (https://www.medcalc.org/calc/odds_ratio.php). The significance level of the test was set at 0.05.
Full genome sequences of hepadnaviruses were retrieved from the GenBank database and aligned using Geneious version 9.1.8 (Biomatters LTD, Auckland, New Zealand) and the MAFFT algorithm17 (link). A set of genome sequences used in a previous study2 (link) was integrated with additional genome sequences of hepadnaviruses of recent identification in mammalian, avian, amphibian species and in fish. The final dataset included 53 hepadnavirus genomes. Phylogenetic analysis was performed using JModel test (http://evomics.org/resources/software/molecular-evolution-software/modeltest/) to evaluate the correct best-fit model of evolution for the entire dataset. Bayesian analysis18 (link),19 (link) was therefore applied using four MCMC chains well-sampled and converging over one million generations (with the first 2000 trees discarded as “burn-in”) and supplying statistical support with subsampling over 1000 replicates. The identified program settings for all partitions, under the Akaike information criteria, included six-character states (general time-reversible model), a proportion of invariable sites and a gamma distribution of rate variation across sites (GTR + I + G). We also tried to perform phylogenetic analyses using other evolutionary models (Maximum likelihood, Neighbor joining) to compare the topology of phylogenetic trees. We could observe similar topologies with slight difference in bootstrap values at the nodes of the tree. Accordingly, we did prefer to retain the Bayesian tree. We deposited the nucleotide genome sequence of strain ITA/2018/165-83 (MK117078) in GenBank.
Full text: Click here
Publication 2019
De novo assembly of BAC inserts was performed using the standard SMRT Analysis (v. 2.0.1) pipeline. Reads were masked for vector sequence (pBACGK1.1) and assembled with HGAP followed by consensus sequence calling with Quiver (Supplemental Fig. S10; Chin et al. 2013 (link)). HGAP creates a scaffold assembly using the longest reads (e.g., >7 kbp) as seeds to recruit additional subreads as a scaffold, while Quiver is a multi-read consensus algorithm that takes advantage of the full information from the raw pulse and base call information generated during SMRT sequencing. Final assembly was performed using a minimum read length of 500 bp and minimum read quality of 0.80 on a PC cluster (eight cores/10 GB of RAM) running RedHat 6 SE. We screened unsplit PacBio reads in FASTA format with cross_match using the recommended settings for contamination screening (–minmatch 10 –minscore 20 –screen). PacBio assemblies were reviewed for misassembly by visualizing read depth of PacBio reads in Parasight (http://eichlerlab.gs.washington.edu/jeff/parasight/index.html) using coverage summaries generated during the resequencing protocol. Sanger assemblies were obtained from NCBI by accession ID (Supplemental Table S8). De novo assembly of short-read data was performed with iCAS (ftp://ftp.sanger.ac.uk/pub/badger/aw7/icas_README).
Publication 2014
Badgers Chin Cloning Vectors Consensus Sequence Crossmatching, Blood hGAP NCOR2 protein, human Plant Embryos Pulse Rate Splenic Hypoplasia Tremor
Donor and recipient blood culture isolates were obtained and genotyped by a combination of standard methods including spa (S. aureus protein A) and SCCmec (staphylococcal cassette chromosome mec) typing (7 (link),8 (link)). spa polymerase chain reaction (PCR) products were compared and clonal complexes assigned using http://spaserver2.ridom.de (7 (link),8 (link)). Screening for presence of Panton-Valentine leukocidin (PVL) toxin was also performed (9 (link)). DNA extraction was performed after bacterial lysis using lysozyme/lysostaphin treatment or mechanical disruption, followed by column purification and ethanol precipitation. Completed DNA preparations were sequenced on the PacBio RSII platform and sequence was assembled using a custom pipeline based on HGAP version 1.4 (Pacific Biosciences, Menlo Park, CA) (10 (link)). Full details are provided in the Supporting Information.
Publication 2014
Bacteria Blood Culture Chromosomes Clone Cells Ethanol hGAP Lysostaphin Muramidase Panton-Valentine leukocidin Polymerase Chain Reaction Staphylococcal Protein A Staphylococcus Tissue Donors Toxins, Biological
The PacBio sequences were assembled using hierarchical genome-assembly process (HGAP) (Chin et al. 2013 (link)). Protein coding gene models were predicted using Augustus (Stanke and Morgenstern 2005 (link)) and the Yeast Genome Annotation Pipeline (Byrne and Wolfe 2005 (link)). In addition, protein sequences from other Saccharomyces species were aligned to the genome assembly using tblastn (Gertz et al. 2006 (link)). These predictions and alignments were used to produce a final set of annotated genes with the Apollo annotation tool (Lewis et al. 2002 ). The protein sequences were functionally annotated using InterproScan (Jones et al. 2014 (link)). Orthologous relationships with S. cerevisiae S288C sequences were calculated using InParanoid (Berglund et al. 2008 (link)). Non-coding RNAs were annotated by searching the RFAM database (Nawrocki et al. 2015 (link)) using Infernal (Nawrocki and Eddy 2013 (link)). Further tRNA predictions were produced using tRNAscan (Lowe and Eddy 1997 (link)). Repeat sequences were identified in Repbase (Bao et al. 2015 (link)) using Repeat Masker (Smit et al. 2013–2015). The dotplots were constructed by aligning S. jurei genome to the S. cerevisiae S288C genome using NUCmer and plotted using MUMmerplot (Kurtz et al. 2004 (link)). These features are available to browse via a UCSC genome browser (Kent et al. 2002 (link)) track hub (Raney et al. 2014 (link)). Single nucleotide polymorphisms (SNPs) were identified using Atlas-SNP2(Challis et al. 2012 (link)).
Full text: Click here
Publication 2018
Amino Acid Sequence Chin Genes, vif Genome Proteins Repetitive Region RNA, Untranslated Saccharomyces Saccharomyces cerevisiae Single Nucleotide Polymorphism Transfer RNA

Most recents protocols related to «HGAP»

The original sequencing data from PacBio were assembled using HGAP and Canu algorithms [27] , resulting in a complete genome with continuous chromosomes and plasmids. For the prediction of bacterial genetics, Glimmer 3.02 software was employed for functional annotation [28] . The coding DNA sequences (CdSs) were subsequently analysed through BLAST against several databases, including the NCBI nonredundant (NR), Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Carbohydrate-Active enzymes (CAZy) databases to acquire the corresponding functional annotations.
Publication 2024
Before initiating the genome sequencing process, the quantity and purity of the genomic DNA were determined using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). This step was performed to ensure the quality of the genomic DNA for obtaining a complete genome sequence. Genome sequencing was conducted using the PacBio RS II single-molecule real-time (SMRT) sequencing technology from Pacific Biosciences (Menlo Park, CA, USA). SMRTbell library inserts of 20 kb were prepared and sequenced using SMRT cells. Raw sequencing data were generated and subjected to de novo assembly utilizing the hierarchical genome assembly process (HGAP) protocol [19 (link)] and RS HGAP4 Assembly in SMRT analysis software (ver. 2.3; Pacific Biosciences, SMRT Link 4.0.0).
Full text: Click here
Publication 2024
The S. collinus Inha504 genome was sequenced at Macrogen (South Korea) using the Illumina HiSeq (Illumina, USA) platforms and the PacBio RSII (Pacific Biosciences, USA). Library preparation for Illumina HiSeq sequencing was performed using the TruSeq DNA sample preparation kit for Illumina (NE, USA), with a library insert size of 350 bp; Library preparation for PacBio RS SMRT sequencing was performed using the PacBio DNA Template Prep Kit 1.0 (Pacific Biosciences, USA) and the library insert size was 20 kb. A high-quality sequence was obtained by correcting the assembled contig error using Pilon (v1.21) software. The de novo assembly of the sequenced fragments was performed using HGAP (v3.0) performs. The validation check of the analyzed fragments was performed using BLAST (v2.7.1 +) software and BUSCO (v3.0) software. The annotation was performed using Prokka (v1.12b) software.
Full text: Click here
Publication 2024
The complete genome sequence of Pseudomonas sp. 1502IPR-01 was assembled using both the PacBio and Illumina reads. The original image data was transferred into sequence data via base calling, defined as raw data, and saved as a FASTQ file. Those FASTQ files contain read sequences, and quality information is included. A statistic of quality information was applied for quality trimming, by which the low-quality data can be removed to form clean data. The reads were then assembled into a contig using a hierarchical genome assembly process (HGAP) and canu, a software that performs scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. The last circular step was checked and finished manually, generating a complete genome with seamless chromosomes and plasmids. Finally, error correction of the PacBio assembly results was performed using the Illumina reads using Pilon. All of the above analyses were performed using I-Sanger Cloud Platform (www.i-sanger.com) from Shanghai Majorbio.
Full text: Click here
Publication 2024
Genomic DNA was extracted and sequenced via Illumina MiSeq or NextSeq benchtop sequencer (Illumina, Inc., San Diego, CA) as previously described 34 . Long read sequencing was performed using PacBio RS II (Pacific Biosciences of California, Inc., Menlo Park, CA). Kraken 2 was used to identify isolate species and check for contamination 35 . Short-read sequencing data were trimmed for adapter sequence content and quality using bbduk 36 . De novo assembly was performed using Newbler v2.9 37 . Minimum thresholds for contig size and coverage were set at 200 bp and 49.5×, respectively. Long-read sequencing data were assembled using HGAP 3.0 in the SMRT Analysis portal. In silico multilocus sequence typing (MLST) was performed using the scheme developed by Curran and Dawson 38 . Antimicrobial resistance genes were annotated using a combination of AMRFinderPlus and ARIBA 39, 40 . The genomes of all 253 ST-621 isolates have been deposited in the National Center for Biotechnology Information under BioProject PRJNA852179.
Publication 2024

Top products related to «HGAP»

Sourced in United States, China, Canada, Switzerland, Japan, United Kingdom
The PacBio RS II is a DNA sequencing platform developed by Pacific Biosciences. It utilizes Single Molecule, Real-Time (SMRT) sequencing technology to generate long DNA reads. The PacBio RS II is capable of producing high-quality, long-read sequence data for various genomics applications.
Sourced in United States, China, Japan
The PacBio RS II platform is a single-molecule, real-time (SMRT) DNA sequencing system designed for long-read sequencing. It utilizes SMRT cells to capture DNA sequence information.
Sourced in United States, United Kingdom
The G-TUBE is a sample preparation device designed for efficient genomic DNA extraction from a wide range of sample types. It utilizes Covaris' proprietary acoustic technology to gently and effectively lyse cells and tissues, releasing high-quality genomic DNA for downstream applications.
Sourced in United States, France, Germany, China, United Kingdom, Japan, Switzerland, Australia, Spain, Italy, Ireland, Canada, Brazil
The Wizard Genomic DNA Purification Kit is a product designed to isolate and purify genomic DNA from a variety of sample types. It utilizes a simple, rapid, and efficient protocol to extract high-quality DNA that can be used in various downstream applications.
Sourced in Germany, United States, France, United Kingdom, Netherlands, Spain, Japan, China, Italy, Canada, Switzerland, Australia, Sweden, India, Belgium, Brazil, Denmark
The QIAamp DNA Mini Kit is a laboratory equipment product designed for the purification of genomic DNA from a variety of sample types. It utilizes a silica-membrane-based technology to efficiently capture and purify DNA, which can then be used for various downstream applications.
PacBioDevNet is a software development kit that provides tools and resources for creating applications that interact with Pacific Biosciences' sequencing platforms. It enables developers to access and integrate Pacific Biosciences' sequencing data and analysis capabilities into their own applications.
Sourced in United States
The PacBio RS II system is a next-generation sequencing platform developed by Pacific Biosciences. It utilizes single-molecule real-time (SMRT) sequencing technology to generate long-read, high-quality sequencing data. The core function of the PacBio RS II system is to perform high-throughput DNA sequencing.
Sourced in United States, China, United Kingdom, Hong Kong, France, Canada, Germany, Switzerland, India, Norway, Japan, Sweden, Cameroon, Italy
The HiSeq 4000 is a high-throughput sequencing system designed for generating large volumes of DNA sequence data. It utilizes Illumina's proven sequencing-by-synthesis technology to produce accurate and reliable results. The HiSeq 4000 has the capability to generate up to 1.5 terabytes of data per run, making it suitable for a wide range of applications, including whole-genome sequencing, targeted sequencing, and transcriptome analysis.
Sourced in United States, China, Germany, United Kingdom, Hong Kong, Canada, Switzerland, Australia, France, Japan, Italy, Sweden, Denmark, Cameroon, Spain, India, Netherlands, Belgium, Norway, Singapore, Brazil
The HiSeq 2000 is a high-throughput DNA sequencing system designed by Illumina. It utilizes sequencing-by-synthesis technology to generate large volumes of sequence data. The HiSeq 2000 is capable of producing up to 600 gigabases of sequence data per run.
Sourced in United States, Germany
The PacBio RSII sequencer is a laboratory instrument designed for DNA sequencing. It utilizes single-molecule real-time (SMRT) technology to sequence long DNA fragments with high accuracy. The core function of the PacBio RSII is to generate high-quality sequence data for various genomic applications.

More about "HGAP"

High-Throughput Genome Assembly and Polishing (HGAP) is a powerful approach that leverages advanced computational techniques to efficiently reconstruct complete or near-complete genomes from vast amounts of sequencing data.
This process, also known as long-read genome assembly, is particularly useful for studying the genetic makeup of diverse organisms, including microbes, plants, and animals.
HGAP workflows often involve the use of specialized sequencing platforms, such as the PacBio RS II system, which can generate long, high-quality reads that facilitate the assembly of complex or fragmented genomes.
Additional tools and products, like the G-TUBE, Wizard Genomic DNA Purification Kit, and QIAamp DNA Mini Kit, may be used to prepare samples and extract high-quality DNA for sequencing.
By optimizing protocols and identifying the best tools and products, HGAP can streamline workflows, enhance reproducibility, and drive breakthroughs in a wide range of biological and biomedical research fields.
This includes applications in metagenomics, transcriptomics, and the study of structural variations and epigenetic modifications.
PubCompare.ai, a cutting-edge platform, can help researchers elevate their HGAP-related research by providing AI-powered protocol optimization and intelligent comparisons of protocols from literature, pre-prints, and patents.
This can assist in locating the most suitable protocols and products, ultimately improving efficiency and reproducibility.
Additionally, resources like the PacBioDevNet community provide a wealth of information and support for researchers working with PacBio sequencing technologies, which are often used in HGAP workflows.
By leveraging these tools and resources, researchers can streamline their HGAP processes and drive groundbreaking discoveries.