Simulated Diploid Genome Assembly

The polymorphic diploid assembly of the Chinese amphioxus Branchiostoma belcheri was processed by HaploMerger. A total of 274 Mbp N gap-free alignments for trusted allele pairs (>1000-bp alignment length and >90% alignment identity) were extracted from the HaploMerger outputs. From this alignment pool, we randomly selected 10-Mbp alignments and concatenated the target and query sequences, respectively. By doing so, a small simulated diploid genome was created with a pair of 10-Mbp chromosomes, one from the target and the other from the query. A total of 25 small genomes of 10 Mbp were created. Random sampling without replacement was implemented to ensure no repeated use of any alignment from the pool. In addition, all alignments were concatenated to create a large simulated genome with a pair of 274-Mbp chromosomes.
For each chromosome from the 10-Mbp genomes, we simulated 650,000 454 shotgun reads (350 ± 70 bp), 140,000 3-kb paired-end 454 reads (3000 ± 600 bp), 40,000 8-kb paired-end 454 reads (8000 ± 1600 bp), 40,000 20-kb paired-end 454 reads (20,000 ± 4000 bp), 1,000,000 300-bp mate-pair Illumina reads (115 bp per end), and 1,000,000 500-bp mate-pair Illumina reads (115 bp per end). It should be noted that to induce more assembly errors, the length of each end of the 454 paired-end reads was set to only 104 bp. Reads were randomly sampled from the chromosomes. Sequencing errors were simulated at 1.3%–1.7% for each read. For 454 reads, ∼50% of the error rate represented indels due to homopolymers. Before use, the Illumina reads were subjected to error-correction using Quake (Kelley et al. 2010 (link)). In summary, we simulated ∼11× 454 and 23× Illumina reads for each small genome. As for the large genome, only 11× 454 reads were simulated (with the same proportions for the library size as for the small genomes). The Celera assembler version 6.1 was used to assemble the simulated data with the following specific parameters: utgErrorRate = 0.015, overlapper = mer, and unitigger = bog (Miller et al. 2008 (link)). Finally, HaploMerger was used to analyze each resulting, soft-masked assembly with the default parameters and a scoring matrix specific to the assembly.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Huang S., Chen Z., Huang G., Yu T., Yang P., Li J., Fu Y., Yuan S., Chen S, & Xu A. (2012). HaploMerger: Reconstructing allelic relationships for polymorphic diploid genome assemblies. Genome Research, 22(8), 1581-1588.

Publication 2012

A 274 Allele Amphioxus Branchiostoma belcheri Chinese Chromosome 10 Chromosomes Diploid Genomes Indels Library Polymorphic

Corresponding Organization :

Other organizations : Sun Yat-sen University

Top 5 similar protocols

Protocol cited in 5 other protocols

Variable analysis

independent variables

HaploMerger processing of the polymorphic diploid assembly of the Chinese amphioxus Branchiostoma belcheri
Random selection of 10-Mbp alignments from the alignment pool
Concatenation of target and query sequences to create small simulated diploid genomes
Random sampling without replacement to create 25 small genomes of 10 Mbp
Concatenation of all alignments to create a large simulated genome with a pair of 274-Mbp chromosomes
Simulation of different sequencing read types (454 shotgun, 454 paired-end, Illumina mate-pair) with varying read lengths and error rates

dependent variables

Alignment length and identity for trusted allele pairs
Resulting assembly quality from the Celera assembler
Output of the HaploMerger analysis on the resulting assemblies

control variables

Alignment length (>1000-bp) and identity (>90%) thresholds for trusted allele pairs
Specific parameters used for the Celera assembler (utgErrorRate = 0.015, overlapper = mer, and unitigger = bog)
Use of the default parameters and a scoring matrix specific to the assembly in the HaploMerger analysis

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!