Illumina Chloroplast Genome Assembly

After the run, image analysis, base calling and error estimation were performed using Illumina/Solexa Pipeline (version 0.2.2.6). Perl scripts were used to sort and bin all sequences using the three (5′) nucleotide tags; these tags were removed prior to evaluation with Reference Guided Assembler (RGA; R. Shen and T. Mockler, in preparation) or de novo assembly. Examination of Illumina Q-values revealed a decrease after cycle 33 (data not shown), thus the three 3′ bases were trimmed, and 30-mers were used in all subsequent analyses (31 (link)). Binned 30-mers were evaluated relative to the appropriate Pinus reference (P. thunbergii, NC_001631; P. koraiensis, NC_004677) using the program RGA in order to estimate the genome coverage.
To assemble chloroplast genomes using Illumina/Solexa microreads, we used a three-step process. First, de novo assemblies were attempted using Velvet Assembler 0.4 (32 (link)) using a hash length of 19, minimum average coverage of 5×, and minimum contig length of 100 bp. Second, contigs were aligned to a reference genome sequence using CodonCode version 2.0.4 (CodonCode Corporation, Dedham, MA, USA; http://www.codoncode.com/) and standard settings for global alignments. Picea sitchensis was aligned to the previously published chloroplast genome of P. thunbergii (NC001631) and the species of Pinus subgenus Strobus were aligned to P. koraiensis (NC004677). The assembly of P. contorta used a draft plastome of P. ponderosa as its reference (A. Liston and R. Cronn, unpublished results). Prior to alignment, an ‘N’ was added to the ends of each contig, in order to differentiate assembly gaps (dashes flanked by the added ‘N's) from deletions (dashes) relative to the reference. Contigs that failed to align to the reference genome were scanned for chloroplast sequence homology using BLASTN (http://www.ncbi.nlm.nih.gov/). Successful matches typically contained >100 bp insertions relative to the reference genome; these contigs were manually inserted into the alignment. Between 67% and 98% of the contigs aligned to the reference genome. Unaligned contigs apparently represent nontarget PCR amplicons (data not shown). The final de novo assemblies covered 78.1–94.6% of the reference genome (excluding deletions and including insertions relative to the reference). Third, gaps between the de novo contigs were replaced with the reference sequence, and this chimeric assembly was used as a ‘pseudo-reference’ for reference-guided assembly with the program RGA. RGA aligns microreads to their best match in a reference sequence, and then creates a guided consensus sequence from the aligned overlapping reads. RGA outputs the resulting contigs, singletons, the real coverage of each base in the assembly, and identifies SNPs based on microread density in the assembled sequence compared to the reference and Q-values at specific position on each microread. RGA settings used were ≤2 mismatches per microread, Q-values ≥20, read depth ≥3 and SNP acceptance requiring ≥70% of reads in agreement. The pseudo-reference created from de novo assemblies and the reference sequences were corrected using RGA.
Final sequences were annotated using standard settings in the program DOGMA [(33 (link)), http://dogma.ccbb.utexas.edu/]. Multiple alignments were made using MAFFT v. 5 (34 (link)), and full alignments with annotations were visualized using the VISTA viewer (34 (link),35 (link)). See Supplementary Figure 1 for full annotation summaries. In addition, nucleotide positions corresponding to primer locations were changed to ‘N’, as the use of complementary forward and reverse primers at a single site precluded us from obtaining genomic sequence for these positions.
Figure 1.

Relative frequencies of barcode error by barcode tag (CCT, GGT), experiment (S1, S6) and nucleotide position (1 (link),2 (link), 3 ). Observed frequencies of erroneous, nontag nucleotides are indicated by position 1 (salmon), 2 (blue) and 3 (green); first and second position errors were far more common than third position errors. Slices within a position are scaled proportionately to the number of base calls for that nucleotide; if errors were present at equal frequencies within a base position, each slice would be of equal size and would not extend beyond the perimeter of the circle. In all experiments, errors involving substitutions to ‘A’ were more frequent than expected for position 1 and 3, where errors involving substitutions to ‘T’ were more frequent than expected for position 2.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Cronn R., Liston A., Parks M., Gernandt D.S., Shen R, & Mockler T. (2008). Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Research, 36(19), e122.

Publication 2008

Bp 100 Chimeric Chloroplast Chloroplast genome Consensus sequence Deletions Genomic Hash Insertions Mers Nucleotide Perimeter Picea Pinus Primers Salmon

Corresponding Organization :

Other organizations : Pacific Northwest Research Station, Universidad Nacional Autónoma de México

Top 5 similar protocols

Protocol cited in 25 other protocols

Variable analysis

independent variables

Barcode tag (CCT, GGT)
Experiment (S1, S6)
Nucleotide position (1, 2, 3)

dependent variables

Relative frequencies of barcode error

control variables

Not explicitly mentioned

controls

No positive or negative controls were explicitly mentioned.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!