After the run, image analysis, base calling and error estimation were performed using Illumina/Solexa Pipeline (version 0.2.2.6). Perl scripts were used to sort and bin all sequences using the three (5′) nucleotide tags; these tags were removed prior to evaluation with Reference Guided Assembler (RGA; R. Shen and T. Mockler, in preparation) or de novo assembly. Examination of Illumina Q-values revealed a decrease after cycle 33 (data not shown), thus the three 3′ bases were trimmed, and 30-mers were used in all subsequent analyses (31 (link)). Binned 30-mers were evaluated relative to the appropriate Pinus reference (P. thunbergii, NC_001631; P. koraiensis, NC_004677) using the program RGA in order to estimate the genome coverage.
To assemble chloroplast genomes using Illumina/Solexa microreads, we used a three-step process. First, de novo assemblies were attempted using Velvet Assembler 0.4 (32 (link)) using a hash length of 19, minimum average coverage of 5×, and minimum contig length of 100 bp. Second, contigs were aligned to a reference genome sequence using CodonCode version 2.0.4 (CodonCode Corporation, Dedham, MA, USA; http://www.codoncode.com/) and standard settings for global alignments. Picea sitchensis was aligned to the previously published chloroplast genome of P. thunbergii (NC001631) and the species of Pinus subgenus Strobus were aligned to P. koraiensis (NC004677). The assembly of P. contorta used a draft plastome of P. ponderosa as its reference (A. Liston and R. Cronn, unpublished results). Prior to alignment, an ‘N’ was added to the ends of each contig, in order to differentiate assembly gaps (dashes flanked by the added ‘N's) from deletions (dashes) relative to the reference. Contigs that failed to align to the reference genome were scanned for chloroplast sequence homology using BLASTN (http://www.ncbi.nlm.nih.gov/). Successful matches typically contained >100 bp insertions relative to the reference genome; these contigs were manually inserted into the alignment. Between 67% and 98% of the contigs aligned to the reference genome. Unaligned contigs apparently represent nontarget PCR amplicons (data not shown). The final de novo assemblies covered 78.1–94.6% of the reference genome (excluding deletions and including insertions relative to the reference). Third, gaps between the de novo contigs were replaced with the reference sequence, and this chimeric assembly was used as a ‘pseudo-reference’ for reference-guided assembly with the program RGA. RGA aligns microreads to their best match in a reference sequence, and then creates a guided consensus sequence from the aligned overlapping reads. RGA outputs the resulting contigs, singletons, the real coverage of each base in the assembly, and identifies SNPs based on microread density in the assembled sequence compared to the reference and Q-values at specific position on each microread. RGA settings used were ≤2 mismatches per microread, Q-values ≥20, read depth ≥3 and SNP acceptance requiring ≥70% of reads in agreement. The pseudo-reference created from de novo assemblies and the reference sequences were corrected using RGA.
Final sequences were annotated using standard settings in the program DOGMA [(33 (link)), http://dogma.ccbb.utexas.edu/]. Multiple alignments were made using MAFFT v. 5 (34 (link)), and full alignments with annotations were visualized using the VISTA viewer (34 (link),35 (link)). See Supplementary Figure 1 for full annotation summaries. In addition, nucleotide positions corresponding to primer locations were changed to ‘N’, as the use of complementary forward and reverse primers at a single site precluded us from obtaining genomic sequence for these positions.

Relative frequencies of barcode error by barcode tag (CCT, GGT), experiment (S1, S6) and nucleotide position (1 (link),2 (link), 3 ). Observed frequencies of erroneous, nontag nucleotides are indicated by position 1 (salmon), 2 (blue) and 3 (green); first and second position errors were far more common than third position errors. Slices within a position are scaled proportionately to the number of base calls for that nucleotide; if errors were present at equal frequencies within a base position, each slice would be of equal size and would not extend beyond the perimeter of the circle. In all experiments, errors involving substitutions to ‘A’ were more frequent than expected for position 1 and 3, where errors involving substitutions to ‘T’ were more frequent than expected for position 2.