Existing data can only yield limited new insights into the effectiveness of a DNA-based identification system for birds. Two mitochondrial genes, cyt b and COI, are rivals for the largest number of animal sequence records greater than 600 bp in GenBank (4,791 and 3,009 species, respectively). However, COI coverage for birds is modest; 173 species share COI sequences with 600-bp overlap. As these records derive from a global avifauna of 10,000 species, they provide a limited basis to evaluate the utility of a COI-based identification system for any continental fauna, impelling us to gather new sequences.
We employed a stratified sampling design to gain an overview of the patterns of COI sequence divergence among North American birds. The initial level of sampling examined a single individual from each of 260 species to ascertain COI divergences among species. These species were selected on the basis of accessibility without regard to known taxonomic issues. The second level of sampling examined one to three additional individuals from 130 of these species to provide a general sense of intraspecific sequence divergences, as well as a preliminary indication of variation in each species. When possible, these individuals were obtained from widely separated localities in North America. The third level of our analysis involved sequencing four to eight more individuals for the few species where the second level detected more than 2% sequence divergence among individuals. Our studies examined specimens collected over the last 20 years; 98% were obtained from the tissue bank at the Royal Ontario Museum, Toronto, Canada. Collection localities and other specimen information are available in the “Birds of North America” file in the Completed Projects section of the Barcode of Life website (http://www.barcodinglife.com). Taxonomic assignments follow the latest North American checklist (AOU 1998 ) and its recent supplements (Banks et al. 2000 , 2002 , 2003 ).
Mitochondrial pseudogenes can complicate PCR-based studies of mitochondrial gene diversity (Bensasson et al. 2001 (link); Thalmann et al. 2004 (link)). We used protocols to reduce pseudogene impacts that included extracting DNA from tissues rich in mitochondria (Sorenson and Quinn 1998 ), employing primers with high universality (Sorenson and Quinn 1998 ), and amplifying a relatively long PCR product because most pseudogenes are short (Pereira and Baker 2004 (link)). DNA extracts were prepared from small samples of muscle using the GeneElute DNA miniprep Kit (Sigma, St. Louis, Missouri, United States), following the manufacturer's protocols. DNA extracts were resuspended in 10 μl of H2O, and a 749-bp region near the 5′ terminus of the COI gene was amplified using primers (BirdF1-TTCTCCAACCACAAAGACATTGGCAC and BirdR1-ACGTGGGAGATAATTCCAAATCCTG). In cases where this primer pair failed, an alternate reverse primer (BirdR2-ACTACATGTGAGATGATTCCGAATCCAG) was generally combined with BirdF1 to generate a 751-bp product, but a third reverse primer (BirdR3-AGGAGTTTGCTAGTACGATGCC) was used for two species of Falco. The 50-μl PCR reaction mixes included 40 μl of ultrapure water, 1.0 U of Taq polymerase, 2.5 μl of MgCl2, 4.5 μl of 10× PCR buffer, 0.5 μl of each primer (0.1 mM), 0.25 μl of each dNTP (0.05 mM), and 0.5–3.0 μl of DNA. The amplification regime consisted of 1 min at 94 °C followed by 5 cycles of 1 min at 94 °C, 1.5 min at 45 °C, and 1.5 min at 72 °C, followed in turn by 30 cycles of 1 min at 4 °C, 1.5 min at 51 °C, and 1.5 min at 72 °C, and a final 5 min at 72 °C. PCR products were visualized in a 1.2% agarose gel. All PCR reactions that generated a single, circa 750-bp, product were then cycle sequenced, while gel purification was used to recover the target gene product in cases where more than one band was present. Sequencing reactions, carried out using Big Dye v3.1 and the BirdF1 primer, were analyzed on an ABI 377 sequencer. The electropherogram and sequence for each specimen are in the “Birds of North America” file, but all sequences have also been deposited in GenBank (see Supporting Information). COI sequences were recovered from all 260 bird species and did not contain insertions, deletions, nonsense, or stop codons, supporting the absence of nuclear pseudogene amplification (Pereira and Baker 2004 (link)). In addition to 429 newly collected sequences, nine GenBank sequences from five species were included (these were the only full-length COI sequences corresponding to species in this study).
Sequence divergences were calculated using the K2P distance model (Kimura 1980 (link)). A NJ tree of K2P distances was created to provide a graphic representation of the patterning of divergences among species (Saitou and Nei 1987 (link)).
Free full text: Click here