Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile [44] (link). Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package [47] (link) is utilized to check for possible chimeras that would arise if the component sequences inadvertently (e.g. contamination, laboratory error) derived from two different taxa.
Free full text: Click here