The rich feature set of FLEXBAR addresses many potential applications in single-end, paired-end and mate-pair sequencing. Typical workflows involve a quality-clipping step, demultiplexing, which potentially includes barcode trimming, followed by a separate adapter trimming step. All of these steps may be executed within the same FLEXBAR program call (see Figure S1). The default parameters of FLEXBAR are optimized to deliver good results (especially Illumina and SOLiD) for a large number of scenarios (see benchmarks). However, customization of settings might improve results for specific applications.
FLEXBAR has been implemented in C++ using the Seqan library [1 (link)]. Multi-threading has been implemented with the Intel Threading Building Blocks library [2 ]. FLEXBAR detects target sequences by overlap sequence alignment, based on the Needleman-Wunsch algorithm [3 (link)]. An overlap (or semi-global) alignment uses the same recurrence relations as a global alignment but does not penalize gaps at the end of the alignment (Figure 1A and Figure S2). To this end, the first row and column of the dynamic programming matrix are initialized with zeros and the alignment score maximum is searched in the last row and column of the alignment matrix.
FLEXBAR offers maximal flexibility in target sequence recognition by considering base substitutions, insertions and deletions. Moreover, the user is free to choose all alignment scoring parameters, the minimal overlap and a threshold on sequence similarity. Default parameters are preset for these parameters and were chosen to work best for Illumina HiSeq and ABI SOLiD sequencing data. A simple direct matching to expected sequence tags might not be adequate for sequencing platfoms with elevated indel error rates. Furthermore, letter- as well as color-space encoded reads can be processed (Figure 1A). FLEXBAR supports five sequence trimming modes, which cover most sequencing applications: (1) LEFT, (2) LEFT-TAIL, (3) RIGHT, (4) RIGHT-TAIL or (5) ANY(where) trimming (Figure 1B). These modes can be independently combined for adapter and barcode sequence recognition in single or paired-end data. Barcode reads might be even separated from the actual single or paired-end read set (as in Illumina TruSeqTM sequencing).
Free full text: Click here