fastp is designed for multi-threading parallel processing. Reads loaded from FASTQ files will be packed with a size of N (N = 1000). Each pack will be consumed by one thread in the pool, and each read of the pack will be processed. Each thread has an individual context to store statistical values of the reads it processes, such as per-cycle quality profiles, per-cycle base contents, adapter trimming results and k-mer counts. These values will be merged after all reads are processed, and a reporter will generate reports in HTML and JSON formats. fastp reports statistical values for pre-filtering and post-filtering data to facilitate comparisons of changes in data quality after filtering is complete.
fastp supports single-end (SE) and paired-end (PE) data. While most steps of SE and PE data processing are similar, PE data processing requires some additional steps such as overlapping analysis. For the sake of simplicity, we only demonstrate the main workflow of paired-end data preprocessing, shown in Figure 1.
Algorithm 1 adapter sequence detection

for seed in sorted_adapter_seeds:

 seqs_after_seed = get_seqs_after(seed)

 forward_tree = build_nucleotide_tree(seqs_after_seed)

 found = True

 node = forward_tree.root

 after_seed = “”

 while node.is_not_leaf():

  if node.has_dominant_child():

   node = node.dominant_child()

   after_seed = after_seed + node.base

  else:

   found = False

   break

 if found == False:

  continue

 else:

  seqs_before_seed = get_seqs_before(seed)

  backward_tree = build_nucleotide_tree(seqs_before_seed)

  node = backward _tree.root

  before_seed = “”

  while node.is_not_leaf():

   if node.has_dominant_child():

     node = node.dominant_child()

     before_seed = node.base + before_seed

   else:

     break

 adapter = before_seed + seed + after_seed

 break