We defined a set of three rules based on known properties of the n-, h-, and c-regions. The initial n-region must have a minimum length of two residues and the terminal c-region a minimum length of three residues. The most hydrophobic position, which is identified by sliding a seven-amino-acid window across the SP and computing the hydrophobicity using the Kyte–Doolittle scale29 (link), belongs to the h-region. All positions between these six labeled positions are labeled as either both n and h or h and c, yielding multitag labels.
This procedure was adapted for different SP classes, with only Sec/SPI completely following it. For Tat SPs, the n–h border was identified using the twin-arginine motif. All positions before the motif were labeled n, followed by two dedicated labels for the motif, again followed by a single position labeled n. For SPII SPs, we did not label a c-region, as the C-terminal positions cannot be considered as such30 (link). The last three positions were labeled as the lipobox, all positions before that as h only. For SPIII SPs, no region labels were generated within the SP.