To perform downstream analyses, we first aligned the methylase sequences using MAFFT (v7.471) [30 (link)]. MAFFT was used to create two different alignments. The first (which we refer to as the compact alignment) used the globalpair and reorder settings, and a maximum iteration count of 1000, while the second (which we refer to as the gappy alignment) used the globalpair and reorder settings, a maximum iteration count of 1000, and an unalignlevel of 0.8. SeaView (v5.0.4) [31 (link)] was used to inspect alignments and to then define four separate site sets: one for the methylase excluding the insertion elements and one each for the three insertion elements. We will refer to the site set containing only the methylases and not the insertion elements as the methylase extein. The methylase extein set was copied and split into three different subsets. Each one contained only the methylase sequences which were invaded by a given insertion element such that there was a subset for intein-containing methylases, a subset for ShiLan domain-containing methylases, and a subset for endonuclease-containing methylases. The alignment of these three extein sub-datasets was the same as in the compact alignment.
Free full text: Click here