Before starting any analyses, experiments were planned and described in a Phylotocol55 (link). Subsequent modifications to the analyses were noted and justified in that document. Many genes besides Sox genes include an HMG box, so searching for Sox genes using just the HMG hidden markov model (HMM) produced many non-target sequences. To identify Sox genes specifically, we generated a custom HMG HMM from a published Sox gene alignment56 (link) after removing the outgroup sequences (Tcf/Lef and Capicua/CIC) using hmmbuild (hmmer.org). We then used this custom HMM to search for Sox genes in translated transcriptomes from 15 cnidarians and six bilaterians. The abbreviations for the cnidarian taxa we used are as follows: Aala—Alatina alata, AdigAcropora digitifera, AmilAcropora millepora, Epal—Exaiptasia pallida, AvanAtolla vanhoeffeni, CcruxCalvadosia cruxmelitensis, CameCeriantheopsis americana, ChemClytia hemisphaerica, CxamCassiopea xamachana, ElinEdwardsiella lineata, HechHydractinia echinata, HmagHydra magnipapillata, HsanHaliclystus sanjuanensis, NvecNematostella vectensis, RrenRenilla reniformis; and for bilaterians: BfloBranchiostoma floridae, CintCiona intestinalis, CeleCaenorhabditis elegans, DmelDrosophila melanogaster, HsapHomo sapiens, LgigLottia gigantea, SpurStrongylocentrotus purpuratus. We used this custom HMM in combination with our hmm2aln script (https://github.com/josephryan57 ) to generate an alignment that included the original sequences used to generate the HMM. We then removed all ctenophore, sponge, and placozoan sequences from this alignment and generated trees.
Phylogenetic analysis was performed following a published protocol58 (link). Briefly, we used the model finder feature with IQ-TREE to identify the best substitution model for the alignment (provided as Supplementary Data 1). We then performed three maximum likelihood analyses, in parallel, using: RAxML with 25 maximum parsimony starting trees, RAxML with 25 random starting trees, and a default run with IQ-TREE. We then compared maximum likelihood values from the outputs of all three analyses to select the best tree and performed 1000 rapid bootstraps using RAxML for branch support. The final tree file was modified in FigTree v1.4 (http://tree.bio.ed.ac.uk/software/figtree/) and Adobe Illustrator v 24.1.1 for presentation.
Free full text: Click here