Protein sequences of Nematostella vectensis (GenBank: XP_001642062.2, XP_001629615.2), Drosophila melanogaster (GenBank: NP_569940.2), Caenorhabditis elegans (GenBank: NP_492153.2, NP_498594.1), Crassostrea gigas (GenBank: EKC20855.1, EKC32699.1, XP_011441313.2), Strongylocentrotus purpuratus (GenBank: XP_011680614.1, XP_781832.1, XP_030847369.1), Ciona intestinalis (GenBank: XP_002128212.1), Danio rerio (GenBank: NP_571671.2, NP_571685.2, XP_021334693.1, XP_686426.5, NP_001277142.1, XP_687183.1) and Homo sapiens (GenBank: XP_024305442.1, NP_056648.1, NP_061172.1, NP_640336.1, NP_631913.3) collected from NCBI were used as queries to search for ADAR/ADAD genes in the public reference genome and the de novo transcriptome assemblies (assembled by Trinity92 (link)) of the 22 species by TBLASTN93 (link) with parameters -F F -e 1e-5, followed by the determination of protein sequences in the target species with GeneWise.94 (link) The predicted proteins were then aligned to the NCBI nr database to confirm whether they were ADARs/ADADs. Domain organizations of the manually confirmed ADAR/ADAD proteins were predicted using the CD-Search tool in NCBI (CDD)95 (link) and Pfam96 (link) with default settings.
Phylogenetic analysis of ADARs and ADADs identified above, were performed with the adenosine-deaminase (AD) domains (around 324 amino acids in length; see Table S2 for the sequences) using RAxML97 (link) with the Maximum Likelihood (ML) method (parameter: -m PROTGAMMAIJTT) and using Mrbayes98 (link) with Bayesian Inference (BI) method (parameters: prset aamodelpr = fixed(Wag); lset rates = invgamma; mcmcp ngen = 1000000 nchains = 4 samplefreq = 100 burnin = 200), respectively. The AD peptide sequences used for phylogenetic analysis were aligned using PRANK.99 (link) Reliability of the ML tree was estimated based on 1,000 bootstrap replications. The structures of phylogenetic trees generated by the two methods were generally consistent with each other (Figure S2). The information of ADAR genes annotated in each species, including the coding nucleotide sequences, protein sequences, domain annotations are presented in Table S2.
Free full text: Click here