A mixture of annotated and un-annotated MHCI sequences were identified using Ensembl’s Biomart and the GO/IPR term for class I (GO: 0042613/ IPR001039) supplemented with various blastN and TblastN searches of Ensembl and NCBI databases using evolutionary diverged as well as species-specific sequences. It should be noted that the analysed genomic databases from cavefish (
Astyanax mexicanus, AstMex102), zebrafish (
Danio rerio ZV9), medaka (
Oryzias latipes, Medaka1), platyfish (
Xiphophorus maculatus, Xipmac4.4.2
), tilapia (
Oreochromis niloticus, Orenil 1.0), stickleback (
Gasterosteus aculatus, BROAD S1), fugu (
Takifugu rubripes, Fugu4.0) and tetraodon (
Tetraodon nigroviridis, Tetraodon8.0), Atlantic salmon (
Salmo salar, AGKD00000000.3), Atlantic cod (
Gadus morhua, NCBI GadMor_May2010) and spotted gar (
Lepisosteus oculatus, Ensembl LepOcu1
) each represent one or a limited number of animals so more genes or other alleles may exist in other haplotypes/ animals. Potential genomic assembly errors would also influence our analyses. For Atlantic salmon, we supplemented the 12 known Atlantic salmon MHCI genes [29 (
link)] with blastN and TblastN searches using preliminary salmon genome sequences available at either cGRASP [85 ] or NCBI [86 ]. Open reading frames were predicted using GenScan [87 (
link)], Fgenesh [88 (
link)] and Augustus [89 (
link)] and/or by aligning with expressed sequences using Spidey [90 (
link)]. Some smaller pseudogene remnants that did not contribute to evolutionary understanding were neglected. Expressed match was either identified through TblastN search against EST resources using MHCI alpha 3 domains or when this approach was negative expressed match was sought using the entire coding sequence in GenBank nucleotide (cDNA) and subsequently available TSA/SRA resources. The transcriptome (TSA/SRA) accession numbers used are as follows: tetraodon (Brain: SRX191169), fugu (Testis: SRX363280, gills: SRX363279, liver: SRX362038, various organs: SRX189142, SRX188889 and SRX188888), Atlantic cod (eggs: SRX148753, brain: SRX148752, head kidney: SRX148751, liver: SRX148750, hind gut: SRX148749, gonad: SRX148748, spleen: SRX148740), stickleback (brain: SRX146601), cavefish (surface fish: SRX212200, Pachon cavefish: SRX212201) and African lungfish SRX152529. The Z lineage sequence identified in spotted gar (
Lepisosteus oculatus) derive from individual brain transcriptome reads (SRX543528) assembled using the CAP3 [91 ] program. The sturgeon Z lineage alpha 1 domain sequence is assembled from near identical genomic reads primarily from the sturgeon species
Acipenser persicus (SRA dataset ERX145719; ERR169830.1125422.1) with a 14 bp gap filled using a
Acipenser baerii sequence (SRA dataset ERX145721; ERR169832.3958173.2). The sturgeon alpha 2 domain sequence is assembled from the near identical sequences primarily from
Acipenser persicus (SRA dataset ERX145719; ERR169830.5438448.1, ERR169830.5438448.2, and ERR169830.5083693.2), with a 10 bp gap filled using a
Acipenser gueldenstaedtii sequence (SRA dataset ERX145720 sequence ERR169831.3185933.1). Three dimensional structures were aligned against the HLA-A2 structure using the Swiss PDB-viewer [92 (
link),93 ].
Grimholt U., Tsukamoto K., Azuma T., Leong J., Koop B.F, & Dijkstra J.M. (2015). A comprehensive analysis of teleost MHC class I sequences. BMC Evolutionary Biology, 15, 32.