To analyze the loss of vision in the blind mole rat and cape golden mole, we first build a genome alignment with mouse as the reference species that included both blind species. Specifically, we used the UCSC lastz/chain/net pipeline (Kent et al. 2003 (
link)) to build pairwise genome alignments between mouse (mm10 assembly) and the following species: rat (rn5), guinea pig (cavPor3), pika (ochPri3), rabbit (oryCun2), prairie vole (micOch1), blind mole rat (nanGal1), squirrel (speTri2), human (hg19), crab-eating macaque (macFas5), bushbaby (otoGar3), cow (bosTau7), dog (canFam3), horse (equCab2), cat (felCat5), elephant (loxAfr3), manatee (triMan1), cape golden mole (chrAsi1), opossum (monDom5), Anolis lizard (anoCar2), chicken (galGal4), and frog (xenTro7). For all species, we used lastz (Schwartz et al. 2003 (
link)) version 1.03.54 with the parameters
H = 2,000
Y = 3,000
L = 3,000
K = 2,400, and the HoxD55 scoring matrix, and kept all local alignment that have at least one ≥30 bp region with ≥60% sequence identity and ≥1.8 bits entropy as described in Hiller et al. (2013) (
link). For all nonmammalian species, we additionally used highly-sensitive local alignments (Hiller et al. 2013 (
link)) with lastz parameters
W = 5,
L = 2,700, and
K = 2,000. For mammals, we kept only alignment chains with a score of ≥70,000 that span ≥9,000 bp in both genomes. In order to keep also chains with very strong alignments spanning only a shorter region, we also kept chains with a score of ≥150,000 that span ≥6,000 bp in both genomes. For nonmammals, we kept only alignment chains with a score of ≥15,000. All other chains are discarded as they typically do not represent strong syntenic alignments. Chains were ‘netted’ using chainNet (Kent et al. 2003 (
link)). The pairwise syntenic alignment nets are the input to MULTIZ (Blanchette et al. 2004 (
link)) to build a multiple alignment. The neutral distances between all species were determined using phyloFit (Siepel et al. 2005 (
link)) and 4-fold degenerate sites. The tree with branch lengths measuring substitutions per neutral site is given in
supplementary figure 39,
Supplementary Material online. As above, we used PhastCons and GERP to obtain 184,412 conserved coding regions covering (27.4 Mb, 1.04% of the mm10 assembly). After applying the GLS and branch method to all conserved coding regions, we selected those multi-exon genes where at least two exons are in the top 1,000 of the most significant hits and selected those single exon genes that are in the same top 1,000 hits. This resulted in a list of 141 (124 multi-exon and 17 single exon) genes for the GLS method and 164 (132 multi-exon and 32 single exon) genes for the branch method. The union of both lists comprises 208 genes. We used Enrichr (Chen et al. 2013 (
link)) to detect functional enrichments of these 208 genes (
table 1). Similar enrichments related to eye and vision were also found for the individual sets of 141 and 164 genes, however the 164 genes detected by the branch method have additional functional enrichments (
supplementary table 5,
Supplementary Material online).
Prudent X., Parra G., Schwede P., Roscito J.G, & Hiller M. (2016). Controlling for Phylogenetic Relatedness and Evolutionary Rates Improves the Discovery of Associations Between Species’ Phenotypic and Genomic Differences. Molecular Biology and Evolution, 33(8), 2135-2150.