The whole-genome protein sequence dataset for G. arboreum, G. barbaense, and G. hirsutum was downloaded from the CottonGen database (https://www.cottongen.org/), and the dataset for G. raimondii was obtained from Phytozome, version 12 (https://phytozome.jgi.doe.gov/pz/portal.html) (Nordberg et al., 2014 (link)). Total protein sequences for other plant species from different taxonomic groups were downloaded from the website of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). A total of 58 HD2 protein sequences from 42 plant species were obtained from the NCBI and utilized to construct Hidden Markov model (HMM) profiles. This profile of the HD2 domains (the HD2 label and the catalytic, regulatory, and zinc- finger domains) was employed as a query to identify HD2 gene family members using HMMER (V3.0) (Finn et al., 2015 (link)). Protein sequences and CDSs for G. arboreum, G. raimondii, G. hirsutum, and G. barnadense were also downloaded from the CottonGen database (https://www.cottongen.org/) (Yu et al., 2014 (link)). All hits were queried in the Pfam (http://pfam.xfam.org/) and InterProScan (http://www.ebi.ac.uk/interpro/search/sequence-search/) databases to verify the presence of conserved domains. The ProtParam (http://web.expasy.org/protparam/) tool offered by Expasy was used to estimate the physicochemical parameters of Gossypium HD2 proteins. The ProtParam tool was also used to estimate biophysical and biochemical properties, such as number of amino acids, molecular weight, grand average hydropathy (GRAVY), theoretical isoelectric point (pI), aliphatic index, and instability index. The cotton HD2 gene subfamilies were named as per the orthologous HD2 members in the A. thaliana genome.
Free full text: Click here