Comparative Genomics of Legume Crop Genomes

SNP and indel sequence variation was assessed across a panel of cultivars relative to the cv. Tanjil reference genome. NGS reads were aligned to the cv Tanjil reference genome via bowtie v 2.0.5 (–very‐sensitive) (Langmead and Salzberg, 2012), and variants were called via the Genome Analysis Tookit 3.4‐46 (McKenna et al., 2010). GATK was used to perform read deduplication via Markduplicates, then variant calling with HaplotypeCaller (–stand_call_conf 20 –stand_emit_conf 20 –min_pruning 5), producing variant data in VCF format (Danecek et al., 2011). Genome comparisons were visualized using Circos v0.67‐1 (Krzywinski et al., 2009).
Orthologous gene clusters were predicted via OrthoMCL (Li et al., 2003) comparing translated annotations of NLL to protein datasets from C. cajan (Varshney et al., 2012), C. arietinum (Varshney et al., 2013), G. max (Schmutz et al., 2010), M. truncatula (Young et al., 2011), P. vulgaris (Schmutz et al., 2014) and A. thaliana (Initiative, 2000).
Analysis of rates of silent‐site substitutions was carried out by searching all peptides against all others for the species Lupinus angustifolius, Glycine max (v 2.0), Lotus japonicus (v 3.0), Medicago truncatula (v 4.0) and Phaseolus vulgaris (v 1.0). Top respective matches were retained between each species per chromosome pairing (allowing for multiple total hits between two species for a given query gene), and within each species (for analysis of whole‐genome duplications). Then in‐frame alignments of coding sequences were made for each retained peptide alignment. From alignments of coding sequences, values for Ks, Ka and Ka/Ks were calculated using the ‘codeml’ method from the PAML package (Yang, 2007). Also from protein alignments, synteny blocks were inferred using DAGchainer (Haas et al., 2004). From the per‐gene‐pair alignments and the synteny blocks, median Ks values for blocks were calculated and used for Ks histogram peaks (Figure 2).
Ages of species divergences and whole‐genome duplications (Figure S7) were calculated from modal Ks peaks (Data S7), by treating initially unknown branch lengths in the known species/duplication tree as variables in a set of equations. The species/duplication tree was rooted at the papilionoid whole‐genome duplication, which predated the main papilionoid radiation (Cannon et al., 2015). A time of 58 Mya for the initial papilionoid radiation was assumed (Lavin et al., 2005). There were 11 unknown branch lengths in the tree in Figure S7, and sufficient data from the modal distances between and within species comparisons to solve for these unknowns algebraically.
To evaluate evidence for a whole‐genome triplication (WGT), synteny blocks were identified using DAGchainer, and synteny coverage depth was calculated using the BEDTools v2.25.0 (Quinlan and Hall, 2010) ‘coverage’ function to make comparisons between other genomes and NLL as the reference, or between NLL and each other genome as the reference. Coverage of synteny blocks was calculated at each nucleotide position using the ‐d option and summarized per coverage depth level.
For visual dot plot assessments of NLL compared with itself and with other legume genomes, we used promer and mummerplot from the MUMmer package (Kurtz et al., 2004), (v3.23) to make comparisons of translated nucleotide sequence, on genomic sequence that was masked for all except exonic sequence. The promer results were filtered to require at least 80% identity.

Free full text: Click here

Hane J.K., Ming Y., Kamphuis L.G., Nelson M.N., Garg G., Atkins C.A., Bayer P.E., Bravo A., Bringans S., Cannon S., Edwards D., Foley R., Gao L., Harrison M.J., Huang W., Hurgobin B., Li S., Liu C., McGrath A., Morahan G., Murray J., Weller J., Jian J, & Singh K.B. (2016). A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant–microbe interactions and legume evolution. Plant Biotechnology Journal, 15(3), 318-330.

Publication 2016

Emit Exonic Frame Gene Gene clusters Genomic Glycine max Indel Legume Lotus japonicus Lupinus Medicago truncatula Nucleotide Nucleotide sequence Peptide Phaseolus vulgaris Protein Protein c Radiation Sequence variation Sequences alignments Synteny Tree

Corresponding Organization : University of Tasmania

Top 5 similar protocols

Protocol cited in 6 other protocols

Variable analysis

independent variables

NGS reads were aligned to the cv Tanjil reference genome via bowtie v 2.0.5 (–very‐sensitive)
Variants were called via the Genome Analysis Tookit 3.4‐46

dependent variables

SNP and indel sequence variation was assessed across a panel of cultivars relative to the cv. Tanjil reference genome
Orthologous gene clusters were predicted via OrthoMCL comparing translated annotations of NLL to protein datasets from C. cajan, C. arietinum, G. max, M. truncatula, P. vulgaris and A. thaliana
Analysis of rates of silent‐site substitutions was carried out by searching all peptides against all others for the species Lupinus angustifolius, Glycine max, Lotus japonicus, Medicago truncatula and Phaseolus vulgaris
Evidence for a whole‐genome triplication (WGT) was evaluated by identifying synteny blocks using DAGchainer and calculating synteny coverage depth using BEDTools
Visual dot plot assessments of NLL compared with itself and with other legume genomes were made using promer and mummerplot from the MUMmer package

control variables

Cv. Tanjil reference genome
Genome Analysis Tookit 3.4‐46 was used to perform read deduplication via Markduplicates, then variant calling with HaplotypeCaller (–stand_call_conf 20 –stand_emit_conf 20 –min_pruning 5)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!