Genetic maps were constructed in R (version 3.6.1) [58 ] with package OneMap (Version 2.1.1) [59 (link)]. Prior to importing the datasets into R/Onemap, variants were tested for Mendelian segregation (χ2 goodness of fit) and those where p < 0.00001 were removed with a custom Perl script (removeDistorted.pl) The resulting files, femalePT.vcf.gz and malePT.vcf.gz, are available at https://github.com/nicotralab/chen-et-al-sex-determination [49 ]. Each dataset was then thinned with vcftools [60 (link)] to ensure that the distance between adjacent variants was no less than 5000 bp and converted from vcf format into OneMap’s “.raw” format with a bash shell script (thin-for-onemap.v2.sh). After this step, we were left with 23,462 variants (20,058 SNPs and 3404 indels) for the maternal genome and 22,359 variants (19,771 SNPs and 2863 indels) for the paternal genome (Additional Files 10 and 11).
We constructed linkage maps in R (version 3.6.1; [58 ] with the package OneMap (Version 2.1.1) [59 (link)]. Variants with identical genotypes in the F1 animals were binned to create single markers for linkage mapping with the function find_bins(). After binning, we recalculated segregation distortion using a Bonferroni-corrected p-value of 0.05 and removed any remaining distorted markers. This resulted in a set of 977 markers for the maternal genome and 487 for the paternal genome. Two-point tests were used to calculate recombination fractions and LOD scores for each pair of markers, and linkage groups between non-distorted markers identified with a maximum recombination fraction (rf) of 0.4 and minimum LOD score determined by the OneMap function suggest_lod() (6.14 for the female dataset and 5.56 for the male dataset). In both cases, 15 linkage groups were obtained.
To order markers within each linkage group, the function order_seq() was used. This function selects an initial set of five markers and applies an exhaustive search to determine the order with the lowest LOD score. To this framework map, the remaining markers are added one-by-one to optimize the total LOD score of the growing map. Recombination fractions were converted to gastrozooid (cM) units using the Kosambi map function.
Most of the initial maps contained pairs of markers that were placed within 0.0001 cM of each other by the OneMap software. Upon closer inspection, we discovered that these markers were simply markers that were located on the same contig in the reference genome but had their alternative alleles in opposite phase of one another. Since these markers were essentially redundant to one another we decided to remove them from the maps. To do this we identified them in the initial maps with a custom perl script (identify_redundant_markers.pl), then removed them using the drop_marker function in OneMap.
A recombination fraction plot was then generated with the function rf_graph_table and visually inspected to identify misplaced markers. These were removed from the map and re-inserted with the try_seq() function. Markers that could not be confidently placed were removed entirely from the final maps. Summary statistics for each map were calculated using the Genetic Map Comparator [61 (link)]. The “unbinned” maps were created by using the custom perl script unbin_markers_in_map.pl.
Free full text: Click here