To detect small sequencing errors (single nucleotides and indels of 1–4 bases) in the newly assembled reference genome, Illumina Genome Analyzer II/IIx generated reads were used. The genomes of two different individuals of Nipponbare were independently re-sequenced at NIAS and CSHL. Low quality bases (http://hannonlab.cshl.edu/fastx_toolkit/). Reads that were <32 bp in length were discarded for further analyses. If only one read of a paired-end read set was discarded in these preprocessing steps, the other read was regarded as a single-end read and named "unpaired." All qualified reads were aligned to the reference genome using BWA v0.5.8a with default options (Li and Durbin 2009 (link)). The NIAS single-end reads and CSHL unpaired reads were aligned in the single-end mode using the BWA command “samse”. The CSHL paired-end reads were aligned in the paired-end mode using the BWA command “sampe”. The reads that matched to multiple genomic positions were discarded. A pile up alignment file of all uniquely mapped reads with a mapping quality value of ≥20 was generated using SAMtools v1.8 (Li et al. 2009 (link)). To avoid erroneous detection of variants, only sites with a read depth of 10 or more were selected.
By comparing the Illumina reads with the reference genome, each aligned site was first classified into four categories: "reference type (R)," "non-reference type (N)," "allelic (A)," and "low depth (L)" for each of three sets (NIAS, CSHL and NIAS + CSHL) (Additional file7 ). If a site had less than 10 reads, the site was "low depth (L)," which means we were unable to assess the site due to low sampling. If ≥80% of the reads were identical to the reference base, the site was classified as "reference type (R)". If ≥80% of the reads were discordant with the reference base, the site was classified as "non-reference type (N)". If there were two alleles with ≥40% read support, the site was classified as "allelic (A)". Since we have two data sets from NIAS and CSHL, the classifications of the three sets (NIAS, CSHL and NIAS + CSHL) were combined and reexamined to decide the genotype for each site (Additional file 7 ): "reference type", "sequencing error (Additional file 9 )", "alleles between individuals” (Additional file 10 ), "alleles within individuals” (Additional file 11 ), and "low depth". SNPs classified as allelic variations were annotated based on the RAP-DB gene models using SnpEff v. 3.1 (Cingolani et al. 2012 (link)) (Additional file 12 ).
The genome of the same NIAS individual used in the Illumina re-sequencing was sequenced using the Roche GS FLX platform. Low quality bases (http://www.repeatmasker.org/) with the MIPS Repeat Element Database (mips-REdat) version 4.3 (http://mips.helmholtz-muenchen.de/plant/genomes.jsp ; Spannagl et al. 2007 (link)) and the Triticeae Repeat Sequence Database release 10 (http://wheat.pw.usda.gov/ITMI/Repeats/ ). All preprocessed reads were aligned to the reference genome using Megablast (version 2.2.24) with the following options: -F 'm D' -U T -e 1e-10 (Zhang et al. 2000 (link)).
By comparing the Illumina reads with the reference genome, each aligned site was first classified into four categories: "reference type (R)," "non-reference type (N)," "allelic (A)," and "low depth (L)" for each of three sets (NIAS, CSHL and NIAS + CSHL) (Additional file
The genome of the same NIAS individual used in the Illumina re-sequencing was sequenced using the Roche GS FLX platform. Low quality bases (
Full text: Click here