Tag libraries of the samples of seeds in the seven developmental stages were prepared in parallel using an Illumina gene expression sample preparation kit and sequenced using the Illumina GAII platform at BGI-Shenzhen without replications (http://en.genomics.cn/navigation/index.action ) (Methods S1 ). A preprocessed database of all possible CATG+17 nucleotide tag sequences was created using our genome reference database. Further information on the genomic sequences and the predicted protein-encoding genes is available at ftp://Jatropha:9uebluesrjd7@ftp.genomics.org.cn and at the NCBI nucleotide database (Project ID: 63485). For annotation, all tags were mapped to the reference sequences, allowing no more than one nucleotide mismatch per tag. All the tags that mapped to reference sequences from multiple genes were filtered and the remaining tags were defined as unambiguous tags. As a result, we generated between 3.26 and 6.18 million raw tags for each of the seven samples (Table S5 ). After removing the low quality reads, the total number of tags per library ranged from 3.03 to 6.07 million and the number of tag entities with unique nucleotide sequences ranged from 83,820 to 167,765 (Table S5 ). For gene expression analysis, the number of expressed tags was calculated and then normalized to TPM (number of transcripts per million tags) [27] (link).
A 3×3 self-organizing map (SOM) of the gene expression data was constructed using GeneCluster 2.0 [84] (link) (http://www.broadinstitute.org/cancer/software/genecluster2/gc2.html ) with a variation filter (Max/Min≥5) to eliminate genes whose expression did not change significantly across samples, and normalization of the means and variance (mean = 0 and variance = 1). The SOM algorithm was executed with the desired cluster range of 3–9 and the rest of the parameters left unchanged. They are 50000 iterations, seed range of 42, initialization of centroids to random vectors, bubble neighborhood, initial and final learning weights of.1 and .005, and initial and final sigmas determining the size of the update neighborhood of a centroid set to 5 and .5, respectively.
A 3×3 self-organizing map (SOM) of the gene expression data was constructed using GeneCluster 2.0 [84] (link) (
Full text: Click here