We constructed the paired-end DNA libraries with insert sizes larger than 2 kb by self-ligation of the DNA fragments and merging the two ends of the DNA fragment. We randomly fragmented the circularized DNA and enriched the fragments crossing the merged boundaries using magnetic beads with biotin and streptavidin. The sequencing process followed the manufacturer’s instructions (Illumina), and the fluorescent images were processed to sequences using the Illumina data processing pipeline (v1.1).
The genome sequence was assembled with short reads using SOAPdenovo software
6 (
http://soap.genomics.org.cn), which adopts the de Bruijn graph data structure to construct contigs
7 (link). The reads were then realigned to the contig sequence, and the paired-end relationship between the reads was transferred to linkage between contigs. We constructed scaffolds starting with short paired-ends and then iterated the scaffolding process, step by step, using longer insert size paired-ends. To fill the intra-scaffold gaps, we used the paired-end information to retrieve read pairs that had one read well-aligned on the contigs and another read located in the gap region, then did a local assembly for the collected reads.
Known transposable elements were identified using RepeatMasker (version 3.2.6)
14 against the Repbase
31 (link) transposable element library (version 2008-08-01), and highly diverged transposable elements were identified with RepeatProteinMask
14 by aligning the genome sequence to the curated transposable-element-related proteins. A
de novo panda repeat library was constructed using RepeatModeller
14 . Using evidence-based gene prediction, the human and dog genes (Ensembl release 52) were projected onto the panda genome, and the gene loci were defined by using both sequence similarity and whole-genome synteny information.
De novo gene prediction was performed using Genscan
16 (link) and Augustus
17 (link). A reference gene set was created by merging all of the gene sets. The sequencing reads were mapped on the panda genome sequence using SOAPaligner
8 (link), and heterozygous SNPs were identified by SOAPsnp
9 (link).
Li R., Fan W., Tian G., Zhu H., He L., Cai J., Huang Q., Cai Q., Li B., Bai Y., Zhang Z., Zhang Y., Wang W., Li J., Wei F., Li H., Jian M., Li J., Zhang Z., Nielsen R., Li D., Gu W., Yang Z., Xuan Z., Ryder O.A., Leung F.C., Zhou Y., Cao J., Sun X., Fu Y., Fang X., Guo X., Wang B., Hou R., Shen F., Mu B., Ni P., Lin R., Qian W., Wang G., Yu C., Nie W., Wang J., Wu Z., Liang H., Min J., Wu Q., Cheng S., Ruan J., Wang M., Shi Z., Wen M., Liu B., Ren X., Zheng H., Dong D., Cook K., Shan G., Zhang H., Kosiol C., Xie X., Lu Z., Zheng H., Li Y., Steiner C.C., Lam T.T., Lin S., Zhang Q., Li G., Tian J., Gong T., Liu H., Zhang D., Fang L., Ye C., Zhang J., Hu W., Xu A., Ren Y., Zhang G., Bruford M.W., Li Q., Ma L., Guo Y., An N., Hu Y., Zheng Y., Shi Y., Li Z., Liu Q., Chen Y., Zhao J., Qu N., Zhao S., Tian F., Wang X., Wang H., Xu L., Liu X., Vinar T., Wang Y., Lam T.W., Yiu S.M., Liu S., Zhang H., Li D., Huang Y., Wang X., Yang G., Jiang Z., Wang J., Qin N., Li L., Li J., Bolund L., Kristiansen K., Wong G.K., Olson M., Zhang X., Li S., Yang H., Wang J, & Wang J. (2009). The sequence and de novo assembly of the giant panda genome. Nature, 463(7279), 311-317.