Annotation of protein disorder was performed using DISOPRED [12 (link)], using default parameters trained to give a 5% false positive rate. The total fraction of predicted protein disorder in a CB region is given by the D value. Coiled coils were identified with the program MULTICOIL [17 (link)], using default parameters. Known protein domains were assigned using the ASTRAL 40% identity protein domain sequence set, and BLAST using e-value ≤ 0.01 [13 (link),18 (link)]. Types of biased region that map to repetitive Zinc-finger-containing proteins (> 0.5 of the length of the protein) were numerous and were additionally filtered out.
GO (Gene Ontology; [19 (link)]) functional categories were taken from the annotation files provided on the Ensembl [16 ] and Gene Ontology [20 ] websites. Further GO term annotations were derived by mapping functional GO annotations for the PDB (downloaded from [20 ]) onto Ensembl protein annotations, using 50% sequence identity and 0.8 fractional sequence coverage (for the protein domain) as thresholds, using alignment made by the program BLASTP (e-value ≤ 0.0001) [13 (link)]. These thresholds were benchmarked on the complete SCOP protein domain sequence database [18 (link)], to give a 2% false positive rate for GO term transfer. Significant associations between GO terms and lists of protein sequences we calculated using binomial statistics, and a P'-value threshold of 0.05, where P' has been adjusted to account for multiple hypothesis testing, using the Bonferroni correction. In addition we used two functional supercategories, wherein all transcription-associated and non-transcription-associated GO terms were pooled together. The transcription-associated GO terms are: GO:0006355; GO:0006357; GO:0006366; GO:0006367;GO:0016563;GO:0003676;GO:0003677;GO:0003700;GO:0003702;GO:0003704;GO:0003713;GO:0030374;GO:0030528.
Free full text: Click here