An automated variant filtering pipeline was used to narrow down the number of putative diagnostic variants (figure 2). First, common (>1% minor allele frequency) and non-functional (not protein-altering) variants were filtered out. Second, potentially pathogenic variants in known disease genes were selected in by comparison against an in-house database of genes consistently implicated in specific developmental disorders, the Developmental Disorders Genotype-to-Phenotype database (DDG2P). This database includes more than 1000 genes that have been consistently implicated in specific developmental disorders and is updated regularly with newly implicated genes (table 1; appendix 1; appendix 2). Each gene in DDG2P is associated with a specific developmental phenotype or syndrome via a particular genetic mechanism (autosomal dominant, autosomal recessive, or X-linked) and mutation consequence on the gene product (loss of function, activating mutation, increased gene dosage, etc). The use of DDG2P enabled any rare variant in a known DD gene with a predictable effect on the gene product to be flagged on the basis of inheritance, genotype, and likely mutational consequence. Large, rare CNVs overlapping non-DDG2P genes were also flagged based on a series of size thresholds (>100 kb for losses and >250 kb for gains where the inheritance was either de novo or segregated with disease, and >500 kb for any genic CNV for which the inheritance was unclear).

Variant filtering logic for clinical reporting within the study

Genomic variants were filtered on the basis of six factors, of which the first five were automated and the final one was done manually: (1) frequency, prevalence of the variant in the general population (MAF ≤1%); (2) function, most severe predicted functional consequence, such as LOF, defined by specific sequence ontology terms (transcript ablation, splice donor variant, splice acceptor variant, stop-gained, frameshift variant, stop-lost, initiator codon variant, in-frame insertion, in-frame deletion, missense variant, transcript amplification, and coding sequence variant); (3) location, genomic location compared with DDG2P of published genes; (4) variant type, genotype (eg, heterozygous or homozygous) and loss or gain for small CNVs (which were only considered when they contained entire genes in which LOF or dominant negative mutations had been previously reported, and gains were only considered when they overlapped genes in which increased gene dosage mutations had been previously reported); (5) inheritance, aspects of the pipeline that are dependent on inheritance information derived from parental data are shaded; and (6) phenotype, patient phenotype was manually compared against published phenotypes for a particular gene. MAF=minor allele frequency. CNV=copy number variant. LOF=loss of function. DDG2P=Developmental Disorders Genotype-to-Phenotype database.

Changes to DDG2P over time

July, 2012November, 2012July, 2013November, 2013
Total reportable genes*81987510751128
Genes added..6020160
Genes removed..417

In addition to genes being added or removed, annotations for existing genes can also change (eg, to include multiple modes or mechanisms). The November, 2013 version was used for the analysis presented here and includes 1128 reportable genes. DDG2P=Developmental Disorders Genotype-to-Phenotype database.

DDG2P also contains non-reportable categories when there is insufficient evidence associating a gene and developmental disorder (appendix 1).

The selection of variants for reporting is based on the strongest available evidence of gene function and no variants yet reported have been retracted because of changes in the DDG2P list.

Free full text: Click here