The ICD9 coding system describes diseases, signs and symptoms, injuries, poisonings, procedures and screening codes. Disease or symptom codes consist of a three-digit number (termed a ‘category’) followed, in most cases, by one or two additional specifying digits. For example, the three-digit code ‘427’ specifies cardiac arrhythmias and further digits are added to specify the type of arrhythmias, such as ‘AF’ (427.31). In most cases, physicians are required to specify codes to the fourth or fifth digit to bill the patient's insurance, although some diseases lack further specification (e.g. 042, human immunodeficiency virus). Some diseases of common etiologies cover multiple ICD9 categories based on acute and chronic effects, the anatomical areas affected or the disease severity and associated other events. ICD9 categories are further grouped hierarchically into sections and chapters.
Since the ICD9 terminology was designed primarily for billing and administrative functions, we developed custom ‘case groups’ of ICD9 codes to better allow for large-scale genomic research involving ICD9 codes. In general, we used the existing three-digit categories as a guide in designing our case groups. We performed one of several functions on the original ICD9 terminology: (i) we combined three-digit codes that represented common etiologies [e.g. creating a single ‘tuberculosis’ code group from 010 to 018 (primary tuberculosis), 137 (late effects of tuberculosis) and 647.3 (tuberculosis complicating the peripartum period)]; (ii) for clinically distinct phenotypes that are combined in a single three-digit code, we divided the existing ICD9 classification (by adding a fourth digit), such as Type 1 and Type 2 diabetes (both part of code ICD9 category 250); and (iii) we marked as ‘ignorable’ other ICD9 codes that were unlikely to be useful in a genetic context, such as contamination with foreign objects, non-specific signs and symptoms [e.g. 790.6 (other abnormal blood chemistry)], non-specific laboratory results, elective abortions and iatrogenic complications of medical care. There were 395 fully specified diagnosis-related ICD9 codes ignored from the analysis. When combining ICD9 codes from disparate parts of the code groupings (e.g. tuberculosis above contains codes in the ICD9 chapters ‘infectious and parasitic diseases’ and ‘complications of pregnancy, childbirth and the puerperium’), we chose the case group number most closely related to the etiology of the disease (e.g. we grouped all tuberculosis ICD9 codes under ‘010’ in the ‘infectious and parasitic diseases’ chapter of ICD9 codes).
In addition, we used the ICD9 coding system to generate comparison groups (‘controls’) for all case groups, which included all patients that did not have a prevalent ICD9 code belonging to a specified list of disease exclusions defined for each case group. The exclusions for most diseases closely followed the existing section groupings in the ICD9 hierarchy, which groups related conditions. Control groups for CD, for instance, excluded CD, ulcerative colitis and several other related gastrointestinal complaints. Similarly, control groups for myocardial infarction excluded patients with myocardial infarctions, as well as angina and other evidence of ischemic heart disease. There are 105 unique control exclusions groups. The custom ICD9 case and exclusion groupings are available from http://knowledgemap.mc.vanderbilt.edu/research.