We designed algorithms for application to primary and secondary care data to establish incident diabetes cases. Our focus was on type 2 diabetes, given the age of UKB participants at recruitment. To assist generalisability to the UKB population, we restricted CPRD data to those on whom we had linked secondary care data, people aged 40–69 years on 1st January 2006, (to reflect age entry criteria for UKB) Primary care algorithms were derived based on four types of evidence: 1) Diabetes diagnostic codes (considered separately as any diagnostic code and the more specific C10E [type 1 diabetes] or C10F [type 2 diabetes] codes, these are a requirement for the Quality Outcomes Framework [QOF] system[14 (link)]), 2) Diabetes medication, (excluding those on metformin only as this has other prescribing indications e.g. pre-diabetes, polycystic ovarian syndrome and is therefore not wholly diabetes specific), 3) Hyperglycaemia on blood results (defined as HbA1c≥6.5% or 48 mmol/mol, or fasting/ random/ unspecified glucose≥11.1 mmol/l) and 4) Presence of diabetes process of care codes (restricted to those routinely recorded for QOF monitoring purposes, e.g. retinopathy screening, foot checks etc.). The threshold for glucose was chosen because primary care records frequently do not specify whether glucose is fasting or not, and we wished to avoid false positives from a non-fasting glucose in the 7.0–11.1 mmol/l range. Using CPRD and the linked Welsh UKB sub-cohort, we used an iterative approach, cross-tabulating evidence at each step, to determine the logical steps to include in the algorithm and in what order. We then applied the final incidence algorithm to both databases. For CPRD, we excluded prevalent diabetes according to pre-existing C10 diabetes-specific Read codes, and for the Welsh dataset, we removed all those with prevalent diabetes according to our UKB algorithm.
When developing the incidence algorithms intended for secondary care data, we defined incident diabetes type based on ICD-10 codes (E10 = type 1 diabetes, E11 = type 2 diabetes, E13/E14 = unspecified diabetes). Prevalent diabetes was excluded as above.
For both primary and secondary care incidence algorithms, we derived event dates by taking the mid-point between the last primary care consultation/ hospital admission without diabetes and the date of the first diabetes Read code/ ICD code/ diabetes medication/ hyperglycaemic blood test/ fifth process of care code. If there were no previous consultations or admissions, we used the UK Biobank inception date. The date of the first diabetes Read code/ ICD code/ diabetes medication/ hyperglycaemic blood test/ fifth process of care code will be available to researchers separately if they wish to calculate the event date in an alternative manner.
When developing the incidence algorithms intended for secondary care data, we defined incident diabetes type based on ICD-10 codes (E10 = type 1 diabetes, E11 = type 2 diabetes, E13/E14 = unspecified diabetes). Prevalent diabetes was excluded as above.
For both primary and secondary care incidence algorithms, we derived event dates by taking the mid-point between the last primary care consultation/ hospital admission without diabetes and the date of the first diabetes Read code/ ICD code/ diabetes medication/ hyperglycaemic blood test/ fifth process of care code. If there were no previous consultations or admissions, we used the UK Biobank inception date. The date of the first diabetes Read code/ ICD code/ diabetes medication/ hyperglycaemic blood test/ fifth process of care code will be available to researchers separately if they wish to calculate the event date in an alternative manner.
Full text: Click here