Each standardized patient vignette was entered into each website or app, and we recorded the resulting diagnoses and triage advice. An author (HS) with no clinical training entered all the vignettes. A random sample of 25 vignettes was entered into symptom checkers by another person without clinical training and the inter-rater reliability between the two in capturing the symptom checker’s recommendations for diagnosis and triage was high (Cohen’s κ 0.90). In some cases we could not evaluate a vignette because some symptom checkers focus only on children or on adults or the symptom checker did not list or ask for the key symptom in the vignette. To avoid penalizing these symptom checkers, we referred to standardized patient vignettes that successfully yielded an output as “standardized patient evaluations.”
To assess diagnostic accuracy, we noted whether the correct diagnosis was listed first or listed at all. For several vignettes, two symptom checkers presented a large number of diagnoses (as much as 99). Because such a long list of potential diagnoses is unlikely to be useful for patients, we considered a diagnosis to be listed at all only if it was within the first 20 diagnoses provided by a symptom checker. It is possible that many patients only focus on the top diagnoses listed. Therefore we also looked at whether the correct diagnosis was listed in the first three diagnoses given. We judged the diagnosis incorrect if the symptom checker indicated that the condition could not be identified.
We categorized the triage advice into three groups: emergent, which included advice to call an ambulance, go to the emergency department, or see a general practitioner immediately; non-emergent, which included advice to call a general practitioner or primary care provider, see a general practitioner or primary care provider, go to an urgent care facility, go to a specialist, go to a retail clinic, or have an e-visit; and self care, which included advice to stay at home or go to a pharmacy. If multiple triage locations were suggested (for example, emergency department or specialist), we used the most urgent suggestion. We chose to do so because in almost all of the cases the most urgent triage suggestion was listed first. If a symptom checker was unable to reach a decision on diagnosis for a given standardized patient vignette but provided triage advice, we still assessed the appropriateness of this triage advice. Symptom checkers that required users to select the correct diagnosis before giving triage advice were not included in assessing the accuracy of triage with the exception of iTriage, which always suggested emergent triage advice.