In order to establish summarized findings, we considered both age and risk of bias (methodologic quality) of the SR. Recognizing the short shelf-life of SRs, Whitlock [15 (link)] suggests that greater weight should be given to more recent SRs, with older SRs providing supporting evidence only. Since effect size was rarely reported, the outcome of interest was limited to confidence in the existence and direction of an association between a predictor and a subsequent outcome (risk of poor outcome, no association with outcome, or inconclusive). In our case, confidence in the direction of each predictor was established through first evaluating the findings from the most recent SR(s) of at least medium quality. Where multiple SRs were published on the same topic within a relatively short time span, confidence in the conclusions regarding the direction and significance of effect for each predictor was an amalgam of 1) SR quality and 2) consistency in findings across different authorship groups. For example, during the years 2007-2009, 5 SRs on prognosis following whiplash were published [3 (link), 6 (link)-9 (link)]. In light of the different methodologies for searching and synthesizing results across the included SRs, our consistency approach can be considered analogous to triangulation for establishing trustworthy results in qualitative research [18 ].
Given the phrasing of each prognostic factor, in only one case was a factor described as protective (i.e. facilitate recovery): regular physical activity in the case of non-traumatic neck pain. The confidence in each association was categorized using an approach adapted from the GRADE working group [19 (link)]: High, moderate, low or very low confidence that the direction of association is robust to findings in future research. In an attempt to be conservative, high confidence was reserved for only those predictors for which consistent high-quality evidence was presented in each SR with at least 1 high quality SR and no conflicting SRs. Moderate confidence required consistent high-level findings from at least 1 recent medium-quality SR, with the majority of findings from other concurrent SRs (where applicable) in the same direction of effect. Low confidence was assigned to a predictor when summary findings were of low-moderate level from the majority of SRs with some conflicting results, or when only a single SR reported significant but moderate findings for that predictor. Very low confidence was assigned when none of the above conditions were met. As a result of these algorithms, each predictor received both an estimate of its association with outcome (risk of poor outcome, no effect on outcome, inconclusive effect) and a level of confidence in that association (high, moderate, low, very low). Readers will note that this means it was possible to arrive at a conclusion of being highly confident in an inconclusive result, which holds meaning for establishing research priorities but less so for clinical practice.
Most SRs did not attempt to stratify the prognostic ability of a variable by outcome. This is understandable considering that there is little to no consensus on the most appropriate outcome to measure in prognostic research on neck pain [20 (link)]. Further, Walton and colleagues [6 (link)] attempted to evaluate the magnitude of prognostic effect between symptom-related outcomes and disability-related outcomes using meta-analysis, and showed that the magnitude of the effect was similar in almost all cases, with older age being the only notable exception. However, two SRs did present their summarized results stratified by type of outcome [5 (link), 16 (link)]. In most cases the magnitude of association was consistent across outcomes, but where it differed, the magnitude entered into the database was the best representation of the overall reported magnitude. For example, if a predictor showed a strong association with one outcome and a limited association with another, the strength of the association for that predictor overall was described in the database as moderate. This happened in only 7 of the 239 different summary statements extracted, which are denoted in the supplementary tables.