Estimates determined that 220 patients per treatment group would provide >95% power for comparison between baricitinib 4 mg and placebo in ACR20 response rate (assumed 60% vs 35%, respectively) at week 12. Randomised patients treated with ≥1 dose of study drug were included in the efficacy analyses under a modified intent-to-treat principle (analysis set).
A stepwise family-based hypothesis testing strategy controlled type I error for primary and key secondary endpoints at 12 weeks for ACR20, HAQ-DI and DAS28-CRP change from baseline, SDAI score ≤3.3, MJS duration, MJS severity, worst tiredness and worst joint pain, with corresponding hypotheses tested for baricitinib 4 or 2 mg versus placebo (see online supplementary figure S1). Only if all tests in a family were significant did the sequence proceed to the next family of tests in the hierarchy; otherwise, subsequent evaluations were considered as supportive analyses in the context of this method with strong control for the familywise error rate. Treatment comparisons for categorical and continuous efficacy measures were performed using logistic regression and analysis of covariance (ANCOVA), respectively, with baseline value (for continuous measures), treatment, region and centrally confirmed the presence of baseline joint erosions in the model. Fisher's exact test was used for categorical safety data or when sample size requirements for the aforementioned logistic regression model were not met. Continuous safety data were analysed using ANCOVA with baseline value and treatment in the model. Duration of MJS was analysed using the Wilcoxon rank-sum test. Analyses were assessed with a significance level of 0.05 (two-sided) unless otherwise defined by the gatekeeping procedure (see online supplementary figure S1).
Patients who were rescued or discontinued were defined thereafter as non-responders (non-responder imputation) for all categorical efficacy outcomes. For continuous efficacy outcomes, the last observations before rescue treatment or discontinuation were carried forward (modified last observation carried forward method). For continuous secondary efficacy measures that were included in the hierarchical testing (see online supplementary figure S1) and where discontinuation was due to an AE, the baseline observation was carried forward to the week 12 timepoint (modified baseline observation carried forward method). Linear extrapolation was used to impute missing data for analysis of the structural progression endpoint at week 24. For patients who were rescued or discontinued, baseline data and the most recent postbaseline radiographic data prior to or at initiation of rescue therapy or discontinuation were used to extrapolate week 24 scores. Analysis methods dependent upon other missing data mechanisms (eg, mixed models for repeated measures, tipping point analyses) were conducted to ensure conclusions were robust. Safety observations were analysed by assigned treatment until the time of rescue or completion of the treatment period.