In preparation for our cluster analyses, all sleep variables were mean-centred and scaled such that higher scores indicated better sleep. We then used K-means cluster analysis to characterize subgroups of sleep variables in WRAP participants (n = 619), conducted using ‘factoextra’ package in R.64 The cluster assignment was based on the minimum distance (sum of the deviation of each variable) of a participant from the centroid of the cluster. The optimal number of clusters was identified using the elbow method by looking at the total within-cluster sum of square (WSS). To characterize the sleep group for each clustering-based subgroup of participants, the effect size ( ε2 ) of the sleep problems used in cluster analysis was noted in the right column of Supplementary Table 3. The relative contributions of the different problems in the grouping of participants were large, medium and small when ε2 ≥ 0.26, ε2 ≥ 0.08 and ε2 ≥ 0.01, respectively.65 Given the high correlation among sleep variables, we conducted preliminary cluster analyses, sequentially excluding subsets of the scales and examining fit statistics and consistency across solutions. Based on the best WSS and Calinski–Harabasz Index values, the following subset of scales was selected in primary analyses: SPI1, SDS, ADQ, SOM, self-reported sleep duration, ESS and ISI.
To characterize how sleep groups differed across sleep characteristics, we used chi-square for categorical variables and Kruskal–Wallis tests for Likert-scale variables [median (Q1–Q3) reported]. Post hoc pairwise group differences at unadjusted P < 0.05 were reported.
Three sensitivity analyses were conducted to investigate the consistency of sleep group assignments and to examine whether between sleep group patterns in our outcomes were stable across different sample selection criteria. Alternative 1: we used LPA to characterize sleep subgroups (‘Mclust’ package in R). Briefly, LPA was a data-driven approach using continuous variables and indicators to identify subgroups of individuals. In this statistical approach, subgroup membership was determined by examining the pattern of interrelationships among indicator variables (maximizing homogeneity within each subgroup and heterogeneity between subgroups).66 Alternative 2 (cognitively unimpaired subset only): we reduced the original set to include only those who were cognitively unimpaired (n = 21 with mild cognitive impairment were removed; leaving n = 598), and K-means cluster analysis was used in this subset. Alternative 3 (expanded set with imputed ISI): as previously noted, the primary cluster analysis was based on the first visit with MOS, ESS and ISI. Since the MOS and ESS questionnaires were added to the battery several years before the ISI, we opted to enlarge ‘baseline sleep’ in sensitivity analyses to include those who had not yet completed an ISI but had completed MOS and ESS at least once. The imputation method used the sleep data on a person both before and after the ‘missing value’. The next observation carried backward assigned the person’s next known sleep score after the ‘missing’ one to the ‘missing value’. If the person did not have the next value, the last observation carried forward, assigned the person’s last previous known sleep score to the ‘missing value’, was used.67 (link) The resulting enlarged set included n = 1237 available.
Free full text: Click here