To account for differences in survey method, fish data were standardised using calibration factors to the NOAA Biogeography Program belt transect since that method had the greatest consistency with the majority of other programs34 (link) (Supplementary Table
The biomass of individual fishes was estimated using the allometric length-weight conversion: W = aTLb, where parameters a and b are species-specific constants, TL is total length (cm), and W is weight (g). Length-weight fitting parameters were obtained from a comprehensive assessment of Hawai‘i specific parameters (Donovan et al., unpublished data) and FishBase37 . Several fish species were removed from fish biomass calculations if aspects of their life history led to inaccurate counts with visual surveys, such as cryptic benthic species, nocturnal species, and pelagic schooling species. Likewise, manta rays were excluded, as their size is difficult to visually estimate and they have high biomass but are encountered infrequently. Additional methodology was developed for dealing with outliers in the fish data, accounting for extreme observations of schooling species. Extreme observations in the database were defined by calculating the upper 99.9% of all individual observations (e.g. one species, size and count on an individual transect), resulting in 26 observations out of over 0.5 million, comprised of 11 species. The distribution of individual counts in the entire database for those 11 species was then used to identify observations that fell above the 99.0% quantile of counts for each species individually. These observations were adjusted to the 99.0% quantile for analysis.
Fish and benthic assemblages were analyzed primarily at the level of functional groups. Benthic assemblages were broken into major functional groups including coral, macroalgae, turf algae, crustose-coralline algae, and other benthic cover (e.g. sponges, sand, basalt rock, recently dead coral). Other benthic cover was not broken down further due to limitations from incorporating data from different methods with different definitions for other benthic taxa. The fish assemblage was characterised into three trophic groups; herbivores, secondary consumers, and predators. Herbivores were further subdivided based on their feeding mode into browsers, grazers and scrapers following Edwards et al.38 (link), which have been suggested as important indicators of resilience on coral reefs (Supplementary Table
Before analysis, all data were fourth root transformed and centered to meet the assumptions of linear models and all variables were standardised to the same scale. The fourth root transformation was chosen because it was strong enough to meet assumptions for all variables, such that a common transformation could be used for all 10 variables.
Regimes were identified using model-based clustering with the mclust package in R44 with the 10 fish and benthic functional groups as inputs. The cluster analysis is based on a probability model where each cluster is a mixture of multivariate normal distributions composed of the densities of each component, and each observation is assigned to a cluster based on the probability of membership given the observation. The mclust function uses three strategies for defining clusters: 1) initialization of the model with model-based hierarchical clustering, 2) maximum likelihood estimation with the expectation-maximization algorithm, and 3) model selection and the number of clusters that are approximated with Bayes factors and Bayesian Information Criterion45 (link) (Supplementary Figure
The 10 multivariate benthic and fish functional groups were visualised with a non-metric multidimensional scaling plot using the metaMDS function in the vegan package in R46 . A Bray-Curtis distance matrix was created with 2-dimensions and a maximum of 50 random starts to search for a stable solution and avoid getting trapped in a local optima47 . Multivariate dispersion was also calculated for each regime and tested with an analysis of multivariate homogeneity of group dispersions with the betadispr function in the vegan package.
Coral species richness was examined across regimes with an Analysis of Variance, and contrasts and confidence intervals were calculated with Tukey’s honest significant differences where coral richness was the response variable and regimes were the explanatory variable. The community composition of corals was also examined by calculating the proportional cover of the four most abundant species within each regime, including Porites lobata, Pocillopora meandrina, Porites compressa, Montipora capitata. Coral species were also classified with a trait-based approach into competitive, stress-tolerant, generalist or weedy species following Darling et al.48 (link), with additional species specific information on bleaching tolerance, and were compared across regimes.
In the Hawaiian Islands, seascape variables such as depth, habitat complexity, and wave exposure have been shown to be important predictors of fish assemblages49 (link)–52 (link). As a result, spatial patterns across regimes were examined by comparing the proportion of each regime at each island, and by comparing the proportion of sites within a regime that were located on north, south, east, and west facing shores across islands. Additionally, depth and habitat complexity, measured as the maximum rate of change in seafloor slope (i.e. slope of slope), were calculated for each point from LiDAR derived bathymetry within a 60 m radius of each survey location53 (link), and were compared across regimes with an Analysis of Variance and post-hoc Tukey multiple comparisons with either depth or complexity as response variables and regimes as explanatory variables.
Temporal transitions between regimes were assessed by predicting the regime for each year at each site individually with the function predict.Mclust in R. Predictions were retained if there was at least a 95% probability of the regime prediction belonging to that regime, and only sites where predictions were available for at least three years between 2000 and 2016 were retained. We used an extended dataset for the transition analysis encompassing a greater number of years compared to the dataset used to characterise the regimes, which ended in 2013. The extended dataset was not used previously in characterizing the regimes in order to avoid confounding effects of coral bleaching events that occurred in 2014 and 2015, as widespread bleaching was unobserved in all previous study years. A total of 80 sites were included in the temporal data set, and patterns of regime transitions were compared by calculating the frequency of a given transition as a proportion of the total number of possible transitions (n = 261). We also tested the sensitivity of analysing data from all 80 sites compared with only analysing those with longer time series (>4 or 6 years) by calculating binomial confidence intervals for each transition in each case. These sites also tended to represent permanent monitoring stations, which allowed for testing for the sensitivity to using observations from locations that may shift spatially from year to year. Binomial confidence intervals for each transition in each case were produced with the binconf function in the Hmisc package in R54 using the Wilson interval.
Finally, we tested the hypothesis that local and global human impacts will result in some transitions among regimes being more likely than others. Each observed transition was treated as a replicate, and the geographical position was used to obtain a value for a local impact represented as human population density within a 15 km radius, and a global impact represented as the degree heating weeks at the height of the 2015 bleaching event (Supplementary Material). The probability of a given transition was estimated with a Bayesian binomial model for each variable where there were at least four occurrences for a given transition as a function of either human population density or degree heating weeks. More details of the Bayesian binomial model are in the Supplementary material.
All analyses were conducted in the R environment for statistical computing version 3.3.055 .