The data were entered using a Microsoft Excel 2007 (Microsoft Corporation, Redmond, WA) spreadsheet and exported into STATA 11.0 for analysis (Stata Corporation, College Station, TX, USA). Point prevalence maps were developed in ArcGIS 10 (ESRI, Redlands, CA) and covariate data extracted for each data point. Multicollinearity between the covariates was initially explored using cross-correlations and where correlation coefficients were >0.7 only non-linearly related covariates were included in the analysis (
S1 Text).
Boosted Regression Tree (BRT) modelling[32 (
link),33 (
link)] was used to identify the environmental factors associated with the occurrence of podoconiosis in Ethiopia. This approach has been effectively used in global mapping of dengue, LF, leishmaniasis and malaria vector mosquitos [34 –37 (
link)] and has superior predictive accuracy compared to other distribution models[38 ]. In brief, BRT modelling combines regression or decision trees and boosting in a number of sequential steps [32 (
link),33 (
link)]. First, the threshold of each input variable that results in either the presence or the absence of podoconiosis is identified, allowing for both continuous and categorical variables and different scales of measurement amongst predictors [32 (
link)]. Second, boosting is a machine-learning method that increases a model’s accuracy iteratively, based on the idea that it is easier to find and average many rough ‘rules of thumb’, than to find a single, highly accurate prediction rule.
Boosted Regression Tree utilizes data on both presence and absence of podoconiosis. Presence was defined as an area with at least one case in the two surveys and absence as an area with no cases in either survey. A selection of 16 environmental and climate covariates were included in a single BRT model in order to explore the relative importance of each covariate in explaining the occurrence of podoconiosis in Ethiopia. Four covariates (land cover, soil type, soil texture, urban rural classification) were excluded that showed little explanatory power (<1% of regression trees used the covariate) on the occurrence of podoconiosis. The retained covariates were used to build the final model included annual precipitation, elevation, population density, enhanced vegetation index, terrain slope, distance to water bodies, silt fraction and clay fraction. In order to obtain a measurement of uncertainty for the generated model, we fitted an ensemble of 120 BRT submodels to predict sets of different risk maps (each at 1km x 1km resolution) and these were subsequently combined to produce a single mean ensemble map and the relative importance of predictor variables was quantified. These contributions are scaled to sum 100, with a higher number indicating a greater effect on the response. Marginal effect curves were plotted to visualize dependencies between the probability of podoconiosis occurrence and each of the covariates. To assess the association of covariates and high prevalence podoconiosis, the prevalence estimates were plotted against each environmental variable. This will help to identify the areas with very high prevalence and to prioritize interventions. BRT modelling and model visualization was carried out in R version 3.1.1 [39 ] using the packages raster [40 ]and dismo[41 ].
The resulting predictive map depicts environmental suitability for the occurrence of podoconiosis. In order to convert this continuous map into a binary map outlining the limits of podoconiosis occurrence, a threshold value of suitability was determined, above which the occurrence was assumed to be possible. Using the receiver operating characteristic (ROC) curve, a threshold value of environmental suitability was chosen such that sensitivity, specificity and proportion correctly classified (PCC) values were maximized. Finally, we estimated the number of individuals at risk by overlaying the binary raster dataset displaying the potential suitability for podoconiosis occurrence on a gridded population density map[26 (
link),27 (
link)] and calculating the population in cells considered to be within the limits of podoconiosis occurrence. The 95% CI of the population at risk were calculated based on binary maps of the lower (2.5%) and upper (97.5%) bounds of the predicted probability of occurrence.
The performance of each sub-model was evaluated using different statistics, including: proportion correctly classified [PCC], sensitivity, specificity, Kappa [κ] and area under the receiver operator characteristics curve (AUC). The mean and confidence intervals for each statistic were used to evaluate the predictive performance of the ensemble BRT model. In addition to ensemble approach to validation, an external validation was performed using data from 96 independent surveys conducted between 1969 and 2012 [6 ,7 (
link),9 (
link)–12 (
link),42 (
link)–44 (
link)] which we previously identified through structured searches of the published and unpublished literature [14 (
link)]. The AUC was used to assess the discriminatory performance of the predictive model, comparing the observed and predicted occurrence of podoconiosis at each historical survey. AUC values of <0.7 indicate poor discriminatory performance, 0.7–0.8 acceptable, 0.8–0.9 excellent and >0.9 outstanding discriminatory performance [45 (
link)].
Deribe K., Cano J., Newport M.J., Golding N., Pullan R.L., Sime H., Gebretsadik A., Assefa A., Kebede A., Hailu A., Rebollo M.P., Shafi O., Bockarie M.J., Aseffa A., Hay S.I., Reithinger R., Enquselassie F., Davey G, & Brooker S.J. (2015). Mapping and Modelling the Geographical Distribution and Environmental Limits of Podoconiosis in Ethiopia. PLoS Neglected Tropical Diseases, 9(7), e0003946.