The procedure builds on the method developed by Wint and Robinson [16] for disaggregating livestock statistics, based on environmental and other spatial data. Spatially stratified, statistical regression models are developed using data from a series of sample points within each training data polygon and these models are then applied to the entire one-kilometre resolution set of predictor variables in order to estimate livestock densities, disaggregated over a defined study area. For GLW 2, this basic methodology has been revised and improved in a number of ways and was first assessed by Van Boeckel et al. [19] (link) and Prosser et al. [18] (link) in a detailed analysis of its performances for modelling domestic ducks in Asia and poultry in China respectively. Van Boeckel et al. (2011) looked at how downscaling performance was influenced by the aggregation level of input domestic duck data in Thailand (no data for the country, only one value for the country, administrative level 1 data, and administrative level 2 data) and comparing the predictions to actual admin level 3 data. The result was that downscaling based on the method outlined below was giving relatively good results provided that the training data were available at administrative level 1, and were degraded with coarser (national-level data) input data. In a separate study, Prosser et al. (2011) compared land-use based downscaling with the GLW 2 methodology to predict chicken ducks and geese in China, and found land-use based methods to give lower performances. In another previous study, though based on a non-stratified implementation of regression methods, Newmann et al. (2009) found comparable results between land-use and regression based downscaling [17] . In human population mapping, the AfriPop and AsiaPop databases are still largely based on land-use weighting of human population density per land use class [35] (link)–[36] (link), whereas new developments of the WorldPop consortium (www.worldpop.org.uk) are moving toward the use of machine learning methods such as random forest or boosted regression trees (Andy Tatem, comm. pers).
The stratified regressions are repeated a specified number of times using random selections of pixels from which to extract dependent and independent variables (bootstraps). This produces multiple models from which the variability as well as the mean values of the model predictions can be calculated for each pixel.
Because of the sheer volume of input and covariate data, the modelling must be broken down into continental tiles. Whilst bespoke geographic tiles can be created for specific tasks or projects, there are six continental tiles that are processed independently, and the global dataset is updated every time a new continental tile is processed (see file S1).
Free full text: Click here