Data. We used the data from the WHO household energy database (WHO 2012b ), which is a systematic compilation of nationally representative surveys or censuses and builds on earlier versions developed by the University of California, Berkeley (Smith et al. 2004 ). The WHO database provides estimates of the percentage of households using as their primary cooking fuel solid fuels (coal, wood, charcoal, dung, and crop residues), liquid fuels (kerosene), gaseous fuels (liquid petroleum gas, natural gas, biogas), and electricity. About three-fourths of the data were disaggregated by individual fuel type and approximately two-thirds of the data by urban and rural residency. These estimates do not directly include fuels used for space heating.
These survey data were obtained from a variety of sources. International multicountry surveys, specifically Macro International’s Demographic and Health Surveys (U.S. Agency of International Development 2012 ), UNICEF’s Multiple Indicator Cluster Surveys (UNICEF 2012 ), the WHO’s World Health Surveys (WHO 2012c ), and the World Bank’s Living Standard Measurement Studies (World Bank 2012b ), which together account for 39% of data points in the database. National censuses constitute a further 18%, and other national surveys such as household, employment, living conditions, or expenditure surveys accounted for another 20% of the database. The remaining 23% of data points are from other sources, including environmental and poverty assessments, MDG reports, and statistical figures provided on the websites of national statistics bureaus.
A total of 586 national country-year data points were available for modeling. These data points covered 155 countries, including 97% of all low- and middle-income countries (LMIC; defined as having < US$12,276 per capita in 2011–2012) and territories between 1974 and 2010, with at least one survey per country. Further details are available in Supplemental Material, Table S1 (http://dx.doi.org/10.1289/ehp.1205987).
Methods for modeling household SFU at the national level. The aim of the modeling was to obtain a complete set of annual trends of primary SFU by country using a transparent, reproducible model. The model should be suitable for estimating SFU for years without survey information in a particular country, and for countries without any survey data. The model should also closely follow empirical data without being unduly influenced by large fluctuations in survey estimates of SFU over adjacent countries or years. This is important because large fluctuations are unlikely in practice and generally reflect (in addition to random error) differences in survey design and conduct. In the absence of data for certain periods, we borrowed information from regional trends, assuming that fuel use patterns are likely to be similar. Also, the model should not be unduly sensitive to parameters such as following the trends of covariates (e.g., gross national income per capita) without compelling evidence of similar trends in SFU.
As seen in other work estimating household SFU (Mehta et al. 2006 ), for countries with no solid-fuel data but that are classified as high-income countries according to the World Bank country classification (World Bank 2012a ), SFU was assumed to be < 5%.
We reviewed range of alternative modeling approaches, including a variety of linear regression models and Bayesian hierarchical/Gaussian process regression models [for details see Supplemental Material, pp. 2–3 (http://dx.doi.org/10.1289/ehp.1205987)]. Also, potential developmental and energy-related covariates thought to be related to household solid fuel use (e.g., gross national income per capita, the percentage of the total population living in rural areas, population density, the percentage of the total population with access to improved sanitation, and the percentage of total energy consumption from fossil fuels) were investigated.
Multilevel/mixed-effects model. A multilevel nonparametric model without covariates was selected because it best fulfilled the above criteria and provided the best fit to the data based on Akaike’s information criterion (AIC), the Bayesian information criterion (BIC), and visual inspection. Modeling assumptions—linearity, normality, and homoscedasticity—also were checked by visual inspection of the residuals and were reasonably met (Goldstein 2010 ; Hox 2010 ). All surveys were included in the model [see Supplemental Material, Table S1 (http://dx.doi.org/10.1289/ehp.1205987)]. Covariates (income, percentage of rural population, population density) were evaluated but not retained because trends in some countries were rather sensitive to the particular set of covariates used. Multilevel modeling takes into account the hierarchical structure of the data; for example, survey points are correlated within countries, which are then clustered within regions (Goldstein 2010 ). When information is scarce for a particular country, regional information is used to derive estimates for a country.
The 155 countries were grouped into the 21 GBD regions, which are based on geographical proximity and epidemiological similarity (IHME et al. 2009 ). The model included hierarchical random effects for regions and countries. Time was the only explanatory variable included in the model, both in terms of fixed and random effects (at country level). The time variable was centered at the year 2003 (the median date of the surveys) and transformed into a natural cubic spline to allow for nonlinearity while providing a desired degree of stability (Orsini and Greenland 2011 ; Peng et al. 2006 ). The number of knots for the spline was chosen to allow the model to adequately follow the survey point trend and avoid any unlikely fluctuation. The locations of the knots were determined by the percentiles of the independent variable (Harrell 2001 ). The covariance model was chosen to be unstructured.
Using a technique of statistical simulation described by King et al. (2000) , we computed the national SFU prevalence estimates and accounted for uncertainty. We drew 1,000 times from the model parameters for the fixed effects to generate the outcome variable in order to capture the estimation uncertainty. We used the method described by De Onis et al. (2004) (
link) to derive regional and global prevalence confidence intervals (CIs).
We used the multilevel model for 150 countries with at least one survey data point. Regional estimates were used instead of model estimates for seven LMICs without survey data. We tested this assumption by performing out-of-sample evaluations on a truncated data set by removing countries from the data set (repeated 30 times). The mean median percentage point difference between the withheld data and the regional mean was 15.8%. We performed additional out-of-sample evaluations on three truncated data sets
a) with 20% of the country-years withheld on countries with more than one survey (repeated 30 times),
b) with the last survey withheld in countries with more than one survey and,
c) with the last 3 years (2008–2010) withheld. The median percentage point differences between the withheld data and the model outputs were 3.7%, 3.6%, and 3.7%, respectively.
Calculation of the population exposed. The model derives estimates of the percentage of households using solid fuels for a particular country and year. The fraction of the population exposed was assumed to be the same as the fraction of households using solid fuels. Accordingly, the SFU fraction was multiplied by the national population (United Nations 2012b ) to obtain an estimate of the absolute population exposed per country. In other words, no attempt was made to adjust population estimates for variations in household size across various settings (e.g., urban vs. rural households) because such data were not consistently available.
All analyses were conducted using Stata software (version 12; StataCorp LP, College Station, TX, USA).
Bonjour S., Adair-Rohani H., Wolf J., Bruce N.G., Mehta S., Prüss-Ustün A., Lahiff M., Rehfuess E.A., Mishra V, & Smith K.R. (2013). Solid Fuel Use for Household Cooking: Country and Regional Estimates for 1980–2010. Environmental Health Perspectives, 121(7), 784-790.