In the second stage, we conducted comprehensive literature reviews to identify deterministic factors to represent local-scale gradients in pollutant concentrations associated with specific sources (i.e., highways, major roads, gas stations). For each pollutant, we identified concentrations near these selected sources in relation to local background levels and developed deterministic multipliers with distance decay rates (together referred to as gradients in this paper) to apply to the background and regional concentrations predicted by our LUR models. All statistical analyses were conducted using SAS (version 9.1; SAS Institute Inc., Cary, NC, USA).
Air quality data. Annual average concentrations of PM2.5 (177 monitoring stations), NO2 (134 monitors), and benzene, ethylbenzene, and 1,3-butadiene (53 monitors) were calculated using data from unique NAPS monitoring sites that were operating during 2006 (see
NAPS includes different monitor types for PM2.5, including tapered element oscillating microbalances (TEOMs), dichotomous partisol samplers (Thermo Fisher Scientific Inc., Waltham, MA, USA), and beta-attenuation mass monitors (Met One Instruments Inc., Grants Pass, OR, USA). Multiple monitors are often present at one location, and our comparative analysis found differences in levels measured by TEOMs, which are known to underpredict PM2.5 because of nitrate evaporation (Dann T, personal communication). We therefore selected other monitor types when they were available at the same location. Those stations with only TEOMs available were adjusted based on yearly calibration between collocated dichotomous and TEOM monitors during 2006 [n = 14, dichotomous = 1.640 + 1.089 × (TEOM), R2 = 0.89, p < 0.001]. NO2, benzene, ethylbenzene, and 1,3-butadiene were measured using standard methods (NAPS 2004).
Predictor variables. PM2.5 and NO2 satellite data. Canada-wide concentrations of PM2.5 and NO2 were estimated using satellite atmospheric composition data combined with local, coincident scaling factors from a chemical transport model [Goddard Earth Observing System (GEOS)-Chem 2011]. Ground-level PM2.5 estimates were derived from aerosol optical depth data from the Terra satellite [National Aeronautics and Space Administration (NASA) 2011b], in combination with output from GEOS-Chem simulations to estimate the relationship between aerosol optical depth over the atmospheric column and ground-level PM2.5 (van Donkelaar et al. 2010 (link)). Ground-level NO2 concentrations were estimated from tropospheric NO2 columns retrieved from the ozone monitoring instrument on the Aura satellite (NASA 2011a); GEOS-Chem was also used to calculate the relationship between the NO2 column and ground-level concentration (Lamsal et al. 2008 (link)). Both PM2.5 and NO2 were estimated at a 0.1 × 0.1° resolution (~ 10 × 10 km). Estimates for PM2.5 were calculated from 2001–2006 data to ensure sufficient observations. For NO2 estimates, we used data from 2005 and 2006, because ozone monitoring instrument measurements began in late 2004.
Geographic data. We modeled regional pollutant variation using geographic predictor variables potentially relevant to pollutant sources, emissions, and dispersion. To capture varying spatial influences of predictors, all variables were calculated for circular buffer distances ranging from 50 m to 50 km. Classes of variables included population density derived from census block-face points (Statistics Canada 2006 ); 1-km land use classifications (Global Land Cover Characterization 2008 ); high-resolution (30 m) land-use classifications (DMTI Spatial Inc., Markham, Ontario, Canada); sources of large industrial emissions from the Canadian National Pollutant Release Inventory (NPRI; Environment Canada 2010 ); small point source locations extracted from the Dun and Bradstreet (D&B) Selectory database of businesses (Hoovers, Austin, TX, USA) in Canada; length of and distance to specific road classifications using the DMTI Spatial road network, such as freeway, highway, major road, and minor road (DMTI Spatial Inc.); length and density of railroads; elevation; and meteorological variables (precipitation and temperature). Any geographic variables with > 30% zero values—those with no predictive features in proximity to a monitor—were recoded as binary (i.e., present/absent). In total, 10 variable classes and 270 buffer-specific variables were explored in the LUR models.
Deterministic gradients. Gradients were developed with a focus on mobile sources and gas stations. We conducted a comprehensive literature review of published studies to identify the distance from sources at which pollutant concentrations typically return to background levels, and an expected ratio of near-source pollutant levels compared with background pollutant levels for each source and pollutant. We searched PubMed (2010), Web of Science (Thomson Reuters 2010 ), and Google Scholar (2010) using a range of keywords to identify studies with measurements of pollutant gradients. Studies varied widely in terms of location, date, methods, duration of measures, number of samples, and definition of near source and background. We developed linear gradients using the steepest portion of the exponential decay curves typically found in the literature, as the tails of the decay functions were very sensitive to local parameters. Gradients were also selected to represent Canadian conditions.
To identify the distance of each NAPS monitor from the nearest highway, major road, local urban road, and gas station, we used DMTI road network data and D&B commercial data for point sources. If a monitor was close enough to one of these features for the source to influence pollutant levels, we modified the corresponding LUR model results (not including point source industrial variables) to account for the deterministic gradients. For example, based on our review of the literature, we assumed that NO2 concentrations at the side of a highway would be 1.65 times higher than LUR-based background concentrations but consistent with background levels 300 m from the highway; this assumption resulted in a distance decay rate of 0.33% per meter that was applied to the model to estimate NO2 levels within the 300-m gradient buffer.
Model evaluation. We used three approaches for model evaluation. Due to the small number of NAPS monitoring stations for PM2.5, NO2, benzene, ethylbenzene, and 1,3-butadience, we did not leave out a percentage for independent postmodel evaluation, because we wanted to capture the greatest range of model predictors possible. Therefore, we first evaluated all LUR models using a bootstrap approach to determine the sensitivity of model prediction and parameter estimates to monitor sampling. Random selection of monitors was conducted, with replacement, and variable coefficients and model R2 values were recorded from the new full sample. This was repeated for 10,000 iterations to estimate the 95% confidence interval (CI) for overall model prediction and individual variable coefficients. Next, we conducted a leave-one-out analysis where each LUR model was repeatedly parameterized on n – 1 data points and then used to predict the excluded monitor measurement. The mean differences between the predicted and measured values were used to estimate model error.
Finally, we evaluated the NO2 and benzene LUR models, with and without gradients, against independent data (35–196 monitoring sites per city) previously collected for LUR models in seven Canadian cities (for a full description of data collection and modeling see Allen et al. 2010 ; Atari and Luginaah 2009 (link); Crouse et al. 2009 ; Henderson et al. 2007 (link); Jerrett et al. 2007 (link); Su et al. 2010 ). Briefly, in each city, monitoring took place over a 2-week period; data from fixed-site monitors, monitoring during yearly average concentration periods, or multiple measurement periods were used to estimate yearly averages [see Supplemental Material,
Population exposure assessment. The national pollutant models were applied to each of the 478,831 Statistics Canada street block-face centroid locations in 2006 to estimate population exposures. First, we applied the LUR models to each block point to derive a unique predicted pollutant concentration for each point, representing the average exposure level for 89 and a SD of ± 158 individuals. We used a GIS to identify the distance of each block centroid to the nearest highway, major road, local urban road, and gas stations and adjusted the corresponding LUR model estimate when the street block point was located within an associated gradient. We then estimated population-weighted exposures to PM2.5, NO2, benzene, ethylbenzene, and 1,3-butadiene in the Canadian population as a whole, and we estimated uncertainty using the 95% confidence limits for LUR model predictions. Because there was insufficient information in the literature to examine uncertainty for specific gradients, we selected ± 50% for all gradients (values shown in