We trained a neural network with the above variables to PM2.5 monitoring data from the AQS network. The relationships between input variables and PM2.5 could be highly nonlinear with complex interactions. Neural networks have the potential to model any type of nonlinearity.71 , 72 The details of the neural network, such as its structure and training method were articulated in the supplementary material. All input variables covered the entire study area, but some of them were not available in early years or had higher proportions of missing values. Missing values were especially common in Terra and Aqua AOD data. To deal with the missing values problem and different temporal coverages, we adopted the following steps. We used a calibration method to fill in the missing values in Aqua AOD data from 2003 to 2012 and Terra AOD data from 2001 to 2012 based on the association of GEOS-Chem outputs and land-use terms with non-missing AOD.56 For the other variables with a low fraction of missing values, we interpolated at grid cells with missing values. Regarding temporal coverage, GEOS-Chem outputs, land-use terms, MODIS outputs, and meteorological variables were available throughout the study period. OMI data, Aqua AOD, and Terra AOD were unavailable in earlier years. For years with one or more unavailable variables, we fitted the model with the remaining available variables.
Most previous studies used only in situ variables for modeling. However, information from a neighboring cell can be informative as well. For example, nearby road density, forest coverage and other land-use variables as well as nearby PM2.5 measurements either influence or correlate with local PM2.5 measurements. They are informative for modeling and can improve model performance. We accounted for spatial correlation by using convolutional layers in the neural network.73 A convolutional layer is computed by applying a convolution kernel on an input layer. Values from neighboring cells are combined through the use of the kernel function. The kernel takes the form a function (e.g. weighted average with Gaussian weights based on distance) that produces a scalar estimate from the multidimensional inputs. A convolution layer aggregates nearby information and can simulate some form of autocorrelation. We included convolutional layers for land-use terms and nearby PM2.5 measurements as additional predictor variables to account for spatial autocorrelation. Multiple convolution layers were incorporated to allow the neural network to model even more complex autocorrelation or possible interaction with other variables (Supplementary material). In addition to nearby grid cells, observations from nearby days for the same grid cell can be also informative. To incorporate this, we first fitted a neural network and obtained an initial prediction for PM2.5. We then computed temporal convolution layers and fitted the neural network again with them (Figure S5).
To validate model results and avoid overfitting, we used 10-fold cross-validation, in which all monitoring sites were randomly divided into 10%-90% splits. The model was trained with 90% of data and predicted PM2.5 at the remaining 10%. The same process was repeated for other splits. Assembling predicted PM2.5 at ten 10% testing sets yielded predicted PM2.5 for all the monitors. We computed correlation between predicted PM2.5 and monitored PM2.5. Spatial and temporal R2s were also calculated. Details of calculating R2 have been specified in the supplementary material.
The trained neural network was then used to make dailyPM2.5 predictions for each gridcell (1 km×1 km) for each day.
All programming was implemented in Matlab (version 2014a, The MathWorks, Inc.).