Evaluating Propensity Score Matching Methods

In each simulated data set, we estimated the propensity score using a logistic regression model to regress treatment status on the 10 baseline covariates. Propensity-score matching was used to construct a matched sample consisting of pairs of treated and untreated subjects. We used greedy nearest neighbor matching on the logit of the propensity score using a caliper of width equal to , where is the variance of the logit of the propensity score in the ith treatment group. This caliper width was used as it has been shown to result in optimal estimation of risk differences in a variety of settings 10 .
In the propensity-score matched sample, the absolute risk reduction was estimated as the sample difference of the proportion of treated subjects in whom the outcome occurred and the proportion of untreated subjects in whom the outcome occurred in the propensity-score matched sample. When the true absolute risk reduction was 0 (the null hypothesis), the statistical significance of the estimated risk difference was assessed using two different methods. First, using methods for independent samples, the Pearson Chi-squared was used to assess the statistical significance of the difference in the probability of the outcome occurring between treatment groups 13 . Second, using methods for paired samples, McNemar's test was used for this comparison.
The variance of the difference in proportions was estimated in two different methods. First, using methods for independent samples, let p_T and p_C denote the observed probability of the outcome occurring in treated and untreated subjects, respectively, in the propensity-score matched sample. Furthermore, assume that there are N propensity score matched pairs. Then the standard error of the estimated risk difference is given by 13 . Second, using methods for paired samples, we assume that in the matched sample there were a pairs in which both the treated and untreated subjects experienced the event; b pairs in which the treated subject experienced the event while the untreated subject does not; and c pairs in which the untreated subject experienced the event while the treated subject did not. Then, the variance of the difference in proportions was estimated by ((b+ c)−(c−b)²/n)/n²14 (link). In both cases, 95 per cent confidence intervals were estimated as p_T−p_C±1.96 × se(p_T−p_C), where se(p_T−p_C) denotes the estimated standard error of the risk difference.
For each of the 100 scenarios (2 treatment−selection models × 2 probabilities of outcomes × 5covariate scenarios × 5 absolute risk reductions), we simulated 1825 data sets. The above analyses were conducted using each of the 1825 simulated data sets. In the 20 scenarios in which the true risk difference was 0, we estimated the empirical type I error rate as the proportion of simulated data sets in which the null hypothesis of no-treatment effect was rejected with a significance level of less than 0.05. Owing to our use of 1825 simulated data sets, an empirical type I error rate that was less than 0.04 or greater than 0.06 would be classified as being statistically significantly different from 0.05. For each of the 100 scenarios, we determined the proportion of estimated 95 per cent confidence intervals that contained the true risk difference. As above, due to the use of 1825 simulated data sets, empirical coverage rates that are less than 0.94 or that exceed 0.96 are statistically significantly different from the advertised coverage rate of 0.95. We also determined the mean width of the estimated 95 per cent confidence intervals across the 1825 simulated data sets. Finally, we compared the standard deviation of the empirical sampling distribution of the estimated treatment effects (i.e. the standard deviation of the 1825 estimated risk differences across the simulated data sets) with the mean of the estimated standard errors of the estimated treatment effect.

Free full text: Click here

, & Austin P.C. (2011). Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples. Statistics in Medicine, 30(11), 1292-1301.

Publication 2011

N214 Sampling treatment Treatment selection

Corresponding Organization : Institute for Clinical Evaluative Sciences

Top 5 similar protocols

Protocol cited in 71 other protocols

Variable analysis

independent variables

Treatment status

dependent variables

Probability of the outcome occurring

control variables

10 baseline covariates

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!