The Egger regression method was introduced above as a test for directional pleiotropy; this test does not make any assumption about the genetic variants. However, under an assumption that is weaker than standard instrumental variable assumptions, the slope coefficient from the Egger regression method provides an estimate of the causal effect that is consistent asymptotically even if all the genetic variants have pleiotropic effects on the outcome.46 (link) This is the assumption that pleiotropic effects of genetic variants (that is, direct effects of the genetic variants on the outcome that do not operate via the risk factor) are independent of instrument strength (known as the InSIDE assumption – Instrument Strength Independent of Direct Effect). This same assumption was considered by Kolesár et al. with individual-level data.54 (link) The motivation for the Egger regression method is that, under the InSIDE assumption, stronger genetic variants should have more reliable estimates of the causal effect than weaker variants. Once the average pleiotropic effect of variants is accounted for through the intercept term in Egger regression, any residual dose–response relationship in the genetic associations provides evidence of a causal effect. The Egger regression estimate is consistent under the InSIDE assumption as the sample size tends to infinity if the correlation between the direct effects and instrument strength is exactly zero; otherwise it is consistent as the sample size and the number of genetic variants both tend to infinity. As previously stated, Egger regression assumes linearity and homogeneity in the associations between the genetic variants, risk factor and outcome.
The InSIDE assumption may not be satisfied in practice, particularly if the pleiotropic effects of genetic variants on the outcome act via a single confounding variable. There is some evidence for the general plausibility of the InSIDE assumption, as associations of genetic variants with different phenotypic variables have been shown to be largely uncorrelated in an empirical study.55 (link) The Egger regression estimate may have much wider confidence intervals than those from other methods in practice, as it relies on variants having different strengths of association with the risk factor. A situation with many independent genetic variants having identical magnitudes of association with the risk factor and with the outcome would intuitively provide strong evidence of a causal effect; however, the Egger estimate in this case would not be identified.
The Egger regression method gives consistent estimates if all the genetic variants are invalid instruments provided that the InSIDE assumption is satisfied, whereas the penalization and median-based methods rely on over half of the genetic variants being valid instrumental variables for consistent estimation. However, the penalization and median-based methods allow more general departures from the instrumental variable assumptions for the invalid instruments. In practice, it would seem prudent to compare estimates from a range of methods. If all methods provide similar estimates, then a causal effect is more plausible. For example, using genetic variants chosen solely on the basis of their association with the risk factor, a broad range of methods affirmed that LDL-c was a causal risk factor for CAD risk. However, the causal effect of HDLc on CAD risk suggested by a liberal Mendelian randomization analysis using the inverse-variance weighted method (see also31 (link)) was not supported by robust analysis methods.53 (link) The median-based and Egger regression methods have also been shown to have lower Type 1 (false positive) error rates than the inverse-variance weighted method in simulation studies with some invalid instrumental variables for finite sample sizes,46 (link), 53 (link) although they were above the nominal level in the case of directional pleiotropy (for the median method), and when the InSIDE assumption was violated (for the Egger regression method).