Sobel-Goodman Tests of Mediation in Stata
sgmediation2 is a user-written Stata command which conducts Sobel-Goodman tests of statistical mediation for linear regression models. sgmediation2 is my update (with permission) to the original command sgmediation, written by Phil Ender of the UCLA Statistical Consulting Group. Questions or requests for additions to the command should be sent to me at firstname.lastname@example.org
The sections below detail:
Background on the statistical tests
Illustration of the sgmediation2 command
Bootstrapped standard errors and confidence intervals
Noteworthy new features of sgmediation2
Limitations of the Sobel-Goodman/product of coefficients approach to mediation
(2) Background on the statistical tests
sgmediation2 calculates three different tests of mediation using the "product of coefficients" approach (MacKinnon et al. 2002). Neither this site or the command sgmediation2 is meant as a blanket endorsement of this approach to mediation. Many approaches to mediation exist with many pros and cons to each (e.g. see MacKinnon et al. 2002; Zhao et al. 2010; Keele 2015). Some prominent limitations of this approach along with a suggested alternative for many cases are detailed in Section #6 at the bottom of this page.
The commonly used approach to mediation based on Baron and Kenny (1986) suggests that a variable may be considered a mediator to the extent to which it carries the influence of a focal independent variable (IV) to a given dependent variable (DV). In this framework, mediation can be said to occur when (1) the IV significantly affects the mediator, (2) the IV significantly affects the DV in the absence of the mediator, (3) the mediator has a significant unique effect on the DV, and (4) the effect of the IV on the DV shrinks upon the addition of the mediator to the model.
Others (e.g. Preacher and Hayes 2004) suggest that only two requirements need be met: (1) the IV has a significant effect before the mediator is added to the model, and (2) the effect of the IV shrinks upon the addition of the mediator to the model (i.e. same requirement as #4 above). Simplifying even further, many now suggest (e.g. Zhao, Lynch, and Chen 2010) that the only needed requirement is that the effect of the IV shrinks upon the addition of the mediator to the model (AKA there is a significant indirect effect; see below for details) because mediation can occur even in the absence of a direct effect of the IV.
sgmediation2 provides tests of all of the various requirements discussed above to facilitate most any test desired. I personally agree that the test that the effect of the IV shrinks upon the addition of the mediator to the model (i.e. the indirect effect) is of most central interest. But as Zhao et al. (2010) detail -- the individual tests outlined by Baron and Kenny (1986) are still quite useful to determine the specific nature of mediation found.
The bottom half of the figure below illustrates the basic logic of a mediating relationship where the mediating variable (MV) is theoretically at least partially the reason/mechanism by which the focal independent variable (IV) affects the outcome (DV). Because of this mediating relationship, the estimate of the effect of the IV will be smaller after accounting for the mediator (c' ) than in a model without the mediator (c ).
The mechanics of the tests are as follows:
First, regress the DV on the IV along with any control variables. The coefficient on the IV is c which represents the "total effect" on the IV (i.e. the effect before removing the portion of the effect explained by the mediator).
Second, regress the mediating variable (MV) on the IV and any control variables. The coefficient on the IV is path a.
Third, regress the DV on the IV, the MV, and any control variables. The coefficient on the MV is path b. The coefficient on the IV is c'.
Terminology for effects
Commonly, the effect of the IV before accounting for the mediator (c ) is referred to as the total effect. The effect of the IV after accounting for the mediator (c' ) is referred to as the direct effect. The difference of the total and direct effect is called the indirect effect—or the amount of the IV's effect that is explained by the mediator.
Testing the indirect effect
To determine how much of the focal IV's effect is explained by the MV (i.e. the indirect effect), we can calculate either a*b ("product of coefficients") or c - c' ("difference in coefficients") which will be identical in size as long as the same sample is used for all three models described above. All three tests calculated by sgmediation2 use the product of coefficients approach. The tests differ only in their calculation of the standard error of the test of a*b.
The Aroian and Goodman version of the test differ from the Sobel version in that they include the product of the variance estimates of the coefficients on paths a and b (but in different ways). Results from all three tend to be similar as the product of the variances tends to be small.
There is some evidence suggesting the Aroian test over the other two (MacKinnon et al. 2002). However, all three have been found to be underpowered and alternative methods to calculate the standard error have been proposed. In particular, bootstrapping is a popular approach that has been shown to work well even in small samples (Preacher and Hayes 2004). Section 4 below illustrates how to obtain bootstrapped estimates of the standard error and confidence interval on the indirect effect (a*b). See Section 6 for an alternative approach using seemingly-unrelated estimation.
See MacKinnon et al. (2002) for a thorough discussion and comparison of each test.
a*b (equivalent to c - c' ) can sometimes be interpreted directly as the amount of the IV's effect the MV explains or as the "indirect effect" of the IV -> MV -> DV. Alternatively, a*b (or c - c' ) can be expressed as the proportion reduction in the effect of the IV after accounting for the MV:
(3) Illustration of the sgmediation2 command
Those with more education tend to report better health. A possible mediation explanation is that more education leads to higher incomes, which is in turn associated with better health (for lots of reasons). The theoretical causal process is:
(4) Bootstrapped standard error and confidence interval estimates
The default Sobel-Goodman tests shown above are known to have low statistical power. A common recommended solution is to use bootstrapping to obtain the standard errors (and p-values) and/or confidence intervals (e.g. Preacher and Hayes 2004; 2008; Zhao et al. 2010). 1,000 or more bootstrapped samples is a common recommendation (e.g. Preacher and Hayes 2008).
By default, Stata's bootstrap command reports bias-corrected confidence intervals. Preacher and Hayes (2004; 2008) recommend using percentile CIs because the sampling distribution of a*b tends to be non-normal—which can be obtained with the postestimation command estat bootstrap and the percentile option.
The example below provides bootstrapped estimates of the indirect, direct, and total effect. Other statistics can be bootstrapped with sgmediation2 (see "stored results" section of the help file).
(5) Noteworthy new features of sgmediation2
Survey weights and multiply-imputed data
First, sgmediation2 allows the use of survey weights and/or multiply-imputed data. To do so, specify the prefix you would have used on the regress command in the prefix( ) option of sgmediation2. E.g. To include the survey weights that have already been set with the svyset command use the command:
Additionally, the prefix( ) option can be used to specify mi est, post: for multiple imputation estimates to be used as defined in mi set. Note the post option is needed with multiply imputed data. Or specify mi est, post: svy: for both survey weights and multiple imputation estimates as defined in mi svyset.
Alternative variance estimators
The vce( ) option can be used to obtain a variance estimator other than the default ols. E.g. Users can specify vce(robust) for robust variance estimates or vce(cluster clustvar) for cluster robust variance estimates. E.g. To adjust the variance estimates for clustering within occupational categories:
Factor syntax for control variables
As demonstrated in the example above, factor syntax is allowed in the list of control variables. This allows you to specify different control variables be treated as continuous or nominal.
Factor syntax is not allowed for the focal independent variable (IV) or the mediating variable (MV) reflecting a limitation of the methods—not the command (see Section 6 below). IVs and MVs are limited to continuous or binary variables with this method.
(6) Limitations of the Sobel-Goodman/product of coefficients approach to mediation
There are many limitations to this approach to mediation (more than I discuss here). A few of note:
Only continuous or binary focal independent variables (IV) can be examined.
Only continuous or binary mediating variables (MV) can be examined.
Multiple mediating variables (MVs) cannot be easily incorporated.
Limited to tests of a single coefficient. E.g. There is no clear way to test if the effect of age is mediated if both age and age^2 coefficients are included in the models.
Limited to linear regression models.
A specialized approach appropriate only for mediation and not other cross-model comparisons.
These limitations (and some others) were the motivation of my article "A General Framework for Comparing Predictions and Marginal Effects Across Models" (Mize, Doan, and Long 2019). See that article and the associated Stata files if you are interested.
Aroian, L. A. (1944). The probability function of the product of two normally distributed variables. Annals of Mathematical Statistics, 18, 265-271.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173.
Goodman, L. A. (1960). On the exact variance of products. Journal of the American Statistical Association, 55, 708–713.
Keele, L. (2015). Causal mediation analysis: warning! Assumptions ahead. American Journal of Evaluation, 36(4), 500-513.
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83.
Mize, T. D., Doan, L., & Long, J. S. (2019). A general framework for comparing predictions and marginal effects across models. Sociological Methodology, 49(1), 152-189.
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior research methods, instruments, & computers, 36(4), 717-731.
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior research methods, 40(3), 879-891.
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290-312.
Zhao, X., Lynch Jr, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and truths about mediation analysis. Journal of consumer research, 37(2), 197-206.