balanceplot
Stata Command for Plots of Imbalance Across Groups
Introduction
balanceplot produces dot plots of standardized imbalance statistics across groups of a categorical independent variable. Standarized imbalance statistics provide a useful way to present differences across covariates (specifed in varlist) for groups of the categorical independent variable (specified in option group( ) ) of interest. A standardized measure is necessary to compare across covariate variables with different metrics.
Installation Instructions
To install in Stata:
net install balanceplot, from("https://tdmize.github.io/data/balanceplot")
Help file
After installing, to read the balanceplot help file (also available here):
help balanceplot
Citation
Please cite use of the balanceplot command as:
Mize, Trenton D. 2018. "balanceplot: Stata command for plots of imbalance across groups." https://www.trentonmize.com/software/balanceplot
Background
Balance plots come from the experimental and causal inference (matching) literature. In an experiment the treatment and control groups should be “balanced” due to random assignment (i.e., covariate distributions should be the same across conditions). When using a causal inference matching method the pre-matched sample is imbalanced but the post-matched sample should be balanced. However, despite these origins balance plots are applicable in any analysis of groups.
Rosenbaum and Rubin (1985) proposed the commonly used measure of “standardized difference” or “bias” to quantify differences in balance across groups:
Using the balanceplot command
balanceplot calculates standardized imbalance statistics and plots them as a dot plot using coefplot. Any of the features available with coefplot can be applied to a balanceplot. The base of the command that balanceplot uses to to make the plot can be shown with the option plotcommand.
The table of means, t-tests for balance, and standardized imbalance statistics can be shown with the option table.
The basic syntax of balanceplot is:
balanceplot depvar varlist, group(groupvar)
Where the depvar is an outcome of intertest, groupvar is the focal categorical grouping variable of interest, and the varlist is a list of covariates. You should use factor syntax for the varlist including the prefix i. for nominal and binary variables.
Examples
Observational data
The first set of examples use observational survey data. Using a balanceplot in this case could be useful before fitting a multiple regression model that includes all covariates as control variables.
sysuse nlsw88
balanceplot wage age i.married i.collgrad i.south tenure ttl_exp, group(union)
balanceplot wage age i.married i.collgrad i.south tenure ttl_exp,
group(race) base(1) ref(2) ref2(3) graphop(xlab(-75(25)75))
Experimental data
Random assignment to condition should balance all covariates in a properly conducted experiment (i.e., Across conditions, means and distributions of each covariate should be the same).
You can check whether random assignment did its job using a balance plot and statistical test of balance. Note these tests are somewhat controversial and there are prominent arguments against the utility of these. E.g., Mutz, Pemantle, and Pham (2018) provide an argument I mostly agree with -- see a discussion in Mize and Manago (2022). But, there are some situations where such tests of balance are needed, such as when there is a high rate of noncompliance or attrition that could imbalance the conditions. In any case, if you need to conduct a balance test with experimental data it is easy with balanceplot.
use "https://tdmize.github.io/data/data/pmh_tess", clear
balanceplot polviews age i.educ i.race hhincome, ///
group(cond) base(1) ref(2) ref2(3) table
Base category = 1_Male Het Past
Reference category = 2_Male Gay Past
2nd Reference category = 3_Fem Het Past
N Used In Balance Calculations
- N for cond = 1_Male Het Past : 529
- N for cond = 2_Male Gay Past : 462
- N for cond = 3_Fem Het Past : 470
Results stored in matrices: bias_1_2 bias_1_3
Difference in Means Across Groups of cond: base(1_Male Het Past) vs ref(2_Male
Gay Past)
| mean_base mean_ref ttest_pval std_diff
-------------+-----------------------------------------------
polviews | 4.066 4.165 0.300 6.602
age | 49.711 50.110 0.727 2.314
2.educ | 0.253 0.331 0.007 17.166
3.educ | 0.282 0.264 0.522 -3.947
4.educ | 0.238 0.184 0.036 -13.298
5.educ | 0.153 0.139 0.508 -4.131
2.race | 0.076 0.110 0.060 11.983
3.race | 0.064 0.065 0.973 0.269
4.race | 0.104 0.115 0.525 3.441
hhincome | 60.052 58.945 0.669 -2.447
Difference in Means Across Groups of cond: base(1_Male Het Past) vs ref2(3_Fem
Het Past)
| mean_base mean_ref ttest_pval std_diff
-------------+-----------------------------------------------
polviews | 4.066 4.170 0.274 6.933
age | 49.711 47.883 0.096 -10.376
2.educ | 0.253 0.326 0.009 15.961
3.educ | 0.282 0.274 0.753 -1.604
4.educ | 0.238 0.206 0.209 -7.647
5.educ | 0.153 0.104 0.020 -14.617
2.race | 0.076 0.083 0.689 2.723
3.race | 0.064 0.053 0.536 -4.709
4.race | 0.104 0.104 0.984 0.093
hhincome | 60.052 58.943 0.659 -2.483
The table option includes tables providing t-tests of balance across each condition (the 3rd column reports the p-value of this test).
A customized balanceplot
By utilizing the options of coefplot, you can create a customized balanceplot. The headings( ) option is particularly useful to organize the graph.
use "https://tdmize.github.io/data/data/mls_gss", clear
balanceplot socdistSS ///
i.cntct_tot i.female i.metro age coninc i.race i.region i.degree i.year, ///
group(L_mentlillB) nosort ///
graphop( xtitle("% Standardized Difference") ///
xlab(-60(20)60) ///
headings(1.cntct_tot = "{bf:Binary IVs}" age = "{bf:Continuous IVs}" ///
2.race = "{bf:Race}" 2.region = "{bf:Region}" ///
1.degree = "{bf:Education}" 2006.year = "{bf:Survey Year}") ///
title("[6.4.a] Standardized differences in rates of labeling behavior as a mental illness", span) ///
subtitle("Positive differences indicate higher rates of labeling as a mental illness") ///
note("NOTES: (1) Ommitted reference categories are: no contact, male, not a metro, white, New England, < high school, and 1996.", span))
Checking balance after matching for causal inference
If you use the teffects package in Stata for matching you can use the teblance summarize command in lieu of balanceplot to calculate balance in the pre-matched and matched samples. You can save these statistics as a matrix and then use coefplot directly in this case to recreate a balanceplot.
teffects ...
tebalance summarize
mat balance = r(table)
coefplot (matrix(balance[,1])) (matrix(balance[,2])), ///
xline(0) xtitle("Standardized Difference") ///
xlab(-.60(.20).60) legend(order(2 "Raw (unweighted)" 4 "AIP Weighted")) ///
sort graphregion(margin(l+5)) ///
title("[6.3.a] Covariate standardized differences in pre- vs mid-COVID samples", span) ///
subtitle("Raw (unweighted) vs AIP weighted data")