balanceplot

Stata Command for Plots of Imbalance Across Groups

Introduction

balanceplot produces dot plots of standardized imbalance statistics across groups of a categorical independent variable. Standarized imbalance statistics provide a useful way to present differences across covariates (specifed in varlist) for groups of the categorical independent variable (specified in option group( ) ) of interest. A standardized measure is necessary to compare across covariate variables with different metrics.

Installation Instructions

To install in Stata:

     net install balanceplot, from("https://tdmize.github.io/data/balanceplot")

Help file

After installing, to read the balanceplot help file (also available here):

     help balanceplot

Citation

Please cite use of the balanceplot command as:

Mize, Trenton D. 2018. "balanceplot: Stata command for plots of imbalance across groups." https://www.trentonmize.com/software/balanceplot

Background

Balance plots come from the experimental and causal inference (matching) literature. In an experiment the treatment and control groups should be “balanced” due to random assignment (i.e., covariate distributions should be the same across conditions). When using a causal inference matching method the pre-matched sample is imbalanced but the post-matched sample should be balanced. However, despite these origins balance plots are applicable in any analysis of groups.

Rosenbaum and Rubin (1985) proposed the commonly used measure of “standardized difference” or “bias” to quantify differences in balance across groups:

Using the balanceplot command

balanceplot calculates standardized imbalance statistics and plots them as a dot plot using coefplot. Any of the features available with coefplot can be applied to a balanceplot. The base of the command that balanceplot uses to to make the plot can be shown with the option plotcommand.

The table of means, t-tests for balance, and standardized imbalance statistics can be shown with the option table

The basic syntax of balanceplot is:

balanceplot depvar varlist, group(groupvar

Where the depvar is an outcome of intertest, groupvar is the focal categorical grouping variable of interest, and the varlist is a list of covariates. You should use factor syntax for the varlist including the prefix i. for nominal and binary variables.

Examples

Observational data

The first set of examples use observational survey data. Using a balanceplot in this case could be useful before fitting a multiple regression model that includes all covariates as control variables.


sysuse nlsw88


balanceplot wage age i.married i.collgrad i.south tenure ttl_exp, group(union)

balanceplot wage age i.married i.collgrad i.south tenure ttl_exp, 

     group(race) base(1) ref(2) ref2(3) graphop(xlab(-75(25)75))

Experimental data

Random assignment to condition should balance all covariates in a properly conducted experiment (i.e., Across conditions, means and distributions of each covariate should be the same).

You can check whether random assignment did its job using a balance plot and statistical test of balance. Note these tests are somewhat controversial and there are prominent arguments against the utility of these. E.g., Mutz, Pemantle, and Pham (2018) provide an argument I mostly agree with -- see a discussion in Mize and Manago (2022). But, there are some situations where such tests of balance are needed, such as when there is a high rate of noncompliance or attrition that could imbalance the conditions. In any case, if you need to conduct a balance test with experimental data it is easy with balanceplot.

use "https://tdmize.github.io/data/data/pmh_tess", clear


balanceplot polviews age i.educ i.race hhincome, ///

group(cond) base(1) ref(2) ref2(3) table

Base category = 1_Male Het Past

Reference category = 2_Male Gay Past

2nd Reference category = 3_Fem Het Past


N Used In Balance Calculations

- N for cond = 1_Male Het Past : 529

- N for cond = 2_Male Gay Past : 462

- N for cond = 3_Fem Het Past : 470


Results stored in matrices: bias_1_2 bias_1_3


Difference in Means Across Groups of cond: base(1_Male Het Past) vs ref(2_Male

Gay Past)

             |  mean_base    mean_ref  ttest_pval    std_diff 

-------------+-----------------------------------------------

    polviews |      4.066       4.165       0.300       6.602 

         age |     49.711      50.110       0.727       2.314 

      2.educ |      0.253       0.331       0.007      17.166 

      3.educ |      0.282       0.264       0.522      -3.947 

      4.educ |      0.238       0.184       0.036     -13.298 

      5.educ |      0.153       0.139       0.508      -4.131 

      2.race |      0.076       0.110       0.060      11.983 

      3.race |      0.064       0.065       0.973       0.269 

      4.race |      0.104       0.115       0.525       3.441 

    hhincome |     60.052      58.945       0.669      -2.447 



Difference in Means Across Groups of cond: base(1_Male Het Past) vs ref2(3_Fem

Het Past)

             |  mean_base    mean_ref  ttest_pval    std_diff 

-------------+-----------------------------------------------

    polviews |      4.066       4.170       0.274       6.933 

         age |     49.711      47.883       0.096     -10.376 

      2.educ |      0.253       0.326       0.009      15.961 

      3.educ |      0.282       0.274       0.753      -1.604 

      4.educ |      0.238       0.206       0.209      -7.647 

      5.educ |      0.153       0.104       0.020     -14.617 

      2.race |      0.076       0.083       0.689       2.723 

      3.race |      0.064       0.053       0.536      -4.709 

      4.race |      0.104       0.104       0.984       0.093 

    hhincome |     60.052      58.943       0.659      -2.483 

The table option includes tables providing t-tests of balance across each condition (the 3rd column reports the p-value of this test).

A customized balanceplot

By utilizing the options of coefplot, you can create a customized balanceplot. The headings( ) option is particularly useful to organize the graph.

use "https://tdmize.github.io/data/data/mls_gss", clear


balanceplot socdistSS ///

i.cntct_tot i.female i.metro age coninc i.race i.region i.degree i.year, ///

group(L_mentlillB) nosort ///

graphop( xtitle("% Standardized Difference") ///

xlab(-60(20)60)  ///

headings(1.cntct_tot = "{bf:Binary IVs}" age = "{bf:Continuous IVs}" ///

2.race = "{bf:Race}" 2.region = "{bf:Region}" ///

1.degree = "{bf:Education}" 2006.year = "{bf:Survey Year}") ///

title("[6.4.a] Standardized differences in rates of labeling behavior as a mental illness", span) ///

subtitle("Positive differences indicate higher rates of labeling as a mental illness") ///

note("NOTES: (1) Ommitted reference categories are: no contact, male, not a metro, white, New England, < high school, and 1996.", span))

Checking balance after matching for causal inference

If you use the teffects package in Stata for matching you can use the teblance summarize command in lieu of balanceplot to calculate balance in the pre-matched and matched samples. You can save these statistics as a matrix and then use coefplot directly in this case to recreate a balanceplot.

teffects ...


tebalance summarize

mat balance = r(table)


coefplot (matrix(balance[,1])) (matrix(balance[,2])), ///

xline(0)  xtitle("Standardized Difference") ///

xlab(-.60(.20).60) legend(order(2 "Raw (unweighted)" 4 "AIP Weighted")) ///

sort graphregion(margin(l+5)) ///

title("[6.3.a] Covariate standardized differences in pre- vs mid-COVID samples", span) ///

subtitle("Raw (unweighted) vs AIP weighted data")