Missing Data Workshop

Lecture slides

Example multiple imputation do-file

Replication files

Workshop Description

Very little data of interest to social, behavioral, and health scientists has complete information. Instead, some missing data tends to be the rule rather than the exception in applied data analysis. Survey respondents may choose to skip sensitive questions; economic data may be harder to find for developing countries; certain types of respondents may be most likely to drop out of panel studies. Rarely is data “missing completely at random” — instead there tend to be systematic factors accounting for missing observations — factors that can bias results if not properly handled.

This workshop will focus on the most effective techniques for conducting quantitative analyses with missing data. In addition to covering the basics of missing data theory and showing the problems that can occur when ignoring missing data, we will cover in detail methods for: (1) multiply imputing missing data, (2) handling missing data with hotdeck imputation, and (3) full information maximum likelihood. Which method to use depends on many factors idiosyncratic to different analyses. Examples in the workshop will primarily use Stata, although some code and resources will be provided for handling missing data in R, SAS, and SPSS.