# Data Visualization Using Stata – Code Horizons Seminar

*See logistics, read the course description, see the outline of the class, and see example visualizations below.*

## Logistics

Next offering: August 16 - 19

Remote seminar offered synchronously (with asynchronous participation allowed).

Read more details and/or register for the seminar on the Code Horizons site

## Course Description

Understanding data and effectively presenting model results are challenges that data analysts face most every day. There is seldom a more effective solution than a well thought out visualization. Problems in the data are easily identified; complex effects are quickly summarized; effect sizes and variability are immediately clear. In this seminar, we will cover best practices for accurately representing data as well as many specific approaches to data exploration, model diagnostics, and model presentation.

The primary focus is on the applied analyst’s “bread and butter” types of visualizations: those I suspect will be useful in most every research project. However, we also cover more advanced visualization methods.

Topics covered range from exploratory data analysis techniques to methods for presenting complex model results. Applied exercises will help participants implement the techniques we cover in Stata. Additional template Stata code will be provided to workshop participants allowing everyone to reproduce all workshop examples.

The seminar will use Stata. Stata is widely-used to clean, examine, model, and visualize data. The data and model visualization capabilities of Stata are impressive yet vastly underutilized by most users. This seminar will teach attendees about best data visualization practices generally—and specific ways to implement these using Stata.

## Outline

** ****Day 1**

· Why visualize data?

o The science of effective data visualization

· Introduction to data visualization in Stata

o Common options universal to most graphs

· Plots of univariate distributions

o Histograms; Kernel density plots

§ Overlays for group comparisons

o Box (and whisker) plots

· Pies, bars, and dots

o Perceptual accuracy and choosing plots

§ Why you shouldn’t use pie charts

o Bar charts

§ Stacked bar charts; group comparisons

o Dot plots

· Confidence intervals and standard errors

o Visual tools for conveying uncertainty

**Day 2**

· General data visualization rules and guidelines

o Axis range rules

o 3D graphics

o Using color well

§ Nominal vs ordinal palettes

§ Color blindness-proofing your graphs

§ Figures that work in color or black and white

o Fonts

o Graphics file formats

o Graph schemes

o Confidence intervals and inferring statistical significance

· Plots of bivariate relationships

o Scatterplots

§ Options for continuous and nominal variables

§ Scatterplot smoothing

· Lowess

o Incorporating covariates

· Local polynomial smoothing

o Plotting change over time (or other continuous variable)

§ Slopegraphs

§ Ridgeline plots

§ Alluvial/Sankey diagrams

**Day 3**

· Multilevel and longitudinal data

o Grouped data

o Multiple levels

o Combining multiple graphs into a single figure; small multiples

o Spaghetti plots

· Visualizing model results

o Coefficient plots

§ Comparing across models and/or groups

o Plots of model predictions

§ Adding distributional information to plots

· Univariate and group-specific

o Interaction effects

§ Nominal x nominal interactions

§ Nominal x continuous interactions

§ Continuous x continuous interactions

o Group comparisons

o Marginal effects plots

§ Models with a single predicted outcome

§ Models with multiple outcomes

o Ideal types

**Day 4**

· Maps

o Map projection options; pros and cons

o Choropleth maps

o Area vs population issues in visualization

o World, countries, states, and counties

· Visualizing covariate balance

o Balanceplots

§ Experimental data

§ Causal inference matching methods

· Model diagnostics

o Residuals

o Influence

o Added-variable plots