Categorical Data Analysis (Soc 681) at Purdue
Note: Taught most every fall semester.
Fall 2024 Course Materials
Note: Course is in progress with materials being added each week. For a full set of materials, see the Fall 2023 section below.
Fall 2023 Course Materials
Includes: syllabus, lecture slides, example code, etc.
Course Description
Many — perhaps even most — social and behavioral science questions include outcome variables that are categorical in nature. E.g. Which political candidate will win the next election? What social class does a person belong to? How many publications does it take to receive tenure? How many drinks does the average person consume per week? Answering these — and countless other — questions cannot be adequately accomplished via the linear regression model and instead require the more advanced techniques covered extensively in this course.
Categorical Data Analysis is a course in applied statistics that primarily deals with regression models in which the dependent variable is binary, nominal, ordinal, or count. In addition, some flexible methods for nonlinearities within the linear regression framework will be briefly covered. Many common statistical issues encountered by social scientists require different methods when the dependent variable is not continuous. E.g. Interpretation of coefficients, calculation of predictions, testing of interaction effects, testing for mediation, assessing model fit, and many other techniques require a different approach for categorical dependent variables than those for continuous outcomes. The focus of the course is on interpretation and learning to deal with the complications introduced by the nonlinearity of the models. Less focus will be on the mathematical details of the models except where pertinent.
Specific models considered include: probit and logit for binary outcomes; ordered logit, ordered probit, and alternating least squares optimal scaling for ordinal outcomes; multinomial logit for nominal outcomes; Poisson, negative binomial, and zero inflated models for counts; and fractional response, LOWESS, and local polynomial smoothing methods for continuous and quasi-continuous outcomes.