ben h. williams professor of economics
baylor university
Spring 2026 — Harvard University
Instructor: Scott Cunningham
Email: anthony_cunningham@fas.harvard.edu
Office: CGIS Knafel Building, Room H402
Office Hours: Tue/Thu 3:00–5:00 PM (Calendly)
Lectures: Tue/Thu 12:00–1:15 PM, Sever 103
Sections:
Teaching Fellow: George Yean (gyean@fas.harvard.edu)
TF Office Hours: Thu 2:00–3:00 PM, CGIS K455 (Calendly)
Course Assistant: Harrison Huang (Office Hours)
The goal of this course is to give you the ability to understand, explain, and perform social science research, with a special focus on data analysis and causal reasoning. By the end of the semester, students will be able to:
You will be able to read and understand the methodology of most academic articles in the social sciences, and have a foot in the door of the data science world.
The course consists of two 75-minute lectures per week and one required weekly discussion section led by Teaching Fellows. Lectures introduce key ideas in statistical inference and causal reasoning, grounding them in practical social science research. Students will learn to program in R, work with real-world datasets, and build both intuitive and technical understanding of modern empirical methods.
Discussion sections provide hands-on practice with statistical software and space to work through problem sets under Teaching Fellow guidance. Students should expect an interactive learning environment that moves between conceptual foundations, implementation, and interpretation.
Reminder: You can attend any section in a given week regardless of which one you're officially registered for.
Either edition is fine:
If you're seeking extra help:
| Component | Weight | Description |
|---|---|---|
| Problem Sets (4) | 40% | Four applied data analysis assignments using real-world datasets. Due Thursdays at 11:59pm via Gradescope. |
| Midterm Exams (2) | 40% | Two in-class exams (no notes, no computers). |
| Final Project | 20% | Independent data analysis on a topic of your choice. Individual or groups up to 3. |
Late Policy: Late submissions lose 10% per day (e.g., 1 day late = 90% max score). After 7 days, late work receives a zero. This applies to both problem sets and final project milestones.
| Part | Topic | Approximate Timing | QSS Chapters |
|---|---|---|---|
| I | R and Data Skills | Weeks 1–2 | Chapter 1 |
| II | Statistical Foundations | Weeks 3–4 | Chapters 3, 5–6 |
| III | Inference and Regression | Weeks 5–7 | Chapters 4, 7 |
| Spring Break (Mar 14–22) | |||
| IV | Prediction and Machine Learning | Weeks 8–10 | Chapter 4 |
| V | Causal Inference | Weeks 11–13 | Chapter 2 |
Topics for future weeks will be posted as we progress through the course.
| Dates | Topic | Reading | Slides | R Script | Assignment |
|---|---|---|---|---|---|
| Part I: R and Data Skills | |||||
| Jan 27, 29 | Introduction to R | QSS 1.1–1.4 | Tue | Thu | R Script | — |
| Feb 3, 5 | Data Visualization; Descriptive Statistics | QSS 1.3, 3.1–3.3 | Thu | R Script | — |
| Part II: Statistical Foundations | |||||
| Feb 10, 12 | Text as Data; Covariance and Correlation | Card et al. (PNAS 2022); QSS 3.5–3.6 | Tue | Thu | — | PS 1 (due Thu Feb 13) |
| Feb 17, 19 | Sampling and Uncertainty; When Data Lies | QSS 3.1–3.6; LaCour & Green (2014, retracted); Broockman, Kalla & Aronow (2015); Broockman & Kalla (2016) | Tue | Thu | — | — |
| Part III: Inference and Regression | |||||
| Feb 24, 26 | Hypothesis Testing: p-values, t-statistics, and Standard Errors | QSS 6.1–6.3, 7.1–7.2 | Slides | — | — |
| Mar 3, 5 | Bivariate Regression | QSS 4.2–4.3 | — | — | PS 2 (Thu Mar 5) Data: gay.csv | gayreshaped.csv | ccap2012.csv |
| Mar 10, 12 | Multivariate Regression and Review | — | — | — | Exam 1 (Thu Mar 12) |
| Mar 14–22 | Spring Recess — No Classes | ||||
| Part IV: Prediction and Machine Learning | |||||
| Mar 24, 26 | Prediction in Social Science: Overfitting and Underfitting | QSS 4.1–4.2 | — | — | Proposal (Thu Mar 26) |
| Mar 31, Apr 2 | Regularization: LASSO, Ridge, and the Bias-Variance Tradeoff | Supplemental | — | — | PS 3 (Thu Apr 2) |
| Apr 7, 9 | Forecasting and the Bridge to Causation | Supplemental | — | — | Draft Analyses (Wed Apr 9) |
| Part V: Causal Inference | |||||
| Apr 14, 16 | Experiments, Omitted Variable Bias, and Instrumental Variables | QSS 2.1–2.6 | — | — | PS 4 (Thu Apr 16) |
| Apr 21, 23 | Difference-in-Differences and Review | — | — | — | Exam 2 (Thu Apr 23) |
| Apr 28 | Project Presentations | — | — | — | Presentations |
| May 11 | Final Report Due (Exam Period) | ||||
Weekly sections provide hands-on practice with R and reinforce concepts from lecture. Attendance is expected.
| Week | Topic | Slides | R Script |
|---|---|---|---|
| Feb 4 | Git Setup, Installing R, IPUMS Data Download | — | — |
| Feb 11 | Basic Statistics, Making Figures, Quarto | Slides | R Script |
| Feb 18 | Correlation and Sampling | Slides | R Script |
| Feb 25 | Detecting Fraud and Testing Hypotheses | Slides | — |
You will select a dataset and research question, then conduct an independent data analysis applying methods from the course. Projects may be completed individually or in groups of up to three.
| Milestone | Description | Due |
|---|---|---|
| Proposal | Short project proposal with evidence of dataset or data collection plan | Mar 26 |
| Draft Analyses | Preliminary analysis and at least one visualization | Apr 9 |
| Presentations | Brief in-class presentation of findings | Apr 28 |
| Final Report | Polished report with all components | May 11 |
See the AI Policy page on Canvas. The goal of this course is for you to learn to think with data. Using AI to generate answers defeats that purpose and will leave you unprepared for exams, which are completed in-class without AI assistance.
All assignments and exams will be submitted through Gradescope, accessed directly through Canvas. Regrade requests must be submitted through Gradescope within one week of grades being posted with a clear explanation.
If you're struggling, please reach out early. Come to office hours, attend TF sections, and use the course discussion board. Learning statistics is hard—confusion is normal and expected. What matters most to me is what you actually learn.