Economics 326 — Introduction to Econometrics II
Previous lecture: treatment effects from cross-sectional data, assuming selection on observables (treatment is as good as random after controlling for covariates).
Often hard to justify. Alternative: exploit data over time:
Difference-in-differences (DID): compare changes over time between a treatment group and a control group.
Two time periods: t \in \{0, 1\} (before and after treatment).
Two groups: D_i \in \{0, 1\} (control and treatment).
Treatment occurs between periods 0 and 1; only the treatment group (D_i = 1) is affected.
Y_{it}: outcome for individual i at time t.
DID regression:
Y_{it} = \alpha + {\color{blue}\delta} \cdot t + {\color{purple}\gamma} D_i + {\color{teal}\beta}(t \cdot D_i) + U_{it},
where \mathrm{E}\left[U_{it} \mid D_i\right] = 0.
Regressors:
Y_{it} = \alpha + {\color{blue}\delta} \cdot t + {\color{purple}\gamma} D_i + {\color{teal}\beta}(t \cdot D_i) + U_{it},
Evaluate \mathrm{E}\left[Y_{it} \mid D_i\right] for each combination of t and D_i:
\begin{align*} t = 0,\ D_i = 0: \quad & \mathrm{E}\left[Y_{i0} \mid D_i = 0\right] = \alpha, \\ t = 0,\ D_i = 1: \quad & \mathrm{E}\left[Y_{i0} \mid D_i = 1\right] = \alpha + {\color{purple}\gamma}, \\ t = 1,\ D_i = 0: \quad & \mathrm{E}\left[Y_{i1} \mid D_i = 0\right] = \alpha + {\color{blue}\delta}, \\ t = 1,\ D_i = 1: \quad & \mathrm{E}\left[Y_{i1} \mid D_i = 1\right] = \alpha + {\color{blue}\delta} + {\color{purple}\gamma} + {\color{teal}\beta}. \end{align*}
Summarized as a 2×2 table:
| D_i = 0 (Control) | D_i = 1 (Treatment) | |
|---|---|---|
| t = 0 | \alpha | \alpha + {\color{purple}\gamma} |
| t = 1 | \alpha + {\color{blue}\delta} | \alpha + {\color{blue}\delta} + {\color{purple}\gamma} + {\color{teal}\beta} |
\alpha: baseline (control group, t = 0).
{\color{purple}\gamma}: pre-existing group difference at t = 0.
{\color{blue}\delta}: time effect — change in the control group from t = 0 to t = 1 (common trend).
Change over time for each group:
Subtract control’s change from treatment’s change:
({\color{blue}\delta} + {\color{teal}\beta}) - {\color{blue}\delta} = {\color{teal}\beta}.
DID estimand as a double difference:
\begin{align*} {\color{teal}\beta} &= \mathrm{E}\left[Y_{i1} - Y_{i0} \mid D_i = 1\right] - \mathrm{E}\left[Y_{i1} - Y_{i0} \mid D_i = 0\right]. \end{align*}
The common trend {\color{blue}\delta} cancels, isolating {\color{teal}\beta}.
DID regression: Y_{it} = \alpha + {\color{blue}\delta} \cdot t + {\color{purple}\gamma} D_i + {\color{teal}\beta}(t \cdot D_i) + U_{it}.
DID diagram — control, treatment, and counterfactual:
Counterfactual (dashed gray): treatment group’s baseline \alpha + {\color{purple}\gamma} plus the control group’s change {\color{blue}\delta}.
Panel potential outcomes: Y_{it}(d) — outcome for individual i at time t if assigned to group d \in \{0, 1\}.
Four potential outcomes: Y_{i0}(0), Y_{i1}(0), Y_{i0}(1), Y_{i1}(1).
The observed outcome is:
Y_{it} = D_i \, Y_{it}(1) + (1 - D_i) \, Y_{it}(0).
What we observe for each group:
| Control (D_i = 0) | Treatment (D_i = 1) | |
|---|---|---|
| t = 0 | Y_{i0}(0) | Y_{i0}(1) |
| t = 1 | Y_{i1}(0) | Y_{i1}(1) |
Treatment effect on the treated at t = 1:
{\color{teal}\text{ATT}} = {\color{teal}\mathrm{E}\left[Y_{i1}(1) - Y_{i1}(0) \mid D_i = 1\right]}.
The counterfactual Y_{i1}(0) is unobserved for the treated group.
{\color{teal}\beta} = \mathrm{E}\left[Y_{i1} - Y_{i0} \mid D_i = 1\right] - \mathrm{E}\left[Y_{i1} - Y_{i0} \mid D_i = 0\right].
Substituting observed outcomes with potential outcomes:
\begin{align*} \beta &= \mathrm{E}\left[{\color{teal}Y_{i1}(1)} {\color{red}- Y_{i0}(1)} \mid D_i = 1\right] \\ &\quad {\color{blue}- \mathrm{E}\left[Y_{i1}(0) - Y_{i0}(0) \mid D_i = 0\right]}. \end{align*}
To relate \beta to the ATT, add and subtract Y_{i1}(0) and Y_{i0}(0) inside the first expectation:
\begin{align*} \beta &= \mathrm{E}\left[{\color{teal}Y_{i1}(1)} {\color{red}- Y_{i0}(1)} \mid D_i = 1\right] {\color{blue}- \mathrm{E}\left[Y_{i1}(0) - Y_{i0}(0) \mid D_i = 0\right]} \\ &= \mathrm{E}\left[{\color{teal}Y_{i1}(1)} \underbrace{{\color{teal}- Y_{i1}(0)} {\color{blue}+ Y_{i1}(0)}}_{= \, 0} \underbrace{{\color{blue}- Y_{i0}(0)} {\color{red}+ Y_{i0}(0)}}_{= \, 0} {\color{red}- Y_{i0}(1)} \mid D_i = 1\right] \\ &\quad {\color{blue}- \mathrm{E}\left[Y_{i1}(0) - Y_{i0}(0) \mid D_i = 0\right]}. \end{align*}
Rearranging and splitting the expectation:
\begin{align*} \beta &= \underbrace{{\color{teal}\mathrm{E}\left[Y_{i1}(1) - Y_{i1}(0) \mid D_i = 1\right]}}_{\color{teal}\text{ATT}} \\ &\quad + \underbrace{{\color{blue}\mathrm{E}\left[Y_{i1}(0) - Y_{i0}(0) \mid D_i = 1\right] - \mathrm{E}\left[Y_{i1}(0) - Y_{i0}(0) \mid D_i = 0\right]}}_{\color{blue}\text{difference in trends}} \\ &\quad + \underbrace{{\color{red}\mathrm{E}\left[Y_{i0}(0) - Y_{i0}(1) \mid D_i = 1\right]}}_{\color{red}\text{anticipation effect}}. \end{align*}
For {\color{teal}\beta} to equal the {\color{teal}\text{ATT}}, the {\color{blue}\text{difference in trends}} and {\color{red}\text{anticipation effect}} must be zero. This requires two assumptions.
Difference in trends is zero if both groups would have experienced the same change over time absent treatment:
{\color{blue}\mathrm{E}\left[Y_{i{\color{red}1}}(0) - Y_{i{\color{red}0}}(0) \mid D_i = 1\right] = \mathrm{E}\left[Y_{i{\color{red}1}}(0) - Y_{i{\color{red}0}}(0) \mid D_i = 0\right]}.
Under parallel trends, the decomposition reduces to:
{\color{teal}\beta} = {\color{teal}\text{ATT}} + \underbrace{{\color{red}\mathrm{E}\left[Y_{i0}(0) - Y_{i0}(1) \mid D_i = 1\right]}}_{\color{red}\text{anticipation effect}}.
Cannot be directly tested: Y_{i1}(0) is unobserved for the treated group.
With multiple pre-treatment periods, can check whether trends were parallel before treatment.
Anticipation effect is zero if pre-treatment outcomes are not affected by future treatment assignment:
\mathrm{E}\left[Y_{i{\color{red}0}}(1) \mid D_i = 1\right] = \mathrm{E}\left[Y_{i{\color{red}0}}(0) \mid D_i = 1\right].
Treatment assignment does not change pre-treatment outcomes in expectation.
Under both parallel trends and no anticipation:
{\color{teal}\beta} = {\color{teal}\text{ATT}}.
Kiel and McClain (1995): did a garbage incinerator in North Andover, MA reduce nearby house prices?
This is a repeated cross-section: the houses sold in 1981 (after) are different from the houses sold in 1978 (before).
Data: kielmc from the wooldridge package:
library(wooldridge)
data(kielmc)
# Show 2 observations from each of the 4 groups
rows <- c(
head(which(kielmc$y81 == 0 & kielmc$nearinc == 0), 2),
head(which(kielmc$y81 == 0 & kielmc$nearinc == 1), 2),
head(which(kielmc$y81 == 1 & kielmc$nearinc == 0), 2),
head(which(kielmc$y81 == 1 & kielmc$nearinc == 1), 2)
)
kielmc[rows, c("rprice", "y81", "nearinc", "age")] rprice y81 nearinc age
14 52000.00 0 0 32
15 49000.00 0 0 18
1 60000.00 0 1 48
2 40000.00 0 1 83
187 90245.77 1 0 1
188 46082.95 1 0 41
180 37634.41 1 1 81
181 39938.55 1 1 71
rprice: house price in 1978 dollars (Y_{it}).
y81: 1 if year is 1981 (after incinerator announced), 0 if 1978 (t = 1 if 1981, 0 if 1978).
nearinc: 1 if house is near the incinerator site (D_i).
Compute the four group means:
Computing the DID by hand:
The DID regression, where y81nrinc = \text{y81} \times \text{nearinc} is the interaction term:
options(scipen = 999)
reg_did <- lm(rprice ~ y81 + nearinc + y81nrinc, data = kielmc)
round(summary(reg_did)$coefficients, 4) Estimate Std. Error t value Pr(>|t|)
(Intercept) 82517.23 2726.910 30.2603 0.0000
y81 18790.29 4050.065 4.6395 0.0000
nearinc -18824.37 4875.322 -3.8612 0.0001
y81nrinc -11863.90 7456.646 -1.5911 0.1126
Coefficient on y81nrinc matches the DID from the 2×2 table.
\hat{\beta} = -\$11{,}864 (SE = 7{,}457, p = 0.113): negative but not significant at 5%.
Parallel trends: without the incinerator, house prices near and far from the site would have followed the same trend over time.
No anticipation: before the incinerator was announced, living near the future site did not affect house prices.
Because this is a repeated cross-section, the houses sold in 1981 are different from those sold in 1978.
What if the mix of houses sold changed over time? Look at the average age of houses sold:
Compositional change: In the “Far” group, houses sold in 1981 were 4.25 years newer than in 1978. But in the “Near” group, they were 11.84 years newer.
The “Near” group got disproportionately newer. Since newer houses generally sell for more, this drastic change in composition artificially pushes the “Near” average prices up.
This artificial price bump from the change in age composition partially masked the negative effect of the incinerator in our simple DID regression.
To remove this bias, we must control for age. This adjusts for the fact that the two groups experienced different compositional changes over time.
reg_did_cov <- lm(rprice ~ y81 + nearinc + y81nrinc + age + I(age^2),
data = kielmc)
round(summary(reg_did_cov)$coefficients, 4) Estimate Std. Error t value Pr(>|t|)
(Intercept) 89116.5354 2406.0511 37.0385 0.0000
y81 21321.0418 3443.6311 6.1914 0.0000
nearinc 9397.9359 4812.2218 1.9529 0.0517
y81nrinc -21920.2700 6359.7454 -3.4467 0.0006
age -1494.4240 131.8603 -11.3334 0.0000
I(age^2) 8.6913 0.8481 10.2476 0.0000
With age controls: \hat{\beta} = -\$21{,}920 (SE = 6{,}360, p = 0.0006) — nearly twice as large, highly significant.
Estimated bias from compositional change: -\$11{,}864 - (-\$21{,}920) \approx +\$10{,}056. The simple DID was severely biased upward because the “Near” houses sold in 1981 were unusually new.
With only two time periods, anticipation effects cannot be separately identified.
With multiple pre-treatment periods, use an event study design. Data span t = -T, \ldots, -1, 0, 1, \ldots, T', where {\color{red}t = 0} is now the treatment date (not “before” as in the two-period model).
Replace the single interaction \beta(t \cdot D_i) with a separate coefficient per period. The baseline is t = -1 (last pre-treatment period):
\begin{align*} Y_{it} &= \alpha + \sum_{s \neq -1} {\color{blue}\delta_s} \cdot \mathbb{1}[t = s] + {\color{purple}\gamma} D_i \\ &\quad + \sum_{s \neq -1} {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] + U_{it}. \end{align*}
Expected outcomes for t = -2 and t = -1:
| D_i = 0 (Control) | D_i = 1 (Treatment) | |
|---|---|---|
| t = -1 | \alpha | \alpha + {\color{purple}\gamma} |
| t = -2 | \alpha + {\color{blue}\delta_{-2}} | \alpha + {\color{blue}\delta_{-2}} + {\color{purple}\gamma} + {\color{teal}\beta_{-2}} |
| Backward trend (t\!:\! -1 \to -2) | {\color{blue}\delta_{-2}} | {\color{blue}\delta_{-2}} + {\color{teal}\beta_{-2}} |
{\color{teal}\beta_{-2}} is the difference in backward trends from t = -1 to t = -2 between treatment and control: ({\color{blue}\delta_{-2}} + {\color{teal}\beta_{-2}}) - {\color{blue}\delta_{-2}} = {\color{teal}\beta_{-2}}.
More generally, {\color{teal}\beta_s} for s \leq -2 is the difference in backward trends from t = -1 to t = s between treatment and control.
No anticipation + parallel pre-trends \implies {\color{teal}\beta_s} = 0 for all s \leq -2. However, we never observe Y_{it}(0) for the treated group, so nonzero {\color{teal}\beta_s} tests the joint hypothesis: we cannot determine which assumption failed.
The critical point: with multiple pre-treatment periods, we can compare the two groups before treatment. Since {\color{teal}\beta_s} for s \leq -2 are estimable from pre-treatment data, {\color{teal}\beta_s} = 0 is a testable implication of the joint hypothesis.
{\color{teal}\beta_s} all equal and nonzero suggests anticipation at t = -1: backward trends are the same for all s \leq -2, but something shifts at the baseline period.
{\color{teal}\beta_s} nonzero and unequal suggests a parallel trends violation: groups were already diverging before treatment.
For s \geq 0: under parallel trends and no anticipation, {\color{teal}\beta_s} measures the treatment effect at period s relative to t = -1.
In the two-period model, the term {\color{blue}\delta} \cdot t creates two intercepts: \alpha at t = 0 and \alpha + {\color{blue}\delta} at t = 1.
With multiple periods t = -T, \ldots, -1, 0, 1, \ldots, T', we cannot use {\color{blue}\delta} \cdot t because that forces a linear trend: the time effect at period s would be \delta \cdot s, with no flexibility.
Instead, include a separate dummy for each period. The regression actually estimated is:
\begin{align*} Y_{it} &= \alpha + \cdots + {\color{blue}\delta_{-2}} \cdot \mathbb{1}[t = -2] + {\color{blue}\delta_0} \cdot \mathbb{1}[t = 0] + {\color{blue}\delta_1} \cdot \mathbb{1}[t = 1] + \cdots \\ &\quad + {\color{purple}\gamma} D_i + \cdots + U_{it}. \end{align*}
The key property: \mathbb{1}[t = s] equals 1 when t = s and 0 otherwise. At t = -1 all dummies are zero (baseline). At any other t, exactly one dummy equals 1:
| Period | Active dummy | Intercept |
|---|---|---|
| t = -1 | none (baseline) | \alpha |
| t = 0 | \mathbb{1}[t = 0] = 1 | \alpha + {\color{blue}\delta_0} |
| t = 1 | \mathbb{1}[t = 1] = 1 | \alpha + {\color{blue}\delta_1} |
Each period gets its own intercept. The compact notation \sum_{s \neq -1} {\color{blue}\delta_s} \cdot \mathbb{1}[t = s] writes the time dummies as a sum. The event study model with all the dummies written out:
\begin{align*} Y_{it} &= \alpha + \cdots + {\color{blue}\delta_{-2}} \mathbb{1}[t\!=\!-2] + {\color{blue}\delta_0} \mathbb{1}[t\!=\!0] + {\color{blue}\delta_1} \mathbb{1}[t\!=\!1] + \cdots \\ &\quad + {\color{purple}\gamma} D_i + \cdots + {\color{teal}\beta_{-2}} D_i \mathbb{1}[t\!=\!-2] + {\color{teal}\beta_0} D_i \mathbb{1}[t\!=\!0] + {\color{teal}\beta_1} D_i \mathbb{1}[t\!=\!1] + \cdots + U_{it}. \end{align*}
At each t, exactly one time dummy survives (and at t = -1 none do). So \alpha + {\color{blue}\delta_t} is the intercept at time t. Define {\color{blue}\lambda_t} = \alpha + {\color{blue}\delta_t} (with \delta_{-1} = 0):
\ldots, \quad {\color{blue}\lambda_{-1}} = \alpha, \quad {\color{blue}\lambda_0} = \alpha + {\color{blue}\delta_0}, \quad {\color{blue}\lambda_1} = \alpha + {\color{blue}\delta_1}, \quad \ldots
The {\color{blue}\lambda_t}’s are called time fixed effects. This is the regression we actually run in OLS, with a dummy for each period:
\begin{align*} Y_{it} &= \cdots + {\color{blue}\lambda_{-1}} \mathbb{1}[t\!=\!-1] + {\color{blue}\lambda_0} \mathbb{1}[t\!=\!0] + {\color{blue}\lambda_1} \mathbb{1}[t\!=\!1] + \cdots \\ &\quad + {\color{purple}\gamma} D_i + \cdots + {\color{teal}\beta_{-2}} D_i \mathbb{1}[t\!=\!-2] + {\color{teal}\beta_0} D_i \mathbb{1}[t\!=\!0] + {\color{teal}\beta_1} D_i \mathbb{1}[t\!=\!1] + \cdots + U_{it}. \end{align*}
At each t, exactly one \lambda-dummy equals 1 and all others are zero:
\underbrace{\cdots + {\color{blue}\lambda_t} \cdot 1 + \cdots}_{\text{only } {\color{blue}\lambda_t} \text{ survives}} = {\color{blue}\lambda_t}.
The notation {\color{blue}\lambda_t} is shorthand for this — it represents whichever \lambda is active at time t:
Y_{it} = {\color{blue}\lambda_t} + {\color{purple}\gamma} D_i + \sum_{s \neq -1} {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] + U_{it}.
Note: The following requires panel data (observing the exact same individuals over time). It cannot be used with repeated cross-sections like the incinerator example.
The same idea can be applied to individuals. In the event study model, the intercept for individual i is \alpha + {\color{purple}\gamma} D_i. This allows only two values:
| Group | Intercept |
|---|---|
| Control (D_i = 0) | \alpha |
| Treatment (D_i = 1) | \alpha + {\color{purple}\gamma} |
All individuals within the same group share the same baseline. But individuals may differ even within a group (e.g., houses near the incinerator differ in size, age, neighborhood quality).
Individual fixed effects give each individual its own dummy, just like time fixed effects give each period its own dummy. The regression is:
\begin{align*} Y_{it} &= {\color{purple}\alpha_1} \mathbb{1}[i\!=\!1] + \cdots + {\color{purple}\alpha_n} \mathbb{1}[i\!=\!n] \\ &\quad + \text{(time and treatment terms)} + U_{it}. \end{align*}
Only the dummy for individual i is active, so the intercept is {\color{purple}\alpha_i}. Using time fixed effects {\color{blue}\lambda_t} from the previous slide:
\begin{align*} Y_{it} &= {\color{purple}\alpha_i} + {\color{blue}\lambda_t} + {\color{red}\cancel{{\color{purple}\gamma} D_i}} \\ &\quad + \sum_{s \neq -1} {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] + U_{it}. \end{align*}
Since D_i does not change over time for any individual, it is already captured by {\color{purple}\alpha_i}. Including both {\color{purple}\alpha_i} and {\color{purple}\gamma} D_i would be perfect multicollinearity, so we drop {\color{purple}\gamma} D_i:
\begin{align*} Y_{it} &= {\color{purple}\alpha_i} + {\color{blue}\lambda_t} \\ &\quad + \sum_{s \neq -1} {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] + U_{it}. \end{align*}
The interaction terms {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] survive because they vary across both individuals and time.
Combining individual and time fixed effects gives the two-way fixed effects (TWFE) model:
\begin{align*} Y_{it} &= {\color{purple}\alpha_i} + {\color{blue}\lambda_t} \\ &\quad + \sum_{s \neq -1} {\color{teal}\beta_s} \cdot D_i \cdot \mathbb{1}[t = s] + U_{it}, \end{align*}
where {\color{purple}\alpha_i} is the individual fixed effect and {\color{blue}\lambda_t} is the time fixed effect.
This is the standard event study specification used in the literature for DID with multiple periods of panel data.