POLI 572B

Michael Weaver

April 12, 2024

Plan for Today

Interaction Effects

Robustness

Discussion of Papers
Review
- McGrath
- Truex
- Fowler and Hall

Robustness

McGrath

Truex

Fowler and Hall

Interaction Effects

Is the effect of some cause different for different sets of cases?

heterogeneous treatment effects
“moderators”: factors that intensify/weaken some causal effect
differences-in-differences

Interaction: Example

Did the local success of an explicitly secular political party, Indian National Congress (INC), reduce the likelihood of religious riots between Hindus and Muslims in India?

Example

Nellis et al compare the incidence of riots in districts in which the INC barely won and barely lost MLA elections (w/ in 1 ppt).

They estimate the model:

\[\mathrm{Any \ Riot}_{it} = \alpha + \beta_1 \mathrm{INC \ Victory}_{it} + \epsilon_{it}\]

	Any Riot
Intercept	0.236
	(0.022)
INC Win	−0.076
	(0.031)
Num.Obs.	644
R2	0.009

Example

Wilkinson (2004) argues that political parties work to stop religious riots in India when Muslims are electorally pivotal.

Muslims are typically victims in riots; will vote against parties to punish them
Muslims more pivotal when the effective number of parties is higher

Example

If the INC was responsive to this electoral logic, we should expect causal effect of INC victory on riots to be stronger when the ENP was above the median:

This is a Binary by Binary interaction:

INC win/lose is binary
High/Low ENP is binary

Interaction Effects

How do we estimate an interaction effect?

Include both causal variable and interacting variable, AND their product.

\(y_i = \beta_0 + \beta_1 D_i + \beta_2 X_i+ \beta_3 D_iX_i + \epsilon_i\)

Interaction Effects | Example

Easy to do in R:

m2 = lm(any_riot ~ INC_win*high_enp, data = riots)
m3 = lm(any_riot ~ INC_win + high_enp + INC_win:high_enp, 
        data = riots)

* expands out the ‘main’ effects and interaction effects

: just multiplies two variables together, no ‘main’ effects

Interaction Effects | Interpretation?

	Any Riot
Intercept	0.242
	(0.032)
INC Win	−0.084
	(0.045)
High ENP	−0.009
	(0.045)
INC Win x High ENP	0.012
	(0.063)
Num.Obs.	636
R2	0.010

Interaction Effects | Example

Different intercepts (means) for each group:

	INC Lose	INC Win
ENP Low	`Intercept`	`Intercept + INC_win`
ENP High	`Intercept + high_enp`	`Intercept + high_enp +` `INC_win + INC_win:high_enp`

Interaction Effects | Example

Interpretation:

Interaction Effects: Effect of INC victory differs across ENP?

INC_win:high_enp tells us how much more likely riots are (on average) when INC wins vs loses in High ENP districts, compared to when INC wins vs loses in Low ENP districts.

“Main Effects”:

INC_win: change probability of riot when INC wins (vs loses) in low ENP areas.
hign_enp: difference in probability of riot in high vs low ENP areas where INC loses.

Interaction Effects | DiD

Different intercepts (means) for each group:

	Untreated	Treated
Pre	`Intercept`	`Intercept + Treated`
Post	`Intercept + Post`	`Intercept + Treated +` `Post + Treated:Post`

lm(Y ~ Treated*Post, data = data)

Interaction Effects | DiD

If we have many treated, many untreated units \(i\), across multiple time periods \(t\), it is common to do this:

lm(Y ~ Treated*Post + dummy_i + dummy_t, data = data)
feols(Y ~ Treated*Post | dummy_i + dummy_t, data = data)

Which of these four coefficients will we be able to actually estimate? (think about linear independence)

Intercept + Treated + Post + Treated:Post

Interaction Effects

Continuous by Binary variable

Rather than fitting different intercepts for groups:

Slope of \(x_1\) depends on binary indicator (\(x_2\))
effect of binary treatment (\(x_2\)) depends on level of continuous variable
\(\hat{\beta_1}\) for \(x_1\) will be slope of \(x_1\) when \(x_2 = 0\)
\(\hat{\beta_2}\) for \(x_2\) will be mean of \(x_2 = 1\) when \(x_1 = 0\)

Assumptions

We assume the change is linear
Both levels of binary variable present across all levels of continuous variable (common support)

Interaction Effects

Continuous by Continuous variable

Slope of \(x_1\) depends on value of \(x_2\)
Slope of \(x_2\) depends on value of \(x_1\)
\(\hat{\beta_1}\) for \(x_1\) will be slope of \(x_1\) when \(x_2 = 0\)
\(\hat{\beta_2}\) for \(x_2\) will be slope of \(x_2\) when \(x_1 = 0\)

Assumptions

We assume the change is linear
all levels of one variable present across all levels of other variable (common support)
This is much harder to achieve!

Interaction Effects

We can look at voting for Black Suffrage in Iowa.

Recall: investigation of effect of enlistment rates in the Civil War and on change in voting for suffrage between 1857 and 1868 (pre- and post- war).

Does the effect of having veterans in a county depend on what happened during their military service?

Does combat experience matter?

Example:

	Suffrage Change
Intercept	0.638
	(0.170)
Enlist Rate	−0.577
	(0.519)
Combat Days	−0.010
	(0.005)
Enlist X Combat	0.031
	(0.016)
Num.Obs.	79
R2	0.193

Example:

We can center both variables at \(0\) to make it easier to interpret:

iowa$enlist_pct_c = iowa$enlist_pct %>% scale(scale = F)
iowa$mean_combat_days_c = iowa$mean_combat_days %>% scale(scale = F)
lm_iowa_c = lm(Suffrage_Diff ~ enlist_pct_c*mean_combat_days_c, iowa)

	Suffrage Change
Intercept	0.449
	(0.010)
Enlist Rate (0-	0.524
	(0.125)
Combat Days (0)	0.000
	(0.001)
Enlist X Combat	0.031
	(0.016)
Num.Obs.	79
R2	0.193

Now main effects are interpretable.

Marginal Effects

For continuous interactions, we need to calculate the marginal effect

marginal effect: unit effect of \(x\) on \(y\) at a given value of \(z\).

\(y_i = \beta_0 + \beta_1 x_i+ \beta_2 z_i+ \beta_3 x_iz_i + \epsilon_i\)

Marginal Effects

Marginal effects have their own standard errors.

Must be computed using variance-covariance matrix
Why?: marginal effect is \(\beta_1 + \beta_3 z_i\)

\(Var(aX + bZ) = a^2 Var(X) + b^2 Var(Z) + 2ab Cov(X,Z)\)

\(Var(\beta_1 + z_i \beta_3) = Var(\beta_1) + z_i^2 Var(\beta_3) + 2z_i Cov(\beta_1,\beta_3)\)

Marginal Effects example:

combat_seq = seq(min(iowa$mean_combat_days), 
                 max(iowa$mean_combat_days),
                 length.out = 100)
mfx = lm_iowa$coefficients['enlist_pct'] + 
      lm_iowa$coefficients['enlist_pct:mean_combat_days']*combat_seq

Marginal Effects example:

Standard Errors:

cov_mat = vcov(lm_iowa)

m_var = diag(cov_mat)['enlist_pct'] +
  combat_seq^2*diag(cov_mat)['enlist_pct:mean_combat_days'] +
  combat_seq*2*cov_mat['enlist_pct','enlist_pct:mean_combat_days']
mse = sqrt(m_var)

l = mfx - 1.96*mse
u = mfx + 1.96*mse

Marginal Effects example:

Continuous Interactions

Hainmueller et al show this can go wrong:

interflex package helpful for:

checking linearity assumptions
plotting linear and non-linear marginal effects
no need to calculate marginal effects w/ SEs

Interactions:

For binary-binary interactions, interpretation and estimation is straightforward.
For any continuous interactions, estimation is tricky and we must calculate marginal effects.
- Use interflex package to handle this
- Check for common support
- Get tests of linearity, non-linear interactions, SEs, plots

Sensitivity

Conditional Independence

Main identifying assumption (for unbiased estimates of causal effects) is:

Conditional Independence Assumption: we have conditioned on all relevant confounding variables
Because we do not know nature of confounding, this is untestable
Must argue that:
- relevant confounders included in conditioning
- remaining possible confounders are unlikely to alter conclusions
These arguments based on:
- case/theoretical knowledge
- tests of sensitivity

Sensitivity

Cinella and Hazlett (2020) develop tools to analyze sensitivity of regression results to violations of conditional independence assumption:

assuming there is an unmeasured confounding variable, how “large” would it have to be alter the results?
- how strongly is the confounder related to treatment? to outcome?
- “strength” measured in terms of partial \(R^2\): variation “explained” by the confounder

Sensitivity

Link to partial identification:

Can be used to set bounds on causal effect in worst-case scenario, where all remaining variation in treatment and outcome explained by a confounder
Can establish “how big” that confounding is relative to known confounders to make an argument for plausible values
- comparison to “known confounder” creates standard candle

Sensitivity

Effect of income on Anti-Muslim Prejudice in Myanmar

m_linear = lm(svy_sh_anti_muslim_prejudice_all ~ svy_sh_income_rc + 
                svy_sh_female_rc + svy_sh_age_rc + 
                svy_sh_education_rc + svy_sh_ethnicity_rc +  
                svy_sh_religion_rc + svy_sh_profession_type_rc + 
                svy_sh_income_source_rc, data = analysis_df)

	linear
(Intercept)	0.794*** (0.013)
svy_sh_income_rc	−0.053*** (0.004)
svy_sh_female_rc	0.054*** (0.005)
svy_sh_age_rc.L	0.028** (0.010)
svy_sh_age_rc.Q	−0.010 (0.009)
svy_sh_age_rc.C	0.010 (0.008)
svy_sh_age_rc^4	−0.006 (0.007)
svy_sh_age_rc^5	−0.008 (0.006)
svy_sh_education_rc.L	−0.097*** (0.012)
svy_sh_education_rc.Q	0.054*** (0.011)
svy_sh_education_rc.C	0.132*** (0.012)
svy_sh_education_rc^4	0.070*** (0.011)
svy_sh_education_rc^5	0.028*** (0.007)
svy_sh_ethnicity_rcChin	0.035** (0.013)
svy_sh_ethnicity_rcMon	0.040** (0.015)
svy_sh_ethnicity_rcKachin	0.046+ (0.024)
svy_sh_ethnicity_rcKayah	0.063*** (0.016)
svy_sh_ethnicity_rcKayin	0.023* (0.010)
svy_sh_ethnicity_rcRakhine	0.105*** (0.023)
svy_sh_ethnicity_rcShan	0.012 (0.008)
svy_sh_ethnicity_rcMixed ancestry	0.010 (0.030)
svy_sh_ethnicity_rcNon-TYT	−0.116*** (0.013)
svy_sh_religion_rcChristian	−0.116*** (0.010)
svy_sh_religion_rcMuslim	−0.598*** (0.014)
svy_sh_religion_rcHindu	−0.151*** (0.026)
svy_sh_religion_rcAnimist	−0.013 (0.049)
svy_sh_religion_rcAthiest	−0.670*** (0.100)
svy_sh_religion_rcOther	−0.499* (0.194)
svy_sh_profession_type_rc2	−0.005 (0.010)
svy_sh_profession_type_rc3	−0.025* (0.011)
svy_sh_profession_type_rc4	−0.027** (0.010)
svy_sh_profession_type_rc5	0.000 (0.010)
svy_sh_profession_type_rc6	0.002 (0.018)
svy_sh_income_source_rcDay Labour	−0.011 (0.011)
svy_sh_income_source_rcRetired	−0.006 (0.027)
svy_sh_income_source_rcService Provider	−0.008 (0.012)
svy_sh_income_source_rcShop Owner	−0.013 (0.012)
svy_sh_income_source_rcStaff	−0.003 (0.011)
svy_sh_income_source_rcTrader	−0.037* (0.016)
Num.Obs.	28179
R2	0.161
RMSE	0.43

require(sensemakr)
require(stringr)
covars = model.matrix(m_linear) %>% colnames
covars_edu = covars %>% str_detect("education") %>% covars[.]
myanmar.sensitivity <- sensemakr(model = m_linear, 
                                treatment = "svy_sh_income_rc",
                                benchmark_covariates = list('education' = covars_edu),
                                kd = 1:3,
                                ky = 1:3, 
                                q = 1,
                                alpha = 0.05, 
                                reduce = TRUE)

Adjusted effect for confounder 1-3 times larger than education.

Adjusted effect if confounder explained all or most of the outcome

Practice:

Using the veterans data:

Regress suff65_yes on enlist_rate, suff57_yes, and state.
Using sensemakr, examine the sensitivity of enlist_rate, using suff57_yes as the ‘benchmark’
Interpret the results of plot(enlist_sensitivity)

Measurement Error

We have previously assumed we measured variables without error. Not a reasonable assumption. How does this affect estimated effects of \(D\)?

if errors in measurements (of \(D\) or \(Y\)) correlated with causal variable \(D\), estimates will be biased (can even be confounding)
if measurements have random error, it depends…

Measurement Error

First, the good news:

If we have random measurement error in \(Y\) or \(D\), problems are manageable.

Measurement Error

This is our model:

\[Y^* = \alpha + \beta D^* + \epsilon\]

\(Y^*\) is the true value of the variable that we observe as \(Y\).

\(E(\nu)=0\), \(\nu\) independent of \(Y^*, D^*\), i.i.d. with variance \(\sigma^2_\nu\)

\[Y = Y^* + \nu\]

\(D^*\) is the true value of the variable that we observe as \(D\).

\(E(\eta)=0\), \(\eta\) independent of \(D^*, Y^*\), i.i.d. with variance \(\sigma^2_\eta\)

\[D = D^* + \eta\]

Measurement Error

What happens when we have measurement error in \(Y\)?

\[Y = \alpha + \beta D^* + \epsilon + \nu\]

Recall: \(E(\nu) = 0; Cov(D^*,\nu) = 0\). So, estimate \(\widehat{\beta}\) for \(\beta\) is unbiased.

Standard errors are affected:

Variance Covariance matrix is now:
\(Var(\epsilon + \nu)(D^{*\prime} D^{*})^{-1}\), not \(Var(\epsilon)(D^{*\prime} D^{*})^{-1}\)
Thus, variances, SEs will be larger due to random measurement error.

Measurement Error

What happens when we have measurement error in \(D\)?

\[Y^* = \alpha + \beta (D - \eta) + \epsilon\]

If we estimate the model using just \(D\), not \(D^*\)

\[Y^* = \alpha + \beta D + \rho\]

\(\rho = (\epsilon - \beta\eta)\).
This produces bias in \(\widehat{\beta}\), because:

\[cov(D, \rho) = cov(D^* + \eta, \epsilon - \beta\eta) = -\beta \sigma^2_\eta \neq 0\]

Measurement Error

What happens when we have measurement error in \(D\)?

We get bias… but what does this bias look like?

\[\small\begin{eqnarray} \widehat{\beta} &=& \frac{Cov(D, Y^*)}{Var(D)} \\ &=& \frac{Cov(D^* + \eta, \alpha + \beta D^* + \epsilon)}{Var(D^* + \eta)} \\ &=& \frac{\beta Cov(D^*, D^*)}{Var(D^*) + Var(\eta)} \\ &=& \beta \frac{Var(D^*)}{Var(D^*) + Var(\eta)} \\ \end{eqnarray}\]

Measurement Error

Attenuation bias!

\[\widehat{\beta} = \beta \frac{Var(D^*)}{Var(D^*) + Var(\eta)}\]

This is \(\beta\) times a ratio of variance of true value over that of the observed. (Signal to noise). As the errors \(\eta\) increase, this ratio goes toward \(0\). So \(\widehat{\beta}\) suffers from attenuation bias.

Our estimates of \(\beta\) will be biased toward zero, which may be “conservative”.

Measurement Error

The good news:

Measurement error in \(Y\) produces wider standard errors; no bias
Measurement error in \(D\) produces bias toward zero (for \(D\)); (known direction and bounds)

But…

What if we to estimate the causal effect of \(D\) on \(Y\), conditioning on \(X\)?

Measurement Error:

Imagine we evaluate the effect of COVID vaccination for people who self select into the vaccine.

What is the effect of the vaccine? We can naively compare people who select into vaccination…

Vaccine	Conspiracy	N	COVID.Rate
Yes	Yes	100	0.010
Yes	No	400	0.005
No	Yes	400	0.100
No	No	100	0.050

What is the naive (without conditioning) estimated effect of vaccination?

Measurement Error:

Or we can condition on, e.g., belief in conspiracies about COVID/vaccines that might relate to vaccine uptake and other COVID propensities…

Vaccine	Conspiracy	N	COVID.Rate
Yes	Yes	100	0.010
Yes	No	400	0.005
No	Yes	400	0.100
No	No	100	0.050

Measurement Error:

In this example:

Failing to condition on conspiracy beliefs leads us to overstate the effect of vaccines, because…

Conspiracists less likely to take vaccine, have higher propensity to be exposed to/infected by COVID in absence of treatment.

Measurement Error:

What if we measure conspiracy beliefs with error: 40 percent chance of incorrect classification. We observe this data:

Vaccine	Conspiracy	N	COVID.Rate
Yes	Yes	220	0.0064
Yes	No	280	0.0057
No	Yes	280	0.0930
No	No	220	0.0860

What is the naive (no conditioning) estimate of the effect of the vaccine?

What is the estimated effect of the vaccine conditioning on conspiracy-belief?

Measurement Error:

Measurement error in \(X\) can lead to bias in estimated slope on \(D\) (\(\widehat{\beta_{D}} \neq \beta_{D}\))
A problem if \(X\) is really needed for conditioning.

Measurement Error

The good news:

Measurement error in \(Y\) produces wider standard errors; no bias
Measurement error in \(X\) produces bias toward zero; (known direction and bounds)
- true in bivariate, multivariate cases

The bad news:

Conditioning will fail to remove bias if conditioning variables measured with error.