Omitted Variables
Sensitivity (Partial Identification)
Sensitivity (Choice of Covariates)
In order for conditioning to estimate the \(ACE\) without bias, we must assume
\(1\). Ignorability/Conditional Independence: within strata of \(X\), potential outcomes of \(Y\) must be independent of cause \(D\) (i.e. within values of \(X\), \(D\) must be as-if random)
In order for conditioning to estimate the \(ACE\) without bias, we must assume
\(2\). Positivity/Common Support: For all values of treatment \(d\) in \(D\) and all value of \(x\) in \(X\): \(Pr(D = d | X = x) > 0\) and \(Pr(D = d | X = x) < 1\)
In order for conditioning to estimate the \(ACE\) without bias, we must assume
Even if we included all variables on backdoor path between \(D\) and \(Y\), regression may still produce biased estimates.
Typically: we approximate the relationship between variables in \(X\) and \(D\) to be additive and linear. If this approximation is wrong, we can have bias.
Regression seems to help get around positivity assumption…
Typically: we approximate the relationship between variables in \(X\) and \(D\) to be additive and linear.
If \(D = 1\) never occurs for certain values of \(X\) (e.g. \(X > 0\)), regression model will use linearity (or ANY functional form) to extrapolate a predicted value of what \(D = 1\) would look like when \(X > 0\).
All of this has assumed that we have blocked all backdoor paths… that Conditional Independence Assumption is correct.
What if there are unblocked confounders?
You present/submit a paper, and discussant/reviewer says: “I don’t believe this causal estimate, because there is some confounding variable you’ve not accounted for.”
What do we do?
\(1.\) They identify a specific confounding variable.
\(2.\) They propose an unknown confounding variable.
Someone argues that you’ve forgotten a specific variable. What is the bias that results?
If the true process generating the data is:
\[Y_i = \beta_0 + \beta_1 D_i + \beta_2 X_i + \nu_i\]
with \((D_i,X_i) \perp \!\!\! \perp \nu_i\), \(E(\nu_i) = 0\)
\(\nu_i\) is unit \(i\) deviation from \(Y_i(D=d, X=x) - E[Y_i(D = d | X = x)]\)
What happens when we estimate this model with a constant and \(D_i\) but exclude \(X_i\)?
\[Y_i = \beta_0 + \beta_1 D_i + \epsilon_i\]
\[\small\begin{eqnarray} \widehat{\beta_1} &=& \frac{Cov(D_i, Y_i)}{Var(D_i)} \\ &=& \frac{Cov(D_i, \beta_0 + \beta_1 D_i + \beta_2 X_i + \nu_i)}{Var(D_i)} \\ &=& \frac{Cov(D_i, \beta_1 D_i)}{Var(D_i)} + \frac{Cov(D_i,\beta_2 X_i)}{Var(D_i)} + \frac{Cov(D_i,\nu_i)}{Var(D_i)} \\ &=& \beta_1\frac{Var(D_i)}{Var(D_i)} + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)} \\ &=& \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)} \end{eqnarray}\]
So, \(E(\widehat{\beta_1}) \neq \beta_1\), it is (potentially) biased
When we exclude \(X_i\) from the regression, we get:
\[\widehat{\beta_1} = \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)}\]
This is omitted variable bias
Excluding \(X\) from the model: \(\widehat{\beta_1} = \beta_1 + \beta_2\frac{Cov(D_i, X_i)}{Var(D_i)}\)
What is the direction of the bias when:
\(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} < 0\)
\(\beta_2 < 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} < 0\)
\(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} > 0\)
\(\beta_2 = 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} > 0\)
\(\beta_2 > 0\); \(\frac{Cov(D_i, X_i)}{Var(D_i)} = 0\)
This only yields bias if two conditions are true:
\(\beta_2 \neq 0\): omitted variable \(X\) has an effect on \(Y\)
\(\frac{Cov(D_i, X_i)}{Var(D_i)} \neq 0\): omitted variable \(X\) is correlated with \(D\). (on the same backdoor path)
This is why we don’t need to include EVERYTHING that might affect \(Y\) in our regression equation; only those variables that affect treatment and the outcome.
Link to DAGs:
Link to Conditional Independence:
Link to linearity:
If someone proposes a specific omitted variable, check that in their story:
If the missing variable…
Then you need to benchmark how your estimate would change, depending on the possible magnitude of the bias.
Courtin et al examine the effect of income on anti-Muslim prejudice in Myanmar.
We have income categories (1 to 4, 4 is highest); average scores on anti-Muslim prejudice questions.
The raw relationship looks like this:
Income is not randomly assigned, and so there may be confounding. To find the effect of income on prejudice, we can condition on the following:
Estimate Effect of income on Anti-Muslim Prejudice in Myanmar
linear | |
---|---|
(Intercept) | 0.801*** (0.016) |
svy_sh_income_rc | -0.060*** (0.004) |
svy_sh_female_rc | 0.060*** (0.006) |
svy_sh_age_rc.L | 0.020+ (0.012) |
svy_sh_age_rc.Q | -0.016 (0.011) |
svy_sh_age_rc.C | -0.001 (0.009) |
svy_sh_age_rc^4 | -0.004 (0.008) |
svy_sh_age_rc^5 | -0.009 (0.007) |
svy_sh_education_rc.L | -0.105*** (0.015) |
svy_sh_education_rc.Q | 0.043*** (0.013) |
svy_sh_education_rc.C | 0.135*** (0.014) |
svy_sh_education_rc^4 | 0.074*** (0.013) |
svy_sh_education_rc^5 | 0.029*** (0.008) |
svy_sh_ethnicity_rcChin | 0.034 (0.025) |
svy_sh_ethnicity_rcMon | 0.047** (0.016) |
svy_sh_ethnicity_rcKachin | -0.071 (0.091) |
svy_sh_ethnicity_rcKayah | 0.029 (0.026) |
svy_sh_ethnicity_rcKayin | 0.000 (0.013) |
svy_sh_ethnicity_rcRakhine | 0.109*** (0.023) |
svy_sh_ethnicity_rcShan | 0.015+ (0.008) |
svy_sh_ethnicity_rcMixed ancestry | 0.010 (0.035) |
svy_sh_ethnicity_rcNon-TYT | -0.145*** (0.016) |
svy_sh_profession_type_rc2 | 0.003 (0.012) |
svy_sh_profession_type_rc3 | -0.031* (0.014) |
svy_sh_profession_type_rc4 | -0.034** (0.012) |
svy_sh_profession_type_rc5 | 0.009 (0.011) |
svy_sh_profession_type_rc6 | -0.018 (0.021) |
svy_sh_income_source_rcDay Labour | -0.014 (0.014) |
svy_sh_income_source_rcRetired | 0.003 (0.032) |
svy_sh_income_source_rcService Provider | -0.016 (0.015) |
svy_sh_income_source_rcShop Owner | -0.021 (0.015) |
svy_sh_income_source_rcStaff | 0.013 (0.014) |
svy_sh_income_source_rcTrader | -0.035+ (0.019) |
Num.Obs. | 20620 |
R2 | 0.051 |
RMSE | 0.43 |
Income is not assigned at random; so condition on these variables to “block” confounding paths:
Anything that might be MISSING from this DAG?
If we can know and can measure specific possible confounders, we just add to the model.
Cinelli and Hazlett (2020) provide a more general take on omitted variable bias…
But the intuition is similar. If there is an unknown confounder \(U\), the bias induced is a function of the relationship between \(Y \sim U\) and \(D \sim U\), just like in omitted variable bias.
If we estimate this model, where \(\widehat{\beta_D}\) is estimate of the causal effect of \(D\)
\(Y_i = \beta_0 + \beta_D D_i + \pmb{\beta_X}\mathbf{X_i} + \epsilon_i\)
\[|bias| = \widehat{se}(\widehat{\beta_D})\sqrt{\frac{R^2_{Y \sim U | D,X} \cdot R^2_{D\sim U | X}}{1- R^2_{D\sim U | X}}}\]
\[|bias| = \widehat{se}(\widehat{\beta_D})\sqrt{\frac{R^2_{Y \sim U | D,X} \cdot R^2_{D\sim U | X}}{1- R^2_{D\sim U | X}}}\]
What matters here are:
\(R^2_{Y \sim U | D,X}\): how much residual variance in outcome \(Y\) (after projecting onto \(D,X\)) is explained by \(U\)
\(R^2_{D\sim U | X}\): how much residual variance in treatment \(D\) (after projecting onto \(X\)) is explained by \(U\)
These \(R^2_{Y \sim U | D,X}\) and \(R^2_{D\sim U | X}\) (or “partial” \(R^2\)) can correspond to linear or non-linear relationships.
We can reason about partial \(R^2\) to think about extent of bias under different scenarios:
One major limitation with this approach is… what counts as “a lot” of residual variance explained? What counts as “a little”?
To make these easier to interpret, we want to get an interpretable benchmark.
What is a “known” important confounder in our model that we can compare against?
Because education is categorical, need to account for all education dummies.
require(sensemakr)
require(stringr)
covars = model.matrix(m_linear) %>% colnames
covars_edu = covars %>% str_detect("education") %>% covars[.]
myanmar.sensitivity <- sensemakr(model = m_linear,
treatment = "svy_sh_income_rc",
benchmark_covariates = list('education' = covars_edu),
kd = 1:3,
ky = 1:3,
q = 1,
alpha = 0.05,
reduce = TRUE)
Outcome: svy_sh_anti_muslim_prejudice_all | ||||||
---|---|---|---|---|---|---|
Treatment | Est. | S.E. | t-value | \(R^2_{Y \sim D |{\bf X}}\) | \(RV_{q = 1}\) | \(RV_{q = 1, \alpha = 0.05}\) |
svy_sh_income_rc | -0.06 | 0.004 | -14.404 | 1% | 9.5% | 8.3% |
Note: df = 20587; Bound ( 1x education ): \(R^2_{Y\sim Z| {\bf X}, D}\) = 1%, \(R^2_{D\sim Z| {\bf X} }\) = 4.5% |
\(RV_{q=1}\): how much residual variation in \(D\) and \(Y\) to make effect \(\to\) 0.
\(RV_{q=1 \alpha=0.05}\): how much residual variation in \(D\) and \(Y\) to make effect \(p \to > 0.05\)
\(R^2_{Y\sim D | X}\): If \(U\) explains all of residual \(Y\), how much should it explain \(D\) to make effect \(\to\) 0?
Outcome: svy_sh_anti_muslim_prejudice_all | ||||||
---|---|---|---|---|---|---|
Treatment | Est. | S.E. | t-value | \(R^2_{Y \sim D |{\bf X}}\) | \(RV_{q = 1}\) | \(RV_{q = 1, \alpha = 0.05}\) |
svy_sh_income_rc | -0.06 | 0.004 | -14.404 | 1% | 9.5% | 8.3% |
Note: df = 20587; Bound ( 1x education ): \(R^2_{Y\sim Z| {\bf X}, D}\) = 1%, \(R^2_{D\sim Z| {\bf X} }\) = 4.5% |
We can then benchmark these against what we know to be \(R^2_{Y \sim U | D,X}\) (residual \(Y\) explained) by education; and \(R^2_{D\sim U | X}\) (residual D explained) by education.
Is it plausible that there is a confounder as “big” as education?
Plots can be more informative:
#Partial R2 of U with D, Y, compared against 0 causal effect
plot(myanmar.sensitivity)
#Partial R2 of U with D, Y, compared against
#failing to reject null of no causal effect
plot(myanmar.sensitivity, sensitivity.of = "t-value")
#Partial R2 of U with D, assuming U explains all Y
plot(myanmar.sensitivity, type = 'extreme')
Adjusted effect for confounder 1-3 times larger than education.
Adjusted \(t\) value of effect for confounder 1-3 times larger than education.
Adjusted effect if confounder explained all or most of the outcome: here effect sign could easily be reversed.
Main identifying assumption (for unbiased estimates of causal effects) is Conditional Independence Assumption
These sensitivity tests give us information to make arguments using theoretical/case knowledge and model estimates to see whether confounders of “reasonable” magnitude would change the effect.
Link to partial identification:
Download the suffrage
data
suffrage = read.csv("https://www.dropbox.com/scl/fi/y5wvmmhhknzbpdufaql9p/referenda_vote.csv?rlkey=1z2hromullprdzumkrawu5n21&dl=1")
enlist_rate
as
veterans
/mil_age
suff65_yes
on enlist_rate
,
suff57_yes
, and state
.sensemakr
, calculate the sensitivity of
enlist_rate
, using suff57_yes
as the
‘benchmark’; save to object enlist_sensitivity
summary(enlist_sensitivity)
plot(enlist_summary)
(see previous slides)
But we don’t know the true DAG…
If we want to avoid \(p\) hacking and avoid evidence that fails to meet “weak severity”, we cannot just try combinations of covariates until we get a “significant” effect.
Instead:
Best practice here is to identify the many reasonable choices you could make:
Then you can conduct analyses using all combinations.
How do you report “a million f-ing regressions”?
Two parts:
Plot of coefficients w/ confidence intervals (color coded for rejection of null), sorted by coefficient size.
“Rug plots” for each attribute of a model, with lines
specr
package makes specification curves easy (but you
may not like their default plots)library(specr)
# Setup Specifications ----
specs <- setup(data = example_data, #data
y = c("y1", "y2"), #outcome measures
x = c("x1", "x2"), #treatment measures
model = c("lm"), #model to estimate
controls = c("c1", "c2"), #controls to include
subsets = list(group1 = unique(example_data$group1), #values to subset the data
group2 = unique(example_data$group2)))
# Run Specification Curve Analysis ----
results <- specr(specs)
# Plot Specification Curve ----
plot(results)
Let’s download the data on Islamist legislators in Indonesia from Problem Set 3 :
dprd_data = read.csv("https://www.dropbox.com/scl/fi/7w8zfmmpg64aqye7dxqan/dprd_ps3.csv?rlkey=m8tss3ope9hhl5clizdmnufse&dl=1")
Now…
NVM_Violence_1_deaths
Another way to get combinations of variables
vars = c('election_cycle', 'province', 'econ_hh_ag_pct', 'econ_hh_in_slums_pct', 'econ_hh_elec_pct', 'econ_kab_support_per_cap', 'demo_dcapital_dist', 'demo_scndschl_per_10k', 'att_bars_pct', 'relg_majority_muslim_pct', 'relg_mosques_per_10k', 'relg_madrasah_per_10k', 'all_IT_vs', 'dapil_seats', 'enp')
models = do.call(c, lapply(seq_along(vars), combn, x = vars, simplify = FALSE)) %>%
lapply(paste, collapse = " + ") %>%
unlist()
models %>% unique %>% length
## [1] 32767
Another way to get combinations of variables