Conditioning with Weighting
Interrupted Time Series
Difference-in-Differences
Natural Experiments (Extra slides)
We want to “block” backdoor paths / confounding variables (\(X_i\)) by…
In order for conditioning to estimate the \(ACE\) without bias, we must assume
\(1\). Ignorability/Conditional Independence: within strata of \(X\), potential outcomes of \(Y\) must be independent of \(D\) (i.e. for cases with same values of \(X\), \(D\) must be as-if random)
In order for conditioning to estimate the \(ACE\) without bias, we must assume
\(2\). Positivity/Common Support: For all values of treatment \(d\) in \(D\) and all value of \(x\) in \(X\): \(Pr(D = d | X = x) > 0\) and \(Pr(D = d | X = x) < 1\)
In order for conditioning to estimate the \(ACE\) without bias, we must assume
Imputation “blocks” the part of the backdoor path toward the outcome \(Y\) (condition on variables to adjust missing potential outcomes of \(Y\))
Reweighting focuses on the selection into treatment \(D\) (condition on variables to adjust for probability of receiving \(D\)).
Consider this situation:
During the US Civil War, sailors earned “prize money” if they served on ships that captured vessels that were running the blockade of the Confederacy. What was the effect of receiving transfers of prize money from the federal government for the economic status of African Americans?
We can compare property held by former black sailors who received (\(D_i = 1\)) prize money vs those did not (\(D_i = 0\))
\(i\) | \(Property_i(0)\) | \(Property_i(1)\) | \(Prize_i\) | \(Free_i\) | \(Property_i\) |
---|---|---|---|---|---|
1 | 1 | 3 | 1 | 1 | 3 |
2 | 1 | 3 | 1 | 1 | 3 |
3 | 1 | 3 | 1 | 1 | 3 |
4 | 0 | 1 | 1 | 0 | 1 |
5 | 1 | 3 | 0 | 1 | 1 |
6 | 0 | 1 | 0 | 0 | 0 |
7 | 0 | 1 | 0 | 0 | 0 |
8 | 0 | 1 | 0 | 0 | 0 |
If confounding \(\to\) observed \(E[Y_i(1)|D_i = 1] \neq E[Y_i(1)]\). Observed property for prize winners \(\neq\) mean property if all won prizes. Why? (board)
People who were born free more likely to end up “treated” with prizes: “over-represented” in our estimates of average \(Y_i(1)\)
People who were born slaves less likely to end up “treated” with prizes: “under-represented” in our estimates of average \(Y_i(1)\)
(converse is true for \(Y_i(0)\))
But, if we know the probability each case was treated (\(Pr(D_i)\)), so we can re-weight observed values of \(Y(1)\) such that we reconstruct to be representative of the “population” \(E[Y(1)]\). The same can be done for \(Y(0)\)
Under the conditional independence assumption, \(Pr(D | X = x)\) is the same for all cases \(X = x\):
\(Pr(D | X = x)\) is called the propensity score, which we can estimate.
TO THE BOARD
Example (1): Let’s “condition” on \(X\) with reweighting
\(i\) | \(Property_i(0)\) | \(Property_i(1)\) | \(Prize_i\) | \(Free_i\) | \(Property_i\) |
---|---|---|---|---|---|
6 | 0 | 1 | 0 | 0 | 0 |
7 | 0 | 1 | 0 | 0 | 0 |
8 | 0 | 1 | 0 | 0 | 0 |
4 | 0 | 1 | 1 | 0 | 1 |
5 | 1 | 3 | 0 | 1 | 1 |
1 | 1 | 3 | 1 | 1 | 3 |
2 | 1 | 3 | 1 | 1 | 3 |
3 | 1 | 3 | 1 | 1 | 3 |
What are the “propensity scores” when \(Free_i = 1\)? \(Free_i = 0\)?
This approach leads us to the inverse probability weighting estimator of the \(ACE\):
\[\widehat{ACE} = \frac{1}{N}\sum\limits_{i=1}^N \frac{D_iY_i}{\widehat{Pr}(D_i|\mathbf{X_i})}-\frac{(1-D_i)Y_i}{1-\widehat{Pr}(D_i|\mathbf{X_i})}\]
Example (1): Let’s “condition” on \(X\) with reweighting
\(i\) | \(Property_i(0)\) | \(Property_i(1)\) | \(Prize_i\) | \(Free_i\) | \(Property_i\) |
---|---|---|---|---|---|
6 | 0 | 1 | 0 | 0 | 0 |
7 | 0 | 1 | 0 | 0 | 0 |
8 | 0 | 1 | 0 | 0 | 0 |
4 | 0 | 1 | 1 | 0 | 1 |
5 | 1 | 3 | 0 | 1 | 1 |
1 | 1 | 3 | 1 | 1 | 3 |
2 | 1 | 3 | 1 | 1 | 3 |
3 | 1 | 3 | 1 | 1 | 3 |
Let’s calculate the \(ACE\)…
If “imputation” and “reweighting” are different, could we do both?
Yes.
This is called “doubly robust” estimation, as it can give us an unbiased estimate of the \(ACE\) if either the imputation model or the propensity score model is correct.
When we condition, we block specific backdoor paths that generate confounding by comparing cases that are similar on observable trait. We assume:
Conditioning and model dependence:
Many models bring additional modelling assumptions:
On top of which: we always need to argue that we have blocked all backdoor paths (conditional independence)
Instead of relying conditioning, we might choose more careful research designs:
Here, the structure of the comparison motivates an argument for independence of \(D\) and potential outcomes. Rather than block specific confounding variables, we eliminate confounding due to a class of confounding variables.
Social and political theorists have frequently argued that media—by shaping perceptions of events in the world, exposing people to narrative frames—affects beliefs and behaviors.
[board]
Foos and Bischoff (2022) examine the effect of changing exposure to The Sun on anti-EU attitudes and voting in the UK.
We could simply compare attitudes about the EU in areas with greater and less readership of The Sun (or among people who read The Sun versus those that do not):
In 1989, ~100 fans of Liverpool FC died in stampede at a match:
Is there any way to make use of this event to learn about the effect of reading The Sun?
We could compare attitudes toward the EU in Liverpool before and after the boycott: this sometimes called an interrupted time series
Plug in the observed outcome before treatment for the counterfactual outcome after treatment: \(t=1\) is post-treatment, \(t=0\) is pre-treatment.
\[\tau_i = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]
Plugging in:
\[\widehat{\tau_i} = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\]
In order for interrupted time series to work, we must assume:
\[\overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}} = \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]
That in the absence of the treatment, outcomes of \(Y\) would not have changed from before to after treatment.
Assumptions imply none of the following occurred (terms from Campbell and Ross)
SUPER IMPORTANT: If there is some other factor that changes over time and affects \(Y\), it can induce bias …
…EVEN IF IT DOES NOT CAUSE \(D\).
The comparison holds the unit constant before and after the event \(\to\) collider bias - generating dependencies between variables that move together over time.
[TO THE BOARD]
A good example of how to do this persuasively:
What kind of causal estimand are we estimating when we do before and after comparisons?
\[\begin{split}E[\tau_i | D_i = 1] = {} \frac{1}{n}\sum\limits^{n}_{i=1} & [Y_{i,t=1}(1) | D_i = 1] - \\ & \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\end{split}\]
Is this the average causal effect?
Before-after comparisons assume no other changes in outcomes over time, but it is almost always true that
\(\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1] \neq 0\)
i.e., counterfactually, in the absence of treatment \(D\), potential outcomes \(Y_i(0)\) are changing over time.
Observed pre-treatment outcomes not a good substitute for post-treatment counterfactual outcomes.
In our example: we don’t know how EU skepticism might have trended in Liverpool absent the boycott. We do know how EU skepticism in the rest of the UK trended absent the boycott.
We don’t know: \(\color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\)
We do know: \(\underbrace{[Y_{i,t=1}(0) | D_i = 0]}_{\text{UK post-1989, no boycott}} - \underbrace{[Y_{i,t=0}(0)|D_i = 0]}_{\text{UK pre-1989, no boycott}}\)
Difference-in-differences compares changes in the treated cases against changes in untreated cases.
We use the trends in the untreated cases to plug-in for the \(\color{red}{counterfactual}\) trends (absent treatment) in the treated cases
If we assume:
\[\{\overbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}^{\text{Treated counterfactual trend}}\} = \\ \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\]
Then we can plug-in the \(observed\) untreated group trend for the \(\color{red}{counterfactual}\) treated group trend.
This is the parallel trends assumption. It is equivalent to saying there are no time-varying confounding variables that differ between treated and untreated (recall that over-time comparisons open up collider paths).
If it is true, we can do some simple algebra and find that
\([\tau_i | D_i = 1] = [Y_{i,t=1}(1) | D_i = 1] - \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\)
\(\begin{equation}\begin{split}[\tau_i | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}_{\text{Treated counterfactual trend}}\}\end{split}\end{equation}\)
Plugging in:
\(\begin{equation}\begin{split}[\widehat{\tau_i} | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\end{split}\end{equation}\)
And this gives us the name:
This shows that the Boycott of the Sun reduced Euro Skepticism in Liverpool
If the parallel trends assumption (untreated cases have the same trends as treated cases in the absence of treatment) is true…
If parallel trends assumption holds, what kinds of confounding does this design eliminate?
What are examples of confounders held constant in Sun Boycott difference-in-differences?
In the newspaper example: what would be an example of some variable that would violate parallel trends assumption?
Estimation:
\(Y_{it} = \beta_0 + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \beta_3 \text{Treated}_i \times \text{Post}_t\)
\(Y_{it} = \overbrace{\alpha_i}^{\text{dummies for each } i} + \underbrace{\alpha_t}_{\text{dummies for each } t} + \beta_3 \text{Treated}_i \times \text{Post}_t\)
How do we validate the parallel trends assumption?
We want to see if… there are divergences in trends between treated and untreated … in absence of treatment.
Tests of Parallel Trends: Event Study
Difference between treated and untreated at \(t=-1\) etc. compared to at \(t=0\).
We shouldn’t just “eyeball” the difference:
We can test the null hypothesis that the difference in trends between treated and untreated \(= 0\). If the \(p > 0.05\) (or some other \(\alpha\)) we can “fail to reject” the null of no difference in trends.
Here we are looking for negatives. We want to \(p\) value that tell us the false negative rate
We don’t want a test that stacks the deck in favor of our hypothesis of no difference..
Analogous to this situation:
We want a COVID test that we plan to use as evidence that we don’t have COVID and so can safely spend time with immunocompromised people.
But the COVID test we use has been designed to minimize false positives.
What could go wrong?
To solve this problem and get useful \(p\) values, we can conduct an equivalence test. We transform the null hypothesis.
Let us assume that there is some level of imbalance that we consider negligible, lets call that \(\delta\).
Our new null hypothesis is:
\(H_{01}: \tau <= -\delta\) OR \(H_{02}: \tau >= \delta\)
Where \(\tau\) is the difference in, e.g. pre-treatment trends for treated/untreated.
That is, two one-sided tests (TOST).
TOST:
If the probability of observing \(\hat{\beta}\) under both null hypotheses is less than \(\alpha\) we can reject the null:
\(H_1: -\delta < \tau < \delta\): the true difference is within some acceptable \(\delta\) distance to \(0\).
TOST visualization
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks data.table::between()
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ tidyr::extract() masks magrittr::extract()
## ✖ dplyr::filter() masks ggdag::filter(), stats::filter()
## ✖ dplyr::first() masks data.table::first()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ lubridate::hour() masks data.table::hour()
## ✖ lubridate::isoweek() masks data.table::isoweek()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks data.table::last()
## ✖ lubridate::mday() masks data.table::mday()
## ✖ lubridate::minute() masks data.table::minute()
## ✖ lubridate::month() masks data.table::month()
## ✖ tidyr::pack() masks Matrix::pack()
## ✖ lubridate::quarter() masks data.table::quarter()
## ✖ lubridate::second() masks data.table::second()
## ✖ dplyr::select() masks MASS::select()
## ✖ purrr::set_names() masks magrittr::set_names()
## ✖ dplyr::src() masks Hmisc::src()
## ✖ dplyr::summarize() masks Hmisc::summarize()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ✖ lubridate::wday() masks data.table::wday()
## ✖ lubridate::week() masks data.table::week()
## ✖ lubridate::yday() masks data.table::yday()
## ✖ lubridate::year() masks data.table::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
These tests can be conducted in R
using:
TOSTER
packagesequivalence_test
in parameters
packagefect
package (for differences in differences and
extensions)These tests can be inverted to get confidence intervals (range of values for \(delta\) which cannot be rejected at \(\alpha\))
These tests require, in addition to everything else:
Staggered Treatment: Bacon-Goodman 2021
Multiple Treatments: https://arxiv.org/pdf/1803.08807.pdf
Continuous Treatment: https://psantanna.com/files/Callaway_Goodman-Bacon_SantAnna_2021.pdf
Covariates: conditioning on time-varying confounders has risks
As we get away from simple DiD, assumptions and potential problems multiply, solutions get more complicated…
Assumptions:
Caveats:
Address confounding in a different way:
Distinguishing “natural experiment” from experiments:
An observational study where causal inference comes from the design that draws on randomization.
Two approaches:
Decisions:
Assumptions:
Follows from Wald estimator for non-compliance:
For more information on using them in practice: Lal, Lockhart, Xu, and Zu 2023
Problems
Assumptions: