POLI 572B

Michael Weaver

February 7, 2025

Design Based Inference

Conditioning and “Design”

Conditioning with Weighting
Interrupted Time Series
Difference-in-Differences
~~Natural Experiments~~ (Extra slides)

Conditioning

We want to “block” backdoor paths / confounding variables (\(X_i\)) by…

“holding constant” those variables
- Either we “impute” (plug-in missing potential outcomes of \(Y\)), using information about \(X_i\)
- Or we “reweight” observed \(Y\) using information about \(X_i\)

In order for conditioning to estimate the \(ACE\) without bias, we must assume

\(1\). Ignorability/Conditional Independence: within strata of \(X\), potential outcomes of \(Y\) must be independent of \(D\) (i.e. for cases with same values of \(X\), \(D\) must be as-if random)

all ‘backdoor’ paths are blocked
no conditioning on colliders to ‘unblock’ backdoor path
Connects to how imputation and reweighting work: ‘as-if random assignment’ of \(D\) for same values of \(X_i = x\)

In order for conditioning to estimate the \(ACE\) without bias, we must assume

\(2\). Positivity/Common Support: For all values of treatment \(d\) in \(D\) and all value of \(x\) in \(X\): \(Pr(D = d | X = x) > 0\) and \(Pr(D = d | X = x) < 1\)

There must be variation in the levels of treatment within every strata of \(X\)
If not, we can get other causal estimands but not \(ACE\)

In order for conditioning to estimate the \(ACE\) without bias, we must assume

No Measurement Error: If conditioning variables \(X\) are mismeasured, bias will persist.

If substantial measurement error, backdoor paths not blocked
Even random measurement error in \(X\) a problem

Conditioning by Reweighting

Imputation “blocks” the part of the backdoor path toward the outcome \(Y\) (condition on variables to adjust missing potential outcomes of \(Y\))

Reweighting focuses on the selection into treatment \(D\) (condition on variables to adjust for probability of receiving \(D\)).

Consider this situation:

During the US Civil War, sailors earned “prize money” if they served on ships that captured vessels that were running the blockade of the Confederacy. What was the effect of receiving transfers of prize money from the federal government for the economic status of African Americans?

We can compare property held by former black sailors who received (\(D_i = 1\)) prize money vs those did not (\(D_i = 0\))

\(i\)	\(Property_i(0)\)	\(Property_i(1)\)	\(Prize_i\)	\(Free_i\)	\(Property_i\)
1	1	3	1	1	3
2	1	3	1	1	3
3	1	3	1	1	3
4	0	1	1	0	1
5	1	3	0	1	1
6	0	1	0	0	0
7	0	1	0	0	0
8	0	1	0	0	0

If confounding \(\to\) observed \(E[Y_i(1)|D_i = 1] \neq E[Y_i(1)]\). Observed property for prize winners \(\neq\) mean property if all won prizes. Why? (board)

Conditioning by Reweighting

People who were born free more likely to end up “treated” with prizes: “over-represented” in our estimates of average \(Y_i(1)\)

People who were born slaves less likely to end up “treated” with prizes: “under-represented” in our estimates of average \(Y_i(1)\)

(converse is true for \(Y_i(0)\))

But, if we know the probability each case was treated (\(Pr(D_i)\)), so we can re-weight observed values of \(Y(1)\) such that we reconstruct to be representative of the “population” \(E[Y(1)]\). The same can be done for \(Y(0)\)

Conditioning by Reweighting

Under the conditional independence assumption, \(Pr(D | X = x)\) is the same for all cases \(X = x\):

i.e. for cases same in \(X_i = x\), we choose \(D_i\) at random with some probability (but it can be any probability: e.g. Game Master says you have to roll above 15 on a D20)

\(Pr(D | X = x)\) is called the propensity score, which we can estimate.

TO THE BOARD

Example (1): Let’s “condition” on \(X\) with reweighting

\(i\)	\(Property_i(0)\)	\(Property_i(1)\)	\(Prize_i\)	\(Free_i\)	\(Property_i\)
6	0	1	0	0	0
7	0	1	0	0	0
8	0	1	0	0	0
4	0	1	1	0	1
5	1	3	0	1	1
1	1	3	1	1	3
2	1	3	1	1	3
3	1	3	1	1	3

What are the “propensity scores” when \(Free_i = 1\)? \(Free_i = 0\)?

Conditioning by Reweighting

This approach leads us to the inverse probability weighting estimator of the \(ACE\):

\[\widehat{ACE} = \frac{1}{N}\sum\limits_{i=1}^N \frac{D_iY_i}{\widehat{Pr}(D_i|\mathbf{X_i})}-\frac{(1-D_i)Y_i}{1-\widehat{Pr}(D_i|\mathbf{X_i})}\]

Example (1): Let’s “condition” on \(X\) with reweighting

\(i\)	\(Property_i(0)\)	\(Property_i(1)\)	\(Prize_i\)	\(Free_i\)	\(Property_i\)
6	0	1	0	0	0
7	0	1	0	0	0
8	0	1	0	0	0
4	0	1	1	0	1
5	1	3	0	1	1
1	1	3	1	1	3
2	1	3	1	1	3
3	1	3	1	1	3

Let’s calculate the \(ACE\)…

Conditioning by Reweighting: Approaches

Estimating Propensity Scores: known probabilities (experiments), logit regression, machine learning
“Balancing Weights”: choose propensity scores to ensure “balance” on observed \(\mathbf{X_i}\) between treated and untreated.
Apply weights using Inverse Weighting

Conditioning: Why not use both?

If “imputation” and “reweighting” are different, could we do both?

Yes.

This is called “doubly robust” estimation, as it can give us an unbiased estimate of the \(ACE\) if either the imputation model or the propensity score model is correct.

Many techniques that have this property

Design vs Conditioning

When we condition, we block specific backdoor paths that generate confounding by comparing cases that are similar on observable trait. We assume:

Ignorability/Conditional Independence
Positivity/Common Support
No measurement error

Can we test or prove these assumptions?

Design vs Conditioning

Conditioning and model dependence:

we did exact matching last week: no model dependence, but curse of dimensionality
with finite cases, lots of variables, we use a mathematical model to plug-in/impute missing counterfactuals

Many models bring additional modelling assumptions:

linearity, additivity, etc.

On top of which: we always need to argue that we have blocked all backdoor paths (conditional independence)

Conditioning vs. Design

Instead of relying conditioning, we might choose more careful research designs:

comparing the same unit over time
differences-in-differences

Here, the structure of the comparison motivates an argument for independence of \(D\) and potential outcomes. Rather than block specific confounding variables, we eliminate confounding due to a class of confounding variables.

Design

Media Effects on Political Attitudes

Social and political theorists have frequently argued that media—by shaping perceptions of events in the world, exposing people to narrative frames—affects beliefs and behaviors.

What might be some obstacles to evaluating the causal effect of media exposure?
Why might experiments be a poor strategy?
Why might conditioning not work very well?

[board]

Media Effects

Foos and Bischoff (2022) examine the effect of changing exposure to The Sun on anti-EU attitudes and voting in the UK.

The Sun is a tabloid with long-standing anti-EU coverage and editorial slant
They ask: did exposure to the Sun’s coverage change attitudes about the EU?

Conditioning?

We could simply compare attitudes about the EU in areas with greater and less readership of The Sun (or among people who read The Sun versus those that do not):

What variables would you want to condition on? (Let’s draw a DAG)

What might be some reasons to doubt the effectiveness of this approach?

Alternatives:

In 1989, ~100 fans of Liverpool FC died in stampede at a match:

The Sun blamed the victims for the tragedy
Fans and teams in the Liverpool area boycotted the The Sun.

Is there any way to make use of this event to learn about the effect of reading The Sun?

Interrupted Time Series

We could compare attitudes toward the EU in Liverpool before and after the boycott: this sometimes called an interrupted time series

Compare this to conditioning: how is it different? what possible advantages/disadvantages are there?

Block all confounding variables that remain constant over time within Liverpool
No need to know what these variables are, measure them, or specify a model, etc.

Interrupted Time Series

Plug in the observed outcome before treatment for the counterfactual outcome after treatment: \(t=1\) is post-treatment, \(t=0\) is pre-treatment.

\[\tau_i = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]

Plugging in:

\[\widehat{\tau_i} = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\]

Interrupted Time Series

In order for interrupted time series to work, we must assume:

\[\overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}} = \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]

That in the absence of the treatment, outcomes of \(Y\) would not have changed from before to after treatment.

Why might this be incorrect in general? In the context of the Liverpool Sun boycott?

Interrupted Time Series

Assumptions imply none of the following occurred (terms from Campbell and Ross)

“history”: bias due to other short-term changes
“maturation”: bias due to long-term trends
“testing”: bias related to selection into “treatment” (trends lead to policy)
“regression”: extreme events lead to treatment, followed by return to the norm
“instrumentation”: treatment induces change in measurement

How might we avoid these pitfalls?

Interrupted Time Series

SUPER IMPORTANT: If there is some other factor that changes over time and affects \(Y\), it can induce bias …

…EVEN IF IT DOES NOT CAUSE \(D\).

The comparison holds the unit constant before and after the event \(\to\) collider bias - generating dependencies between variables that move together over time.

[TO THE BOARD]

Interrupted Time Series

A good example of how to do this persuasively:

Mummolo 2018

Treatment window “small” (one day to the next)
Treatment a “surprise” (police officers working the streets did not anticipate)
Treatment adopted for reasons unrelated to trends in outcome (policy change in response to court cases, not detection of guns on the street)

Interrupted Time Series

What kind of causal estimand are we estimating when we do before and after comparisons?

\[\begin{split}E[\tau_i | D_i = 1] = {} \frac{1}{n}\sum\limits^{n}_{i=1} & [Y_{i,t=1}(1) | D_i = 1] - \\ & \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\end{split}\]

Is this the average causal effect?

It is the average treatment effect on treated

Interrupted Time Series

Before-after comparisons assume no other changes in outcomes over time, but it is almost always true that

\(\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1] \neq 0\)

i.e., counterfactually, in the absence of treatment \(D\), potential outcomes \(Y_i(0)\) are changing over time.

Observed pre-treatment outcomes not a good substitute for post-treatment counterfactual outcomes.

Do we have a way to estimate what these counterfactual trends absent treatment might be?

Differences in Differences

In our example: we don’t know how EU skepticism might have trended in Liverpool absent the boycott. We do know how EU skepticism in the rest of the UK trended absent the boycott.

We don’t know: \(\color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\)

We do know: \(\underbrace{[Y_{i,t=1}(0) | D_i = 0]}_{\text{UK post-1989, no boycott}} - \underbrace{[Y_{i,t=0}(0)|D_i = 0]}_{\text{UK pre-1989, no boycott}}\)

Difference-in-Differences:

Difference-in-differences compares changes in the treated cases against changes in untreated cases.

We use the trends in the untreated cases to plug-in for the \(\color{red}{counterfactual}\) trends (absent treatment) in the treated cases

Difference-in-Differences:

If we assume:

\[\{\overbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}^{\text{Treated counterfactual trend}}\} = \\ \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\]

Then we can plug-in the \(observed\) untreated group trend for the \(\color{red}{counterfactual}\) treated group trend.

Difference-in-Differences:

This is the parallel trends assumption. It is equivalent to saying there are no time-varying confounding variables that differ between treated and untreated (recall that over-time comparisons open up collider paths).

If it is true, we can do some simple algebra and find that

\([\tau_i | D_i = 1] = [Y_{i,t=1}(1) | D_i = 1] - \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\)

\(\begin{equation}\begin{split}[\tau_i | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}_{\text{Treated counterfactual trend}}\}\end{split}\end{equation}\)

Plugging in:

\(\begin{equation}\begin{split}[\widehat{\tau_i} | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\end{split}\end{equation}\)

And this gives us the name:

\(Treated_{post} - Treated_{pre}\) (first difference)
\(Untreated_{post} - Untreated_{pre}\) (first difference)
\((Treated_{post} - Treated_{pre}) - \\ (Untreated_{post} - Untreated_{pre})\)

Difference in differences

Difference-in-differences:

This shows that the Boycott of the Sun reduced Euro Skepticism in Liverpool

If the parallel trends assumption (untreated cases have the same trends as treated cases in the absence of treatment) is true…

This is an unbiased estimate of the Average Treatment on Treated (\(ATT\)):
- the effect of treatment on the treated cases
- it is TOTALLY fine if the effect of treatment on untreated cases would have been different

Difference-in-differences:

If parallel trends assumption holds, what kinds of confounding does this design eliminate?

Any time-invariant confounders within units (unchanging causes of \(Y\))
Any time-varying confounders that are shared by both treated cases and un-treated cases (causes of \(Y\) that change similarly in both groups)
Crucial to choose right “control” cases to have parallel trends or e.g. share as many time-varying confounders.

What are examples of confounders held constant in Sun Boycott difference-in-differences?

Difference-in-differences:

In the newspaper example: what would be an example of some variable that would violate parallel trends assumption?

Difference-in-differences:

Estimation:

Classical scenario: with two “treatment” conditions, two time periods: calculate differences-in-differences in means
We can use regression, dummy for ever-treated, dummy for post-treatment, interaction of the two

\(Y_{it} = \beta_0 + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \beta_3 \text{Treated}_i \times \text{Post}_t\)

\(Y_{it} = \overbrace{\alpha_i}^{\text{dummies for each } i} + \underbrace{\alpha_t}_{\text{dummies for each } t} + \beta_3 \text{Treated}_i \times \text{Post}_t\)

#Different ways (these do not use correct standard errors)
lm(y ~ EverTreated*PostTreatment, data = my_data)
lm(y ~ as.factor(unit_id) + as.factor(time_id) + EverTreated:PostTreatment, data = my_data)
require(fixest)
feols(y ~ EverTreated:PostTreatment | unit_id + time_id, data = my_data)

Difference-in-differences:

How do we validate the parallel trends assumption?

We cannot directly test it, but we can try to see if it is violated.

We want to see if… there are divergences in trends between treated and untreated … in absence of treatment.

Look at trends prior to treatment

Difference-in-differences:

Tests of Parallel Trends: Event Study

If “treatment” happens between time \(t\) and \(t+1\), we should not see an effect between \(t-1\) and \(t\)!
If design is right, there should be no effects of the treatment prior to the treatment taking place. (No differential trends between treated and untreated before treatment)
If we “pass” placebo test, does not reject possibility that something other than “treatment” caused a change. Only can “reject” divergent trends before treatment \(\xrightarrow{infer}\) parallel counterfactual trends post treatment.
You need multiple pre-treatment data points

Difference between treated and untreated at \(t=-1\) etc. compared to at \(t=0\).

Hypothesis Tests

We shouldn’t just “eyeball” the difference:

We can test the null hypothesis that the difference in trends between treated and untreated \(= 0\). If the \(p > 0.05\) (or some other \(\alpha\)) we can “fail to reject” the null of no difference in trends.

Is this a severe test?
Should we use this test?

recall, \(p\) values advertise false positive rate of finding a difference.

Equivalence Tests

Here we are looking for negatives. We want to \(p\) value that tell us the false negative rate

In a difference-in-difference we want to show that there are no meaningful differences in trends between treated and untreated cases prior to treatmeant (evidence in favor of parallel trends assumption)
In a natural experiment, we want to show that there are no meaningful differences between treated and untreated cases on variables that could be confounders (evidence in favor of as-if random assignment process).

We don’t want a test that stacks the deck in favor of our hypothesis of no difference..

Equivalence Tests

Analogous to this situation:

We want a COVID test that we plan to use as evidence that we don’t have COVID and so can safely spend time with immunocompromised people.

But the COVID test we use has been designed to minimize false positives.

What could go wrong?

Worst case scenario, the “test” is just a piece of paper. 0% false positive rate but 100% false negative rate.

Equivalence Tests

To solve this problem and get useful \(p\) values, we can conduct an equivalence test. We transform the null hypothesis.

Let us assume that there is some level of imbalance that we consider negligible, lets call that \(\delta\).

Our new null hypothesis is:

\(H_{01}: \tau <= -\delta\) OR \(H_{02}: \tau >= \delta\)

Where \(\tau\) is the difference in, e.g. pre-treatment trends for treated/untreated.

That is, two one-sided tests (TOST).

Equivalence Tests

TOST:

If the probability of observing \(\hat{\beta}\) under both null hypotheses is less than \(\alpha\) we can reject the null:

\(H_1: -\delta < \tau < \delta\): the true difference is within some acceptable \(\delta\) distance to \(0\).

TOST visualization

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between()     masks data.table::between()
## ✖ tidyr::expand()      masks Matrix::expand()
## ✖ tidyr::extract()     masks magrittr::extract()
## ✖ dplyr::filter()      masks ggdag::filter(), stats::filter()
## ✖ dplyr::first()       masks data.table::first()
## ✖ dplyr::group_rows()  masks kableExtra::group_rows()
## ✖ lubridate::hour()    masks data.table::hour()
## ✖ lubridate::isoweek() masks data.table::isoweek()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ dplyr::last()        masks data.table::last()
## ✖ lubridate::mday()    masks data.table::mday()
## ✖ lubridate::minute()  masks data.table::minute()
## ✖ lubridate::month()   masks data.table::month()
## ✖ tidyr::pack()        masks Matrix::pack()
## ✖ lubridate::quarter() masks data.table::quarter()
## ✖ lubridate::second()  masks data.table::second()
## ✖ dplyr::select()      masks MASS::select()
## ✖ purrr::set_names()   masks magrittr::set_names()
## ✖ dplyr::src()         masks Hmisc::src()
## ✖ dplyr::summarize()   masks Hmisc::summarize()
## ✖ purrr::transpose()   masks data.table::transpose()
## ✖ tidyr::unpack()      masks Matrix::unpack()
## ✖ lubridate::wday()    masks data.table::wday()
## ✖ lubridate::week()    masks data.table::week()
## ✖ lubridate::yday()    masks data.table::yday()
## ✖ lubridate::year()    masks data.table::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Equivalence Tests

These tests can be conducted in R using:

TOSTER packages
equivalence_test in parameters package
fect package (for differences in differences and extensions)

These tests can be inverted to get confidence intervals (range of values for \(delta\) which cannot be rejected at \(\alpha\))

These tests require, in addition to everything else:

specifying what range of values counts as “practical equivalence” (\(-\delta,\delta\))
this requires justification, though there are standard recommendations for particular applications (natural experiments, difference in difference)

Difference-in-differences: Extensions

Staggered Treatment: Bacon-Goodman 2021

some provinces pass laws, but some pass it later than others

Multiple Treatments: https://arxiv.org/pdf/1803.08807.pdf

units can enter and exit treatment - or there are different treatments they get

Continuous Treatment: https://psantanna.com/files/Callaway_Goodman-Bacon_SantAnna_2021.pdf

Covariates: conditioning on time-varying confounders has risks

As we get away from simple DiD, assumptions and potential problems multiply, solutions get more complicated…

Difference-in-differences:

Assumptions:

parallel trends: treated and untreated have same changes over time except for shift in treatment.
equivalently: no time-varying confounding; no factors affecting \(Y\) that change with \(D\)

Caveats:

‘check’ parallel trends; compare against “placebos” with triple differences
careful: what happens when units are treated at different times? different amounts?
careful: STANDARD ERRORS!

Natural Experiments (Extra Slides)

Natural Experiments:

Address confounding in a different way:

exploit random or as-if random allocation of \(X\) to find causal effect
look for “naturally” occurring randomization

Natural Experiments:

Distinguishing “natural experiment” from experiments:

Treatment and control groups - or different levels of treatment
Random assignment (e.g., government program with lottery)
“as-if” random assigment (we claim process is random)
we don’t do the manipulation

Natural Experiments:

An observational study where causal inference comes from the design that draws on randomization.

contrast to difference-in-differences
depends on argument to validate “as-if” random assumption \(\to\) fewer assumptions in the statistical model

Regression Discontinuity

“Treatment” occurs for cases on one side of threshold, not to cases on other side
“Near” the threshold, cases are effectively randomly assigned.
- all units treated when above the threshold (sharp RD)
- treatment can increase at the threshold (“fuzzy” RD) (e.g. Mo 2018)
Close elections is a common use:

Regression Discontinuity

Two approaches:

Potential outcomes of \(Y(0)\) are approximately linear near the threshold; treatment shifts outcomes
- so we (flexibly) model slope of \(Y\) on either side of the cut-off
Potential outcomes of \(Y\) are random near the cut-off
- difference in means near the cutoff

Regression Discontinuity

Decisions:

model outcome around cutoff
- bias/variance trade-off
- how do you model it? linear? quadratic? cubic? quartic? local linear?
- how do you choose the “bandwidth” around the cutoff to use?
- typically recommend local linear, automatic choice of ‘optimal bandwidth’
difference in means at cutoff
- how do you choose the “bandwidth” around the cutoff to use?

Regression Discontinuity

Assumptions:

assignment to treatment at the threshold is random
AND
- either we have correctly modeled potential outcomes of \(Y\) on either side of cutoff
- or we have used “bandwidth” around cutoff within which treatment is as-if random assigned (difference in means)

Instrumental Variables

Follows from Wald estimator for non-compliance:

Instrumental Variables Least Squares uses variance “explained” in \(D\) by “random” \(Z\)
- \(D\) is “treatment”, \(Z\) is “random assignment to treatment”
- easy to understand when \(Z\) is binary
- when \(Z\) is continuous, it is more complicated (and depends on assumptions of linearity)

For more information on using them in practice: Lal, Lockhart, Xu, and Zu 2023

Instrumental Variables

Problems

is variation induced in \(D\) by \(Z\) linear?
does \(Z\) actually predict change in \(D\)?
- “weak instruments problem”: identical to when there are very few compliers in an experiment: inference becomes very hard

Instrumental Variables

Assumptions:

\(Z\) is randomly assigned
Exclusion Restriction
- no other causal path from \(Z\) to \(Y\), other than \(X\)
Monotonicity
- increasing \(Z\) uniformly increases or decreases values of \(X\)

Natural Experiments: Choices

Sometimes natural experiments only random conditional on some other variables (Nellis, et al.)
- What is the correct functional form for the conditioning variables?
Standard Errors:
- What is the unit at which treatment is assigned?
- Cluster/Randomization inference

Natural Experiments: Limitations

Effects are “local” or for “compliers”
Effects for sub-group with randomization
- like in non-compliance, we find subset of cases that are randomly induced into treatment/responsive
- cases may be different from population of interest
Causal exposures may be different
- cause applied at cutoff might be different than elsewhere
- cause applied at random might be different (e.g. rainfall as instrument for economic growth)

Natural Experiments: What to do

Evidence that randomization took place
- statistical tests of implication
- qualitative evidence: “causal process observation”
Show robustness to choices of analysis