Medical researchers have developed a new diagnostic test that can detect pre-cancerous growths.
To find out whether this test actually results in medical interventions that reduce mortality, they conduct a randomized experiment in which half the study group is, at random, invited to come take the screening.
Non-compliance in experiments occurs when units that are randomly assigned to treatment or control do not take their assigned treatment condition.
experiments OR natural experiments in which there is random (as-if random) process of assigning exposure, but not all units follow this random process.
How do we find the effect of treatment…
In 1963, researchers for the Health Insurance Plan of Greater New York (HIP) conducted the first randomized trial testing the effectiveness of mammography on reductions in breast cancer mortality.
60,000 women were randomly assigned to (T) be invited to be screened or (C) not invited.
Many who were invited chose not to be screened.
None who were not invited were screened.
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
What is the effect of mammography?
Compare all treated (screened) to all untreated (not screened in both T and C)
Compare all treated (screened) to all untreated (not screened in both T and C)
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
What is the effect of mammography?
Compare all treated (screened) to the assigned-to-control group.
Compare all treated (screened) to the assigned-to-control group.
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
Do mammograms increase the risk of death from breast cancer?
DISCUSS
Compare assigned-to-treatment group to the assigned-to-control group.
Compare assigned-to-treatment group to the assigned-to-control group.
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
This comparison says: effect of mammography is to reduce breast cancer mortality rate by -0.23 per 1000!
Which option is the best?
Why?
Why do we compare assigned-to-treat to assigned-to-control:
Type | Assigned to T | Assigned to C |
---|---|---|
compliers | Take | Not Take |
always takers | Take | Take |
never takers | Not Take | Not Take |
defiers | Not Take | Take |
Potential outcomes of treatment status by treatment assignment
\(Z\) is assignment to treatment; \(D\) is receipt of treatment
Type | \(D_i(Z_i = 1)\) | \(D_i(Z_i = 0)\) |
---|---|---|
compliers | 1 | 0 |
always takers | 1 | 1 |
never takers | 0 | 0 |
defiers | 0 | 1 |
Option 1 (treated vs untreated): Compares
Option 2 (treated vs assigned to control): Compares
If compliers and never-takers have different potential outcomes \(\to\) bias. (any intuitions?)
Option 3: Compares
Comparison is based on random assignment \(\to\) gives us unbiased estimate.
Even if there is a randomized experiment…
When we compare and treated and untreated in a way that ignores the process of random assignment we run the risk of bias.
(example)
One solution to non-compliance is to estimate the intent to treat (\(ITT\)) effect, as opposed to the \(ACE\).
Intent to treat (\(ITT\)) compares assigned-to-treatment against assigned-to-control. Because this is randomly assigned, we know this comparison is unbiased (see Lectures 1 and 2).
Where \(Z\) is the treatment assigned and \(D\) is the treatment received \[ ITT = \frac{1}{N} \sum\limits_{i=1}^{N} Y_{i}(Z_i=1) - Y_i(Z_i=0)\]
\[ ITT \neq ACE = \frac{1}{N} \sum\limits_{i=1}^{N} Y_i(D_i=1) - Y_i(D_i=0)\]
unless there is perfect compliance.
Why does it work?
Treatment assignment is random; but which units actually take treatment may not be.
\(ITT\) ensures that there is balance in the different types of units across the comparison (e.g. in expectation same proportion compliers, never-takers) as well as their potential outcomes.
\(ITT\) is just the \(ACE\), where the “cause” is assignment to treatment. \(\widehat{ITT}\) unbiased like \(\widehat{ACE}\) under same assumptions.
Policy makers might care about the overall effect of policy when costs are fixed but some people don’t receive it.
In policy-oriented contexts, \(ITT\) is usually desirable.
If…
In examining causal effects related to theories: it may not be enough.
If we want to find the effect of treatment, not just of assignment-to-treatment, there is a way to do better than the \(ITT\).
If we add to the \(ACE\) assumptions, we can find the \(ACE\) for compliers: the complier average causal effect (\(CACE\)).
\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]
where \(N_c\) is the number of compliers in the experiment.
Why look at effects for compliers?
\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]
Where \(N_c\) is the number of compliers.
Suggests: maybe we just find average \(Y\) for compliers in treatment; average \(Y\) for compliers in control.
How do we find compliers… in the treatment group?
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
How do we find compliers… in the treatment group?
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
How do we find compliers… in the control group?
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
In general, we cannot directly observe…
With three key assumptions, can find the effect for compliers without knowing which people are compliers!
To understand how, we first need potential outcomes model.
Potential Outcomes of Non-Compliance:
\(D_i(Z = 1)\) | \(D_i(Z = 0)\) | \(Y_i(Z1,D0)\) | \(Y_i(Z1,D1)\) | \(Y_i(Z0,D0)\) | \(Y_i(Z0,D1)\) | Type |
---|---|---|---|---|---|---|
1 | 1 | NA | 1 | NA | 1 | Always Taker |
1 | 0 | NA | 1 | 0 | NA | Complier |
0 | 0 | 0 | NA | 0 | NA | Never Taker |
0 | 1 | 1 | NA | NA | 0 | Defier |
More generally:
Why do we need this assumption?
\(D_i(Z = 1)\) | \(D_i(Z = 0)\) | \(Y_i(Z1,D0)\) | \(Y_i(Z1,D1)\) | \(Y_i(Z0,D0)\) | \(Y_i(Z0,D1)\) | Type |
---|---|---|---|---|---|---|
1 | 1 | NA | 1 | NA | 1 | Always Taker |
1 | 0 | NA | 1 | 0 | NA | Complier |
0 | 0 | 0 | NA | 0 | NA | Never Taker |
Random assignment to treatment \(Z\) only affects outcome (\(Y\)) THROUGH treatment (\(D\)).
If the process of assignment to treatment has its own, independent effect, we are in trouble.
Why do we need this?
Always/ Never Takers
\(D_i(Z = 1)\) | \(D_i(Z = 0)\) | \(Y_i(Z1,D0)\) | \(Y_i(Z1,D1)\) | \(Y_i(Z0,D0)\) | \(Y_i(Z0,D1)\) | Type |
---|---|---|---|---|---|---|
1 | 1 | NA | 1 | NA | 1 | Always Taker |
1 | 0 | NA | 1 | 0 | NA | Complier |
0 | 0 | 0 | NA | 0 | NA | Never Taker |
Compliers
\(D_i(Z = 1)\) | \(D_i(Z = 0)\) | \(Y_i(Z1,D0)\) | \(Y_i(Z1,D1)\) | \(Y_i(Z0,D0)\) | \(Y_i(Z0,D1)\) | Type |
---|---|---|---|---|---|---|
1 | 1 | NA | 1 | NA | 1 | Always Taker |
1 | 0 | 0 | 1 | 0 | 1 | Complier |
0 | 0 | 0 | NA | 0 | NA | Never Taker |
Assignment to values of \(Z\) is random
Assignment status of one unit does not affect treatment receipt of another. (SUTVA)
WHY
Imagine there is a variable \(Type_?\) indicating the type of unit (where \(?\) could be e.g. (c) complier, (n) never-taker, (a) always-taker) that is either \(0,1\).
First, the parameter:
\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]
\[CACE = Y_c(D_i = 1) - Y_c(D_i = 0)\]
Where \(Y_c(D_i = 1)\) is the mean \(Y_i(D_i = 1)\) for compliers
Where \(Y_c(D_i = 0)\) is the mean \(Y_i(D_i = 0)\) for compliers
Could we estimate the \(CACE\) like this?
\[\begin{aligned} \widehat{CACE} & = \frac{1}{m} \sum_{i = 1}^{m} Y_i(D_i = 1|Z_i=1) \\ & - \frac{1}{n} \sum_{i = 1}^{n}Y_i(D_i = 0|Z_i = 0) \end{aligned}\]
A path:
Mean \(\bar{x}\) can always be decomposed as a weighted average of sub-group means of \(x\), \(x_g\):
If there are \(G\) sub-groups, then each group is weighted by its proportion of the total \(N\): \(w_g = \frac{N_g}{N}\). All \(w\) must sum to \(1\).
\[\bar{x} = \sum\limits_{g=1}^{G} x_g \cdot w_g\]
Within the experiment:
proportion of (a)lways Takers is \(\pi_{a}\) proportion of (n)ever Takers is \(\pi_{n}\) proportion of (c)ompliers is \(\pi_{c}\) proportion of (d)efiers is \(\pi_{d}\)
\(\pi_{c} + \pi_{a} + \pi_{n} + \pi_{d} = 1\): These are the weights for each “group”
\(Y_?(1)\) indicates the mean of \(Y_?(Z = 1)\) for each group. \(Y_?(0)\) indicates the mean of \(Y_?(Z = 0)\) for each group.
By this logic:
\[ITT = E[Y_i(Z_i = 1)] - E[Y_i(Z_i = 0)]\]
Can be rewritten as
\[ITT = [Y_c(1)\pi_c + Y_a(1)\pi_a + Y_n(1)\pi_n + Y_d(1)\pi_d] - \\ [Y_c(0)\pi_c + Y_a(0)\pi_a + Y_n(0)\pi_n + Y_d(0)\pi_d]\]
Assuming no defiers, we get:
\[ITT = [Y_c(1)\pi_c + Y_a(1)\pi_a + Y_n(1)\pi_n] - \\ [Y_c(0)\pi_c + Y_a(0)\pi_a + Y_n(0)\pi_n]\]
Assuming the exclusion restriction, \(Y(1) = Y(0) = Y\) for always takers and never takers. Why?
\[ITT = [Y_c(1)\pi_c + Y_a\pi_a + Y_n\pi_n] - \\ [Y_c(0)\pi_c + Y_a\pi_a + Y_n\pi_n ]\]
Doing some subtraction, we find that only differences due to effects on compliers are left:
\[ITT = [Y_c(1)\pi_c - Y_c(0)\pi_c] + \\ [Y_a\pi_a - Y_a\pi_a] + [Y_n\pi_n - Y_n\pi_n] \]
\[ITT = [Y_c(1)\pi_c - Y_c(0)\pi_c]\]
\[ITT = [Y_c(1) - Y_c(0)]\pi_c = CACE \cdot \pi_c\]
Doing some rearranging, we just need to estimate two parameters to get the \(CACE\):
\[\frac{ITT}{\pi_c} = CACE \]
We need to estimate parameters \(ITT\) and \(\pi_c\)
We can estimate \(ITT\) without bias by using random assignment:
We know this from before.
We can estimate \(\pi_c\), also by random assignment. Instead of looking differences in \(Y\) we look at differences in \(D\) (by the “no defiers” assumption)
\[ITT_D = \frac{1}{N} \sum\limits_{i=1}^{N} D_i(Z_i = 1) - D_i(Z_i = 0)\]
\[ITT_D = (1\cdot\pi_c + 1\cdot\pi_a + 0\cdot\pi_n) - \\ (0\cdot\pi_c + 1\cdot\pi_a + 0\cdot\pi_n)\]
\[ITT_D = (1\cdot\pi_c - 0\cdot\pi_c) = \pi_c\]
And because \(ITT_D\) is also an \(ITT\), due to random assignment…
\[E(\widehat{ITT_D}) = ITT_D = \pi_c\]
We can estimate \(\pi_c\) without bias.
\[\widehat{CACE} = \frac{\widehat{ITT}}{\widehat{ITT_D}}\]
By dividing the estimate \(\widehat{ITT}\) by the estimate of \(\pi_c\), \(\widehat{ITT_D}\), we can estimate the effect of treatment on compliers
This estimator of the \(CACE\) is called the Wald estimator
\[\widehat{CACE} = \frac{Y^T - Y^C}{D^T - D^C}\] Recall: \(Y^T\) is mean of observed \(Y_i(Z_i = 1)\)
And it is a kind of instrumental variables analysis. (Instrumental variables generalizes this from a binary treatment)
It turns out, that our estimator \(\widehat{CACE}\) is biased:
\(E \left( \frac{a}{b} \right) \neq \frac{E(a)}{E(b)}\)
\(CACE = E \left( \frac{ITT}{ITT_D} \right) \neq \frac{E(\widehat{ITT})}{E(\widehat{ITT_D})} = \widehat{CACE}\)
but consistent:
as \(n \to \infty\), the bias goes to \(0\). Thus, this approach is biased in small samples (typically, need hundreds of observations or)
Let’s estimate \(CACE\)
Assignment | Screened | \(N\) | BC Deaths | BCD per 1k | BC Deaths (total) | BCD per 1k (total) |
---|---|---|---|---|---|---|
Assigned to Control Group | ||||||
Control | No | 30000 | 52 | 1.73 | 52 | 1.73 |
Assigned to Treatment Group | ||||||
Treatment | No | 20000 | 25 | 1.25 | 45 | 1.50 |
Treatment | Yes | 10000 | 20 | 2.00 |
How do you interpret this result?
What does this \(CACE\) show us? It is the effect of treatment for the units change treatment status due to random assignment
external validity?: are the cases that comply with treatment assignment different from population of interest? Often, yes. CACE is limited only to compliers and may be uninformative about others.
“weak instruments”?: does random assignment actually induce a change in “taking treatment”? If intervention doesn’t affect behavior (\(ITT_D\) is small), these approaches prone to bias, incorrect estimates of standard errors.
We have discussed canonical \(CACE\):
if treatment assignment/treatment takes multiple values, assumptions/interpretation may change.
If someone has “instrumental variable” with an experiment or natural experiment, can you map it onto this framework? If not, be suspicious.
Assumptions here are “bigger”
In groups of 2-3, refer to Pereira et al (2024) “Innoculation Reduces Misinformation”.