POLI 572B

Michael Weaver

January 24, 2025

Potential Outcomes Model and Non-Compliance

Plan for Today

Non-compliance in experiments

  • Example and the Problem
  • Solutions
  • CACE
  • Practice

Example

Example:

Medical researchers have developed a new diagnostic test that can detect pre-cancerous growths.

To find out whether this test actually results in medical interventions that reduce mortality, they conduct a randomized experiment in which half the study group is, at random, invited to come take the screening.

Example:

  • Imagine you are invited: would you come take the screening?
  • Imagine you were not invited (but found out about the test): would you try to take the screening?

Non-compliance

Non-compliance in experiments occurs when units that are randomly assigned to treatment or control do not take their assigned treatment condition.

  • e.g. in a medical trial for a new drug, patients assigned to a control group seek out the treatment anyway
  • in an experiment on contacting voters, people do not answer the phone or answer the door.
  • wartime drafts assigned people to treatment by random numbers, but some people dodged the draft or were found exempt; some joined anyway

“Non-compliance”

more broadly:

experiments OR natural experiments in which there is random (as-if random) process of assigning exposure, but not all units follow this random process.

Non-compliance

How do we find the effect of treatment

  • when some assigned to treatment don’t take it
  • when some assigned to no-treatment do take it

Our example:

In 1963, researchers for the Health Insurance Plan of Greater New York (HIP) conducted the first randomized trial testing the effectiveness of mammography on reductions in breast cancer mortality.

60,000 women were randomly assigned to (T) be invited to be screened or (C) not invited.

Mammography

Many who were invited chose not to be screened.

None who were not invited were screened.

  • How can they find the effect of mammography on breast cancer death rates when large numbers of treatment group never were treated?
Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

Mammography

What is the effect of mammography?

Option 1:

Compare all treated (screened) to all untreated (not screened in both T and C)

Compare all treated (screened) to all untreated (not screened in both T and C)

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00
  • This comparison says: effect of mammography is to increase breast cancer mortality rate by 0.46 per 1000!

Mammography

What is the effect of mammography?

Option 2:

Compare all treated (screened) to the assigned-to-control group.

Compare all treated (screened) to the assigned-to-control group.

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00
  • This comparison says: effect of mammography is to increase breast cancer mortality rate by 0.27 per 1000!

What has gone wrong here?

Do mammograms increase the risk of death from breast cancer?

DISCUSS

Mammography

Option 3:

Compare assigned-to-treatment group to the assigned-to-control group.

Compare assigned-to-treatment group to the assigned-to-control group.

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

This comparison says: effect of mammography is to reduce breast cancer mortality rate by -0.23 per 1000!

Mammography

Which option is the best?

Why?

Why do we compare assigned-to-treat to assigned-to-control:

Some intuition

There are four possible types of people/units

Type Assigned to T Assigned to C
compliers Take Not Take
always takers Take Take
never takers Not Take Not Take
defiers Not Take Take

Potential Outcomes

Potential outcomes of treatment status by treatment assignment

\(Z\) is assignment to treatment; \(D\) is receipt of treatment

Type \(D_i(Z_i = 1)\) \(D_i(Z_i = 0)\)
compliers 1 0
always takers 1 1
never takers 0 0
defiers 0 1

Mammography:

Option 1 (treated vs untreated): Compares

  • compliers assigned to treatment
  • never-takers assigned to treatment; compliers, never-takers assigned to control

Option 2 (treated vs assigned to control): Compares

  • compliers assigned to treatment
  • compliers, never-takers assigned to control

If compliers and never-takers have different potential outcomes \(\to\) bias. (any intuitions?)

Mammography:

Option 3: Compares

  • compliers, never-takers assigned to treatment
  • compliers, never-takers assigned to control

Comparison is based on random assignment \(\to\) gives us unbiased estimate.

Even if there is a randomized experiment…

When we compare and treated and untreated in a way that ignores the process of random assignment we run the risk of bias.

(example)

What to do about Non-compliance?

Solution 1: Intent to Treat

One solution to non-compliance is to estimate the intent to treat (\(ITT\)) effect, as opposed to the \(ACE\).

Intent to treat (\(ITT\)) compares assigned-to-treatment against assigned-to-control. Because this is randomly assigned, we know this comparison is unbiased (see Lectures 1 and 2).

Solution 1: Intent to Treat

Where \(Z\) is the treatment assigned and \(D\) is the treatment received \[ ITT = \frac{1}{N} \sum\limits_{i=1}^{N} Y_{i}(Z_i=1) - Y_i(Z_i=0)\]

\[ ITT \neq ACE = \frac{1}{N} \sum\limits_{i=1}^{N} Y_i(D_i=1) - Y_i(D_i=0)\]

unless there is perfect compliance.

Solution 1: Intent to Treat

Why does it work?

  • Treatment assignment is random; but which units actually take treatment may not be.

  • \(ITT\) ensures that there is balance in the different types of units across the comparison (e.g. in expectation same proportion compliers, never-takers) as well as their potential outcomes.

  • \(ITT\) is just the \(ACE\), where the “cause” is assignment to treatment. \(\widehat{ITT}\) unbiased like \(\widehat{ACE}\) under same assumptions.

Solution 1: Intent to Treat

Why might it be desirable?

Policy makers might care about the overall effect of policy when costs are fixed but some people don’t receive it.

  • e.g. felon re-integration; reminder to apply for healthcare in US

In policy-oriented contexts, \(ITT\) is usually desirable.

Solution 1: Intent to Treat

Might not be interesting

If…

  • We care about the effect of actual exposure
  • \(ITT\) usually diluted because some never get treated or because some people always get treated
  • e.g., military service

In examining causal effects related to theories: it may not be enough.

Can we do better?

CACE

Effects of treatment

If we want to find the effect of treatment, not just of assignment-to-treatment, there is a way to do better than the \(ITT\).

If we add to the \(ACE\) assumptions, we can find the \(ACE\) for compliers: the complier average causal effect (\(CACE\)).

\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]

where \(N_c\) is the number of compliers in the experiment.

\(CACE\)

Why look at effects for compliers?

\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]

Where \(N_c\) is the number of compliers.

Suggests: maybe we just find average \(Y\) for compliers in treatment; average \(Y\) for compliers in control.

\(CACE\)

How do we find compliers… in the treatment group?

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

\(CACE\)

How do we find compliers… in the treatment group?

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

\(CACE\)

How do we find compliers… in the control group?

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

\(CACE\)

In general, we cannot directly observe…

  • \(Y_i(D_i = 1)\) for compliers in treatment (unless no always-takers)
  • \(Y_i(D_i = 0)\) for compliers in control (unless no never-takers)


  • But… We don’t need to directly observe compliers!

\(CACE\)

With three key assumptions, can find the effect for compliers without knowing which people are compliers!

To understand how, we first need potential outcomes model.

  • need an indicator for random assignment: \(Z\)
  • indicator for receipt of treatment: \(D\)
  • what people would do, given assignment and treatment \(Y\)

Potential Outcomes of Non-Compliance:

\(D_i(Z = 1)\) \(D_i(Z = 0)\) \(Y_i(Z1,D0)\) \(Y_i(Z1,D1)\) \(Y_i(Z0,D0)\) \(Y_i(Z0,D1)\) Type
1 1 NA 1 NA 1 Always Taker
1 0 NA 1 0 NA Complier
0 0 0 NA 0 NA Never Taker
0 1 1 NA NA 0 Defier

Assumption 1: No Defiers / Monotonicity

  • Assume that “defiers” do not exist.
    • Seems plausible, but sometimes defiers may exist.


  • Why do we need this assumption!
    • If no defiers, we can identify never treats and always treats! (why?)

More generally:

  • assignment to treatment \(Z\) can have only positive (only negative) effects on receipt of treatment \(D\). (monotonicity)

Assumption 1: No Defiers / Monotonicity

Why do we need this assumption?

  • In absence of defiers, the only units changing their treatment status between assigned to control and assigned to treat are compliers.
  • With defiers, would not know the proportions of always taker/never taker/complier.
\(D_i(Z = 1)\) \(D_i(Z = 0)\) \(Y_i(Z1,D0)\) \(Y_i(Z1,D1)\) \(Y_i(Z0,D0)\) \(Y_i(Z0,D1)\) Type
1 1 NA 1 NA 1 Always Taker
1 0 NA 1 0 NA Complier
0 0 0 NA 0 NA Never Taker

Assumption 2: Exclusion restriction

  • Random assignment to treatment \(Z\) only affects outcome (\(Y\)) THROUGH treatment (\(D\)).

  • If the process of assignment to treatment has its own, independent effect, we are in trouble.

Assumption 2: Exclusion restriction

Why do we need this?

  • Allows us to assume that \(Y_i(Z_i = 1) = Y_i(Z_i = 0)\) for always takers and never takers
    • b/c only \(D\) affects \(Y\), does not matter that \(Z\) is different
  • Allows us to assume that unit causal effects for compliers are the same in assign-to-treat and assign-to-control:
    • \(Y_c(Z_i=0,D_i=0) = Y_c(Z_i=1, D_i=0)\) and \(Y_c(Z_i=0,D_i=1) = Y_c(Z_i=1, D_i=1)\)
    • To find effect of \(D\) for compliers whose values of \(D\) and \(Z\) are different, \(Z\) cannot change \(Y\).
    • Otherwise, we get effects of \(D\) and \(Z\) for compliers

Always/ Never Takers

\(D_i(Z = 1)\) \(D_i(Z = 0)\) \(Y_i(Z1,D0)\) \(Y_i(Z1,D1)\) \(Y_i(Z0,D0)\) \(Y_i(Z0,D1)\) Type
1 1 NA 1 NA 1 Always Taker
1 0 NA 1 0 NA Complier
0 0 0 NA 0 NA Never Taker

Compliers

\(D_i(Z = 1)\) \(D_i(Z = 0)\) \(Y_i(Z1,D0)\) \(Y_i(Z1,D1)\) \(Y_i(Z0,D0)\) \(Y_i(Z0,D1)\) Type
1 1 NA 1 NA 1 Always Taker
1 0 0 1 0 1 Complier
0 0 0 NA 0 NA Never Taker

Assumption 3: Random Assignment

Assignment to values of \(Z\) is random

Assignment status of one unit does not affect treatment receipt of another. (SUTVA)

Assumption 3: Random Assignment

WHY

  • Potential outcomes of \(Y\) are in expectation the same in assign-to-treat, assign-to-control \(\to\) \(\widehat{ITT}\) is unbiased estimator of \(ITT\).

Imagine there is a variable \(Type_?\) indicating the type of unit (where \(?\) could be e.g. (c) complier, (n) never-taker, (a) always-taker) that is either \(0,1\).

  • Randomization \(\to\) expected value of \(Type_?\) is same in assign-to-treat/assign-to-control
    • Proportion of compliers, never-takers, etc. same in both groups

Estimating the CACE:

First, the parameter:

\[CACE = \frac{1}{N_c} \sum_{i = 1}^{N_c} Y_i(D_i = 1) - Y_i(D_i = 0)\]

\[CACE = Y_c(D_i = 1) - Y_c(D_i = 0)\]

Where \(Y_c(D_i = 1)\) is the mean \(Y_i(D_i = 1)\) for compliers

Where \(Y_c(D_i = 0)\) is the mean \(Y_i(D_i = 0)\) for compliers

Estimating the CACE:

Could we estimate the \(CACE\) like this?

\[\begin{aligned} \widehat{CACE} & = \frac{1}{m} \sum_{i = 1}^{m} Y_i(D_i = 1|Z_i=1) \\ & - \frac{1}{n} \sum_{i = 1}^{n}Y_i(D_i = 0|Z_i = 0) \end{aligned}\]

  • If there are always takers and/or never takers?

Estimating the CACE:

A path:

Mean \(\bar{x}\) can always be decomposed as a weighted average of sub-group means of \(x\), \(x_g\):

If there are \(G\) sub-groups, then each group is weighted by its proportion of the total \(N\): \(w_g = \frac{N_g}{N}\). All \(w\) must sum to \(1\).

\[\bar{x} = \sum\limits_{g=1}^{G} x_g \cdot w_g\]

Estimating the CACE:

Within the experiment:

proportion of (a)lways Takers is \(\pi_{a}\) proportion of (n)ever Takers is \(\pi_{n}\) proportion of (c)ompliers is \(\pi_{c}\) proportion of (d)efiers is \(\pi_{d}\)

\(\pi_{c} + \pi_{a} + \pi_{n} + \pi_{d} = 1\): These are the weights for each “group”

\(Y_?(1)\) indicates the mean of \(Y_?(Z = 1)\) for each group. \(Y_?(0)\) indicates the mean of \(Y_?(Z = 0)\) for each group.

Estimating the CACE:

By this logic:

\[ITT = E[Y_i(Z_i = 1)] - E[Y_i(Z_i = 0)]\]

Can be rewritten as

\[ITT = [Y_c(1)\pi_c + Y_a(1)\pi_a + Y_n(1)\pi_n + Y_d(1)\pi_d] - \\ [Y_c(0)\pi_c + Y_a(0)\pi_a + Y_n(0)\pi_n + Y_d(0)\pi_d]\]

Estimating the CACE:

Assuming no defiers, we get:

\[ITT = [Y_c(1)\pi_c + Y_a(1)\pi_a + Y_n(1)\pi_n] - \\ [Y_c(0)\pi_c + Y_a(0)\pi_a + Y_n(0)\pi_n]\]

Estimating the CACE:

Assuming the exclusion restriction, \(Y(1) = Y(0) = Y\) for always takers and never takers. Why?

\[ITT = [Y_c(1)\pi_c + Y_a\pi_a + Y_n\pi_n] - \\ [Y_c(0)\pi_c + Y_a\pi_a + Y_n\pi_n ]\]

Estimating the CACE:

Doing some subtraction, we find that only differences due to effects on compliers are left:

\[ITT = [Y_c(1)\pi_c - Y_c(0)\pi_c] + \\ [Y_a\pi_a - Y_a\pi_a] + [Y_n\pi_n - Y_n\pi_n] \]

\[ITT = [Y_c(1)\pi_c - Y_c(0)\pi_c]\]

\[ITT = [Y_c(1) - Y_c(0)]\pi_c = CACE \cdot \pi_c\]

Estimating the CACE:

Doing some rearranging, we just need to estimate two parameters to get the \(CACE\):

\[\frac{ITT}{\pi_c} = CACE \]

We need to estimate parameters \(ITT\) and \(\pi_c\)

Estimating the CACE:

We can estimate \(ITT\) without bias by using random assignment:

  • \(E(\widehat{ITT}) = ITT\)

We know this from before.

Estimating the CACE:

We can estimate \(\pi_c\), also by random assignment. Instead of looking differences in \(Y\) we look at differences in \(D\) (by the “no defiers” assumption)

\[ITT_D = \frac{1}{N} \sum\limits_{i=1}^{N} D_i(Z_i = 1) - D_i(Z_i = 0)\]

\[ITT_D = (1\cdot\pi_c + 1\cdot\pi_a + 0\cdot\pi_n) - \\ (0\cdot\pi_c + 1\cdot\pi_a + 0\cdot\pi_n)\]

\[ITT_D = (1\cdot\pi_c - 0\cdot\pi_c) = \pi_c\]

Estimating the CACE:

And because \(ITT_D\) is also an \(ITT\), due to random assignment…

\[E(\widehat{ITT_D}) = ITT_D = \pi_c\]

We can estimate \(\pi_c\) without bias.

Estimating the CACE:

\[\widehat{CACE} = \frac{\widehat{ITT}}{\widehat{ITT_D}}\]

By dividing the estimate \(\widehat{ITT}\) by the estimate of \(\pi_c\), \(\widehat{ITT_D}\), we can estimate the effect of treatment on compliers

Estimating the CACE

This estimator of the \(CACE\) is called the Wald estimator

\[\widehat{CACE} = \frac{Y^T - Y^C}{D^T - D^C}\] Recall: \(Y^T\) is mean of observed \(Y_i(Z_i = 1)\)

And it is a kind of instrumental variables analysis. (Instrumental variables generalizes this from a binary treatment)

Estimating the CACE

It turns out, that our estimator \(\widehat{CACE}\) is biased:

\(E \left( \frac{a}{b} \right) \neq \frac{E(a)}{E(b)}\)

\(CACE = E \left( \frac{ITT}{ITT_D} \right) \neq \frac{E(\widehat{ITT})}{E(\widehat{ITT_D})} = \widehat{CACE}\)

but consistent:

as \(n \to \infty\), the bias goes to \(0\). Thus, this approach is biased in small samples (typically, need hundreds of observations or)

Let’s estimate \(CACE\)

Assignment Screened \(N\) BC Deaths BCD per 1k BC Deaths (total) BCD per 1k (total)
Assigned to Control Group
Control No 30000 52 1.73 52 1.73
Assigned to Treatment Group
Treatment No 20000 25 1.25 45 1.50
Treatment Yes 10000 20 2.00

How do you interpret this result?

Interpreting the CACE

What does this \(CACE\) show us? It is the effect of treatment for the units change treatment status due to random assignment

  • external validity?: are the cases that comply with treatment assignment different from population of interest? Often, yes. CACE is limited only to compliers and may be uninformative about others.

  • “weak instruments”?: does random assignment actually induce a change in “taking treatment”? If intervention doesn’t affect behavior (\(ITT_D\) is small), these approaches prone to bias, incorrect estimates of standard errors.

Interpreting the CACE

We have discussed canonical \(CACE\):

  • if treatment assignment/treatment takes multiple values, assumptions/interpretation may change.

  • If someone has “instrumental variable” with an experiment or natural experiment, can you map it onto this framework? If not, be suspicious.

  • Assumptions here are “bigger”

    • “no defiers” not always plausible (partial identification available)
    • exclusion restriction is hard to defend in many cases. (partial identification available)

Exercise

In groups of 2-3, refer to Pereira et al (2024) “Innoculation Reduces Misinformation”.

  1. What is the “assignment to treatment” (\(Z\)) and how is it assigned?
  2. What is/are the “treatments” (\(D\)) that can be received?
  3. In terms of this study: Who is a Complier? Never taker, Always taker, Defier?
  4. For each of the three assumptions needed to estimate \(CACE\): What does it mean? Is it plausible?
  5. Interpret the results of the estimates of \(CACE\) (What do they mean, substantively? Are they biased?)