December 2, 2024

Correlation to Causation

Solutions to Confounding

  1. Recap
  2. Before/After Comparisons

Recap

Solutions to Confounding

Every way of using correlation as evidence for causality makes assumptions

  • FPCI cannot be solved without assumptions
  • With assumptions, can say confounding/bias is not a problem

Solution How Bias
Solved
Which Bias
Removed
Assumes Internal
Validity
External
Validity
Experiment Randomization
Breaks \(W \rightarrow X\) link
All confounding variables 1. \(X\) is random
2. Change only \(X\)
High Low
Conditioning Hold confounders
constant
Only confounders
conditioned on
1. Condition on all confounders
2. Low measurement error
3. Cases similar in \(W\)
Low High

Conditioning

Did Trump rallies increase hate crimes?

Feinberg, Branton, and Martinez-Ebers compare hate crimes in counties with and without Trump rallies, condition on (hold constant):

  • percent Jewish, number of hate groups, crime rate, 2012 Republican vote share, percent university educated, region
  • But they left out population, which confounded Trump rallies and Hate Crimes.
  • Difficult to find counties without rallies similar in many traits to counties with rallies

After conditioning on population (a confounder): no correlation.

Limits of Conditioning:

Did Trump rallies increase hate crimes?

  1. If we know confounding variables, can we find cases with and without rallies that are the same on many confounding variables?

  2. If we don’t know or can’t measure confounding variables, may still be differences between places with and without rallies that produce confounding.

  • What if we compare counties before and after rallies?

Before and After

Example: Before and After

Taking the same data from Feinberg, Branton, and Martinez-Ebers

  • we focus only on counties that ever had a Trump (Clinton) rally
  • compare the month after the rally to the month before the rally

Before and After

DISCUSS

If we compare counties to themselves before and after rallies…

  1. Which confounding variables are held constant?

  2. What are confounding variables that might NOT be addressed in this comparison?

What kinds of confounding variables are held constant in this before/after comparison?

Break

Design Based Solutions

Design Based Solutions

Conditioning removes confounding by:

  • identifying possible confounding variables
  • measuring confounding variables
  • examine relationship b/t \(X\) and \(Y\) for cases with similar value of confounding variables \(W\).

Design Based Solutions

Design-based solutions remove confounding by:

  • selecting cases for comparison in order to eliminate many known/unknown as well as measurable/unmeasurable confounding variables.
  • the nature of the comparison holds constant classes of confounding variables, not specific confounding variables.
  • by a “class” we mean all confounding variables that share specific properties (e.g., unchanging over time)

Example: Before and After

Which of these possible confounders are held constant in a before-and-after comparison (month after vs month before rally)?

Example: Before and After

Example: Before and After

Example: Before and After

Confounding Solved?

All confounding variables (affect whether a rally occurs; affect hate crimes) that are unchanging over time are held constant.

  • because held constant, cannot produce confounding
  • e.g., demographic features, political leaning, location/geography, long-term economic trends, 8chan white nationalists
  • any variable that does not change in the time period of the comparison (in this case, two months) held constant
  • does not matter if we can think of or even measure the confounders

Design Based Solutions

Before and after comparisons are design based, because…

  • they hold constant ALL unchanging confounding variables (a class of confounding variables).
  • contrast to conditioning, only blocks specific variables/paths

And like all solutions to confounding: they make an assumption

Design Based Solutions

Just like experiments and confounding, Before and After comparisons plug in for MISSING potential outcomes.

County Time \(Y:\) HC(Yes) \(Y:\) HC(No) \(X:\) Rally
\(c\) Before \(\color{red}{\text{Hate Crimes}_{c,Before}[\text{Rally}]}\) \(\color{black}{\text{Hate Crimes}_{c,Before}[\text{No Rally}]}\) No
\(\Downarrow\)
\(c\) After \(\color{black}{\text{Hate Crimes}_{c,After}[\text{Rally}]}\) \(\color{red}{\text{Hate Crimes}_{c,After}[\text{No Rally}]}\) Yes

Design Based Solutions

County Time \(Y:\) HC(Yes) \(Y:\) HC(No) \(X:\) Rally
\(c\) Before \(\color{red}{\text{Hate Crimes}_{c,Before}[\text{Rally}]}\) \(\color{black}{\text{Hate Crimes}_{c,Before}[\text{No Rally}]}\) No
\(\Downarrow\)
\(c\) After \(\color{black}{\text{Hate Crimes}_{c,After}[\text{Rally}]}\) \(\boxed{\color{black}{\text{Hate Crimes}_{c,Before}[\text{No Rally}]}}\) Yes

We assume \(\color{red}{\text{Hate Crimes}_{c,After}[\text{No Rally}]} = \\ \color{black}{\text{Hate Crimes}_{c,Before}[\text{No Rally}]}\)

That is: if \(X\) had not changed, \(Y\) would not have changed.

Before and After

Assumptions:

  • assume that counterfactual potential outcomes of \(Y\) without \(X\) after \(X\) happens, same as factual \(Y\) without \(X\) before \(X\) happens
  • equivalently: assume there are no variables \(W\) that affect \(Y\) and change over time with \(X\).

Any \(W\) that affects \(Y\) and changes with \(X\) will produce confounding even if it does not cause \(X\).

  • this is a new problem

Example: Before and After

When does this assumption fail?

  • Did Trump rallies take place in places that are already trending toward having more hate crimes?
  • Is it possible that Trump wanted to avoid controversy and waited to hold rallies in places until they had a month with a lower-than-usual number of hate crimes? (board)
  • We can address these concerns by looking at longer-term trends…

Mostly constant upward trend; no change when rallies occur

Example: Before and After

When does this assumption fail?

Over-time comparison, we can create confounding from variables that do not cause \(X\) to change, if they also change with \(X\) over time…

  • Does rally change measurement, but not actual number of hate crimes? (Measurement bias)

    • we can examine this concern by measuring hate crimes in other ways.
  • Are there are other changes over the same time-frame (change at the same time as \(X\), rallies)?

    • This is harder to solve. In the same situation as conditioning.
    • Less of a problem when comparing across very short time periods (fewer variables change)

Example: Before and After

It may be that the effects are due to changes in measurement: Anti-Defamation League vs. FBI Hate Crimes give different results.

Another Example:

Why have real wages stayed stagnant?

Another Example:

Starting in the 1980s, automation via robotics/software started to grow.

Another Example:

From “before” to “after” growth of automation, we see slowing or even reversal of growth in real wages:

  • automation \(\to\) inequality, worse job / health outcomes

Can we reasonably conclude the machines are to blame?

Design: Before and After

What is it?

Compare the same case to itself before and after change in \(X\)

How does it work?

Holds constant all unchanging attributes of the case.

  • any confounding variables that do not change over time cannot produce change in \(Y\) with change in \(X\)

Before and After: Assumptions

In order to infer \(X\) causes \(Y\) if \(X,Y\) correlated in before/after comparison

Must Assume

  1. There are no other variables that affect \(Y\) and change with \(X\) over time (may be a causal or non-causal link of \(W\) and \(X\))

Before and After: Limitations

This assumption can be violated if…

  • other variables that affect \(Y\) change with \(X\) over time.
  • Value of \(Y\) in cases has a long-term trend in one direction
  • \(X\) changes in response to extreme changes in \(Y\) (e.g. gun laws respond to uptick in gun crimes)
  • \(X\) changes measurement of \(Y\)

Solution How Bias
Solved
Which Bias
Removed
Assumes Internal
Validity
External
Validity
Experiment Randomization
Breaks \(W \rightarrow X\) link
All confounding variables 1. \(X\) is random
2. Change only \(X\)
Highest Lowest
Conditioning Hold confounders
constant
Only variables
conditioned on
1. Condition on all confounders
2. Low measurement error
3. Cases same in \(W\)
Lowest Highest
Before and After Hold confounders
constant
variables
unchanging
over time
1. causes of \(Y\)
do not change w/ \(X\)
Lower Higher

Alternatives?