November 25, 2024

Correlation to Causation

Solutions to Confounding

  1. Recap
    • Causation requires assumptions
    • Solutions involve trade-offs (internal vs external validity)
    • Experiments: assumptions?
  2. Interlude: Cholera
  3. Conditioning:
    • what is it?
    • what are the assumptions?

Recap

Solutions to Confounding

We want to use correlation to provide evidence of causation:

  • but we know that confounding (a form of bias) can lead us astray.
  • if we can reasonably make some assumptions, correlation can be strong evidence of causality without bias.
  • What are the assumptions needed to infer correlation \(\to\) causation?
  • How can we check/interrogate those assumptions?

Always a Trade-off

But the choice of “solution” to confounding — or our research design — always involves a trade off:

Increasing confidence that our correlation yield an unbiased estimate of the causal effect of \(X\) on \(Y\) (internal validity)…

…comes at the price of limiting the kinds of cases we can examine and the kinds of causal variables we can examine (external validity)

  • easier to carefully manipulate less important causal factors for small groups of people/cases
  • difficult, costly, and unethical to experimentally manipulate important causal factors for society more broadly

Internal Validity

Internal Validity

A research design (choice of which cases to compare using correlation) has internal validity when the causal effect of X on Y it finds is not biased (systematically incorrect) / does not suffer from confounding.

  • studies with strong internal validity imply that we have very good reason to believe that the correlation of X and Y we observe actually implies the causal effect of X on Y.
  • we believe it is unbiased because we can believe the assumptions (e.g. randomization)

External Validity

External Validity

is the degree to which the causal relationship we find in a study captures/is relevant to the causal relationship in our causal question/claim

  • Study has external validity if the relationship found is true for the cases we are interested in

    • if study has sampling bias (undergrads not the same as population of all humans), may lack external validity
  • Study has external validity if the causal variable in the study maps onto the concept/definition of the cause in the causal claim.

    • E.g. Fox News media effects vs lab experiments

Speech and Hate Crimes

Did Trump rallies cause an increase in hate crime?

“A USA TODAY analysis of the 64 rallies Trump … held [between] 2017 [and 2019] found that, when discussing immigration, the president has said ‘invasion’ at least 19 times. He has used the word ‘animal’ 34 times and the word ‘killer’ nearly three dozen times.”

Speech and Hate Crimes

But as we discussed, this correlation might suffer from confounding:

DISCUSS WITH NEIGHBORS: could there be other factors about communities that…

  • caused rallies to occur there
  • caused hate crimes to be higher

BOARD

Speech and Hate Crimes

One way to solve confounding is to do an experiment:

Kalmoe (2014) examines the effect of “aggressive” and “violent” language on support for political violence.

  • 512 survey respondents in random sample of US adults were randomly assigned to see two versions of campaign ad
  • one used more aggressive words like “fight”, the other less aggressive words.
  • people report their support for violence

Speech and Hate Crimes

Kalmoe (2014) finds that “aggressive” and “violent” language increased support for political violence.

  • survey respondents were randomly assigned to see two versions of campaign ad
  • one used more aggressive words like “fight”, the other less aggressive words.
  • people report their support for violence
  • do you believe the ‘violent’ ads caused people to support violence?
  • what can this tell us about possible effects of Trump’s speeches on hate crimes?

(GROUPS)

Solution How Bias
Solved
Which Bias
Removed
Assumes Internal
Validity
External
Validity
Experiment Randomization
Breaks \(W \rightarrow X\) link
All confounding variables \(X\) is random;
Change only \(X\)
High Low

Interlude

Before we return to speech and hate crimes

Imagine…

You live in mid-19th century London.

  • Every few years, hundreds to thousands of people are killed in cholera outbreaks
  • To stop these deaths, you need to answer:

What causes the spread of cholera?

Cholera

Dominant view was that “miasmas” or “bad air” caused diseases like cholera

Broad Street Pump Outbreak (1854)

John Snow, MD suggested cholera transmitted as “germ” in water.

To provide evidence of his claim, Snow uses correlation: mapped cholera deaths of 1854 outbreak in SoHo.

  • Broad Street Pump (source of drinking water) had “fouled” water (X)
  • Examined mortality from cholera (Y)
  • Proximity to the Broad Street Pump (C) correlated with mortality (Y)
  • Proximity to other pumps not related to mortality

Broad Street Pump Outbreak (1854)

  • Positive correlation

Broad Street Pump Outbreak (1854)

Leading doctors rejected Snow’s evidence:

  • Houses and sewers near Broad Street Pump built on 1665 plague burial site.
  • Sewers produce foul odors from rotting material/human waste

Both might produce miasmas.

  • maybe Plague cemetery/Sewer \(\to\) Miasmas \(\to\) Foul Water
  • and Miasmas \(\to\) Cholera

So… Confounding.

No, this John Snow

Confounding

Broad Street Pump Outbreak (1854)

Snow’s solution to confounding: compare people “near pump” w/ different water sources

Brewers Broad St. Residents
Water Source (X) Brewery Well/
Beer (Clean)
Pump (Contam.)
Location Near pump Near pump
Timing Aug. 1854 Aug. 1854
Miasmas (W) Yes Yes
Cholera (Y) No Yes

Broad Street Pump Outbreak (1854)

Snow’s solution to confounding: compare people “far from pump” w/ different water sources

Lady and Niece West End Residents
Water Source (X) Broad Street Pump
(Contam.)
Another Pump
(Clean)
Location Mile from Broad St. Mile from Broad St.
Timing Aug. 1854 Aug. 1854
Miasmas (W) No No
Cholera (Y) Yes No

Broad Street Pump Outbreak (1854)

Discuss:

do you find these comparisons convincing (as a way to prevent confounding)?

Why or why not?

Holding geography constant

Conditioning

This solution to confounding is called…

conditioning

when we observe \(X\) and \(Y\) for multiple cases, we examine the correlation of \(X\) and \(Y\) within groups of cases that are the same on confounding variables \(W, etc. \ldots\)

How does conditioning solve the problem?

  • Cases compared have same values on confounding variable \(W\)
  • In these groups, \(W\) cannot affect \(X\) or \(Y\) (because \(W\) is not moving, it can’t move \(X\) or)
  • “Backdoor” path from \(X\) to \(Y\) is “blocked”

Conditioning

In contrast to experiments, conditioning is possible for any cases and for any possible-cause \(X\):

Conditioning has greater external validity.

  • Let’s revisit the effects of speech on hate crimes

Conditioning, an Example

Example: Conditioning

Earlier we asked:

Did Trump rallies increase hate crimes?

  • inflammatory rhetoric \(\xrightarrow{?}\) violence
  • many argue there is a link
  • but is there empirical evidence of causality?

Conditioning: Example

Conditioning: Example

Correlation between Trump Rallies and Hate Crimes likely suffers from confounding

  • Return to possible confounders on the board

Example: Conditioning:

Feinberg, Branton, and Martinez-Ebers compare hate crimes in counties with and without Trump rallies, but condition on (hold constant):

  • percent Jewish
  • number of hate groups
  • crime rate
  • 2012 Republican vote share
  • percent university educated
  • region

Example: Conditioning

Feinberg, Branton, and Martinez-Ebers find that, even after conditioning, Trump rallies increase the risk of hate crimes by 200%!

  • Discuss: are you convinced that this correlation, after conditioning, shows rallies caused hate crimes?

Example: Conditioning

Example: Conditioning

Economics PhD Candidates show that conditioning on the same variables…

  • Clinton rallies increased hate crimes by nearly 250%!!
  • What could be going on here? Do all political rallies cause hate crimes? Or is something else happening?

Any confounding variables that are missing from this diagram?

Conclusion

Conditioning

  • What is it?: Another way to “solve” confounding.
  • How does it work?: Look at correlation between \(X\) and \(Y\), for cases with same value of \(W\)
  • What are the assumptions?: That is a very good question for Wednesday