1. Correlation: Recap
- definition
- attributes
- problems
2. Problems with Correlation
- Random correlation
- Bias in correlation (confounding)
- Today is a very good day to ask questions
November 6, 2024
The fundamental problem of causal inference is that:
for any one case, we cannot know whether some “cause” actually led to some “effect”.
We “solve” the FPCI by comparing factual outcomes in different cases that have different exposures to the “cause”
This is correlation: the degree of association/relationship between the observed values of \(X\) (the independent variable) and \(Y\) (the dependent variable)
Correlations have
random association: correlations between \(X\) and \(Y\) occur by chance and do not reflect systematic relationship.
bias (spurious correlation, confounding): \(X\) and \(Y\) are correlated but the correlation does not result from a causal relationship between those variables
What is the problem with looking at many correlations and reporting only those that are “significant”? (\(p < 0.05\))
Statistical Significance |
\(p\)-value | By Chance? | Why? | “Real”? |
---|---|---|---|---|
Low | High \((p > 0.05)\) | Likely | small \(N\) weak correlation |
Probably not |
High | Low \((p < 0.05)\) | Unlikely | large \(N\) strong correlation |
Probably |
Mueller and Schwarz (2020) investigate:
During the period from 2015 to the end of 2017, Trump posted more than 300 messages that can be classified as “Anti-Muslim”.
Did Trump’s tweeting of anti-Muslim messages increase anti-Muslim hate crimes?
We can’t observe the US in the absence of Trump tweeting against Muslims, so authors use correlation…
Trump’s Twitter gained attention after he announced run for President (2015-2017)
When Trump gained prominence, anti-Muslim hate crimes increased
As Trump’s Tweeting against Muslims reached more people (change in observed \(X\)), anti-Muslim hate crimes increased (change in \(Y\))
In groups, discuss:
Can this correlation be convincing that Trump’s tweets caused anti-Muslim hate crimes?
Why or why not?
Why doesn’t correlation imply causation?
confounding is when there is a systematic observed correlation between \(X\) and \(Y\) that does NOT reflect the causal effect of \(X\) on \(Y\).
Mueller and Schwarz look at the correlation of Trump’s Twitter activity and Hate Crimes over time:
When Trump tweeted more (and had more followers) (2015-2017), hate crimes were higher than when Trump tweeted less (and had fewer followers) (2010-2014).
Confounding happens when cases that experience different levels of \(X\) have different (factual and counterfactual) potential outcomes of \(Y\).
In other words, cases with different levels of \(X\) were already different in their factual/counter-factual values of \(Y\).
In our example:
In order for Mueller and Schwarz’s correlation to imply causation, need to assume that:
\(\color{red}{\mathrm{AntiMuslim \ Hate \ Crime_{USA \ 2015-17}(No \ Trump \ Tweets)}}\) \(=\) \(\mathrm{AntiMuslim \ Hate \ Crime_{USA \ 2010-14}(No \ Trump \ Tweets)}\)
If assumption is wrong…
Anti-Muslim hate crimes in 2015-2017 would have been different from 2010-2014 even without Trump’s Tweets…
…this comparison leads to confounding.
In correlation, Mueller and Schwarz assume that US (2015-17) without Trump tweets (counterfactual) is the same as US (2010-14) without Trump tweets (factual)
Case | Tweets | No Tweets |
---|---|---|
USA 2015-17 | \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)}\) | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\) |
\(\Downarrow{=}\) | \(\Uparrow{=}\) | |
USA 2010-14 | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2010-14}(Trump \ Tweets)}}\) | \(\mathrm{Hate \ Crime_{USA \ 2010-14}(No \ Tweets)}\) |
If this substitution is wrong: USA in 2010-14 vs USA 2015-17 have different potential outcomes of hate crime, correlation is biased.
Case | Tweets | No Tweets |
---|---|---|
USA 2015-17 | \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)}\) | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\) |
\(\Downarrow{\neq}\) | \(\Uparrow{\neq}\) | |
USA 2010-14 | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2010-14}(Trump \ Tweets)}}\) | \(\mathrm{Hate \ Crime_{USA \ 2010-14}(No \ Tweets)}\) |
Why might the potential outcomes of hate crime be different in these two time periods??
Confounding occurs when there are other differences between cases (call them variables, e.g. \(W\), etc.) that causally affect \(X\) and \(Y\).
The easiest way to understand this is visually.
Causal graphs represent a model of the true causal relationships between variables.
the nodes or dots correspond to variables
the arrows convey the direction of the flow of causality
Arrows alone do not indicate whether \(X\), e.g., increases or decreases \(Y\).
Did Trump anti-Muslim tweets cause hate crimes?
Maybe…
Did Trump anti-Muslim tweets cause hate crimes?
In a causal graph, there is confounding of correlation of \(X\) and \(Y\) if…
We don’t know the True causal graph (if we did, we wouldn’t need work so hard to evaluate causal claims)
Instead, these causal graphs help us think about possible scenarios that might produce bias/confounding of the correlation between \(X\) and \(Y\).
These examples illustrate the possibility that if causal graphs include variables in addition to the independent and dependent variables, there is a risk of confounding or bias.
Do all additional variables produce confounding?
No… We will discuss three different patterns of variables: some of which have confounding, some which do not.
The most serious threat to empirical evidence of causality is confounding: