1. Correlation: Recap
- definition
- attributes
- problems
2. Problems with Correlation
- Random correlation
- Bias in correlation (confounding)
- Today is a very good day to ask questions
November 6, 2025
We “solve” the FPCI by comparing factual outcomes in different cases that have different exposures to the “cause”
This is correlation: the degree of association/relationship between the observed values of \(X\) (the independent variable) and \(Y\) (the dependent variable)
Correlations have
random association: correlations between \(X\) and \(Y\) occur by chance and do not reflect \(X\) causing \(Y\).
bias (spurious correlation, confounding): \(X\) and \(Y\) are correlated but the correlation does not result from a causal relationship between those variables
What is the problem?
If correlation can occur by chance:
What is the solution?
What is the problem with looking at many correlations and reporting only those that are “significant”? (\(p < 0.05\))
| Statistical Significance |
\(p\)-value | By Chance? | Why? | “Real”? |
|---|---|---|---|---|
| Low | High \((p > 0.05)\) | Likely | small \(N\) weak correlation |
Probably not |
| High | Low \((p < 0.05)\) | Unlikely | large \(N\) strong correlation |
Probably |
Practice
Mueller and Schwarz (2023) investigate:
During the period from 2015 to the end of 2017, Trump posted more than 300 messages that can be classified as “Anti-Muslim”.
Did Trump’s tweeting of anti-Muslim messages on social media increase anti-Muslim hate crimes?
We can’t observe the US in the absence of Trump tweeting against Muslims, so authors use correlation…
\(X\) (Independent Variable): Trump’s Twitter gained attention after he announced run for President (2015-2017)
\(Y\) (Dependent variable): When Trump gained prominence, anti-Muslim hate crimes increased
As Trump’s Tweeting against Muslims reached more people (change in observed \(X\)), anti-Muslim hate crimes increased (change in \(Y\))
In groups, discuss:
Is this correlation be convincing that Trump’s tweets caused anti-Muslim hate crimes?
Why or why not?
In this case, correlation “solves” FPCI by plugging in these factual potential outcomes
| Case | Tweets | No Tweets |
|---|---|---|
| USA 2015-17 | \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)}\) | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\) |
| \(\Downarrow\) | \(\Uparrow{}\) | |
| USA 2010-14 | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2010-14}(Trump \ Tweets)}}\) | \(\mathrm{Hate \ Crime_{USA \ 2010-14}(No \ Tweets)}\) |

Why doesn’t correlation imply causation?
confounding is when there is a systematic observed correlation between \(X\) and \(Y\) that does NOT reflect the causal effect of \(X\) on \(Y\).
Mueller and Schwarz look at the correlation of Trump’s Twitter activity and Hate Crimes over time:
When Trump tweeted more (and had more followers) (2015-2017), hate crimes were higher than when Trump tweeted less (and had fewer followers) (2010-2014).
In order for Mueller and Schwarz’s correlation to imply causation, need to assume that:
\(\color{red}{\mathrm{AntiMuslim \ Hate \ Crime_{USA \ 2015-17}(No \ Trump \ Tweets)}}\) \(=\) \(\mathrm{AntiMuslim \ Hate \ Crime_{USA \ 2010-14}(No \ Trump \ Tweets)}\)
If assumption is wrong…
and Anti-Muslim hate crimes in 2015-2017 would have been different from 2010-2014 even without Trump’s Tweets…
…this comparison leads to confounding.
In correlation, Mueller and Schwarz assume that US (2015-17) without Trump tweets (counterfactual) is the same as US (2010-14) without Trump tweets (factual)
| Case | Tweets | No Tweets |
|---|---|---|
| USA 2015-17 | \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)}\) | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\) |
| \(\Downarrow{=}\) | \(\Uparrow{=}\) | |
| USA 2010-14 | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2010-14}(Trump \ Tweets)}}\) | \(\mathrm{Hate \ Crime_{USA \ 2010-14}(No \ Tweets)}\) |
If this substitution is wrong: USA in 2010-14 vs USA 2015-17 have different potential outcomes of hate crime, correlation is biased.
| Case | Tweets | No Tweets |
|---|---|---|
| USA 2015-17 | \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)}\) | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\) |
| \(\Downarrow{\neq}\) | \(\Uparrow{\neq}\) | |
| USA 2010-14 | \(\color{red}{\mathrm{Hate \ Crime_{USA \ 2010-14}(Trump \ Tweets)}}\) | \(\mathrm{Hate \ Crime_{USA \ 2010-14}(No \ Tweets)}\) |
Maybe \(\mathrm{Hate \ Crime_{USA \ 2015-17}(Trump \ Tweets)} = \\ \color{red}{\mathrm{Hate \ Crime_{USA \ 2015-17}(No \ Tweets)}}\): hate crimes would have gone up ANYWAY
Why does confounding happen?
Confounding happens when cases that experience different levels of \(X\) have systematically different (factual and counterfactual) potential outcomes of \(Y\).
Cases with different levels of cause \(X\)
Does social media use cause increased polarization?
Individual data on social media use and affective polarization from ANES 2024
Why does confounding happen?
Confounding happens when cases that experience different levels of \(X\) have systematically different (factual and counterfactual) potential outcomes of \(Y\).
What does that look like in the context of social media consumption and polarization?
::BOARD::
Why does confounding happen?
Confounding happens when cases that experience different levels of \(X\) have systematically different (factual and counterfactual) potential outcomes of \(Y\).
Cases that actually receive cause \(X\)…
… than cases not actually receiving \(X\)
Confounding occurs when there are other differences between cases (call them variables, e.g. \(W\), etc.) that causally affect \(X\) and \(Y\).
The easiest way to understand this is visually.
Causal graphs represent a model of the true causal relationships between variables.
the nodes or dots correspond to variables
the arrows convey the direction of the flow of causality
Arrows alone do not indicate whether \(X\), e.g., increases or decreases \(Y\).
Did Trump anti-Muslim tweets cause hate crimes?
Did Trump anti-Muslim tweets cause hate crimes?
Maybe…
In a causal graph, there is confounding of correlation of \(X\) and \(Y\) if…
Does social media use cause increased polarization?
In groups: what other variables might affect social media use? polarization?
We don’t know the True causal graph (if we did, we wouldn’t need work so hard to evaluate causal claims)
Instead, these causal graphs help us think through possible scenarios that might produce bias/confounding of the correlation between \(X\) and \(Y\).
These examples illustrate the possibility that if causal graphs include variables in addition to the independent and dependent variables, there is a risk of confounding or bias.
Do all additional variables produce confounding?
No… We will discuss three different patterns of variables: some of which produce confounding, some which do not.
The most serious threat to empirical evidence of causality is confounding: