November 4, 2024

Testing Causal Claims

1. Fundamental Problem of Causal Inference

  • independent/dependent variables
  • solutions

2. Correlation

  • what is it?
  • problems with correlation
  • Random association

Election

Election Violence?

Criminal cases; campaign claims hinge on whether Trump’s statements affect real-world actions.

Do Trump’s statements this year increase the probability of violence after the election?

Did Trump’s election fraud claims in 2020 increase the probability of violence (like January 6th)?

Fundamental Problem

Causal claims are about counterfactuals, can be expressed in terms of potential outcomes:

“Trump’s election fraud claims caused January 6th” \(\xrightarrow{implies}\)

\[\text{Capitol Attacked}_{2021} (\text{Trump Claims Fraud}) = Yes \\ \color{red}{\text{Capitol Attacked}_{2021} (\text{Trump Doesn't Claim Fraud}) = No} \]

  • But the Fundamental Problem of Causal Inference: can never observe the effect of fraud claims in 2020, because never see the counterfactual.

Solving the FPCI?

In order to provide scientific evidence (i.e., meets weak severity) regarding causal claims, we need to find ways around the FPCI:

  • We need to find something we can see that corresponds to the missing counterfactual

Solving the FPCI

What behaviors cause a person to become wealthy?

Can we learn anything from this evidence?

Solving the FPCI

We cannot say anything about causality if:

  • we try to explain the cause of some effect by only looking at cases that experience that effect (selection on dependent variable)
  • we try to explain the effect of some cause by only looking at cases that experience that cause

We focus on effects of causes, so we need to see variation in exposure to the cause

  • without this, evidence fails weak severity

Testing Causal Claims

We make causal claims testable by translating them into statements about potential outcomes: the relationship between independent (cause) and dependent (outcome) variables.

  • these are statements about what we should observe if the causal claim is true.

For instance:

  • If \(X\) were present(absent), then \(Y\) would be more(less) likely
  • If \(X\) were to increase(decrease), then \(Y\) would increase(decrease)

We then have to compare cases with different values of \(X\).

Independent/Dependent Variables

Independent variable:

The variable capturing the alleged cause in a causal claim.

  • often denoted as “X”

Dependent variable:

The variable capturing the alleged outcome (what is affected) in a causal claim.

  • often denoted as “Y”

Potential Outcomes are the values of dependent variable a case would take if exposed to different values of the independent variable

Testing Causal Claims

We can’t easily find a counterfactual US where Trump didn’t claim election was stolen…

But we can examine whether other statements he has made increased violence.

Trump Rallies and Hate Crimes

In February 2019, Donald Trump held a rally in El Paso, TX. Argued that migrants were dangerous.

  • “A wall is a very good thing, not a bad thing. It’s a moral thing”
  • “[reducing numbers of immigrants in detention would be] cutting loose dangerous criminals into our country.”
  • “The Democrat Party [is] becoming the party of socialism, late-term abortion, open borders and crime.

Trump Rallies and Hate Crimes

In August 2019, an armed man killed 22 people at a Walmart in El Paso, TX. In advance of his attack, he issued a manifesto that stated he was motivated in response to an alleged “Hispanic invasion of Texas.”

Trump Rallies and Hate Crimes

A causal claim:

“Trump’s rally in El Paso increased the likelihood of hate crimes against immigrants.”

  • Does the fact that a hate crime took place in El Paso after the rally prove this is true? Why or why not?

Trump Rallies and Hate Crimes

“Trump rallies in a community increase the likelihood of hate crimes against immigrants.”

What could be an independent variable used to test this causal claim?

What could be a dependent variable used to test this causal claim?

Trump Rallies and Hate Crimes

Causal claim implies…

Counterfactual claim:

“If Trump had not held a rally in El Paso (in 2019), then there would have been fewer hate crimes against immigrants.”

Potential Outcomes:

\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)

\(\mathrm{Black}\) is factual; \(\color{red}{\mathrm{Red}}\) is counterfactual

Trump Rallies and Hate Crimes

If Trump’s rally caused hate crimes to increase in El Paso, we would expect to see this:

\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)

While the value in \(\mathrm{Black}\) is factual; The value in \(\color{red}{\mathrm{Red}}\) is counterfactual and can never be known.

Trump Rallies and Hate Crimes

\(\mathrm{City}_i\) \(\mathrm{Rally}_i\) \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\)
El Paso Yes > 1 ?

How would we find the “\(?\)”?

  • We need to find another case where there was no rally

Trump Rallies and Hate Crimes

We cannot observe: \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)

But we can observe, e.g.: \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\)

If we assume: \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\) \(=\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)

Then, we can empirically test our causal claim, to see if:

\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\)

\(\mathrm{City}_i\) \(\mathrm{Rally}_i\) \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\)
El Paso Yes \(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally})\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)
\(\mathbf{\Uparrow}\)
Austin No \(\color{red}{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{Rally})}\) \(\boxed{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})}\)

\(\mathrm{City}_i\) \(\mathrm{Rally}_i\) \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\)
El Paso Yes \(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally})\) \(\boxed{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})}\)
\(\mathbf{\Uparrow}\)
Austin No \(\color{red}{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{Rally})}\) \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\)

Solving the FPCI

Every solution to the FPCI involves:

  1. Comparing the observed values of outcome \(Y\) in cases that actually have different values of cause \(X\)

  2. Making assumption that factual (observed) potential outcomes from one case as equivalent to counterfactual (unobserved) potential outcomes of another case.

  • The observed patterns of the association between the independent variable \(X\) and dependent variable \(Y\) we call correlations

Correlation

Correlation is the association/relationship between the observed values of \(X\) (the independent variable) and \(Y\) (the dependent variable)

  • There are formal mathematical definitions for specific kinds of associations.
  • We use the term loosely to describe the observed relationship between \(X\) and \(Y\).

Correlation

All empirical evidence for causal claims relies on correlation between the independent and dependent variables.

But, you’ve all heard this:

Correlation

How do we turn correlation into evidence of causation?

  • First, we need to know a bit more about correlation

Correlation

Correlation

Many different ways of assessing correlation.

  • correlations have a direction:
    • positive: \(X\uparrow\) , \(Y\uparrow\)
    • negative: \(X\uparrow\), \(Y\downarrow\)
  • correlations have strength:
    • strong: \(X\) and \(Y\) almost always move together
    • weak: \(X\) and \(Y\) do not move together very much
  • correlations have a magnitude:
    • how much \(Y\) changes with \(X\) on average.

Correlation

Many ways of examining correlations:

  • Numerical summary (correlation coefficient, linear regression, etc.)
  • scatterplots (x-y coordinates with one point per case)
  • bar plots (cases grouped into bars by values of X, height corresponds to values of Y)

Correlation

Formally…

common mathematical definition: correlation is the degree of linear association between \(X\) and \(Y\)

  • Takes values between \(-1\) and \(1\)
  • Values close to \(1\) or \(-1\) suggest strong degree of linear association
  • Values close to \(0\) suggest weak degree of linear association
  • Value of correlation does not tell us how much \(Y\) changes with \(X\)

Correlation

What is it?

negative correlation: (\(< 0\)) values of \(X\) and \(Y\) move in opposite direction:

  • higher values of \(X\) appear with lower values of \(Y\)
  • lower values of \(X\) appear with higher values of \(Y\)

positive correlation: (\(> 0\)) values of \(X\) and \(Y\) move in same direction:

  • higher values of \(X\) appear with higher values of \(Y\)
  • lower values of \(X\) appear with lower values of \(Y\)

Correlation

Correlation

  • It is possible to see perfect correlation but small change in \(Y\) across \(X\)

  • It is possible to see weak correlation but large change in \(Y\) across \(X\)

  • It is possible to see perfect nonlinear relationship between \(X\) and \(Y\) with \(0\) correlation

Correlation:

weak correlation: values for \(X\) and \(Y\) do not cluster along line

strong correlation: values for \(X\) and \(Y\) cluster strongly along a line

strength of correlation does not determine the slope of line describing \(X,Y\) relationship

magnitude: this is the slope of the line describing the \(X,Y\) relationship. The larger the effect, the steeper the slope

Correlation: \(0.08\), Magnitude: \(0.22\). Does this correlation prove that Trump rallies caused hate crimes? Why or why not?

Correlation: \(0.08\), Magnitude: \(0.22\). Does this correlation prove that Trump rallies caused hate crimes? Why or why not?

Correlation: 0.67, Magnitude: 5.82. Does this correlation prove that Nick Cage caused drownings? Why or why not?

Correlation

Two types of problems

  • random association: correlations between \(X\) and \(Y\) occur by chance and do not reflect any systematic relationship between \(X\) and \(Y\). (In the extreme, absolutely no relationship between \(X\) and \(Y\))

  • bias (spurious correlation, confounding): \(X\) and \(Y\) are correlated but the correlation does not result from causal relationship between those variables

Solving these problems involves making assumptions: what are those assumptions? how plausible are they?

Random Association

Correlation: Random association

Correlation: Random association

Arbitrary processes can make seemingly-strong patterns.

If you look long enough at pure chaos, you might find a strong correlation

  • But it could have happened by chance!
  • So an observed correlation might not mean any relationship (e.g. Nick Cage)

Arbitary Correlations

Random association: Statistics

To see that random patterns can emerge, I use random number generators to

  • randomly pick \(5\) values of \(X\) out of a “hat”
  • randomly pick \(5\) values of \(Y\) out of a different “hat”

We can imagine these are the observed \(X\) and \(Y\) for \(5\) cases.

How easy is it to find a strong correlation (even though \(X\) and \(Y\) totally unrelated)

Random association: Statistics

Random association: Statistics

\(\#\) Tries to get correlation \(> 0.9\): 58

Random association

What do we do about this problem?

  • We can never rule out a chance correlation, but we can figure out how likely it is that the correlation occurred by chance.
  • If chance correlation is very unlikely, then we set aside this concern (or indicate the “false positive rate” for this analysis)
  • This is exactly identical to the situation of random sampling error: we don’t know if our random sample has randomly erred in taking too many of one group vs another, but we can know how likely errors of different sizes are.

Random association: Statistics

Field of statistics investigates properties of chance events:

  • Probability theory tells us how likely events are to happen, given chance
  • Tells us how likely correlation of a given strength is to happen by chance

Random association: Statistics

How?

  1. Compute correlation of \(X\) and \(Y\)
  2. How strong is the correlation?
    • Patterns that are stronger are less likely to happen by chance
  3. How many cases do we have?
    • Patterns with many cases are less likely to happen by chance
  4. Assign a probability that the correlation we see would have happened by chance, as a function of number of cases, correlation strength

Random association: Statistics

This procedure works…

Assuming…

  1. we correctly describe the chance processes that might generate this correlation
  2. we don’t misuse our statistical tests

Random association: Statistics

Tries to get correlation \(> 0.9\): 1377

Random association: Statistics

Tries to get correlation \(> 0.9\): 905248

Random association: Statistics

Tries to get correlation \(> 0.45\): 30

Random association: Statistics

statistical significance:

An indication of how likely it is that correlation we observe could have happened purely by chance.

higher degree of statistical significance indicates correlation is unlikely to have happened by chance

Random association: Statistics

\(p\) value:

  • A numerical measure of statistical significance. Puts a number on how likely observed correlation would have occurred by chance, assuming we know the chance procedure and assuming truth is a \(0\) correlation.

  • It is a probability, so is between \(0\) and \(1\).

  • Lower \(p\)-values indicate greater statistical significance

\(p < 0.05\) often used as threshold for “significant” result.

  • but it is not a magic number
  • we observe \(p < 0.05\) by chance (\(\frac{1}{20}\)th of the time)

Random association: Statistics

\(p\) value:

Advertised promise about how likely “true correlation” is actually \(0\). An indicator of how likely correlation is going to lead us astray, due to random association.

Significant?

Significant?

Significant?

What else do you want to know?

Same interpretation?

Same interpretation?

Random association: Statistics

\(p\) value:

Be wary of “\(p\)-hacking”/“snooping”, “data dredging”

  • \(p\) values become meaningless if we look at many correlations, then only report the ones that are “significant”.

Why?

  • low \(p\)-values occur by chance. expect to observe some if we look at many correlations
  • using this procedure, you can always find some “significant” result \(\to\) fail weak severity

Correlation: Random Association

  1. Correlations can appear by chance
  2. We can assess probability of chance correlation if we know:
    • strength of correlation (close to \(1,-1\))
    • size of the sample (\(N\))
    • we assume we know the chance process generating our observations
  3. \(p\)-values:
    • Obtained using mathematical formulae
    • Given same \(N\), stronger correlation has lower \(p\)
    • Given same strength, correlation with more \(N\) has lower \(p\)

Random assocation

Recap:

Statistical
Significance
\(p\)-value By Chance? Why? “Real”?
Low High \((p > 0.05)\) Likely small \(N\)
weak correlation
Probably not
High Low \((p < 0.05)\) Unlikely large \(N\)
strong correlation
Possibly

Conclusion

Correlation: \(0.08\), Magnitude: \(0.22\), \(p = 0.00001\)

Did Trump rallies cause hate crimes?

Correlation: \(0.08\), Magnitude: \(0.22\), \(p = 0.00001\)

Did Trump rallies cause hate crimes?

Conclusion

\(1.\) Correlation as “solution” to Fundamental Problem of Causal Inference

  • but requires assumptions

\(2.\) Correlation suffers from two problems:

  • Random Association: assess \(p\) values and statistical significance, based on assumptions about chance process
  • Bias: we call this “confounding”, more on Wednesday