One set of a tools to help us scientifically interrogate claims about causal relationships.
We can easily concoct causal explanations for phenomena/experiences that are wrong
One does not have evidence for a claim if nothing has been done to rule out the ways the claim may be false. If data x agree with a claim C, but the method is … guaranteed to find such agreement, and had little or no capability of finding flaws with C even if they exist, they we have bad evidence, no test
(Mayo 2018, pp. 5)
This principle can guide us at all stages of research:
(strong) severity requirement:
We have evidence for a claim C just to the extent it survives stringent scrutiny. If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet none or few are found, then the passing result, \(x\), is evidence for C.
(Mayo 2018, pp. 14)
We want to deliberately engage with ways in which our claim and evidence for that claim may be wrong.
Severity principle does not REQUIRE statistics (or even quantitative data).
What about this procedure makes it “highly capable” of finding the claim that “Mayo gained weight” to be wrong?
Statistical methods not needed, but they are a (potentially) potent tool to help us achieve tests with strong severity
Severity as a criterion of evidence relates to falsification: we are not interested in inferring that claim is true from tests; we are interested in testing whether claim is false
\(H\): All swans are white.
\(H \to O\): See no swans of other colors
\(not \ O \to not \ H\): Black swan \(\to\) claim is false
Falsification not actually so simple (Duhem, in Mayo 2018, pp. 84-6)
Any statistical test of a claim involves, in addition to the hypothesis \(H\):
If the claim “fails” a test, it could be that \(H\) is wrong. Or any \(A_1 \dots A_k\), \(E_1 \dots E_k\).
“The only thing the experiment teaches us is … there is at least one error; but where this error lies is just what it does not tell us.”
Strong severity involves doing the work to rule out the possibility that it is an incorrect assumption made in the test that leads us to say evidence supports/rejects our claim.
Understand canonical statistical designs and models used for testing causal hypotheses, such that:
Meta-level:
Tests of causal theories/claims/hypotheses involve making additional assumptions…
When using/evaluating statistical tools… ask:
What do the authors do to subject contact hypothesis to severe testing? (What specific role do statistical tests play in assessing whether contact causes a reduction in prejudice?)
Broockman and Kalla use simple \(t\) tests to compare differences in means
“The power of multiple regression analysis is that it allows us to do in non-experimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed” (Wooldridge 2009: 77)
A matter of degree
Statistical evidence for causality combines observed data and a mathematical model of the world.
Causal evidence varies in terms of complexity of math/restrictiveness of assumptions: a matter of degree
Model-based inferences about causality involve many choices in complex statistical models with many difficult-to-assess assumptions
Design-based inferences about causality use carefully controlled comparisons with simple models and transparent assumptions
What does it mean to say that interpersonal contact with an out-group members causes a reduction in prejudice (toward that group)?
Causality is…
“If legal cannabis policy had not been adopted, then Y…”
Causality is…
Causality requires something acting on another (a mechanism)
A powerful way to formalize a mathematical model of causality
Imagine you were a participant in an experiment similar to the Broockman and Kalla paper…
Imagine that, today, you were asked to contribute money to a trans-rights advocacy organization.
Write down: would you contribute (1) - and how much - or not contribute (0)…
… if, yesterday, you were in the no contact group
… if, yesterday, you were in the contact group
In an experiment with two conditions: each subject, prior to intervention has two possible states that it could take.
If \(Y_i\) is the donation status of person \(i\); \(Z_i\) is the experimental condition: \(1\) indicates contact, \(0\) no contact, then we have two potential outcomes
More generally: \(Y_i(Z_i \in (0,1))\)
on the board, let’s make a table of potential outcomes corresponding to this thought experiment
Tables of potential outcomes for different units are response schedules
Potential outcomes help us mathematically describe causal effects.
The unit causal effect of contact for person \(i\) is the difference between the potential outcomes:
\[\tau_i = Y_i(1) - Y_i(0)\]
n.b. \(\tau\) and other Greek letters used to stand in for quantities of interest. Here, presumably for “treatment effect”.
What kinds of assumptions are built in this response schedule?. Discuss in pairs
What happens when we’ve fallen from Mount Olympus, and we actually have to examine the data?
fundamental problem of causal inference:
i | \(Y_i(0)\) | \(Y_i(1)\) | \(Y_i(1) - Y_i(0)\) |
---|---|---|---|
1 | ? | 0 | ? |
2 | 1 | ? | ? |
3 | ? | 1 | ? |
4 | 0 | ? | ? |
5 | ? | 1 | ? |
The potential outcomes model allows us to imagine different causal estimands for different purposes
estimand is just a formal definition of a quantity we want to know. causal estimands are expressed as some function of unit causal effects.
The potential outcomes model also points towards possible solutions:
point identification: when an attribute of an unobserved/population distribution can be inferred without bias to have a specific value, given the observed data.
partial identification: when an attribute of an unobserved/population distribution can be inferred without bias to be within a range of values, given the observed data.
Using information about the treatment \(Z\) (contact or no contact) and the outcome \(Y\) (donation or no donation), we can infer a range of values for each unit treatment effect:
Return to the board: what possible values could each unobserved potential outcome take?
If we want to know what is the average unit causal effect
\[E[\tau_i] = \frac{1}{n}\sum\limits_{i=1}^{n}\tau_i\]
How might we get the upper and lower bounds for what that effect could be, given the observed data?
\(E[\tau_i] = E[Y_i(1) - Y_i(0)] = E[Y_i(1)] - E[Y_i(0)]\)
We decompose the mean of \(Y_i(1)\) and \(Y_i(0)\) as a weighted mean of “principal strata”: those treated (\(Z_i=1\)) and untreated (\(Z_i=0\))
\[\begin{equation}\begin{split}E[\tau_i] &= \overbrace{\{E[Y_i(1)|Z_i = 1]\pi_1 + E[Y_i(1)|Z_i = 0]\pi_0\}}^{\text{Mean of Y when treated}} \\ & \phantom{=}\ - \underbrace{\{E[Y_i(0)|Z_i = 1]\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\}}_{\text{Mean of Y when untreated}} \end{split} \end{equation}\]
Some of the means within strata are \(\color{black}{observed}\) and other are \(\color{red}{counterfactual}\): Why?
\[\begin{equation}\begin{split}E[\tau_i] &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{E[Y_i(1)|Z_i = 0]}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{E[Y_i(0)|Z_i = 1]}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
\[\begin{equation}\begin{split}E[\tau_i] &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{\overbrace{E[Y_i(1)|Z_i = 0]}^{\text{Mean Y(1) for untreated}}}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{\underbrace{E[Y_i(0)|Z_i = 1]}_{\text{Mean Y_i(0) for treated}}}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
We don’t know what these missing values are, but we can plug in the maximum and minimum possible values for these unobserved counterfactuals, to calculate the highest and lowest possible values \(E[\tau_i]\)
In groups, what is the maximum possible value for \(E[\tau_i]\) using data on the board? What is the minimum?
If the minimum and maximum possible values of \(Y\) are \(Y^L, Y^U\) respectively:
Highest possible average causal effect when:
\[\begin{equation}\begin{split}E[\tau_i]^U &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{Y^U}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{Y^L}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
If the minimum and maximum possible values of \(Y\) are \(Y^L, Y^U\) respectively:
Lowest possible average causal effect when:
\[\begin{equation}\begin{split}E[\tau_i]^L &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{Y^L}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{Y^U}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
We can thus partially identify a range of values in which the \(E[\tau_i]\) must be, with almost no assumptions (other than potential outcomes for an individual not affected by treatment assignment of other cases).
These bounds may not be very informative (effect may be positive OR negative).
But we can impose additional assumptions:
In a sense, there are no “statistics” here, as we don’t appeal to any random variables/stochastic processes. There is no error probability. If assumptions hold, then true value is within the bounds.
unit causal effects are fundamentally un-observable, so focus has been on average causal effects (or average treatment effects)
\[ACE = \bar{\tau} = \frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\]
We have a parameter/estimand (average causal effect) that we would like to know. But data we need is always missing.
Analogical problem to estimating, e.g., population mean (a parameter) without observing the entire population.
Assuming Random Assignment does two things:
estimand: \(ACE = \bar{\tau} = \frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\)
We use the estimator (\(\widehat{ACE}\) or \(\widehat{\bar{\tau}}\)):
\[\normalsize\underbrace{\frac{1}{m}\sum\limits_{i=1}^m}_{\text{Avg. over T}} \overbrace{[Y_i(1) | Z_i = 1]}^{\text{Y(treated) for T}} - \underbrace{\frac{1}{N-m}\sum\limits_{i=m + 1}^N}_{\text{Avg. over C}} \overbrace{[Y_i(0) | Z_i = 0]}^{\text{Y(control) for C}}\]
Where units \(1 \to m\) (group \(T\)) are assigned to treatment \(Z_i = 1\) and units \((N - m) \to N\) (group \(C\)) assigned to control \(Z_i = 0\).
This estimator (\(\widehat{ACE}\)) uses the
\[ACE =\frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\]
\[ACE =E[Y_i(1) - Y_i(0)]\]
\[ACE = E[Y_i(1)] - E[Y_i(0)]\]
And if \(Z_i\) is randomly assigned:
\[ACE = E[Y_i(1)|Z_i = 1] - E[Y_i(0)|Z_i = 0]\] \[ACE = E(\widehat{ACE})\]
For us to get an unbiased estimator of the average causal effect, we use a statistical model that assumes:
Limitations:
A randomized experiment as a comparison could be used to make estimators of different estimands
These would require a different set of assumptions, different estimator.
More complex experiments (e.g. testing “spillover effects” from contact) imply different estimands, different estimators.
We focus on canonical estimators for each design.
random variable: a chance procedure for generating a number
observed value (realization): value of a particular draw of a random variable.
Equal probability of landing on \(1,2,3,4,5,6\)
Imagine the random variable as a box containing all possible values of a die roll on tickets
A roll of the die would be a realization
\(X\) and \(Y\) are random variables.
X | Y |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
3 | 3 |
X | Y |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
3 | 2 |
3 | 2 |
3 | 3 |
We often have data variables: lists of numbers
random variables are chance processes for generating numbers
to treat data variables as random variables need to assume a model for random data generating process
\[\sum\limits_{i=1}^{n}S_i\]
It turns out that:
\[E\left(\sum\limits_{i=1}^{n}S_i\right) = \sum\limits_{i=1}^{n}E\left(S_i\right) = n \cdot 3.5\]
\[\frac{1}{n}\sum\limits_{i=1}^{n}E\left(S_i\right) = n \cdot 3.5 \cdot \frac{1}{n}\]
If we roll die \(n\) times and take the mean of spots, it is a random variable. The mean of the \(n\) draws is, in expectation, the mean of the random variable \(S\). AND as \(n\) gets large, sample mean will converge on the mean of \(S\).
We can use repeated observations to estimate parameters of random variable (e.g. expected value).
We did this assuming independent and identically distributed random draws.
Good for rolling dice…