How do statistical tests help authors assess causality?
Do you find these statistical tests persuasive (about causality)?
Broockman and Kalla use simple \(t\) tests to compare differences in means
“The power of multiple regression analysis is that it allows us to do in non-experimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed” (Wooldridge 2009: 77)
A matter of degree
Statistical evidence for causality combines observed data and a mathematical model of the world.
Causal evidence varies in terms of complexity of math/restrictiveness of assumptions: a matter of degree
Model-based inferences about causality depend on complex statistical models with many difficulty-to-assess assumptions
Design-based inferences about causality use carefully controlled comparisons with simple models and transparent assumptions
Whatever our approach…
do the assumptions needed to use this mathematical tool reasonably fit reality?
What does it mean to say that interpersonal contact with an out-group members causes a reduction in prejudice (toward that group)?
Causality is…
“If legal cannabis policy had not been adopted, then Y…”
Causality is…
Causality requires something acting on another (a mechanism)
A powerful way to formalize a mathematical model of causality
Imagine you were a participant in an experiment similar to the Broockman and Kalla paper…
Imagine that, today, you were asked to contribute money to a trans-rights advocacy organization. Write down: would you contribute (1) - and how much - or not contribute (0)…
In an experiment with two conditions: each subject, prior to intervention has two possible states that it could take.
If \(Y_i\) is the donation status of person \(i\); \(Z\) is the experimental condition: \(1\) indicates contact, \(0\) no contact, then we have two potential outcomes
More generally: \(Y_i(Z_i \in (0,1))\)
on the board, let’s make a table of potential outcomes corresponding to this thought experiment
Tables of potential outcomes for different units are response schedules
Potential outcomes help us mathematically describe causal effects.
The unit causal effect of contact for person \(i\) is the difference between the potential outcomes:
\[\tau_i = Y_i(1) - Y_i(0)\]
What kinds of assumptions are built in this response schedule?. Discuss in pairs
What happens when we’ve fallen from Mount Olympus, and we actually have to examine the data?
fundamental problem of causal inference:
i | \(Y_i(0)\) | \(Y_i(1)\) | \(Y_i(1) - Y_i(0)\) |
---|---|---|---|
1 | ? | 0 | ? |
2 | 1 | ? | ? |
3 | ? | 1 | ? |
4 | 0 | ? | ? |
5 | ? | 1 | ? |
The potential outcomes model allows us to imagine different causal estimands for different purposes
The potential outcomes model also points towards possible solutions:
point identification: when an attribute of an unobserved/population distribution can be inferred without bias to have a specific value, given the observed data.
partial identification: when an attribute of an unobserved/population distribution can be inferred without bias to be within a range of values, given the observed data.
Using information about the treatment \(Z\) (contact or no contact) and the outcome \(Y\) (donation or no donation), we can infer a range of values for each unit treatment effect:
Return to the board: what possible values could each unobserved potential outcome take?
If we want to know what is the average unit causal effect (\(E[\tau_i] = \frac{1}{n}\sum\limits_{i=1}^{n}\tau_i\))…
How might we get the upper and lower bounds for what that effect could be, given the observed data?
\(E[\tau_i] = E[Y_i(1) - Y_i(0)] = E[Y_i(1)] - E[Y_i(0)]\)
We decompose the mean of \(Y(1)\) and \(Y(0)\) as a weighted mean of “principal strata”: those treated (\(Z=1\)) and untreated (\(Z=0\))
\[\begin{equation}\begin{split}E[\tau_i] &= \overbrace{\{E[Y_i(1)|Z_i = 1]\pi_1 + E[Y_i(1)|Z_i = 0]\pi_0\}}^{\text{Mean of Y when treated}} \\ & \phantom{=}\ - \underbrace{\{E[Y_i(0)|Z_i = 1]\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\}}_{\text{Mean of Y when untreated}} \end{split} \end{equation}\]
Some of the means within strata are \(\color{black}{observed}\) and other are \(\color{red}{counterfactual}\): Why?
\[\begin{equation}\begin{split}E[\tau_i] &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{E[Y_i(1)|Z_i = 0]}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{E[Y_i(0)|Z_i = 1]}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
\[\begin{equation}\begin{split}E[\tau_i] &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \overbrace{\color{red}{E[Y_i(1)|Z_i = 0]}}^{\text{Mean Y(1) for untreated}}\pi_0\} \\ & \phantom{=}\ - \{\underbrace{\color{red}{E[Y_i(0)|Z_i = 1]}}_{\text{Mean Y(0) for treated}}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
We can plug in the maximum and minimum possible values for these unobserved counterfactuals, to calculate the highest and lowest possible values \(E[\tau_i]\)
In groups, what is the maximum possible value for \(E[\tau_i]\) using data on the board? What is the minimum?
If \(Y \in {Y^L, Y^U}\) (the minimum and maximum possible values of \(Y\) are \(Y^L, Y^U\) respectively):
Then:
Highest possible average causal effect when untreated cases would have had highest value of \(Y\) if treated and treated cases would have had lowest value of \(Y\) if untreated.
\[\begin{equation}\begin{split}E[\tau_i]^U &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \color{red}{Y^U}\pi_0\} \\ & \phantom{=}\ - \{\color{red}{Y^L}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
If \(Y \in {Y^L, Y^U}\) (the minimum and maximum possible values of \(Y\) are \(Y^L, Y^U\) respectively):
Lowest possible average causal effect when untreated cases would have had highest value of \(Y\) if treated and treated cases would have had lowest value of \(Y\) if untreated.
\[\begin{equation}\begin{split}E[\tau_i]^L &= \{E[Y_i(1)|Z_i = 1]Pr(Z_i=1) + \color{red}{Y^L}Pr(Z_i=0)\} \\ & \phantom{=}\ - \{\color{red}{Y^U}Pr(Z_i=1) + E[Y_i(0)|Z_i = 0]Pr(Z_i=0)\} \end{split} \end{equation}\]
We can thus partially identify a range of values in which the \(E[\tau_i]\) must be, with almost no assumptions (other than potential outcomes for an individual not affected by treatment assignment of other cases).
These bounds may not be very informative (effect may be positive OR negative).
But we can impose additional assumptions:
These additional assumptions can be translated into (usually tighter) bounds on \(E[\tau_i]\)
In a sense, there are no “statistics” here, as we don’t appeal to any random variables/stochastic processes.
We algebraically express bounds implied by specific assumptions.
random variable: a chance procedure for generating a number
observed value (realization): value of a particular draw of a random variable.
Equal probability of landing on \(1,2,3,4,5,6\)
Imagine the random variable as a box containing all possible values of a die roll on tickets
A roll of the die would be a realization
\(X\) and \(Y\) are random variables.
X | Y |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
3 | 3 |
X | Y |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
3 | 2 |
3 | 2 |
3 | 3 |
We often have data variables: lists of numbers
random variables are chance processes for generating numbers
to treat data variables as random variables need to assume a model for random data generating process
\[\sum\limits_{i=1}^{n}S_i\]
It turns out that:
\[E\left(\sum\limits_{i=1}^{n}S_i\right) = \sum\limits_{i=1}^{n}E\left(S_i\right) = n \cdot 3.5\]
\[\frac{1}{n}\sum\limits_{i=1}^{n}E\left(S_i\right) = n \cdot 3.5 \cdot \frac{1}{n}\]
If we roll die \(n\) times and take the mean of spots, it is a random variable. The mean of the \(n\) draws is, in expectation, the mean of the random variable \(S\). AND as \(n\) gets large, sample mean will converge on the mean of \(S\).
We can use repeated observations to estimate parameters of random variable (e.g. expected value).
We did this assuming independent and identically distributed random draws.
Good for rolling dice…
unit causal effects are fundamentally un-observable, so focus has been on average causal effects (or average treatment effects)
\[ACE = \bar{\tau} = \frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\]
We have a parameter/estimand (average causal effect) that we would like to know. But data we need is always missing.
Analogical problem to estimating, e.g., population mean (a parameter) without observing the entire population.
Assuming Random Assignment does two things:
parameter: \(ACE = \bar{\tau} = \frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\)
We use the estimator (\(\widehat{ACE}\) or \(\widehat{\bar{\tau}}\)):
\[\normalsize\underbrace{\frac{1}{m}\sum\limits_{i=1}^m}_{\text{Avg. over T}} \overbrace{[Y_i(1) | Z_i = 1]}^{\text{Y(treated) for T}} - \underbrace{\frac{1}{N-m}\sum\limits_{i=m + 1}^N}_{\text{Avg. over C}} \overbrace{[Y_i(0) | Z_i = 0]}^{\text{Y(control) for C}}\]
Where units \(1 \to m\) (group \(T\)) are assigned to treatment \(Z = 1\) and units \((N - m) \to N\) (group \(C\)) assigned to control \(Z = 0\).
This estimator (\(\widehat{ACE}\)) uses the
\[ACE =\frac{1}{N}\sum\limits_{i=1}^N [Y_i(1) - Y_i(0)]\]
\[ACE =E[Y_i(1) - Y_i(0)]\]
\[ACE =E[Y_i(1)] - E[Y_i(0)]\]
And if \(Z_i\) is randomly assigned:
\[ACE = E[Y_i(1)|Z_i = 1] - E[Y_i(0)|Z_i = 0]\] \[ACE = E(\widehat{ACE})\]
For us to get an unbiased estimator of the average causal effect, we use a statistical model that assumes:
Limitations: