We never observe unit causal effects, but can estimate average causal effect, by filling in missing (\(\color{red}{\text{counterfactual}}\)) potential outcomes.
\[\begin{equation}\begin{split}ACE &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \overbrace{\color{red}{E[Y_i(1)|Z_i = 0]}}^{\text{Mean Y(1) for untreated}}\pi_0\} \\ & \phantom{=}\ - \{\underbrace{\color{red}{E[Y_i(0)|Z_i = 1]}}_{\text{Mean Y(0) for treated}}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
Given assumptions, we fill in the maximum/minimum logically possible values for missing potential outcomes to get bounds. Bounds must contain the true causal effect if the assumptions are true.
random variable: a chance procedure for generating a number
observed value (realization): value of a particular draw of a random variable.
Arithmetic operations on random variables are new random variables (e.g., sum and mean)
Expected value of a random variable \(X\) is the mean of all possible realizations of \(X\)
Independence and Dependence: random variables \(X,Y\) are independent if knowing value of \(X\) does not yield information about value of \(Y\).
Mean of \(n\) realizations (sample) from random variable \(X\) is also a random variable, with mean same as \(E[X]\). (intuition for proof?)
\[\begin{equation}\begin{split}ACE &= \{E[Y_i(1)|Z_i = 1]\pi_1 + \overbrace{\color{red}{E[Y_i(1)|Z_i = 0]}}^{\text{plug in Y(1) for treated}}\pi_0\} \\ & \phantom{=}\ - \{\underbrace{\color{red}{E[Y_i(0)|Z_i = 1]}}_{\text{plug in Y(0) for untreated}}\pi_1 + E[Y_i(0)|Z_i = 0]\pi_0\} \end{split} \end{equation}\]
… random assignment does not guarantee that any particular randomization gives us exact estimate of unobserved potential outcomes
different realizations of random process \(\to\) sampling variability
The extent of sampling variability depends on the nature of the random process that generates observations.
In any given randomization, treatment mean and control mean are likely \(\neq\) the true means of \(Y(1)\) and \(Y(0)\)…
We want to know:
parameter/estimand: unknown attribute of random variable (e.g., the mean) that we want to know
estimator: rule/procedure for estimating the parameter/estimand given observed data
bias: estimator is biased if, on average, the estimator yields a value different from the parameter
So \(\widehat{ACE}\) is unbiased if:
\[E(\widehat{ACE}) - ACE = 0\]
Following evidence of effects of soap opera in Rwanda (2009):
Variation on Rwandan study, in Eastern DRC.
Intolerance: “I would not like that group to belong to my community association”; (1 = totally disagree; 4 = totally agree)
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | 2 |
2 | 4 | 4 |
3 | 4 | 2 |
4 | 2 | 3 |
5 | 2 | 4 |
6 | 4 | 1 |
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | 2 |
2 | 4 | 4 |
3 | 4 | 2 |
4 | 2 | 3 |
5 | 2 | 4 |
6 | 4 | 1 |
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | 2 |
2 | 4 | 4 |
3 | 4 | 2 |
4 | 2 | 3 |
5 | 2 | 4 |
6 | 4 | 1 |
We set 3 regions in treatment (soap opera + talk show)
We set 3 regions in control (soap opera only)
How many possible random assignments are there?
What are all possible random assignments (to treatment and control)?
R
1 | 2 | 3 |
1 | 2 | 4 |
1 | 2 | 5 |
1 | 2 | 6 |
1 | 3 | 4 |
1 | 3 | 5 |
1 | 3 | 6 |
1 | 4 | 5 |
1 | 4 | 6 |
1 | 5 | 6 |
2 | 3 | 4 |
2 | 3 | 5 |
2 | 3 | 6 |
2 | 4 | 5 |
2 | 4 | 6 |
2 | 5 | 6 |
3 | 4 | 5 |
3 | 4 | 6 |
3 | 5 | 6 |
4 | 5 | 6 |
For each randomization, calculate the \(\widehat{ACE}\) (hint, express this in fractions \(\frac{x}{3}\))
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | 2 |
2 | 4 | 4 |
3 | 4 | 2 |
4 | 2 | 3 |
5 | 2 | 4 |
6 | 4 | 1 |
What is the mean \(\widehat{ACE}\)?
How does it compare to the \(ACE\)?
Are there any \(\widehat{ACE} = ACE\)
Let’s check our work:
require(data.table) #data.table function
p_o_table = data.table(region_i = 1:6,
y_i_1 = c(3,4,4,2,2,4),
y_i_0 = c(2,4,2,3,4,1)
)
p_o_table$tau_i = p_o_table$y_i_1 - p_o_table$y_i_0
p_o_table[, tau_i := y_i_1 - y_i_0]
#ACE
ace = mean(p_o_table$tau_i)
ace
## [1] 0.5
Let’s check our work:
require(ri) #combn is a function in this package
require(magrittr) # %>% is from this package
randomizations = combn(6,3,simplify = T) %>% t
t_means = apply(randomizations, 1,
function(x)
mean(p_o_table[region_i %in% x, y_i_1])
)
c_means = apply(randomizations, 1,
function(x)
mean(p_o_table[!(region_i %in% x), y_i_0])
)
Let’s check our work:
## [1] 3.666667 3.000000 3.000000 3.666667 3.000000 3.000000 3.666667 2.333333
## [9] 3.000000 3.000000 3.333333 3.333333 4.000000 2.666667 3.333333 3.333333
## [17] 2.666667 3.333333 3.333333 2.666667
## [1] 2.666667 2.333333 2.000000 3.000000 3.000000 2.666667 3.666667 2.333333
## [9] 3.333333 3.000000 2.333333 2.000000 3.000000 1.666667 2.666667 2.333333
## [17] 2.333333 3.333333 3.000000 2.666667
Let’s check our work:
#Average Causal Effects (hat)
ace_hats = t_means - c_means
#Expected value of the ACE (hat)
e_ace_hat = mean(ace_hats)
e_ace_hat
## [1] 0.5
## [1] 0.5
Let’s check our work:
Let’s check our work:
Let’s check our work:
Sample Difference in Means in unbiased
Histogram is the exact sampling distribution of the \(\widehat{ACE}\) in this experiment
This sampling distribution could tell us
But we never observe this histogram
Analytic/Asymptotic approach
Randomization inference
Bootstrap
All approaches involve:
First: we want to get variance of \(\widehat{ACE}\)
\[Var[X - Y] = Var[X] + Var[Y] - 2 \cdot Cov[X,Y]\]
What is \(Var[Y^T - Y^C] = Var[\widehat{ACE}]\)?
Variances of Treatment/Control Group Means
if we assume independent and identically distributed draws from the study group
\[Var[Y^T] = \frac{Var[Y_i(1)]}{m}\]
Variance of sampling distribution of the treatment-group mean is variance of potential outcomes under treatment for all cases divided by the treatment group size
Variance of potential outcomes under treatment:
\[Var[Y_i(1)] = \frac{1}{N}\sum\limits_{i=1}^{N} \left( Y_i(1) - \overbrace{\frac{\sum\limits_{i=1}^{N} Y_i(1)}{N}}^{mean \ Y_i(1)} \right) ^2\]
This is a parameter, often denoted \(\sigma^2\)
\[Var[Y^T] = \frac{\sigma^2}{m}\]
We don’t know \(\sigma^2\), we need to estimate it from our sample.
Like sample mean, sample variance is an unbiased estimator of population variance:
\[\widehat{Var[Y_i(1)]} = \widehat{\sigma^2} = \frac{1}{\color{red}{m-1}}\sum\limits_{i=1}^{m}[Y_i(1) | Z_i = 1] - Y^T)^2\]
Why is sample variance biased if we divide by \(m\) (instead of \(m-1\))?
the mean is the value that minimizes the sum of squared errors
If the sample mean \(\hat\mu\) \(\neq\) population mean \(\mu\), then \(\left[ \sum\limits_{i = 1}^{m} [x_i - \hat\mu]^2 \right] < \left[ \sum\limits_{i = 1}^{m} [x_i - \mu]^2 \right]\)
Uncorrected sample variance \(\widehat{\sigma^2}\) is \(\frac{1}{m} \sum\limits_{i = 1}^{m} [x_i - \hat\mu]^2\).
Then, \(\widehat{\sigma^2} < \sigma^2\) unless sample mean equals population mean
Using this approach:
\[\widehat{Var[Y_i(1)]} = \widehat{\sigma^2} = \frac{1}{m-1}\sum\limits_{i=1}^{m}[Y_i(1) | Z_i = 1] - Y^T)^2\]
\[\widehat{Var[Y^T]} = \frac{\widehat{\sigma^2}}{m}\]
we can estimate \(Var(Y^T)\) and \(Var(Y^C)\).
What else do we need to estimate \(Var[\widehat{ACE}]\)?
We still need \(Cov(Y^T,Y^C)\) to get variance of \(\widehat{ACE}\), because
\(Var[\widehat{ACE}] = Var[Y^T] + Var[Y^C] - 2 Cov[Y^T, Y^C]\)
\[Cov(Y^T,Y^C) = -\frac{1}{N(N-1)}\sum\limits_{i=1}^{N} \left( Y_i(1) - \overbrace{\frac{\sum\limits_{i=1}^{N} Y_i(1)}{N}}^{mean \ Y_i(1)} \right) \cdot \left(Y_i(0) - \overbrace{\frac{\sum\limits_{i=1}^{N} Y_i(0)}{N}}^{mean \ Y_i(0)} \right)\]
Can’t estimate the covariance because we don’t see both potential outcomes for each case!
We can ignore the covariance safely, because doing so deflates the variance.
Variances we obtain with \(\widehat{Var}[\widehat{ACE}]\) are going to be:
We’ve been trying to estimate the variance of the \(\widehat{ACE}\).
Variance is not usually what we want
Per the Central Limit Theorem: the sampling distributions of sums of random variables (and by extension, their means) approach the normal distribution as the \(N \rightarrow\infty\).
Using this fact; estimated sample mean and variance of the sample mean:
This approximation performs well, but depends on sample size and population distribution.
If the population looks like this:
Predict the shape of sampling distribution of sample mean for \((n= 5\))
The shape of sampling distribution of sample mean for \((n= 5\))
If the population looks like this:
Predict the shape of sampling distribution of sample mean for \((n= 5\))
The shape of sampling distribution of sample mean for \((n= 5\))
The sampling distibution of sample mean for \(n = 25\) is:
The sampling distribution of sample mean for \((n= 100\)) is
Does normality hold in our experiment?
Does normality hold in our experiment?
Does normality hold in our experiment?
We run an experiment on our 6 regions, and observe \(\widehat{ACE} = 0.667\)
The hypothesis test investigates: what is probability of observing a value this large or larger if the true \(ACE = 0\)
If distributional assumptions are wrong, hypothesis test will not be correct
Alternatives:
bootstrap
randomization inference
Unlike analytical approach:
Tests a different null hypothesis
Usually null hypothesis is that the average effect is \(0\) (some units could have positive or negative effects).
\[\frac{1}{N}\sum\limits_{i=1}^{N} \tau_i = ACE = 0\] Randomization inference tests the sharp null hypothesis
\[\tau_i = 0 \ \ \ \forall \ \ \ (i\ \in N)\] that every unit treatment effect is \(0\).
Advantages:
Disadvantages
In practice:
We run Paluck’s experiment and see this:
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | ? |
2 | 4 | ? |
3 | ? | 2 |
4 | 2 | ? |
5 | ? | 4 |
6 | ? | 1 |
Under the sharp null, what are the values that are “?”?
Under the sharp null, this would be true:
\(Region_i\) | \(Y_i(1)\) | \(Y_i(0)\) |
---|---|---|
1 | 3 | \(\color{red}{3}\) |
2 | 4 | \(\color{red}{4}\) |
3 | \(\color{red}{2}\) | 2 |
4 | 2 | \(\color{red}{2}\) |
5 | \(\color{red}{4}\) | 4 |
6 | \(\color{red}{1}\) | 1 |
Once we have this response schedule under the sharp null, we:
If \(\widehat{ACE}=\) 0.6666667, then \(p(ACE != 0)\) is 0.8.
In R
: