March 11, 2019

- confounding

- internal validity
- external validity

**FPCI:** We cannot know the causal effect of \(X\) on \(Y\) for a specific case.

Let us **estimate** (draw **inferences****average** causal effect of \(X\) on \(Y\) for a group of cases.

Three assumptions (you only need to know (**1**))

**Random Assignment**to "Treatment" and "Control"**Exclusion Restriction**(only one thing is changing: \(X\))**SUTVA**(If I receive the treatment, it doesn't affect you)

Why do experiments work?

- In example with immigrant friendship and attitudes about immigrants, people with immigrant friends
**already different**in their potential outcomes (people who hate immigrants don't befriend them) - Experiments,
**through random assignment**, ensure that potential outcomes of treated/untreated cases are the same (except for**random sampling error**)

Why do different cases have **different potential outcomes**?

- We approach causality as
**deterministic** - For \(Y\) or
**dependent variable**to take different values across cases, cases must be exposed to other causal factors. - Cases with exactly the same potential outcomes must be pretty much the same in terms of other factors.

Since we can't observe people in both factual and counterfactual state, we want to compare people who are identical except for the "treatment": they serve as "counterfactuals" for each other.

If we were **omniscient deities**, maybe we could know all **potential outcomes**

\(Person_i\) | \(Friend_i\) | \(Attitude_i^{Yes}\) | \(Attitude_i^{No}\) | Value of Diversity |
---|---|---|---|---|

1 | Yes |
Positive (1) |
Positive (1) | Medium |

2 | Yes |
Very Positive (2) |
Very Positive (2) | High |

3 | No |
Neutral (0) | Neutral (0) |
Low |

4 | Yes |
Positive (1) |
Positive (1) | Medium |

5 | Yes |
Very Positive (2) |
Positive (1) | High |

6 | No |
Neutral (0) | Neutral (0) |
Low |

If we were **omniscient deities**, maybe we could know all **potential outcomes**

\(Person_i\) | \(Friend_i\) | \(Attitude_i^{Yes}\) | \(Attitude_i^{No}\) | Value of Diversity |
---|---|---|---|---|

1 | Yes |
Positive (1) |
Positive (1) | Medium |

2 | Yes |
Very Positive (2) |
Very Positive (2) | High |

3 | Yes |
Neutral (0) |
Neutral (0) | Low |

4 | No |
Positive (1) | Positive (1) |
Medium |

5 | No |
Very Positive (2) | Very Positive (2) |
High |

6 | No |
Neutral (0) | Neutral (0) |
Low |

When the relationship between \(X\) and \(Y\) we discover empirically is systematically different from the **true causal relationship** between \(X\) and \(Y\), our analysis suffers from **bias**.

This bias arises from **confounding**:

**confounding** occurs when some other variable \(W\) is causally linked to \(X\) (independent variable) *and* \(Y\) (dependent variable).

- No bias/ no confounding of true causal link between \(X\) and \(Y\) if \(W\) either unrelated to \(X\) or unrelated to \(Y\).

If we were **omniscient deities**, maybe we could know all **potential outcomes**

\(Person_i\) | \(Friend_i\) | \(Attitude_i^{Yes}\) | \(Attitude_i^{No}\) | Value of Diversity |
---|---|---|---|---|

1 | Yes |
Positive (1) |
Positive (1) | Medium |

2 | Yes |
Positive (1) |
Positive (1) | High |

3 | No |
Positive (1) | Positive (1) |
Low |

4 | Yes |
Positive (1) |
Positive (1) | Medium |

5 | Yes |
Positive (1) |
Positive (1) | High |

6 | No |
Positive (1) | Positive (1) |
Low |

Because of Fundamental Problem of Causal Inference:

- We cannot know which cases have identical potential outcomes
- Might not know which other factors \(W\) are linked to \(X\) and \(Y\)

- Randomization approximately balances cases with same potential outcomes in treatment and control
- Randomization approximately balances cases with same values of \(W\) in treatment and control

Cases are **approximately the same** on average: - cases in control are **observable** "counterfactuals" for cases in treatment

Might appear to be the **only** valid solution:

- if we don't know cases' potential outcomes
- if we don't know other causal factors affecting \(X\) and \(Y\)

Only experiments let us find an **unbiased** causal relationship between \(X\) and \(Y\)

- Experiments have "
**internal validity**"

A research design has **internal validity** when the causal effect of X on Y it finds is not biased (systematically incorrect) or does not suffer from **confounding**.

**What can we manipulate?**(economic growth? democracy? violence?)**Who/what can we study?**(who participates in psych labs?)

is the degree to which the causal relationship we find in a study matches the **cause** and the context (set of cases) identified in a causal theory

- Study has
**external validity**if the relationship found can generalize to all the cases to which our causal theory applies - If our study suffers from
**sampling bias**, then our study may lack**external validity** - Study has
**external validity**if the cause in the study is the same as the cause in our causal theory - If our independent variable/cause does not match the theory then our study may lack
**external validity**

More **internal validity** (unbiased calculation of causal effect) comes at the cost in **external validity** (relevance of study sample or cause to the theory)

Many relevant contexts/important causes cannot or should not be manipulated at random:

What causes democratization? What causes war or ethnic violence? Why were civil rights extended to oppressed minority groups?

Experiments may not meet all our needs

- What types of other approaches?
- What are their assumptions?