March 13, 2019

- confounding (again)
- correlation

- solving "fat tail" would distort upper end of distribution

- Commitment NOW to re-weight midterm to final (no "take-backs")
- In principle, open to ANYONE

A research design has **internal validity** when the **observed relationship** between \(X\) and \(Y\) it finds is not a biased (systematically incorrect) estimate of the causal effect of \(X\) on \(Y\) (does not suffer from **confounding**).

Research design has **external validity** when the \(X\) and \(Y\) we examine match the causal theory and the cases we study match the set of cases/population the causal theory is supposed to describe

- If our study suffers from
**sampling bias**\(\to\) lack of**external validity** - If our independent variable/dependent variables do not match the theory \(\to\) lack of
**external validity**

Recall what hypotheses say: If \(X\) causes \(Y\)

We should observe that as \(X\) changes, the potential outcomes of \(Y\) change.

But FPCI says we can only ever see one potential outcome per case.

We always examine the relationship between the **observed** \(X\) and the **observed** \(Y\)

- true in experiments
- true in any kind of causal investigation

To **infer** a causal relationship between \(X\) and \(Y\) based on relationship between the **observed** \(X\) and the **observed** \(Y\), one of the following must be true

- potential outcomes of \(Y\) (the way cases behave when exposed to different levels of \(X\)) are the same for cases with different values of \(X\)
- (identically) other factors that affect \(Y\) (call them \(W,Z,etc.\)) are the same across/unrelated to different levels of \(X\) (this is saying we have an absence of
**confounding**) in the cases we compare.

All "solutions" to the FPCI make assumptions about the **cases we compare** that allow us to accept one of these two points (previous slide), so we can infer causality from how observed values of \(X\) and \(Y\) are related.

- Experiments only look at observed values of \(X\) and \(Y\), but we can claim causality because we assume random assignment

We want to know whether having fewer guns reduces suicide, accidents, and aggressive uses of guns.

Imagine the following experiment:

We collect data on legal gun owners. We randomly assign half of them to receive a letter from the government that offers to pay them money to hand over their guns (a gun buyback). The other half receives no letter. We then compare suicide rates, accident rates, and gun victimization rates among the friends and families of

people who handed over their gunsagainstpeople who did not hand over their guns. We observe that suicide, accident, and gun victimization rates were lower in the "give-up-guns" group. Can we infer that having fewer gunscauseda reduction?

degree of association or relationship between the **observed** values taken by two variables (\(X\) and \(Y\))

- Many different ways of doing this (compare group means, regression) are all fundamentally about correlation.
- correlations have a
**direction**:- positive: implies that as \(X\) increases, \(Y\) increases
- negative: \(X\) increases, \(Y\) decreases

- correlations have
**strength**(has nothing to do**size of effect**):**strong**: \(X\) and \(Y\) almost**always**move together**weak**: \(X\) and \(Y\) do not move together very much

- There is also a
**technical**definition of correlation (later)

But with some assumptions: correlation \(\to\) causation

To understand assumptions, need to know what problems arise

**bias**(spurious correlation,**confounding**): \(X\) and \(Y\) are correlated but the correlation does not result from**causal relationship**between those variables**random association**: correlations between \(X\) and \(Y\) occur**by chance**and do not reflect

**confounding** occurs when some other variable \(W\) is causally linked to \(X\) (independent variable) *and* \(Y\) (dependent variable).

If we diagram causal links between variables using this notation: \(X \to Y\) implies \(X\) causes \(Y\), then…

**confounding** occurs when there is a path between \(X\) and \(Y\) that is **non-causal** (goes the "wrong way" on at least one arrow)

**Do all third variables create confounding/bias?**

\(W\) produces no confounding under the following conditions:

- \(W\) is unrelated to \(X\) or \(Y\)
- \(W\) is an
**antecedent**variable - \(W\) is an
**intervening**variable

**intervening variable**: a variable **through which** \(X\) causes \(Y\)

- Intervening variables
**do not**produce spurious correlations

\[X \rightarrow W \rightarrow Y\]

**antecedent variable**: a variable that **affects** \(Y\) **only through** \(X\)

- antecedent variables
**do not**produce spurious correlations if they only affect \(Y\) through \(X\)

\[Z \rightarrow X \rightarrow Y\]

\(Z\) is **intervening variable**