March 13, 2019

## Plan for Today:

### (2) Correlation

• confounding (again)
• correlation

## Grade Scaling?

### No scaling of grades

• solving "fat tail" would distort upper end of distribution

### Option to Re-Weight

• Commitment NOW to re-weight midterm to final (no "take-backs")
• In principle, open to ANYONE

## Evidence for Causality

#### Internal Validity

A research design has internal validity when the observed relationship between $$X$$ and $$Y$$ it finds is not a biased (systematically incorrect) estimate of the causal effect of $$X$$ on $$Y$$ (does not suffer from confounding).

#### External Validity:

Research design has external validity when the $$X$$ and $$Y$$ we examine match the causal theory and the cases we study match the set of cases/population the causal theory is supposed to describe

• If our study suffers from sampling bias $$\to$$ lack of external validity
• If our independent variable/dependent variables do not match the theory $$\to$$ lack of external validity

## Beyond Experiments:

Recall what hypotheses say: If $$X$$ causes $$Y$$

We should observe that as $$X$$ changes, the potential outcomes of $$Y$$ change.

But FPCI says we can only ever see one potential outcome per case.

We always examine the relationship between the observed $$X$$ and the observed $$Y$$

• true in experiments
• true in any kind of causal investigation

## Beyond Experiments:

To infer a causal relationship between $$X$$ and $$Y$$ based on relationship between the observed $$X$$ and the observed $$Y$$, one of the following must be true

1. potential outcomes of $$Y$$ (the way cases behave when exposed to different levels of $$X$$) are the same for cases with different values of $$X$$
2. (identically) other factors that affect $$Y$$ (call them $$W,Z,etc.$$) are the same across/unrelated to different levels of $$X$$ (this is saying we have an absence of confounding) in the cases we compare.

## Beyond Experiments:

All "solutions" to the FPCI make assumptions about the cases we compare that allow us to accept one of these two points (previous slide), so we can infer causality from how observed values of $$X$$ and $$Y$$ are related.

• Experiments only look at observed values of $$X$$ and $$Y$$, but we can claim causality because we assume random assignment

## Beyond Experiments: Digression

We want to know whether having fewer guns reduces suicide, accidents, and aggressive uses of guns.

Imagine the following experiment:

We collect data on legal gun owners. We randomly assign half of them to receive a letter from the government that offers to pay them money to hand over their guns (a gun buyback). The other half receives no letter. We then compare suicide rates, accident rates, and gun victimization rates among the friends and families of people who handed over their guns against people who did not hand over their guns. We observe that suicide, accident, and gun victimization rates were lower in the "give-up-guns" group. Can we infer that having fewer guns caused a reduction?

## Correlation

#### correlation:

degree of association or relationship between the observed values taken by two variables ($$X$$ and $$Y$$)

• Many different ways of doing this (compare group means, regression) are all fundamentally about correlation.
• correlations have a direction:
• positive: implies that as $$X$$ increases, $$Y$$ increases
• negative: $$X$$ increases, $$Y$$ decreases
• correlations have strength (has nothing to do size of effect):
• strong: $$X$$ and $$Y$$ almost always move together
• weak: $$X$$ and $$Y$$ do not move together very much
• There is also a technical definition of correlation (later)

But with some assumptions: correlation $$\to$$ causation

## Correlation

To understand assumptions, need to know what problems arise

### Two types of problems

• bias (spurious correlation, confounding): $$X$$ and $$Y$$ are correlated but the correlation does not result from causal relationship between those variables

• random association: correlations between $$X$$ and $$Y$$ occur by chance and do not reflect

## Confounding (again)

confounding occurs when some other variable $$W$$ is causally linked to $$X$$ (independent variable) and $$Y$$ (dependent variable).

#### or if we diagram

If we diagram causal links between variables using this notation: $$X \to Y$$ implies $$X$$ causes $$Y$$, then…

confounding occurs when there is a path between $$X$$ and $$Y$$ that is non-causal (goes the "wrong way" on at least one arrow)

## Confounding (again)

Do all third variables create confounding/bias?

### No.

$$W$$ produces no confounding under the following conditions:

1. $$W$$ is unrelated to $$X$$ or $$Y$$
2. $$W$$ is an antecedent variable
3. $$W$$ is an intervening variable

## Confounding (again)

intervening variable: a variable through which $$X$$ causes $$Y$$

• Intervening variables do not produce spurious correlations

$X \rightarrow W \rightarrow Y$

## Antecedent Variable

antecedent variable: a variable that affects $$Y$$ only through $$X$$

• antecedent variables do not produce spurious correlations if they only affect $$Y$$ through $$X$$

$Z \rightarrow X \rightarrow Y$

$$Z$$ is intervening variable