March 28, 2018

## What is the problem?

### Correlation as clue to Causality

If $$X \rightarrow Y$$:

• Values of $$X$$ and $$Y$$ should move together
• Values of $$X$$ and $$Y$$ should be correlated
• This correlation should not be by chance

### A hiccup:

It is possible that $$X,Y$$ are correlated, without causation

## Spurious Correlation:

spurious correlation: when the observed correlation between $$X$$ (independent variable) and $$Y$$ (dependent variable) inaccurately reflects the true causal relationship $$X \rightarrow Y$$

• Can think of this as a kind of bias:

$Correlation_{True}(X,Y) - Correlation_{Observed}(X,Y) \neq 0$

## Spurious Correlation:

confounding variables are the source of spurious correlation:

• Variables other than $$X,Y$$ (e.g. $$W$$), that are related to both $$X$$ and $$Y$$

## Spurious Correlation:

### Types:

1. confounding variables cause both $$X,Y$$ to appear correlated when $$X$$ does not cause $$Y$$

## Spurious Correlation:

### Types:

1. confounding variables cause both $$X,Y$$ to appear correlated when $$X$$ does not cause $$Y$$
2. confounding variables cause both $$X,Y$$ to appear uncorrelated when $$X$$ does cause $$Y$$

## Spurious Correlation:

### Types:

1. confounding variables cause $$X,Y$$ to appear correlated when $$X$$ does not cause $$Y$$
2. confounding variables cause $$X,Y$$ to appear uncorrelated when $$X$$ does cause $$Y$$
3. confounding variables cause $$X,Y$$ relationship to be too positive or too negative
• effect of $$X \rightarrow Y$$ appears too strong compared to truth
• effect of $$X \rightarrow Y$$ appears too weak compared to truth
• effect of $$X \rightarrow Y$$ appears wrong direction compared to truth

## Spurious Correlation:

### Types:

1. confounding variables cause $$X,Y$$ to appear correlated when $$X$$ does not cause $$Y$$
2. confounding variables cause $$X,Y$$ to appear uncorrelated when $$X$$ does cause $$Y$$
3. confounding variables cause $$X,Y$$ relationship to be too positive or too negative
• effect of $$X \rightarrow Y$$ appears too strong compared to truth
• effect of $$X \rightarrow Y$$ appears too weak compared to truth
• effect of $$X \rightarrow Y$$ appears wrong direction compared to truth

## Spurious Correlation:

### Bias

All of these are examples of bias: true causal relationship of $$X,Y$$ not observed due to confounding

• bias implies absence of internal validity (recovering true causal relationship)

## Spurious Correlation:

### What you need to know:

1. What is spurious correlation/bias?
2. Why does it happen?
3. What is confounding?
4. What are the different forms it can take?
5. If we know direction of effects of $$W$$ on both $$X,Y$$, what is direction of bias/spurious relationship

## What we need to know:

1. What are broad types of solutions?
• Adjustment vs Design (Similar Cases, Same Case, Diff-in-Diff, Natural Experiments, Experiments)
• How does it remove confounding
• What confounding does it eliminate/not eliminate
• What are its key assumptions (to conclude no remaining bias/spurious correlation)
• What is the tradeoff between internal and external validity

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• conditioning looks at correlation of $$X,Y$$ holding confounding variables constant

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• conditioning looks at correlation of $$X,Y$$ holding confounding variables constant
• What confounding is removed:
• Removes confounding from all measured variables used in conditioning
• Does not remove confounding from unmeasured/mis-measured variables

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• conditioning looks at correlation of $$X,Y$$ holding confounding variables constant
• What confounding is removed:
• Removes confounding from all measured variables used in conditioning
• Does not remove confounding from unmeasured/mis-measured variables

• Key Assumptions (to conclude no remaining bias):
• Condition on all confounding variables
• No measurement error on confounding variables
• Methods:
• Matching (compare cases with same values on confounding variables)
• Regression (linear approximation of above)
• Can be done for all relevant cases, high external validity
• Requires big assumptions, low internal validity

## Design:

design-based solutions:

Choose comparison so that we eliminate possible confounding variables

• Hold confounding variable constant (like conditioning)

#### OR

• Break link between confounding variable(s) $$W$$ and independent variable $$X$$

## Design: Similar Cases

Compare cases in same place and time

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are the same for compared cases
• Does not account for unchanging variables that differ between cases
• Does not account for changing variables that differ between cases

## Design: Similar Cases

Compare cases in same place and time with different exposure to the cause

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are the same for compared cases
• Does not account for unchanging variables that differ between cases (e.g. different housing quality)
• Does not account for changing variables that differ between cases (e.g. location of new schools)

## Design: Similar Cases

• Key Assumptions (to conclude no remaining bias):
• No constant/time-invariant differences between cases that affect $$X,Y$$
• No changing/time-variant differences between cases that affect $$X,Y$$
• Methods:
• Finding cases close in time and space
• Very similar cases with different $$X$$ not possible for all relevant cases, lower external validity
• Requires fewer assumptions about confounding variables, higher internal validity

## Design: Same Case

Compare same case to itself over time

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are unchanging for the case
• Does not remove confounding variables that change over time

## Design: Same Case

Compare same case to itself over time

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are unchanging for the case
• Does not remove confounding variables that change over time

## Design: Same Case

• Key Assumptions (to conclude no remaining bias):
• No confounding variables that change over time for this case (e.g. Terrorist bombing in NYC)
• No confounding variables that change over time for all cases (e.g. #BlackLivesMatter)
• Methods:
• "interrupted time series"
• Result for individual case may not to all relevant cases, lower external validity
• Requires fewer assumptions about confounding variables, higher internal validity

## Design: Diff-in-Diff

Compare same case to itself over time against another case to itself

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are
• unchanging for each case
• changing over time and shared by all cases (shared trend)
• Does not remove confounding variables that change over time but differ across cases (case-specific trend)

## Design: Diff-in-Diff

Compare same case to itself over time against another case to itself

• How it fixes confounding:
• Holds confounding variables constant, can no longer induce spurious correlation
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables that are
• unchanging for each case (e.g. history of NJ, PA)
• changing over time and shared by all cases (shared trends; e.g. changing national/regional economy)
• Does not remove confounding variables that change over time but differ across cases (case-specific trends; e.g. PA /NJ have different voting trends)

## Design: Diff-in-Diff

• Key Assumptions (to conclude no remaining bias):
• No confounding variables that change over time differently across cases (no differences in trends)
• Conversely: assume trends are parallel (the same) across cases in absence of the cause (counterfactually)
• Methods:
• check pre-cause parallel trends
• "synthetic control" artificially produces (rather than assume) parallel trends
• Cases with similar trends, but different "treatments" may be rare, lower external validity
• Requires even fewer assumptions about confounding variables, higher internal validity

## Design: Natural Experiments

Compare cases where cause/independent variable/$$X$$ is (as-if) randomly assigned by "nature"

• How it fixes confounding:
• Breaks link between confounding variables $$W,Z,\ldots$$ and the independent variable $$X$$
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables

## Design: Natural Experiments

Compare cases where cause/independent variable/$$X$$ is (as-if) randomly assigned by "nature"

• How it fixes confounding:
• Breaks link between confounding variables $$W,Z,\ldots$$ and the independent variable $$X$$
• What confounding is removed:
• Removes confounding from all measured/unmeasured variables

## Design: Natural Experiments

• Key Assumptions (to conclude no remaining bias):
• Cause is random (cases either unmotivated to, unable to, or unaware of selecting "treatment")
• Methods:
• standard natural experiment
• regression discontinuity
• instrumental variables
• cases with randomized cause unusual/rare; low external validity
• Requires minimal assumptions about confounding variables, high internal validity

## Summary:

How Bias
Solved
Which Bias
Removed
Assumes Internal
Validity
External
Validity
constant
All measured
confounding variables
Condition all
confounders
Lowest Highest
Similar Cases Hold
constant
Cases' shared
confounding variables
No diff.
b/t cases
Middle Middle
Same Case Hold
constant
Case's unchanging
confounding variables
No confounding
trends
Middle Middle
Diff in Diff Hold
constant
Case's unchanging variables
Cases' shared trends
Cases have
parallel trends
Higher Lower
Natural Experiment Break $$W \rightarrow X$$ link All confounding variables $$X$$ as-if random Highest Lowest