POLI 110: Confounding

November 25, 2025

Correlation to Causation

Solutions to Confounding

\(1\). Conditioning:

What is it?
How does it work?
Assumptions?

\(2\). “Design Based” approach

what is it?
how does it work?

Recap

Solutions to Confounding

Must ask…

What comparisons between cases does it involve?
- can you recognize this solution when it is described to you?
- can you describe how to use this solution to test a particular causal claim?
What assumptions are required for it to work?
How does it “solve” confounding?
What trade offs do we make?

Experiment:

Does exposure to mass-shootings increase support for laws restricting access to guns?

What would an experiment look like?:

What confounding is removed?:

What is assumed?:

Internal vs External validity?

Example:

Does exposure to mass-shootings increase support for laws restricting access to guns? (McGinty et al 2013)

What would an experiment look like?: Recruit people into a survey. Randomly assign respondents to read/not read a news story about a mass shooting

What confounding is removed?: All confounding variables

What is assumed?: Random assignment, changing nothing other than \(X\)

Internal vs External validity?

Solution	How Confounding Solved	Which Confounding Removed	Assumes	Internal Validity	External Validity
Experiment	Randomization Breaks \(W \rightarrow X\) link	All confounding variables	\(X\) is random; Change only \(X\)	High	Low

Conditioning

Did Trump rallies cause hate crimes?

data from Feinberg, Branton, and Martinez-Ebers

Possible confounders imagined by Feinberg, Branton, and Martinez-Ebers

Conditioning

conditioning

when we observe \(X\) and \(Y\) for multiple cases, we examine the correlation of \(X\) and \(Y\) within groups of cases that are the same on confounding variables \(W, etc. \ldots\)

How does conditioning solve the problem?

Cases compared have same values on confounding variable(s) \(W\)
In these groups, \(W\) cannot affect \(X\) or \(Y\) (because \(W\) is not moving, it can’t move \(X\) or \(Y\))
“Backdoor path” from \(X\) to \(Y\) is “blocked”

How does conditioning work?

Just kidding!

Example: Conditioning

Feinberg, Branton, and Martinez-Ebers compare hate crimes in counties with and without Trump rallies, but condition on (hold constant):

percent Jewish
number of hate groups
crime rate
2012 Republican vote share
percent university educated
region

Example: Conditioning

County	HC(Yes) Y	HC(No) Y	Rally (X)	Jewish %	Hate Groups	Crime Rate	Rep. %	Univ. %	Region
a	\(More\)	\(\color{red}{Fewer}\)	Yes	2	3	15	53	38	South
	\(\Downarrow\)	\(\Uparrow\)
b	\(\color{red}{More}\)	\(Fewer\)	No	2	3	15	53	38	South

Example: Conditioning

Feinberg, Branton, and Martinez-Ebers find that, even after conditioning, Trump rallies increase the risk of hate crimes by 200%!

Lots of news headlines like this

Example: Conditioning

Economics PhD Candidates show that conditioning on the same variables…

Clinton rallies increased hate crimes by nearly 250%!!

What is going on here??

County	HC(Yes) Y	HC(No) Y	Rally (X)	Jewish %	Hate Groups	Crime Rate	Rep. %	Univ. %	Region	Pop.
a	\(More\)	\(\color{red}{More}\)	Yes	2	3	15	53	38	South	High
	\(\not\Downarrow\)	\(\not\Uparrow\)
b	\(\color{red}{Fewer}\)	\(Fewer\)	No	2	3	15	53	38	South	Low

We would be wrong to use observed hate crimes in county \(b\) (without a rally) to substitute in for counterfactual hate crimes in county \(a\) (without a rally).

Population differences \(\to\) difference in hate crimes regardless of rally.

After also conditioning on population (a confounder): no correlation.

Solution	How Confounding Solved	Which Confounding Removed	Assumes	Internal Validity	External Validity
Experiment	Randomization Breaks \(W \rightarrow X\) link	All confounding variables	\(X\) is random; Change only \(X\)	High	Low
Conditioning	Hold confounders constant	?	?	?	High

Assumptions

Conditioning Assumptions

In order to use conditioning to infer \(X\) causes \(Y\) if \(X,Y\) correlated …

Must Assume

There are no other confounding variables (that you have not conditioned on)
- i.e. you have conditioned on ALL confounding variables (backdoor paths)
- sometimes called “ignorability assumption”: you assume you can “ignore” other variables without confounding

How can we tell whether this assumption is correct?

DISCUSS WITH YOUR NEIGHBORS

Conditioning Assumptions

Assume there are no other confounding variables (that you have not conditioned on)

this can never be proven to be true: a “strong” assumption
we can’t know whether all confounders are blocked because we don’t know true causal graph
compare this against the assumption of randomization in experiments

Example

In wake of mass shootings, we might ask:

Do mass shooting events cause people to become more supportive of stricter gun control policies?

Example

Newman and Hartman (2017) examine whether exposure to mass shootings cause an increase in support for stricter gun control

\(Y\) large survey (CCES) asks: “In general, do you feel that laws covering the sale of firearms should be made more strict, less strict, or kept as they are?”
\(X\) “distance to nearest mass shooting in recent years”

What might be possible confounding variables?

Example

Which variables do we NEED to condition on? Which variables do we NOT NEED to condition on?

Assumptions

Must Assume

There are no other confounding variables (that you have not conditioned on)

i.e. you have conditioned on ALL confounding variables
this can never be proven to be true: a “strong” assumption

But…

don’t need to condition on ALL variables that might affect \(X\) or affect \(Y\): only confounders.

Example

Which variables do we NEED to condition on? Which variables do we NOT NEED to condition on?

Assumptions

Must Assume

There are no other confounding variables (that you have not conditioned on)

i.e. you have conditioned on ALL confounding variables
this can never be proven to be true: a “strong” assumption

But…

don’t need to condition on ALL variables that might affect \(X\) or affect \(Y\): only confounders.
don’t need to condition on ALL variables on a backdoor path. Just one variable per backdoor path is needed.

Example

Newman and Hartman (2017):

Compare proximity to mass shooting and gun control attitudes, conditioning on (holding constant:

community level factors: income, education, partisanship, racial composition, murder, firearms stores, population density, population
individual level factors: education, income, age, gender, race, property ownership, having children, in military, military family, partisanship, ideology, religiosity, region of birth

Are there any potentially really important confounders that might not be included?

Proximity to mass shootings increases support for stricter gun laws… assuming no other confounders

Assumptions

Must Assume

There are no other confounding variables (that you have not conditioned on)

But…

don’t need to condition on ALL variables that might affect \(X\) or affect \(Y\): only confounders.
don’t need to condition on ALL variables on a backdoor path. Just one variable per backdoor path is needed.
we can give reasons/arguments for why any remaining confounding is likely small OR biased against the correlation (small bias may not be enough to change our conclusions, or bias is against our conclusions)

Solution	How Confounding Solved	Which Confounding Removed	Assumes	Internal Validity	External Validity
Experiment	Randomization Breaks \(W \rightarrow X\) link	All confounding variables	\(X\) is random; Change only \(X\)	High	Low
Conditioning	Hold confounders constant	Only variables conditioned on	Condition on all confounders	Low	High

Conditioning Assumptions

In order to infer \(X\) causes \(Y\) if \(X,Y\) correlated after conditioning

Must Assume

Variables used to condition relationship between \(X\) and \(Y\) are measured without error.

even random measurement error in confounding variables leads conditioning to not remove bias.
Why? You are no longer comparing like-with-like.

Conditioning Assumptions

Imagine we want to condition on gun ownership, when examining correlation of mass shootings and gun attitudes.

What if we measure gun ownership with random measurement error?

Let’s see what happens… BOARD

Conditioning Assumptions

In order to infer \(X\) causes \(Y\) if \(X,Y\) correlated after conditioning

Must Assume

We can find cases that are the same on confounding variables \(W\) and different in \(X\).

Example:

Let’s say we want to examine the effect of laws restricting gun ownership on gun violence across countries:

What are some factors that might affect strictness of gun regulation and gun violence?

Is there, e.g., country that is similar to the US on all confounders?

Conditioning: Limitations

In order to infer \(X\) causes \(Y\) if \(X,Y\) correlated after conditioning

Must Assume

There are cases that are the same on confounding variables \(W\) and different in \(X\).

as we want to condition on more and more variables…
less and less likely we find cases that exactly match
so, cannot condition on all confounding variables

Solution	How Confounding Solved	Which Confounding Removed	Assumes	Internal Validity	External Validity
Experiment	Randomization Breaks \(W \rightarrow X\) link	All confounding variables	\(X\) is random; Change only \(X\)	High	Low
Conditioning	Hold confounders constant	Only variables conditioned on	Condition on all confounders; Low measurement error; Have similar cases	Low	High

Beyond Conditioning

Limits of Conditioning:

**Did Trump rallies increase hate crimes?**

If we know confounding variables, can we find cases with and without rallies that are the same on many confounding variables?
If we don’t know or can’t measure confounding variables, may still be differences between places with and without rallies that produce confounding.

Is there a better comparison we can make?

Before and After

We can make Before and After comparisons: What is it?

before and after: examine the change in \(Y\) in a single case (or group of cases) where \(X\) changes over time (from before to after)

Example: Before and After

Taking the same data from Feinberg, Branton, and Martinez-Ebers…

we focus only on counties that ever had a Trump (Clinton) rally
compare the month after the rally \((X = \text{Rally})\) to the month before the rally \((X = \text{No Rally})\)

Before and After

DISCUSS

If we compare counties to themselves before and after rallies…

Which confounding variables are held constant?
What are confounding variables that might NOT be addressed in this comparison?

Design Based Solutions

Conditioning removes confounding by:

identify possible confounding variables
measure confounding variables
relationship b/t \(X\) and \(Y\) for cases with similar value of confounding variables \(W\).

Design Based Solutions

Design-based solutions (like Before and After) remove confounding by:

selecting cases for comparison in order to eliminate many known/unknown as well as measurable/unmeasurable confounding variables.
the nature of the comparison holds constant classes of confounding variables, not specific confounding variables.
by a “class” we mean all confounding variables that share certain properties

Example: Before and After

Which of these possible confounders are held constant in a before-and-after comparison?

Example: Before and After

How does it work?…

All confounding variables (affect whether a rally occurs; affect hate crimes) that are unchanging over time (before to) are held constant.

because held constant, cannot produce confounding

Confounding Solved?…

any variable that does not change in the time period of the comparison held constant

Conclusion

Conditioning:

What is it?:
- examine correlation of \(X\) and \(Y\) for cases that are the same on confounders \(W\)
How does it work?:
- Holds measured confounding variables constant
Assumptions:
- no other confounding variables: can’t check
- no measurement error: can check
- similar cases with different \(X\): can check
While imperfect, we should not dismiss this approach.

Conclusion

Before and After:

What is it?:
- examine correlation of \(X\) and \(Y\) within cases where \(X\) changes over time
How does it work?:
- Holds all unchanging confounding variables constant
Assumptions?