Sampling Error
- Review
- Random Sampling Error and Severity
- Sampling Error vs. Measurement Error
- Non-citizen Voting
October 16, 2025
Sampling Error
population: full set of cases (countries, individuals, etc.) we’re interested in describing
sample: a subset of the population that we observe and measure
inference: description of the (unmeasured) population we make based on the (measured) sample
and there is uncertainty about what is true about the population, because we only measure a sample
The population:
The sample:
The inference:
Americans say politically-motivated violence is a major problem:
random sampling: sampling cases from the population in a manner that gives all cases an equal probability of being chosen.
This procedure creates samples that:
The difference between the value of the measure for the sample and the true value of the measure for the population
\[\mathrm{Value}_{sample} - \mathrm{Value}_{population} \neq 0 \xrightarrow{then} \mathrm{sampling \ error}\]
When thinking about random sampling error and sampling bias, we should think about…
the sampling distribution:
(e.g., the percent of survey respondents who think political violence is a major problem across all possible samples of \(1477\))
The inference:
Americans say politically-motivated violence is a major problem:
Can we say concern about politically-motivated violence has gone up?
If we assume random sampling, no sampling bias (or at least, the same sampling bias):
Skeptic might say:
“It could be that concern about political violence remained at 73%, and the increase is due to random sampling error!”
null hypothesis: there has been NO increase in concern (claim of an increase is wrong)
hypothesis test asks: “how likely are we to see this evidence for the claim, assuming that the claim is false”
We can imagine truth is 73% concerned about political violence; generate thousands of samples of size \(1477\); record the results
If true concern about political violence is 73%, by chance we see 77% concern or more with probability of 0.000220.
hypothesis test asks: “how likely are we to see this evidence, assuming that the claim is false”
\(p\) values:
If we assume random sampling, no sampling bias (or at least, the same sampling bias):
Skeptic might say:
“It could be that concern about political violence remained at 73%, and the increase is due to random sampling error!”
null hypothesis: there has been NO increase in concern (claim is wrong)
But we can respond:
“We saw 77% of people concerned about political violence. If you were right, we’d see this only 0.022% of the time
What is a margin of error/confidence interval then?
If true concern about political violence is 75%, by chance we see 77% concern or more with probability of 0.036039. Error rate of 3.6039\(\%\) \(> 1\%\) \(\to\) cannot reject 75%.
What is a margin of error/confidence interval then?
If we have random sampling errors:
If we have sampling bias…
Each dot is the result of a survey of voters during the 2020 US Presidential Election. These surveys suggested that by election day voters preferred Biden to Trump by \(8.4\) percent. Biden actually won by only \(4.5\) points.
Is this sampling error? Is this a random error or a bias?
It depends: if this is going on, then sampling bias
It depends: if this is going on, then sampling bias
It depends: if there are “shy” Trump voters, then measurement bias.
In addition to winning the Electoral College in a landslide, I won the popular vote if you deduct the millions of people who voted illegally
— Donald J. Trump (@realDonaldTrump) November 27, 2016
White House senior advisor doubles down on voter fraud claims: “Voter fraud is a serious problem in this country” pic.twitter.com/DC6lVPQznz
— ABC News (@ABC) February 12, 2017
This “problem” used to justify policy changes:
Claim: Widespread voter fraud: “14% of non-citizens voted”
Richman et al:
Richman et al:
Discuss: Do you find this persuasive? Why or why not?
The political scientists who run the CCES survey point out:
measurement error of individuals as citizens/non-citizens, leads Richman et al to sample of “non-citizens” that include citizens and non-citizens:
Nobody who consistently reports being a non-citizen votes.
Measurement Error (of individuals’ citizenship)
\(\Downarrow produces\)
Sampling Error (sample that should be of non-citizens includes citizens)
\(\Downarrow produces\)
Measurement Error (about the population of non-citizens)
\(\Downarrow\)
authors make incorrect inference that hundreds of thousands of non-citizens vote illegally.
People try to persuade you there is a problem in need of fixing:
Is this problem real?