(1) Sampling Error
- Sampling key ideas
- Sampling Bias vs Random Sampling Error
- Random Sampling
- Random Sampling and Severity
October 14, 2025
Take this survey on your current work status
Unemployment rates:
“What proportion of Canadians who are in the labour force are unable to find a job?”
We can’t interview all Canadians. Can’t interview all Canadians ages 20-24
How do we know about unemployment rates?
Survey ~ 65,000 households across Canada
Surveyed household member reports employment information on all household members
Conducted via: in person interview, telephone interview, or online survey
How might this procedure go wrong?
We would have to observe too many cases.
Can we say something about all cases based on a few cases?
population: full set of cases (countries, individuals, etc.) we’re interested in describing
sample: a subset of the population that actually we observe and measure
inference: description of the (unmeasured) population we make based on the (measured) sample
and there is uncertainty about what is true about the population, because we only measure a sample
Since the Federal elections earlier this year:
“What proportion of Canadians approve of Mark Carney’s performance as PM?”
We can’t interview all Canadians…
Survey of \(1562\) Canadian adults (October 3-5 2025)
The population:
The sample:
The inference:
The difference between the value of the measure for the sample and the true value of the measure for the population
\[\mathrm{Value}_{sample} - \mathrm{Value}_{population} \neq 0 \xrightarrow{then} \mathrm{sampling \ error}\]
\(1\). sampling bias: cases in the sample are not representative of the population: not every member of population has equal chance of being in sample. Error is consistently in the same direction.
\(2\). random sampling error: in choosing cases for a sample, by chance, we get samples where the average is too high or too low compared to the population average
The population:
The sample:
The inference:
To understand random sampling error and sampling bias, need to understand…
the sampling distribution:
(e.g., the percent of survey respondents who approve of PM Carney in every possible sample of \(1562\) drawn from online survey pool)
We can visualize a sampling distribution using a histogram to illustrate:
random sampling: sampling cases from the population in a manner that gives all cases an equal probability of being chosen.
This procedure creates samples that:
(board: intuitions as to why)
Let’s say we want to understand employment among students in this course.
The population is students registered in this course
Students in lecture hall, responding to poll are the sample
When we take the average hours worked for the sample (people taking poll in class today)…
and use it as our estimate of the average hours worked of the population (all students registered in this course)…
we are making an inference.
Was this sample a random sample of the students in the course?
Can you think of any reasons this sample (students in lecture) would suffer from sampling bias?
When samples are not random they may suffer from sampling bias and the random sampling errors are of unknown size
Let’s now imagine that the population is students in class today who completed the survey…
To illustrate random sampling error: We can simulate taking random samples of students in class and plot the sampling distribution
histogram = Sampling distribution (the averages of different random samples)
Blue line = Population Mean (true in-class average)
Red line = Sampling Distribution Mean (average of SAMPLE averages)
Takeaways:
Random sampling always involves random sampling error \(\to\) uncertainty
How might interpretation of evidence for claims be changed due to this random sampling error?
Random sampling always involves random sampling error.
How might our conclusions be changed due to this random sampling error?
inference: 47% (\(\pm 2.5\%\), 19 times out of 20) approve of Carney as PM.
This implies that true population approval of Carney is in:
Random sampling always involves random sampling error.
How might our conclusions be changed due to this random sampling error?
hypothesis tests: compare an “alternative” (claim is correct) versus a “null” hypothesis (claim is not correct). Tell us probability of observing data for a claim by chance, if the claim were false. (error probability)
alternative hypothesis: young adult unemployment in 2025 \(>\) young adult unemployment in 2005
null hypothesis: young adult unemployment in 2025 \(\leq\) young adult unemployment in 2005
confidence intervals/margins of error; \(p\) values/hypothesis tests only work as advertised if assumptions are correct.
Each dot is the result of a survey of voters during the 2020 US Presidential Election. These surveys suggested that by election day voters preferred Biden to Trump by \(8.4\) percent. Biden actually won by only \(4.5\) points.
Is this sampling error? Is this a random error or a bias?