(1) Measurement Error
- Measurement error
(2) Sampling Error
- Sampling Bias vs Random Sampling Error
- Random Sampling
October 16, 2024
Concepts not transparent/systematic \(\xrightarrow{\xcancel{weak \ severity}}\)
Variable does not map onto concept (lack of validity) \(\xrightarrow{\xcancel{weak \ severity}}\) or \(\xrightarrow{\xcancel{pass \ severe \ test}}\)
Procedure does not return the true value (measurement error) \(\xrightarrow{\xcancel{pass \ severe \ test}}\)
…if there is measurement error?
Measurement error is everywhere. Does not mean we can say nothing about descriptive claims.
We need to distinguish between random and systematic errors. Does the source of the error suggest a systematic direction to the error?
Does the evidence using this measure appear to support or reject the claim?
If the error is a bias, what is the direction of the bias? (upward?, downward?) Is this bias toward supporting or rejecting the claim?
Is the magnitude of the error likely to be large or small? Is it large enough that it could incorrectly lead us to accept/reject the claim?
If:
Then, bias is a problem. Claim could be true or false.
If:
Then, bias is not a problem. Procedure stacks the deck against the claim, yet still supports it.
If:
Then, bias is not a problem. Procedure stacks the deck in favor of the claim, yet still rejects it.
If:
Then, bias is a problem. Claim could be true or false.
If:
Then, random error is a problem.
If:
Then, random error is a problem (biased toward finding no pattern).
If:
Then, random error is unlikely to be a problem.
What is the type of this case ?
What is the amount/frequency of some phenomena?
What is the relative amount/frequency of something across different cases/times?
What patterns are there between two different phenomena?
We would have to observe too many cases.
Ahead of the upcoming Federal elections:
“What proportion of Canadians prefer Justin Trudeau to be the next Prime Minister?”
We can’t interview all Canadians…
population: full set of cases (countries, individuals, etc.) we’re interested in describing
sample: a subset of the population that we observe and measure
inference: description of the (unmeasured) population we make based on the (measured) sample
and there is uncertainty about what is true about the population, because we only measure a sample
The population:
The sample:
The inference:
The difference between the value of the measure for the sample and the true value of the measure for the population
\[\mathrm{Value}_{sample} - \mathrm{Value}_{population} \neq 0 \xrightarrow{then} \mathrm{sampling \ error}\]
\(1\). sampling bias: cases in the sample are not representative of the population: not every member of population has equal chance of being in sample. Error is consistently in the same direction.
\(2\). random sampling error: in choosing cases for a sample, by chance, we get samples where the average is too high or too low compared to the population average
To understand random sampling error and sampling bias, it can be useful to understand…
the sampling distribution:
(e.g., the percent of survey respondents who prefer Trudeau in every possible sample of \(1009\) drawn using random digit dialing)
We can visualize a sampling distribution using a histogram and then assess:
random sampling: sampling cases from the population in a manner that gives all cases an equal probability of being chosen.
This procedure creates samples that:
(board: intuitions as to why)
A quick survey: (Answer BOTH questions - you have to click through)
Go here
Or go to menti.com and enter: \(1735 \ 7291\)
Let’s say we want to understand housing insecurity among students in this course.
The population is students registered in this course
Students in lecture hall, responding to poll are the sample
When we take the average rent budget for the sample (people taking poll in class today)…
and use it as our estimate of the average rent budget of the population (all students registered in this course)…
we are making an inference.
Was this sample a random sample of the students in the course?
Can you think of any reasons this sample (students in lecture) would suffer from sampling bias?
When samples are not random they may suffer from sampling bias and the random errors are of unknown size
Let’s now imagine that the population is students in class today who completed the survey…
To illustrate random sampling error: We can simulate taking random samples of students in class and plot the sampling distribution
histogram = Sampling distribution (the averages of different random samples)
Blue line = Population Mean (true in-class average)
Red line = Sampling Distribution Mean (average of SAMPLE averages)
We want to know: what fraction of Canadian adults prefer Trudeau as PM?
There are ~31 million Canadians over the age of 18: assuming our sample is random, about how many people (\(n\)) do you think we’d have to survey to come up with sample mean and margin of error of \(\pm 1\) points that includes the population mean with a probability of 99%?
If people who prefer Trudeau, but feel social pressure to give a different answer or that they “did not know”…
Is this sampling error? Is this a random error or a bias?
Each dot is the result of a survey of voters during the 2020 US Presidential Election. These surveys suggested that by election day voters preferred Biden to Trump by \(8.4\) percent. Biden actually won by only \(4.5\) points.
Is this sampling error? Is this a random error or a bias?
It depends: if this is going on, then sampling bias
It depends: if this is going on, then sampling bias
It depends: if there are “shy” Trump voters, then measurement bias.
Next class: