(1) Measurement Error
- Bias/Systematic
- Random
- Sources and Solutions
October 9, 2024
Concepts not transparent/systematic \(\xrightarrow{\xcancel{weak \ severity}}\)
Variable does not map onto concept (lack of validity) \(\xrightarrow{\xcancel{weak \ severity}}\) or \(\xrightarrow{\xcancel{pass \ severe \ test}}\)
Procedure does not return the true value (measurement error) \(\xrightarrow{\xcancel{pass \ severe \ test}}\)
is a difference between the true value of a variable for a case and the observed value of the variable for that case produced by the measurement procedure.
\[\mathrm{Value}_{observed} - \mathrm{Value}_{true} \neq 0 \xrightarrow{then} \mathrm{measurement \ error}\]
measurement bias or systematic measurement error: error produced when our measurement procedure obtains values that are, on average, too high or too low (or, incorrect) compared to the truth.
random measurement error: errors that occur due to random features of measurement process or phenomenon and the values that we measure are, on average, correct
Go to menti.com/ and enter the code \(3484 \ 7512\)
…if there is measurement error?
We only observe what we observe, how do we know procedure does not return the true value?
…if there is measurement error?
Measurement error is everywhere. Does not mean we can say nothing about descriptive claims.
Need two concepts/variables/measures:
For each one, what are possible kinds of measurement error?
concept: Anti-refugee Violence
variable: Number of attacks against refugee persons and property
measure: (for each week)
concept: Anti-refugee speech on social media
variable: Number of anti-refugee posts on Facebook per week
measure:
Example Facebook posts:
concept: “Exposure to Facebook”: persons who have an active Facebook account
variable: Active facebook users in a municipality per 10k people.
measure: Followers of Nutella Germany on Facebook (who share their location information) divided by population
In groups, discuss possible sources of measurement error in the following:
What pattern do we see here between anti-refugee Facebook posts and anti-refugee violence over time?
Could the sources of measurement error we discussed alter our conclusions?
Unlike in hard sciences, not usually a calibration error.
Errors arise from the fact that observations are made by and of people:
(\(1\)) Subjectivity/Perspective: Researcher/data collector systematically perceives and evaluates cases incorrectly
Examples:
(\(2\)) Motives/Incentives to mis-represent: beyond researchers, people generating the data
If we surveyed Canadians and asked them:
“And would you oppose stopping all immigration into Canada?”
They can choose “oppose”, “support”, “neither support nor oppose”
Do you think this survey response would suffer from measurement bias?
List experiments
(board)
List experiments in US vs Canada
How many people are opposed to stopping immigration?
When discussing crime rates for natural-born citizens, legal immigrants, and undocumented immigrants, need to get the number of undocumented immigrants.
Why might it be difficult to correctly count?
(\(3\)) Use of data beyond its intended purposes: without knowing how data is produced, unanticipated errors can arise.
Kennedy et al, argue:
It takes time for undocumented immigrants in custody to be identified.
Only people in custody for longer periods of time for serious crimes likely to be thoroughly checked:
Alex Nowsrateh shows that these conclusions came from misunderstanding of the Texas data:
Kennedy et al takes sum of undocumented from DHS and TDCJ
“We can supply the number uniquely identified by TDCJ (Prison category) and the total number of Illegals identified through PEP (this can include illegals also identified by TDCJ). Please note, if someone was uniquely identified through TDCJ, but at a later time is identified through PEP, the individual would no longer be in the Prison category and would reflect the PEP identification” [emphasis added].
Anything that affects the values we observe that is unrelated to the actual values for the cases we want to observe.
We need to distinguish between random and systematic errors. Does the source of the error suggest a systematic direction to the error?
Is the magnitude of the error likely to be large or small? Is it possible to assess how wrong it could be?
If the error is a bias, what is the systematic pattern that is produced? (upward?, downward?)
Systematic/Bias | Random Error | |
---|---|---|
Pattern | Errors are systematic (deviate from truth, on average) |
Errors are random (correspond to truth, on average) |
When it’s OK | If it is UNIFORM across cases and we want relative values |
If false negative better than false positive |
When it’s Not OK | If it is different across cases/ or we want absolute values |
If we need precision/ observe few cases |
Solved by more data? | No, bias persists. | Yes, random errors “wash out” |
Measurement Error