October 16, 2024

Objectives

(1) Measurement Error

  • Measurement error

(2) Sampling Error

  • Sampling Bias vs Random Sampling Error
  • Random Sampling

Measurement Error

Descriptive Claim

Concepts not transparent/systematic \(\xrightarrow{\xcancel{weak \ severity}}\)

Variable does not map onto concept (lack of validity) \(\xrightarrow{\xcancel{weak \ severity}}\) or \(\xrightarrow{\xcancel{pass \ severe \ test}}\)

Procedure does not return the true value (measurement error) \(\xrightarrow{\xcancel{pass \ severe \ test}}\)

\(\xcancel{\to}\) “Answer”

What do we do…

…if there is measurement error?

Measurement error is everywhere. Does not mean we can say nothing about descriptive claims.

  1. Depending on the descriptive claim and what we observe, some types of measurement error are tolerable.
  2. Depending on the direction of the measurement error, we may still be able to evaluate claims
  3. Depending on the magnitude of the measurement error, errors may be irrelevant

If errors, then…

  1. We need to distinguish between random and systematic errors. Does the source of the error suggest a systematic direction to the error?

  2. Does the evidence using this measure appear to support or reject the claim?

  3. If the error is a bias, what is the direction of the bias? (upward?, downward?) Is this bias toward supporting or rejecting the claim?

  4. Is the magnitude of the error likely to be large or small? Is it large enough that it could incorrectly lead us to accept/reject the claim?

Bias: When is it a problem?

If:

  • the evidence appears to support the claim:
  • measurement biased in the direction of the claim (fails weak severity)
  • bias has large magnitude

Then, bias is a problem. Claim could be true or false.

Bias: When is it a problem?

If:

  • the evidence appears to support the claim:
  • measurement biased in the opposite direction of the claim
  • bias has large or small magnitude

Then, bias is not a problem. Procedure stacks the deck against the claim, yet still supports it.

Bias: When is it a problem?

If:

  • the evidence appears to reject the claim:
  • measurement biased in the direction of the claim
  • bias has large or small magnitude

Then, bias is not a problem. Procedure stacks the deck in favor of the claim, yet still rejects it.

Bias: When is it a problem?

If:

  • the evidence appears to reject the claim:
  • measurement biased in the opposite direction of the claim (fails weak severity)
  • bias has large magnitude

Then, bias is a problem. Claim could be true or false.

Random Error: When is it a problem?

If:

  • evidence appears to support/reject a claim
  • random error has large magnitude
  • claim is about/we use data on few cases

Then, random error is a problem.

Random Error: When is it a problem?

If:

  • claim is about patterns/associations
  • evidence appears to reject the claim
  • random error has relatively large magnitude

Then, random error is a problem (biased toward finding no pattern).

Random Error: When is it a problem?

If:

  • evidence appears to support/reject a claim
  • random error has small magnitude

Then, random error is unlikely to be a problem.

Specific Descriptive Claims

What is the type of this case ?

  • e.g., “this person is an undocumented migrant”
  • measurement procedure indicates the person is “undocumented”
  • Bias is tolerable if procedure is biased against labelling them as undocumented.
  • Random error is tolerable if magnitude is small.
  • What if the measurement procedure indicated the person is NOT undocumented?

Specific Descriptive Claims

What is the amount/frequency of some phenomena?

  • e.g., “The homicide rate in Canada is at least 1.94 per 100,000”
  • Measurement procedure says it is 1.95 per 100,000
  • Bias is acceptable if procedure is biased downward
  • Random error is acceptable if magnitude is small (relative to the claim).
  • What if the measurement returned a homicide rate of 1.5 per 100,000?

Specific Descriptive Claims

What is the relative amount/frequency of something across different cases/times?

  • e.g., “The murder rate in the United States has gone down.”
  • Measurement procedure shows a decline between 2022 and 2023
  • Bias is acceptable if the bias is constant over time or across cases.
  • Random error acceptable if small relative to the claim
  • What if the measurement returned no change in murders?
  • Bias is a problem if the pattern of errors is toward the claim
  • Random error a problem because it creates attenuation bias \(\to\) bias to find no change/no pattern

Specific Descriptive Claims

What patterns are there between two different phenomena?

  • e.g., “Higher gun ownership rates associated with higher rates of gun deaths.”
  • Measurement procedure shows a no pattern between gun ownership and gun deaths
  • Bias is acceptable if constant, unrelated across measures, or biased toward finding a positive relationship.
  • Random Error a problem because it creates attenuation bias
  • What if measurement procedure shows a positive association between gun ownership and gun deaths?
  • Bias acceptable if constant, unrelated, or toward negative relatinship
  • Random error acceptable if relatively small.

Specific Descriptive Claims

  • what is the type of a specific phenomenon: bias and random error a problem.
  • amount/frequency of phenomena: bias/random error is a problem.
  • relative amount/frequency of phenomena across different places/times: bias is ok if it is constant. random error is a problem
  • what patterns are there between two different phenomena: bias is ok if it is constant. random error a problem

Sampling Error

Sampling

Sometimes we cannot answer descriptive claims directly

We would have to observe too many cases.

Example:

Ahead of the upcoming Federal elections:

“What proportion of Canadians prefer Justin Trudeau to be the next Prime Minister?”


We can’t interview all Canadians…

Sampling

Survey of \(1009\) Canadian adults (09/11-10/11 2024)

Sampling

Key terms:

population: full set of cases (countries, individuals, etc.) we’re interested in describing

sample: a subset of the population that we observe and measure

inference: description of the (unmeasured) population we make based on the (measured) sample

and there is uncertainty about what is true about the population, because we only measure a sample

Example:

Measuring voting preferences

The population:

  • All Canadian adults (18 years of age or older)

The sample:

  • \(1009\) Canadians interviewed from random digit dialing of land lines and cell lines. Each line tried 5 times, between 5-9 pm on weekdays (12-6 pm on weekends). Weighted according to age, gender, and geographic region, as per 2021 Census.

The inference:

  • 21% (\(\pm 3.1\)%) of Canadians say prefer Trudeau as Prime Minister

sampling error:

The difference between the value of the measure for the sample and the true value of the measure for the population

\[\mathrm{Value}_{sample} - \mathrm{Value}_{population} \neq 0 \xrightarrow{then} \mathrm{sampling \ error}\]

  • Just like measurement error, there are two types: one that is bias and one that is random
  • Sampling error is a kind of measurement error

sampling error:

\(1\). sampling bias: cases in the sample are not representative of the population: not every member of population has equal chance of being in sample. Error is consistently in the same direction.

sampling error:

\(2\). random sampling error: in choosing cases for a sample, by chance, we get samples where the average is too high or too low compared to the population average

  • but these errors would cancel out (if we repeated the sampling procedure).
  • produces the uncertainty of sampling (e.g., margin of error).

sampling error:

To understand random sampling error and sampling bias, it can be useful to understand…

the sampling distribution:

  • the results from all possible samples we could get, using a given sampling procedure.
  • we only ever get the result from the one sample we draw, but can imagine the results could have been different
  • it is not actually knowable, except in simulations

(e.g., the percent of survey respondents who prefer Trudeau in every possible sample of \(1009\) drawn using random digit dialing)

We can visualize a sampling distribution using a histogram and then assess:

  • is the sampling procedure biased?
  • how much random error is there?

Random Sampling

random sampling: sampling cases from the population in a manner that gives all cases an equal probability of being chosen.

This procedure creates samples that:

  • on average, give unbiased inferences about the population (regardless of sample size) (no sampling bias)
    • unbiased in that, across all samples, on average the sample average are the same as the population average
  • has random sampling errors with a known size: produces known uncertainty (described by the field of statistics)

(board: intuitions as to why)

A quick survey: (Answer BOTH questions - you have to click through)

Go here

Or go to menti.com and enter: \(1735 \ 7291\)

Example:

Let’s say we want to understand housing insecurity among students in this course.

The population is students registered in this course

Students in lecture hall, responding to poll are the sample

Example: Contacts

When we take the average rent budget for the sample (people taking poll in class today)…


and use it as our estimate of the average rent budget of the population (all students registered in this course)…


we are making an inference.

Example: Contacts

Was this sample a random sample of the students in the course?


Can you think of any reasons this sample (students in lecture) would suffer from sampling bias?

Example: Contacts

When samples are not random they may suffer from sampling bias and the random errors are of unknown size

  • students attending lecture in person may differ in their rent expenditures from those who do not (e.g., the latter may have extra jobs, have to save money on transit, higher levels of stress)

Random Sampling

Let’s now imagine that the population is students in class today who completed the survey…

To illustrate random sampling error: We can simulate taking random samples of students in class and plot the sampling distribution

See here

histogram = Sampling distribution (the averages of different random samples)

Blue line = Population Mean (true in-class average)

Red line = Sampling Distribution Mean (average of SAMPLE averages)

Random Sampling Error

We want to know: what fraction of Canadian adults prefer Trudeau as PM?

There are ~31 million Canadians over the age of 18: assuming our sample is random, about how many people (\(n\)) do you think we’d have to survey to come up with sample mean and margin of error of \(\pm 1\) points that includes the population mean with a probability of 99%?

  • \(n \approx 16000\). It doesn’t matter how many Canadians there are. The size of the random error in the sampling distribution gets smaller by a factor of \(\sqrt{n}\)

Sampling Error?

If people who prefer Trudeau, but feel social pressure to give a different answer or that they “did not know”…


Is this sampling error? Is this a random error or a bias?

Sampling Error?

Each dot is the result of a survey of voters during the 2020 US Presidential Election. These surveys suggested that by election day voters preferred Biden to Trump by \(8.4\) percent. Biden actually won by only \(4.5\) points.

Is this sampling error? Is this a random error or a bias?

Sampling Error?

It depends: if this is going on, then sampling bias

Sampling Error?

It depends: if this is going on, then sampling bias

Sampling Error?

It depends: if there are “shy” Trump voters, then measurement bias.

Conclusion

Conclusion:

  • Recognize Sampling Bias vs Random Sampling Error
  • Random Sampling a solution for both
  • Distinguish between sampling bias and measurement bias

Next class:

  • Wrapping up descriptive claims
  • More on