February 6, 2019

- What is sampling
- Key terms related to sampling: population, sample, inference, sampling error, random sampling

**"Most Americans prefer a ban on semi-automatic firearms."**

We can't interview **all Americans**…

**population**: full set of cases (countries, individuals, etc.) we're interested in describing

**sample**: subset of the population that we observe and measure

**inference**: description of the population we make *based on a sample*, which must be, by its nature, uncertain

Measuring attitudes on gun control in the US:

The **population**:

- All adults in the US

The **sample**:

- 1500 people chosen
**at random**

The **inference**:

- \(57\%\) of Americans want ban on semi-automatic weapons, with some
**uncertainty**due to sampling - in this case, the interval \((54.5\%,59.5\%)\) covers the true % with a probability of 0.95

Samples can lead us to make **incorrect** inferences

The difference between the value of the measure for the sample and the true value of the measure for the population

\[Value_{Population} - Value_{Sample} \neq 0 \xrightarrow{then} sampling \ error\]

**sampling bias**: samples generated by a sampling procedure deviate **systematically** from the population. Results from respondents/cases joining sample with **unequal probability**

- e.g., certain respondents/cases choosing to be in/out of sample (who responds to surveys?)
- e.g., researcher selects cases out of convenience/observed traits

**random sampling error**: samples generated by sampling procedure deviate from population **by chance**.

- too many cases in a sample with high values or with low values (e.g. survey only people with long/short commutes).
- because errors are
**by chance**, errors cancel out and are**unbiased**in aggregate.

For **sampling** to let us draw valid **inferences** about the **population** of interest, we need to use a sampling procedure that

**random sampling**: sampling cases from the population in a manner that gives **all cases** an **equal probability** of being chosen

- average from random sample permits
**unbiased**inferences about the population average (regardless of sample size) **unbiased**means that, on average, the sample mean will be identical with the population mean, not that it is exactly correct every time.- still permits
**random sampling errors**, but because sampling is**random**and we can**describe**the chance of getting errors of different sizes (**known uncertainty**)

You don't need to know this, but:

If we wanted to know: what is the average commuting time to UBC for students in this course?

**population**: all students in this class

**sample**: students in this class who are present during the last 2 minutes of Friday's lecture

If we wanted to know: what is the average GPA for students in this course?

**population**: all students in this class

**sample**: students in this class who are present in class on next Friday (after midterm, right before midterm break)

**Measurement Error**:

- Incorrectly describe the world because you
**incorrectly**observe values for the**case(s)**you study

\[Value_{Case \ Truth} - Value_{Case \ Obs.} \neq 0 \xrightarrow{then} measurement \ error\]

**Sampling Error**:

- Incorrectly describe the world because you sample
**cases that are different**from the population you want to learn about

\[Value_{Population} - Value_{Sample} \neq 0 \xrightarrow{then} sampling \ error\]

**Sampling error** is **measurement error** when you are evaluating descriptive claims **about the population** that you sample.

- "Based on our sample, we estimate that 3.5% of non-citizens in the United States voted in 2010, implying 679,000 illegal votes."

**Sampling error** is not **measurement error** when you are evaluating claims **about the cases** that you sample.

"In our sample, Democrats prefer stricter gun control far more than Republicans"

"Canvassing for transgender rights increased tolerance of transgender persons for survey respondents in our experiment"

In addition to winning the Electoral College in a landslide, I won the popular vote if you deduct the millions of people who voted illegally

â€” Donald J. Trump (@realDonaldTrump) November 27, 2016

**Many people have argued there is no evidence for Trump's claims**

- Using sample from large survey,
- Select as their sample of "non-citizens" respondents who indicate they are non-citizens (\(489\))
- They then count who among those "non-citizens" voted (\(13\))
- Conclude (with additional work) that 3.5% of non-citizens voted in 2010, up to 14.7% in 2008 (~2.8 million people)

The political scientists who oversee the survey point out:

- Citizenship question suffers from (low) measurement error.
- Out of \(18878\) surveyed in both 2010 and 2012, \(99.7\%\) gave the same answer on citizenship, \(0.19\%\) went from "non-citizen" to "citizen" (maybe true), \(0.11\%\) went from
**"citizen" to "non-citizen" (definitely false)** - This question generates
**measurement error**(misclassification of who is a citizen) for \(0.1\%\) of people. - Because there are so many more citizens than non-citizens, there are many more
**citizens**who are**misclassified**as**"non-citizens"**

**measurement error** of individuals as citizens/non-citizens, leads to sample of "non-citizens" that include citizens (measurement error suggests about \(18\%\) of "non-citizens" are citizens).

- We have
**sampling error**… the sample does not reflect the population Richman et al/ Trump want to make inferences about. - It could be that the "non-citizen" voting is driven by misclassified citizens.

- Richman et al/ Trump want to make claims about the
**population**of all non-citizens. They use a**sample**. - The
**sample**is generated based on individual survey responses to a questions that are**wrong**for \(0.1\%\) of individuals (**measurement error**) - This generates
**sampling bias**as the sample systematically includes**citizens**who are treated as representative of "non-citizens" - Because Richman et al/ Trump are using the sample to support claims about the
**population**the**sampling bias**counts as**measurement error**, particularly**bias**.