2025-09-08
Last lecture we discussed:
Volunteers for a study are likely to be different, on average, from the population. Also known as “self-selection” bias.
For example:
A sample of convenience is a collection of individuals that happen to be available at the time.
Other definitions1:
A convenience sample is the one that is drawn from a source that is conveniently accessible to the researcher.
A purposive sample is the one whose characteristics are defined for a purpose that is relevant to the study.
Problems:
This does not invalidate the studies in question. They can have high “internal validity”. The issue is with generalizing, i.e, its external validity.
Sampling error: Chance deviations between estimates and the truth.
Even when you did NOTHING wrong.
Sampling error is the difference between the estimate and its true parameter value — and it can be quantified!
Because an estimate is a random variable, the value of an estimate is influenced by chance.
Therefore estimates will differ among random samples from the same population.
Sampling Bias
Sampling Error
Accuracy (on average gets the correct answer)
Precision (gives a similar answer repeatedly)
Image: Wikipedia Commons (public domain).
In this figure, the “X” is the population parameter. The circles are different estimates calculated from different samples taken from that population.
Figure made in R with code borrowed from Y. Brandvain
Independent selection of individuals
Random selection of individuals
Sufficiently large
Taking random samples is hard and requires effort
From: makeameme.org
B21: Biostatistics with R