Selection Bias: Definition, Types, and Examples in Research
Selection bias occurs when the individuals included in a study are not representative of the population the researcher intends to study, making it impossible to generalize findings beyond the sample. It is one of the most common threats to external and internal validity in empirical research, affecting fields from clinical trials to social science surveys.
Definition and overview
Selection bias is a systematic error arising from the method by which study participants are chosen. When the selection process is not random — or when non-random dropout occurs after selection — the resulting sample differs from the target population in ways that distort the study's findings. The key feature is systematicity: the bias is not random noise but a consistent distortion in one direction.
Selection bias threatens both internal validity (whether the study correctly identifies a causal relationship) and external validity (whether findings generalize to people outside the study). A treatment that appears effective in a highly selected clinical trial sample may have no effect — or even harm — a broader population.
Types of selection bias
Self-selection bias
Participants choose to enroll in or withdraw from a study based on characteristics that are related to the outcome. People who volunteer for a health intervention trial may already be more motivated and health-conscious than non-volunteers, inflating the apparent effectiveness of the intervention.
Attrition bias (differential dropout)
Participants who drop out of a longitudinal study differ systematically from those who remain. If sicker patients withdraw from a clinical trial because of side effects, the remaining sample will appear healthier than the full enrolled group, making the treatment look more effective than it is.
Berkson's bias
A hospital-based or clinic-based sample over-represents people who are ill enough to seek care. Comparing two diseases using hospital records may artificially inflate their apparent co-occurrence because both conditions independently increase the probability of hospitalization.
Healthy worker effect
Occupational studies comparing workers to the general population are biased because employed people are healthier on average than the broader population (which includes the chronically ill and disabled). This makes workplace exposures appear less harmful than they are.
Survivorship bias
Only participants who "survived" to the point of measurement are included, omitting those who dropped out, died, or otherwise did not persist — often for reasons related to the outcome of interest.
How it occurs in research
Selection bias can enter a study at multiple stages:
- Recruitment: Using convenience samples (e.g., university students, online volunteers) that differ from the intended population in age, education, health status, or motivation.
- Eligibility criteria: Overly narrow inclusion/exclusion criteria that produce a sample too homogeneous to represent real-world patient populations.
- Consent and enrollment: Systematic refusal to participate by certain demographic groups (e.g., lower health literacy, mistrust of research institutions).
- Retention: Higher dropout rates in one study arm because of side effects, inconvenience, or lack of perceived benefit.
Real examples from research
The Literary Digest presidential poll (1936)
The Literary Digest mailed 10 million survey ballots to predict the U.S. presidential election, drawing their mailing list from telephone directories and automobile registrations. In 1936, these lists over-represented wealthy, Republican-leaning Americans. The poll predicted a landslide for Alf Landon; Franklin Roosevelt won by the largest electoral margin in history. The massive sample size did not compensate for the systematic selection error.
Clinical trials and the "WEIRD" problem
Henrich, Heine, and Norenzayan (2010) documented that psychological and behavioral research disproportionately samples from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Because most participants in published studies are university students from high-income countries, many findings do not generalize globally — a profound selection bias baked into the discipline.
Hormone replacement therapy meta-analyses
Observational studies in the 1990s appeared to show that hormone replacement therapy (HRT) reduced cardiovascular disease risk in postmenopausal women. However, women who chose HRT were more health-conscious and had better baseline cardiovascular health than non-users — a classic self-selection bias. When the Women's Health Initiative randomized controlled trial was completed, HRT was actually found to increase cardiovascular risk in older women.
How to detect selection bias
- Compare enrolled vs. eligible non-participants: If recruitment records allow, compare baseline characteristics of those who enrolled with those who declined or were excluded.
- Analyze dropout patterns: In longitudinal studies, compare the characteristics of completers and non-completers at baseline. If they differ significantly, attrition bias is likely.
- Sensitivity analyses: Test how results change under different assumptions about the missing or excluded participants (e.g., worst-case scenarios).
- Check representativeness: Compare the sample's key demographics (age, sex, comorbidities, socioeconomic status) to population-level data.
How to minimize selection bias
Random sampling
Probability-based sampling — where every member of the target population has a known, non-zero chance of selection — is the gold standard for representativeness. Simple random sampling, stratified sampling, and cluster sampling all provide this property when properly implemented.
Randomized controlled trials
In experimental research, random assignment to conditions (not random sampling from the population) eliminates selection bias at the level of group comparability, even if the sample itself is not fully representative of the broader population.
Intention-to-treat analysis
Analyzing all participants as assigned — including dropouts — regardless of whether they completed the intervention, prevents attrition from selectively removing the sickest or most vulnerable participants from the analysis.
Multiple imputation for missing data
Statistical techniques that impute plausible values for missing data using observed characteristics reduce the impact of dropout-related selection bias, provided the missingness mechanism is understood.
| Type | Mechanism | Mitigation |
|---|---|---|
| Self-selection | Volunteers differ from non-volunteers | Random sampling; incentivize broad participation |
| Attrition | Dropouts differ from completers | ITT analysis; multiple imputation |
| Berkson's | Hospital sample over-represents disease severity | Population-based cohort designs |
| Healthy worker | Workers healthier than general population | Internal comparison groups within same workplace |
Quick summary
| Feature | Detail |
|---|---|
| Definition | Systematic distortion from non-representative participant selection |
| Affects | Internal validity and external validity |
| Key types | Self-selection, attrition, Berkson's, healthy worker, survivorship |
| Detection | Compare enrolled vs. non-enrolled; dropout analysis |
| Primary mitigations | Random sampling, RCTs, ITT analysis, multiple imputation |
Need academic sources on selection bias or research methodology? CiteGenie's AI finds peer-reviewed papers to support your work.
Find Sources for Your Research