Research Methods · May 9, 2026

By CiteGenie Editorial Reviewed against primary style manuals — see our editorial process

Selection Bias: Definition, Types, and Examples in Research

Selection bias occurs when the individuals included in a study are not representative of the population the researcher intends to study, making it impossible to generalize findings beyond the sample. It is one of the most common threats to external and internal validity in empirical research, affecting fields from clinical trials to social science surveys.

Definition and overview

Selection bias is a systematic error arising from the method by which study participants are chosen. When the selection process is not random — or when non-random dropout occurs after selection — the resulting sample differs from the target population in ways that distort the study's findings. The key feature is systematicity: the bias is not random noise but a consistent distortion in one direction.

Selection bias threatens both internal validity (whether the study correctly identifies a causal relationship) and external validity (whether findings generalize to people outside the study). A treatment that appears effective in a highly selected clinical trial sample may have no effect — or even harm — a broader population.

Types of selection bias

Self-selection bias

Participants choose to enroll in or withdraw from a study based on characteristics that are related to the outcome. People who volunteer for a health intervention trial may already be more motivated and health-conscious than non-volunteers, inflating the apparent effectiveness of the intervention.

Attrition bias (differential dropout)

Participants who drop out of a longitudinal study differ systematically from those who remain. If sicker patients withdraw from a clinical trial because of side effects, the remaining sample will appear healthier than the full enrolled group, making the treatment look more effective than it is.

Berkson's bias

A hospital-based or clinic-based sample over-represents people who are ill enough to seek care. Comparing two diseases using hospital records may artificially inflate their apparent co-occurrence because both conditions independently increase the probability of hospitalization.

Healthy worker effect

Occupational studies comparing workers to the general population are biased because employed people are healthier on average than the broader population (which includes the chronically ill and disabled). This makes workplace exposures appear less harmful than they are.

Survivorship bias

Only participants who "survived" to the point of measurement are included, omitting those who dropped out, died, or otherwise did not persist — often for reasons related to the outcome of interest.

How it occurs in research

Selection bias can enter a study at multiple stages:

Recruitment: Using convenience samples (e.g., university students, online volunteers) that differ from the intended population in age, education, health status, or motivation.
Eligibility criteria: Overly narrow inclusion/exclusion criteria that produce a sample too homogeneous to represent real-world patient populations.
Consent and enrollment: Systematic refusal to participate by certain demographic groups (e.g., lower health literacy, mistrust of research institutions).
Retention: Higher dropout rates in one study arm because of side effects, inconvenience, or lack of perceived benefit.

Real examples from research

The Literary Digest presidential poll (1936)

The Literary Digest mailed 10 million survey ballots to predict the U.S. presidential election, drawing their mailing list from telephone directories and automobile registrations. In 1936, these lists over-represented wealthy, Republican-leaning Americans. The poll predicted a landslide for Alf Landon; Franklin Roosevelt won by the largest electoral margin in history. The massive sample size did not compensate for the systematic selection error.

Clinical trials and the "WEIRD" problem

Henrich, Heine, and Norenzayan (2010) documented that psychological and behavioral research disproportionately samples from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Because most participants in published studies are university students from high-income countries, many findings do not generalize globally — a profound selection bias baked into the discipline.

Hormone replacement therapy meta-analyses

Observational studies in the 1990s appeared to show that hormone replacement therapy (HRT) reduced cardiovascular disease risk in postmenopausal women. However, women who chose HRT were more health-conscious and had better baseline cardiovascular health than non-users — a classic self-selection bias. When the Women's Health Initiative randomized controlled trial was completed, HRT was actually found to increase cardiovascular risk in older women.

How to detect selection bias

Compare enrolled vs. eligible non-participants: If recruitment records allow, compare baseline characteristics of those who enrolled with those who declined or were excluded.
Analyze dropout patterns: In longitudinal studies, compare the characteristics of completers and non-completers at baseline. If they differ significantly, attrition bias is likely.
Sensitivity analyses: Test how results change under different assumptions about the missing or excluded participants (e.g., worst-case scenarios).
Check representativeness: Compare the sample's key demographics (age, sex, comorbidities, socioeconomic status) to population-level data.

How to minimize selection bias

Random sampling

Probability-based sampling — where every member of the target population has a known, non-zero chance of selection — is the gold standard for representativeness. Simple random sampling, stratified sampling, and cluster sampling all provide this property when properly implemented.

Randomized controlled trials

In experimental research, random assignment to conditions (not random sampling from the population) eliminates selection bias at the level of group comparability, even if the sample itself is not fully representative of the broader population.

Intention-to-treat analysis

Analyzing all participants as assigned — including dropouts — regardless of whether they completed the intervention, prevents attrition from selectively removing the sickest or most vulnerable participants from the analysis.

Multiple imputation for missing data

Statistical techniques that impute plausible values for missing data using observed characteristics reduce the impact of dropout-related selection bias, provided the missingness mechanism is understood.

Type	Mechanism	Mitigation
Self-selection	Volunteers differ from non-volunteers	Random sampling; incentivize broad participation
Attrition	Dropouts differ from completers	ITT analysis; multiple imputation
Berkson's	Hospital sample over-represents disease severity	Population-based cohort designs
Healthy worker	Workers healthier than general population	Internal comparison groups within same workplace

Quick summary

Feature	Detail
Definition	Systematic distortion from non-representative participant selection
Affects	Internal validity and external validity
Key types	Self-selection, attrition, Berkson's, healthy worker, survivorship
Detection	Compare enrolled vs. non-enrolled; dropout analysis
Primary mitigations	Random sampling, RCTs, ITT analysis, multiple imputation

Need academic sources on selection bias or research methodology? CiteGenie's AI finds peer-reviewed papers to support your work.

Find Sources for Your Research