Experimental Design: Types, Components, and Examples

Experimental design is the gold standard for establishing causation in research. By systematically manipulating one variable while holding others constant and randomly assigning participants to conditions, a well-designed experiment can demonstrate that a treatment causes a change in outcomes — not merely that the two are associated. This guide covers every essential component of experimental design, from control groups and random assignment to factorial layouts and the threats that undermine validity.

What is a true experiment?

A true experiment (also called a randomized controlled experiment or RCT in clinical contexts) has three defining features: (1) the researcher actively manipulates at least one variable, (2) participants are randomly assigned to conditions, and (3) there is a control condition against which the treatment is compared. These features together allow strong causal inference — the logic being that randomization distributes all confounding variables equally across groups, so any observed difference in outcomes can be attributed to the treatment.

Experiments stand in contrast to observational studies, in which the researcher observes participants without manipulating anything. Observational studies can reveal correlations but cannot rule out confounding variables, which is why they cannot establish causation with the same confidence as a true experiment.

Control and experimental groups

Every experiment needs a baseline for comparison. The control group receives no treatment, a placebo, or the standard existing treatment. The experimental group (also called the treatment group) receives the manipulation being tested. After the treatment period, the groups are compared on the outcome measure.

Example — Educational intervention

A study tests whether spaced-repetition flashcard software improves vocabulary retention. The experimental group uses the software for four weeks; the control group studies with conventional paper flashcards. Both groups take the same vocabulary test at the end.

In some experiments there are multiple experimental groups receiving different doses or variants of the treatment. Adding a placebo group — participants who believe they are receiving treatment but are not — controls for expectation effects (the placebo effect).

Double-blind design: In a double-blind experiment, neither participants nor the researchers who interact with them know which condition each participant is in. This prevents both placebo effects and experimenter bias from contaminating results. Double-blind RCTs are the standard in pharmaceutical research.

Random assignment

Random assignment is the process of assigning each participant to a condition by chance — for example, by flipping a coin, using a random number table, or employing software to generate a random sequence. It is the single most powerful tool for eliminating systematic differences between groups before the experiment begins.

Random assignment should not be confused with random sampling. Random sampling refers to how participants are recruited from the population (relevant to external validity). Random assignment refers to how those participants are divided into conditions (relevant to internal validity). A study can have one without the other.

Stratified random assignment

When sample sizes are small, purely random assignment can accidentally produce groups that differ on an important variable (e.g., 80% of participants with severe symptoms end up in the control group). Stratified random assignment (also called blocking) first divides participants into strata based on a key characteristic (e.g., severity, gender, age), then randomly assigns within each stratum to ensure balance.

Independent and dependent variables

The independent variable (IV) is the variable the researcher deliberately manipulates. It is the presumed cause. The dependent variable (DV) is what is measured to see whether the manipulation had an effect. It is the presumed effect. The research hypothesis states the expected relationship between them.

Example — Sleep and memory

IV: Amount of sleep (4 hours vs. 8 hours)
DV: Score on a word recall test the following morning

Extraneous variables are any other variables that could influence the DV. When an extraneous variable is not controlled and differs systematically between groups, it becomes a confounding variable that threatens the validity of the causal inference. Good experimental design identifies likely confounds and either holds them constant, counterbalances them, or controls for them statistically.

Operationalization

Before running the experiment, the researcher must operationalize each variable — define exactly how it will be measured or manipulated. Vague concepts like "stress" or "learning" must be translated into specific, measurable operations (e.g., salivary cortisol level; number of correct answers on a 20-item test). Clear operationalization enables replication and prevents disputes about what was actually studied.

Pre-test and post-test designs

A pre-test/post-test design measures participants on the dependent variable both before and after the treatment. This provides a baseline score for each participant, allowing the researcher to calculate change scores and increasing statistical sensitivity because each participant serves as their own historical control.

Design	Structure	Strength
Post-test only	Random assign → Treatment → Measure	Simple; avoids pre-test sensitization
Pre-test/post-test	Measure → Random assign → Treatment → Measure	Controls for baseline differences; detects change
Solomon four-group	Combines pre-tested and non-pre-tested conditions	Tests whether the pre-test itself affects results

Factorial designs

A factorial design includes two or more independent variables (factors), each with two or more levels, and tests every combination of those levels. The primary advantage is that factorial designs can detect interaction effects — cases where the effect of one IV depends on the level of another IV — something that separate one-factor experiments cannot reveal.

Example — 2×2 factorial design

A study examines the effects of caffeine (caffeine vs. no caffeine) and task type (simple vs. complex) on reaction time. This creates four conditions: caffeine/simple, caffeine/complex, no-caffeine/simple, no-caffeine/complex. A significant interaction would mean caffeine improves performance on simple tasks but impairs it on complex ones (or vice versa).

Factorial designs are described by the number of levels of each factor: a 2×3 design has two factors, one with 2 levels and one with 3 levels, yielding 6 total conditions. Adding factors multiplies the number of conditions quickly, so researchers must balance comprehensiveness against practical feasibility.

Quasi-experimental designs

When random assignment is not feasible — for ethical, practical, or logistical reasons — researchers use quasi-experimental designs. These look like true experiments (there is still a comparison group and a treatment) but lack random assignment, which weakens causal inference.

Nonequivalent control group design: Pre-existing groups (e.g., two classrooms) are compared; the researcher cannot guarantee equivalence at baseline.
Interrupted time series: Repeated measurements are taken before and after a naturally occurring intervention (e.g., a policy change) to assess whether the trend shifted.
Regression discontinuity: Treatment is assigned based on a cutoff score (e.g., students above a threshold get a scholarship); those just above and just below the cutoff are compared as near-equivalent groups.

Threats to validity

Internal validity threats

Internal validity is the degree to which an observed effect can be attributed to the independent variable rather than to other factors. Key threats include:

History: An external event occurs during the experiment that could cause the observed change (e.g., a public health campaign during a health intervention study).
Maturation: Participants change naturally over time regardless of the treatment (especially relevant in developmental studies).
Testing: Taking the pre-test affects performance on the post-test, independent of the intervention.
Instrumentation: Measuring instruments change over time (e.g., observers become more lenient in coding behavior).
Regression to the mean: Participants selected because of extreme scores on a pre-test will tend to score closer to the mean on subsequent measures regardless of treatment.
Selection bias: Groups differ systematically at baseline due to non-random assignment.
Attrition (mortality): Participants drop out of the study in ways that differ between groups.

External validity threats

External validity is the degree to which findings can be generalized beyond the specific sample, setting, and time of the study. Threats include:

Sample bias: The sample is not representative of the population (e.g., WEIRD — Western, Educated, Industrialized, Rich, Democratic — samples in psychology).
Reactive effects of testing: Participants behave differently because they know they are being observed (Hawthorne effect).
Setting effects: Laboratory findings may not replicate in naturalistic settings.

Need peer-reviewed papers on experimental design or research methodology? CiteGenie can find supporting academic sources for your paper.

Find Sources for Your Research