Research Methods ·
By Reviewed against primary style manuals — see our editorial process

Descriptive Statistics: A Complete Guide

Before you can test anything, you need to know what your data actually looks like. That's the job of descriptive statistics. No hypotheses yet, no inferences — just summaries that turn a column of numbers into something you can read. This guide walks through the basics: central tendency, spread, shape, and frequency distributions, each with worked examples.

What are descriptive statistics?

Descriptive statistics are numerical and graphical summaries that organize, simplify, and present data in a meaningful way. They describe the characteristics of a data set without drawing conclusions beyond that data — no predictions, no generalizations to a larger population. The goal is clarity: to replace a long list of raw numbers with a handful of informative values.

Every quantitative study begins with descriptive statistics. A clinical trial reports average baseline scores. A survey paper reports the percentage of respondents in each category. An educational study reports the mean and standard deviation of exam scores. These summaries allow readers to understand the sample and evaluate the plausibility of the findings.

Descriptive statistics fall into four families:

  • Central tendency — where the middle of the data lies (mean, median, mode)
  • Variability (spread) — how spread out the data are (range, variance, standard deviation, IQR)
  • Shape — the symmetry and tail behavior of the distribution (skewness, kurtosis)
  • Frequency — how often each value or category appears (frequency tables, histograms)

Measures of central tendency

A measure of central tendency is a single value that represents the "center" or "typical" value of a data set. Three measures dominate: the mean, the median, and the mode.

Mean (arithmetic average)

The mean is the sum of all values divided by the number of values. It is the most widely used measure of center and is the basis for many statistical tests.

Formula — Population mean

μ = (Σxᵢ) / N

Formula — Sample mean

x̄ = (Σxᵢ) / n

The mean uses every data point, which makes it sensitive to outliers. A single extreme value can pull the mean substantially away from where most data cluster.

Median

The median is the middle value when data are arranged in ascending order. If there is an even number of values, the median is the average of the two middle values. The median is resistant to outliers — extreme values do not affect it.

Example — Odd number of values

Data: 3, 7, 9, 12, 15 → Median = 9 (the third of five values)

Example — Even number of values

Data: 3, 7, 9, 12 → Median = (7 + 9) / 2 = 8

Mode

The mode is the value that appears most frequently. A data set can have no mode (all values unique), one mode (unimodal), two modes (bimodal), or more (multimodal). The mode is the only appropriate measure of center for nominal (categorical) data.

Example — Mode

Data: 2, 4, 4, 5, 7, 7, 7, 9 → Mode = 7 (appears three times)

Measures of spread (variability)

Knowing where the center of the data lies is not enough. Two data sets can have identical means but very different distributions. Measures of spread quantify how much the values differ from one another.

Range

The range is the simplest measure of spread: maximum value minus minimum value. It is easy to compute but highly sensitive to outliers because it only uses two data points.

Example — Range

Data: 14, 18, 21, 23, 67 → Range = 67 − 14 = 53

Interquartile range (IQR)

The IQR is the range of the middle 50% of the data: IQR = Q3 − Q1, where Q1 is the 25th percentile and Q3 is the 75th percentile. It is resistant to outliers and is the preferred spread measure when the median is used as the center.

Variance

The variance is the average of the squared deviations from the mean. Squaring removes negative signs and gives extra weight to large deviations.

Formula — Sample variance

s² = Σ(xᵢ − x̄)² / (n − 1)

The denominator is n − 1 (not n) for sample variance. This is called Bessel's correction; it produces an unbiased estimate of the population variance.

Standard deviation

The standard deviation is the square root of the variance. Because it is expressed in the same units as the original data, it is much more interpretable than variance.

Formula — Sample standard deviation

s = √[Σ(xᵢ − x̄)² / (n − 1)]

A small standard deviation means values cluster tightly around the mean; a large one means they are spread out. In a normal distribution, roughly 68% of values fall within one standard deviation of the mean.

Shape of a distribution

The shape of a distribution describes how data are distributed across their range. The two most important shape characteristics are skewness and kurtosis.

Skewness

Skewness measures the asymmetry of a distribution. A perfectly symmetrical distribution has skewness = 0.

  • Positive skew (right skew): The tail extends to the right; mean > median. Common in income data — most people earn moderate amounts, but a few earn extremely high salaries.
  • Negative skew (left skew): The tail extends to the left; mean < median. Example: age at retirement (most people retire at similar ages; a few retire very early).

Kurtosis

Kurtosis describes the "tailedness" — whether extreme values are more or less common than in a normal distribution. A normal distribution has kurtosis = 3 (or excess kurtosis = 0). High kurtosis (leptokurtic) means heavier tails and a sharper peak; low kurtosis (platykurtic) means lighter tails and a flatter peak.

Practical rule of thumb: When skewness and kurtosis values are both between −2 and +2, the distribution is often considered approximately normal for most statistical purposes — though formal tests (Shapiro-Wilk, Kolmogorov-Smirnov) should be used to confirm normality.

Frequency distributions

A frequency distribution shows how often each value (or range of values) appears in a data set. It is typically displayed as a table, histogram, or bar chart and is often the first step in exploring a new data set.

Absolute frequency

The count of observations in each category or interval.

Relative frequency

The proportion of observations in each category: relative frequency = count / total n. Multiplying by 100 gives the percentage.

Cumulative frequency

The running total of frequencies from the lowest to a given value. Useful for finding percentiles and understanding how data accumulate across a range.

Example — Frequency table for exam scores (n = 30)
Score rangeFrequencyRelative frequencyCumulative frequency
50–5926.7%2
60–69516.7%7
70–791033.3%17
80–89826.7%25
90–99516.7%30

Worked example with a data set

Suppose a researcher measures the number of hours per week that 10 university students spend studying:

Raw data

8, 12, 15, 10, 9, 14, 11, 13, 10, 42

Step 1: Sort the data

Sorted

8, 9, 10, 10, 11, 12, 13, 14, 15, 42

Step 2: Compute the mean

Mean

x̄ = (8+9+10+10+11+12+13+14+15+42) / 10 = 144 / 10 = 14.4 hours

Step 3: Find the median

Median (n = 10, even)

Middle two values: 11 and 12 → Median = (11 + 12) / 2 = 11.5 hours

Step 4: Find the mode

Mode

10 appears twice; all others appear once → Mode = 10 hours

Step 5: Compute spread

Range, IQR, and standard deviation

Range = 42 − 8 = 34
Q1 = 10, Q3 = 14 → IQR = 14 − 10 = 4
s² = Σ(xᵢ − 14.4)² / 9 ≈ 958.4 / 9 ≈ 106.5
s = √106.5 ≈ 10.3 hours

Interpretation: The mean (14.4) is pulled upward by the outlier of 42 hours. The median (11.5) better represents the typical student. The large standard deviation (10.3) reflects the influence of that single extreme value. This illustrates why reporting both the mean and median is important when outliers may be present.

When to use which measure

SituationBest measure of centerBest measure of spread
Symmetric distribution, no outliersMeanStandard deviation
Skewed distribution or outliers presentMedianIQR
Nominal (categorical) dataModeN/A (use frequencies)
Ordinal dataMedianIQR or range
Bimodal distributionReport both modesStandard deviation or IQR

Descriptive vs. inferential statistics

Descriptive and inferential statistics serve complementary purposes. Understanding the distinction is fundamental to research design and reporting.

FeatureDescriptive statisticsInferential statistics
PurposeSummarize and describe the data you haveDraw conclusions about a population from a sample
ScopeOnly the data collectedGeneralizes beyond the sample
Typical outputsMean, SD, frequency table, histogramp-values, confidence intervals, effect sizes
Examples"The sample mean score was 72.4 (SD = 8.1)""The treatment group scored significantly higher, t(48) = 3.21, p = .002"
Probability required?NoYes — relies on sampling distributions

Descriptive statistics are always reported first. They let readers assess the data quality, check for plausible values, and judge whether the sample is representative before any inferential claims are considered.

Quick summary

ConceptWhat it measuresKey formula / note
MeanArithmetic averageΣxᵢ / n; sensitive to outliers
MedianMiddle valueResistant to outliers; prefer for skewed data
ModeMost frequent valueOnly center measure for nominal data
RangeMax − MinSimple; affected by outliers
IQRSpread of middle 50%Q3 − Q1; resistant to outliers
VarianceAverage squared deviationΣ(xᵢ − x̄)² / (n−1)
Standard deviationTypical distance from mean√variance; same units as data
SkewnessSymmetry of distributionPositive = right tail; negative = left tail
KurtosisTail heavinessNormal distribution has excess kurtosis = 0

Writing a research paper that uses descriptive statistics? Use CiteGenie to find the academic sources that support your methods and findings.

Find Sources for Your Research