Descriptive Statistics: A Complete Guide
Before you can test anything, you need to know what your data actually looks like. That's the job of descriptive statistics. No hypotheses yet, no inferences — just summaries that turn a column of numbers into something you can read. This guide walks through the basics: central tendency, spread, shape, and frequency distributions, each with worked examples.
What are descriptive statistics?
Descriptive statistics are numerical and graphical summaries that organize, simplify, and present data in a meaningful way. They describe the characteristics of a data set without drawing conclusions beyond that data — no predictions, no generalizations to a larger population. The goal is clarity: to replace a long list of raw numbers with a handful of informative values.
Every quantitative study begins with descriptive statistics. A clinical trial reports average baseline scores. A survey paper reports the percentage of respondents in each category. An educational study reports the mean and standard deviation of exam scores. These summaries allow readers to understand the sample and evaluate the plausibility of the findings.
Descriptive statistics fall into four families:
- Central tendency — where the middle of the data lies (mean, median, mode)
- Variability (spread) — how spread out the data are (range, variance, standard deviation, IQR)
- Shape — the symmetry and tail behavior of the distribution (skewness, kurtosis)
- Frequency — how often each value or category appears (frequency tables, histograms)
Measures of central tendency
A measure of central tendency is a single value that represents the "center" or "typical" value of a data set. Three measures dominate: the mean, the median, and the mode.
Mean (arithmetic average)
The mean is the sum of all values divided by the number of values. It is the most widely used measure of center and is the basis for many statistical tests.
μ = (Σxᵢ) / N
x̄ = (Σxᵢ) / n
The mean uses every data point, which makes it sensitive to outliers. A single extreme value can pull the mean substantially away from where most data cluster.
Median
The median is the middle value when data are arranged in ascending order. If there is an even number of values, the median is the average of the two middle values. The median is resistant to outliers — extreme values do not affect it.
Data: 3, 7, 9, 12, 15 → Median = 9 (the third of five values)
Data: 3, 7, 9, 12 → Median = (7 + 9) / 2 = 8
Mode
The mode is the value that appears most frequently. A data set can have no mode (all values unique), one mode (unimodal), two modes (bimodal), or more (multimodal). The mode is the only appropriate measure of center for nominal (categorical) data.
Data: 2, 4, 4, 5, 7, 7, 7, 9 → Mode = 7 (appears three times)
Measures of spread (variability)
Knowing where the center of the data lies is not enough. Two data sets can have identical means but very different distributions. Measures of spread quantify how much the values differ from one another.
Range
The range is the simplest measure of spread: maximum value minus minimum value. It is easy to compute but highly sensitive to outliers because it only uses two data points.
Data: 14, 18, 21, 23, 67 → Range = 67 − 14 = 53
Interquartile range (IQR)
The IQR is the range of the middle 50% of the data: IQR = Q3 − Q1, where Q1 is the 25th percentile and Q3 is the 75th percentile. It is resistant to outliers and is the preferred spread measure when the median is used as the center.
Variance
The variance is the average of the squared deviations from the mean. Squaring removes negative signs and gives extra weight to large deviations.
s² = Σ(xᵢ − x̄)² / (n − 1)
The denominator is n − 1 (not n) for sample variance. This is called Bessel's correction; it produces an unbiased estimate of the population variance.
Standard deviation
The standard deviation is the square root of the variance. Because it is expressed in the same units as the original data, it is much more interpretable than variance.
s = √[Σ(xᵢ − x̄)² / (n − 1)]
A small standard deviation means values cluster tightly around the mean; a large one means they are spread out. In a normal distribution, roughly 68% of values fall within one standard deviation of the mean.
Shape of a distribution
The shape of a distribution describes how data are distributed across their range. The two most important shape characteristics are skewness and kurtosis.
Skewness
Skewness measures the asymmetry of a distribution. A perfectly symmetrical distribution has skewness = 0.
- Positive skew (right skew): The tail extends to the right; mean > median. Common in income data — most people earn moderate amounts, but a few earn extremely high salaries.
- Negative skew (left skew): The tail extends to the left; mean < median. Example: age at retirement (most people retire at similar ages; a few retire very early).
Kurtosis
Kurtosis describes the "tailedness" — whether extreme values are more or less common than in a normal distribution. A normal distribution has kurtosis = 3 (or excess kurtosis = 0). High kurtosis (leptokurtic) means heavier tails and a sharper peak; low kurtosis (platykurtic) means lighter tails and a flatter peak.
Frequency distributions
A frequency distribution shows how often each value (or range of values) appears in a data set. It is typically displayed as a table, histogram, or bar chart and is often the first step in exploring a new data set.
Absolute frequency
The count of observations in each category or interval.
Relative frequency
The proportion of observations in each category: relative frequency = count / total n. Multiplying by 100 gives the percentage.
Cumulative frequency
The running total of frequencies from the lowest to a given value. Useful for finding percentiles and understanding how data accumulate across a range.
| Score range | Frequency | Relative frequency | Cumulative frequency |
|---|---|---|---|
| 50–59 | 2 | 6.7% | 2 |
| 60–69 | 5 | 16.7% | 7 |
| 70–79 | 10 | 33.3% | 17 |
| 80–89 | 8 | 26.7% | 25 |
| 90–99 | 5 | 16.7% | 30 |
Worked example with a data set
Suppose a researcher measures the number of hours per week that 10 university students spend studying:
8, 12, 15, 10, 9, 14, 11, 13, 10, 42
Step 1: Sort the data
8, 9, 10, 10, 11, 12, 13, 14, 15, 42
Step 2: Compute the mean
x̄ = (8+9+10+10+11+12+13+14+15+42) / 10 = 144 / 10 = 14.4 hours
Step 3: Find the median
Middle two values: 11 and 12 → Median = (11 + 12) / 2 = 11.5 hours
Step 4: Find the mode
10 appears twice; all others appear once → Mode = 10 hours
Step 5: Compute spread
Range = 42 − 8 = 34
Q1 = 10, Q3 = 14 → IQR = 14 − 10 = 4
s² = Σ(xᵢ − 14.4)² / 9 ≈ 958.4 / 9 ≈ 106.5
s = √106.5 ≈ 10.3 hours
When to use which measure
| Situation | Best measure of center | Best measure of spread |
|---|---|---|
| Symmetric distribution, no outliers | Mean | Standard deviation |
| Skewed distribution or outliers present | Median | IQR |
| Nominal (categorical) data | Mode | N/A (use frequencies) |
| Ordinal data | Median | IQR or range |
| Bimodal distribution | Report both modes | Standard deviation or IQR |
Descriptive vs. inferential statistics
Descriptive and inferential statistics serve complementary purposes. Understanding the distinction is fundamental to research design and reporting.
| Feature | Descriptive statistics | Inferential statistics |
|---|---|---|
| Purpose | Summarize and describe the data you have | Draw conclusions about a population from a sample |
| Scope | Only the data collected | Generalizes beyond the sample |
| Typical outputs | Mean, SD, frequency table, histogram | p-values, confidence intervals, effect sizes |
| Examples | "The sample mean score was 72.4 (SD = 8.1)" | "The treatment group scored significantly higher, t(48) = 3.21, p = .002" |
| Probability required? | No | Yes — relies on sampling distributions |
Descriptive statistics are always reported first. They let readers assess the data quality, check for plausible values, and judge whether the sample is representative before any inferential claims are considered.
Quick summary
| Concept | What it measures | Key formula / note |
|---|---|---|
| Mean | Arithmetic average | Σxᵢ / n; sensitive to outliers |
| Median | Middle value | Resistant to outliers; prefer for skewed data |
| Mode | Most frequent value | Only center measure for nominal data |
| Range | Max − Min | Simple; affected by outliers |
| IQR | Spread of middle 50% | Q3 − Q1; resistant to outliers |
| Variance | Average squared deviation | Σ(xᵢ − x̄)² / (n−1) |
| Standard deviation | Typical distance from mean | √variance; same units as data |
| Skewness | Symmetry of distribution | Positive = right tail; negative = left tail |
| Kurtosis | Tail heaviness | Normal distribution has excess kurtosis = 0 |
Writing a research paper that uses descriptive statistics? Use CiteGenie to find the academic sources that support your methods and findings.
Find Sources for Your Research