Mathwords: Paired Data

Key Formula

r = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{\sqrt{\left(n\sum x_i^2 - \left(\sum x_i\right)^2\right)\left(n\sum y_i^2 - \left(\sum y_i\right)^2\right)}}

Where:

$(x_i, y_i)$ = The individual paired data points
$n$ = The number of paired data points
$r$ = The Pearson correlation coefficient, measuring the strength and direction of the linear relationship between the paired variables (ranges from −1 to 1)
$\sum$ = Summation over all data pairs from i = 1 to n

Worked Example

Problem: A teacher records the number of hours five students studied and their test scores. The paired data is: (1, 50), (2, 60), (3, 65), (4, 80), (5, 90). Find the correlation coefficient r to determine how strongly study hours and test scores are related.

Step 1: List the paired data and compute the needed sums. Here n = 5.

\sum x_i = 1+2+3+4+5 = 15

Step 2: Compute the sum of the y-values and the sum of the products.

\sum y_i = 50+60+65+80+90 = 345 \qquad \sum x_i y_i = 50+120+195+320+450 = 1135

Step 3: Compute the sums of squares for x and y.

\sum x_i^2 = 1+4+9+16+25 = 55 \qquad \sum y_i^2 = 2500+3600+4225+6400+8100 = 24825

Step 4: Substitute into the correlation formula. Compute the numerator first.

\text{Numerator} = 5(1135) - (15)(345) = 5675 - 5175 = 500

Step 5: Compute the denominator, then divide to find r.

\text{Denom} = \sqrt{(5 \cdot 55 - 15^2)(5 \cdot 24825 - 345^2)} = \sqrt{(275-225)(124125-119025)} = \sqrt{50 \cdot 5100} = \sqrt{255000} \approx 505.0

Answer: r ≈ 500 / 505.0 ≈ 0.99. The correlation coefficient is very close to 1, indicating a strong positive linear relationship between hours studied and test scores.

Another Example

This example focuses on organizing paired data and computing basic descriptive statistics rather than calculating the full correlation coefficient. It also uses only four data points to illustrate that paired data sets can vary in size.

Problem: A researcher collects paired data on daily temperature (°F) and the number of ice cream cones sold at a shop: (60, 30), (70, 40), (80, 55), (90, 70). Organize the paired data, plot the general trend, and compute the mean of each variable.

Step 1: Identify the paired structure. Each pair links one temperature reading to one sales figure. Temperature is the independent variable x, and sales is the dependent variable y.

(x_1, y_1) = (60, 30),\; (x_2, y_2) = (70, 40),\; (x_3, y_3) = (80, 55),\; (x_4, y_4) = (90, 70)

Step 2: Find the mean of the x-values (temperature).

\bar{x} = \frac{60+70+80+90}{4} = \frac{300}{4} = 75

Step 3: Find the mean of the y-values (ice cream sales).

\bar{y} = \frac{30+40+55+70}{4} = \frac{195}{4} = 48.75

Step 4: Describe the trend. As temperature increases from 60 to 90, sales increase from 30 to 70. On a scatterplot, these four points would rise from left to right, suggesting a positive association.

Answer: The mean temperature is 75°F and the mean sales count is 48.75 cones. The paired data shows a clear positive trend: higher temperatures correspond to higher ice cream sales.

Frequently Asked Questions

What is the difference between paired data and unpaired data?

Paired data links each observation in one set to a specific observation in another set—for example, the same student's score before and after tutoring. Unpaired (or independent) data compares two groups with no natural one-to-one connection, such as test scores from two different classes. The distinction matters because paired data requires statistical methods (like the paired t-test) that account for the built-in connection between observations.

How do you display paired data on a graph?

The most common way to display paired data is with a scatterplot. You plot each ordered pair

(x, y)

as a point on a coordinate plane, with the independent variable on the horizontal axis and the dependent variable on the vertical axis. The resulting pattern of dots reveals the type and strength of the relationship between the two variables.

When do you use paired data in statistics?

You use paired data whenever two measurements are naturally linked. Common situations include before-and-after studies on the same subjects, matching measurements from twins, or recording two different attributes of the same item (like a car's weight and fuel efficiency). Recognizing that data is paired lets you use more powerful statistical tests that reduce variability from individual differences.

Paired Data vs. Unpaired (Independent) Data

	Paired Data	Unpaired (Independent) Data
Definition	Two data sets where each value in one set corresponds to exactly one value in the other	Two data sets with no natural one-to-one correspondence between values
Example	Same patient's blood pressure before and after medication	Blood pressure of patients in Group A vs. a separate Group B
Graph type	Scatterplot (one point per pair)	Side-by-side boxplots or histograms
Statistical test	Paired t-test or Wilcoxon signed-rank test	Two-sample t-test or Mann-Whitney U test
Key advantage	Controls for individual variability by comparing each subject to itself	Does not require matching; groups can be different sizes

Why It Matters

Paired data appears throughout algebra, statistics, and science courses whenever you study the relationship between two variables. You encounter it when plotting points on a coordinate plane, calculating lines of best fit, and analyzing experimental results with before-and-after measurements. Recognizing that your data is paired—rather than independent—determines which statistical methods give valid conclusions and can lead to more precise results by eliminating subject-to-subject variability.

Common Mistakes

Mistake: Treating paired data as unpaired and using a two-sample t-test instead of a paired t-test.

Correction: When each observation in one set has a natural partner in the other set, always use paired methods. Ignoring the pairing throws away useful information and typically produces less accurate results.

Mistake: Swapping the x and y values in ordered pairs or misidentifying which variable is independent.

Correction: Order matters in paired data. By convention, the first value in each pair is the independent variable (x) and the second is the dependent variable (y). Reversing them changes the scatterplot orientation and the interpretation of the regression equation.

Related Terms

Coordinates — Paired data points are plotted as coordinates
Scatterplot — Primary graph type for displaying paired data
Linear Fit — Models the trend in paired data with a line
Least-Squares Regression Line — Best-fit line computed from paired data
Ordered Pair — The fundamental format of each paired data entry
Correlation — Measures strength of relationship in paired data
Bivariate Data — Another name for data involving two variables