Paired Data
Key Formula
r=(n∑xi2−(∑xi)2)(n∑yi2−(∑yi)2)n∑xiyi−∑xi∑yi
Where:
- (xi,yi) = The individual paired data points
- n = The number of paired data points
- r = The Pearson correlation coefficient, measuring the strength and direction of the linear relationship between the paired variables (ranges from −1 to 1)
- ∑ = Summation over all data pairs from i = 1 to n
Worked Example
Problem: A teacher records the number of hours five students studied and their test scores. The paired data is: (1, 50), (2, 60), (3, 65), (4, 80), (5, 90). Find the correlation coefficient r to determine how strongly study hours and test scores are related.
Step 1: List the paired data and compute the needed sums. Here n = 5.
∑xi=1+2+3+4+5=15
Step 2: Compute the sum of the y-values and the sum of the products.
∑yi=50+60+65+80+90=345∑xiyi=50+120+195+320+450=1135
Step 3: Compute the sums of squares for x and y.
∑xi2=1+4+9+16+25=55∑yi2=2500+3600+4225+6400+8100=24825
Step 4: Substitute into the correlation formula. Compute the numerator first.
Numerator=5(1135)−(15)(345)=5675−5175=500
Step 5: Compute the denominator, then divide to find r.
Denom=(5⋅55−152)(5⋅24825−3452)=(275−225)(124125−119025)=50⋅5100=255000≈505.0
Answer: r ≈ 500 / 505.0 ≈ 0.99. The correlation coefficient is very close to 1, indicating a strong positive linear relationship between hours studied and test scores.
Another Example
This example focuses on organizing paired data and computing basic descriptive statistics rather than calculating the full correlation coefficient. It also uses only four data points to illustrate that paired data sets can vary in size.
Problem: A researcher collects paired data on daily temperature (°F) and the number of ice cream cones sold at a shop: (60, 30), (70, 40), (80, 55), (90, 70). Organize the paired data, plot the general trend, and compute the mean of each variable.
Step 1: Identify the paired structure. Each pair links one temperature reading to one sales figure. Temperature is the independent variable x, and sales is the dependent variable y.
(x1,y1)=(60,30),(x2,y2)=(70,40),(x3,y3)=(80,55),(x4,y4)=(90,70)
Step 2: Find the mean of the x-values (temperature).
xˉ=460+70+80+90=4300=75
Step 3: Find the mean of the y-values (ice cream sales).
yˉ=430+40+55+70=4195=48.75
Step 4: Describe the trend. As temperature increases from 60 to 90, sales increase from 30 to 70. On a scatterplot, these four points would rise from left to right, suggesting a positive association.
Answer: The mean temperature is 75°F and the mean sales count is 48.75 cones. The paired data shows a clear positive trend: higher temperatures correspond to higher ice cream sales.
Frequently Asked Questions
What is the difference between paired data and unpaired data?
Paired data links each observation in one set to a specific observation in another set—for example, the same student's score before and after tutoring. Unpaired (or independent) data compares two groups with no natural one-to-one connection, such as test scores from two different classes. The distinction matters because paired data requires statistical methods (like the paired t-test) that account for the built-in connection between observations.
How do you display paired data on a graph?
The most common way to display paired data is with a scatterplot. You plot each ordered pair (x,y) as a point on a coordinate plane, with the independent variable on the horizontal axis and the dependent variable on the vertical axis. The resulting pattern of dots reveals the type and strength of the relationship between the two variables.
When do you use paired data in statistics?
You use paired data whenever two measurements are naturally linked. Common situations include before-and-after studies on the same subjects, matching measurements from twins, or recording two different attributes of the same item (like a car's weight and fuel efficiency). Recognizing that data is paired lets you use more powerful statistical tests that reduce variability from individual differences.
Paired Data vs. Unpaired (Independent) Data
| Paired Data | Unpaired (Independent) Data | |
|---|---|---|
| Definition | Two data sets where each value in one set corresponds to exactly one value in the other | Two data sets with no natural one-to-one correspondence between values |
| Example | Same patient's blood pressure before and after medication | Blood pressure of patients in Group A vs. a separate Group B |
| Graph type | Scatterplot (one point per pair) | Side-by-side boxplots or histograms |
| Statistical test | Paired t-test or Wilcoxon signed-rank test | Two-sample t-test or Mann-Whitney U test |
| Key advantage | Controls for individual variability by comparing each subject to itself | Does not require matching; groups can be different sizes |
Why It Matters
Paired data appears throughout algebra, statistics, and science courses whenever you study the relationship between two variables. You encounter it when plotting points on a coordinate plane, calculating lines of best fit, and analyzing experimental results with before-and-after measurements. Recognizing that your data is paired—rather than independent—determines which statistical methods give valid conclusions and can lead to more precise results by eliminating subject-to-subject variability.
Common Mistakes
Mistake: Treating paired data as unpaired and using a two-sample t-test instead of a paired t-test.
Correction: When each observation in one set has a natural partner in the other set, always use paired methods. Ignoring the pairing throws away useful information and typically produces less accurate results.
Mistake: Swapping the x and y values in ordered pairs or misidentifying which variable is independent.
Correction: Order matters in paired data. By convention, the first value in each pair is the independent variable (x) and the second is the dependent variable (y). Reversing them changes the scatterplot orientation and the interpretation of the regression equation.
Related Terms
- Coordinates — Paired data points are plotted as coordinates
- Scatterplot — Primary graph type for displaying paired data
- Linear Fit — Models the trend in paired data with a line
- Least-Squares Regression Line — Best-fit line computed from paired data
- Ordered Pair — The fundamental format of each paired data entry
- Correlation — Measures strength of relationship in paired data
- Bivariate Data — Another name for data involving two variables

