Mathwords logoReference LibraryMathwords

Positively Associated Data

Positively Associated Data

A relationship in paired data in which the two sets of data tend to increase together or decrease together. In a scatterplot, positively associated data tend to follow a pattern from the lower left to the upper right. Positively associated data have a positive correlation coefficient.

 

Scatterplot with x and y axes showing dots trending from lower-left to upper-right, labeled "Positively Associated Data.

 

 

See also

Negatively associated data

Key Formula

r=nxy(x)(y)[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{\left[n\sum x^2 - (\sum x)^2\right]\left[n\sum y^2 - (\sum y)^2\right]}}
Where:
  • rr = Correlation coefficient; for positively associated data, r is between 0 and 1
  • nn = Number of data pairs
  • xx = Values of the first variable (independent variable)
  • yy = Values of the second variable (dependent variable)
  • xy\sum xy = Sum of the products of each paired x and y value
  • x\sum x = Sum of all x values
  • y\sum y = Sum of all y values

Worked Example

Problem: Five students recorded the number of hours they studied and their test scores: (1, 50), (2, 60), (3, 65), (4, 80), (5, 90). Determine whether the data are positively associated by computing the correlation coefficient r.
Step 1: List the values and compute the needed sums. Here n = 5.
x=1+2+3+4+5=15\sum x = 1+2+3+4+5 = 15
Step 2: Find the sum of y values and the sum of the products xy.
y=50+60+65+80+90=345xy=50+120+195+320+450=1135\sum y = 50+60+65+80+90 = 345 \quad\quad \sum xy = 50+120+195+320+450 = 1135
Step 3: Find the sum of squared x values and squared y values.
x2=1+4+9+16+25=55y2=2500+3600+4225+6400+8100=24825\sum x^2 = 1+4+9+16+25 = 55 \quad\quad \sum y^2 = 2500+3600+4225+6400+8100 = 24825
Step 4: Substitute into the correlation coefficient formula.
r=5(1135)(15)(345)[5(55)152][5(24825)3452]=56755175(275225)(124125119025)=500505100r = \frac{5(1135) - (15)(345)}{\sqrt{[5(55) - 15^2][5(24825) - 345^2]}} = \frac{5675 - 5175}{\sqrt{(275-225)(124125-119025)}} = \frac{500}{\sqrt{50 \cdot 5100}}
Step 5: Simplify to find r.
r=500255000=500505.00.990r = \frac{500}{\sqrt{255000}} = \frac{500}{505.0} \approx 0.990
Answer: The correlation coefficient r ≈ 0.99, which is positive and very close to 1. The data are strongly positively associated — as study hours increase, test scores increase.

Another Example

This example uses a real-world context (temperature vs. sales) and starts with a visual/scatterplot observation before confirming numerically, showing that positive association can often be identified by the pattern of the data before computing r.

Problem: A store tracks the temperature outside (°F) and the number of cold drinks sold over four days: (60, 20), (70, 25), (80, 40), (90, 50). Without computing r exactly, determine from a scatterplot description whether the data show positive association, then verify with the correlation formula.
Step 1: Plot the points mentally or on paper. As temperature (x) increases from 60 to 90, drinks sold (y) increases from 20 to 50. The pattern rises from lower left to upper right, suggesting positive association.
Step 2: Compute the required sums with n = 4.
x=300,y=135,xy=60(20)+70(25)+80(40)+90(50)=1200+1750+3200+4500=10650\sum x = 300,\quad \sum y = 135,\quad \sum xy = 60(20)+70(25)+80(40)+90(50) = 1200+1750+3200+4500 = 10650
Step 3: Find the sums of squares.
x2=3600+4900+6400+8100=23000y2=400+625+1600+2500=5125\sum x^2 = 3600+4900+6400+8100 = 23000 \quad\quad \sum y^2 = 400+625+1600+2500 = 5125
Step 4: Apply the formula.
r=4(10650)(300)(135)[4(23000)3002][4(5125)1352]=4260040500(9200090000)(2050018225)=210020002275r = \frac{4(10650)-(300)(135)}{\sqrt{[4(23000)-300^2][4(5125)-135^2]}} = \frac{42600-40500}{\sqrt{(92000-90000)(20500-18225)}} = \frac{2100}{\sqrt{2000 \cdot 2275}}
Step 5: Simplify.
r=21004550000=21002133.10.984r = \frac{2100}{\sqrt{4550000}} = \frac{2100}{2133.1} \approx 0.984
Answer: r ≈ 0.984, confirming strong positive association. Higher temperatures correspond to more cold drinks sold.

Frequently Asked Questions

What is the difference between positively associated data and negatively associated data?
Positively associated data have variables that increase together (r > 0), so the scatterplot rises from lower left to upper right. Negatively associated data have one variable increasing while the other decreases (r < 0), so the scatterplot falls from upper left to lower right. The key distinction is the direction of the trend.
Does positive association mean one variable causes the other?
No. Positive association shows that two variables move in the same direction, but it does not prove causation. There could be a hidden third variable (a confounding variable) driving both. For example, ice cream sales and drowning incidents are positively associated, but hot weather is the underlying cause of both — ice cream does not cause drowning.
Can data be positively associated but not perfectly linear?
Yes. Data can be positively associated with any r value between 0 and 1 (exclusive). An r of 0.4, for instance, indicates a weak positive association — the general trend is upward, but the points are scattered widely around the trend line. Only r = 1 represents a perfect positive linear relationship.

Positively Associated Data vs. Negatively Associated Data

Positively Associated DataNegatively Associated Data
Direction of trendBoth variables increase or decrease togetherOne variable increases while the other decreases
Correlation coefficient (r)0 < r ≤ 1−1 ≤ r < 0
Scatterplot patternRises from lower left to upper rightFalls from upper left to lower right
Real-world exampleHeight vs. weight in peoplePrice of a product vs. quantity demanded
Slope of best-fit linePositive slopeNegative slope

Why It Matters

Identifying positive association is one of the first skills you learn in statistics and data analysis. You encounter it when studying scatterplots in algebra, in science classes when analyzing experimental data, and in standardized tests like the SAT. Understanding whether data are positively associated helps you make predictions — if you know the trend, you can estimate one variable's value from the other using a line of best fit.

Common Mistakes

Mistake: Assuming positive association means causation.
Correction: Correlation does not imply causation. Two variables can move together because of a third, hidden factor. Always look for confounding variables before concluding that one variable causes another.
Mistake: Thinking any upward-looking cluster of points means strong positive association.
Correction: Strength depends on how tightly the points cluster around a line. A loose upward cloud might have r = 0.3 (weak positive association), while tightly packed points along a line could give r = 0.95 (strong). Always check the correlation coefficient to judge strength, not just the general direction.

Related Terms