# Correlation Coefficient: Simple Definition, Formula, Easy Steps

parcel on

Correlation coefficients are used to measure how strong a relationship is between two variables. There are several types of correlation coefficient coefficient, but the most popular is Pearson ’ south. Pearson’s correlation ( besides called Pearson ’ s R ) is a correlation coefficient normally used in linear arrested development. If you ’ re starting out in statistics, you ’ ll probably learn about Pearson ’ sulfur R first. In fact, when anyone refers to the correlation coefficient, they are normally talking about Pearson ’ sulfur .
Watch the video for an overview of the correlation coefficient coefficient, or read on below :
Intro to the Correlation Coefficient Watch this video recording on YouTube
Can’t see the video? Can ’ thyroxine see the video ? Click here Contents:

Correlation coefficient formulas are used to find how strong a relationship is between data. The formula hark back a value between -1 and 1, where :

• 1 indicates a strong positive relationship.
• -1 indicates a strong negative relationship.
• A result of zero indicates no relationship at all.

## Meaning

• A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length.
• A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed.
• Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related.

The absolute value of the correlation coefficient coefficient gives us the kinship lastingness. The larger the number, the stronger the kinship. For exercise, |-.75| = .75, which has a stronger relationship than .65 .

Like the explanation ? Check out the Practically Cheating Statistics Handbook, which has hundreds of bit-by-bit, worked out problems !

## Types of correlation coefficient formulas.

There are respective types of correlation coefficient coefficient formula .
One of the most normally used convention is Pearson ’ s correlation coefficient coefficient recipe. If you ’ re taking a basic stats class, this is the one you ’ ll credibly function :

Two other formulas are normally used : the sample distribution correlation coefficient and the population correlation coefficient .

## Sample correlation coefficient

Sx and sy are the sample standard deviations, and sxy is the sample covariance .

## Population correlation coefficient

The population correlation coefficient coefficient uses σx and σy as the population standard deviations, and σxy as the population covariance .
Check out my Youtube channel for more tips and aid with statistics !
correlation between sets of data is a bill of how well they are related. The most common quantify of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear kinship between two sets of data. In elementary terms, it answers the question, Can I draw a channel graph to represent the data ? Two letters are used to represent the Pearson correlation : greek letter rho ( ρ ) for a population and the letter “ radius ” for a sample .

## Potential problems with Pearson correlation.

The PPMC is not able to tell the deviation between subject variables and independent variables. For case, if you are trying to find the correlation between a high gear calorie diet and diabetes, you might find a high correlation of .8. however, you could besides get the lapp solution with the variables switched around. In other words, you could say that diabetes causes a high calorie diet. That obviously makes no common sense. therefore, as a research worker you have to be mindful of the data you are plugging in. In accession, the PPMC will not give you any information about the slope of the line ; it only tells you whether there is a relationship .
Real Life Example
Pearson correlation is used in thousands of real life situations. For example, scientists in China wanted to know if there was a relationship between how weedy rice populations are different genetically. The finish was to find out the evolutionary potential of the rice. Pearson ’ s correlation between the two groups was analyzed. It showed a positive Pearson Product Moment correlation of between 0.783 and 0.895 for weedy rice populations. This figure is quite eminent, which suggested a fairly solid relationship .

If you ’ ra matter to in seeing more examples of PPMC, you can find several studies on the National Institute of Health ’ s Openi web site, which shows resultant role on studies arsenic varied as breast vesicle imaging to the character that carbohydrates play in system of weights loss.

Watch the video recording to learn how to find PPMC by pass.
How to Find Pearson ‘s Correlation Coefficient ( by Hand ) Watch this video recording on YouTube
Can’t see the video? Can ’ metric ton see the video ? Click here Example question : Find the value of the correlation coefficient from the following postpone :

Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 1: Make a graph. Use the given data, and add three more columns : xy, x2, and y2 .

Subject Age x Glucose Level y xy x2 y2
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257 .

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779

Step 3: Take the public square of the numbers in the x column, and put the result in the x2 column .

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the public square of the numbers in the y column, and put the leave in the y2 column .

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the column and put the result at the bottom of the column. The greek letter sigma ( Σ ) is a short way of saying “ union of ” or summation .

Subject Age x Glucose Level y xy x2 y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6: Use the surveil correlation coefficient formula.

The answer is : 2868 / 5413.27 = 0.529809
Click here if you want easy, bit-by-bit instructions for solving this formula .
From our postpone :

• Σx = 247
• Σy = 486
• Σxy = 20,485
• Σx2 = 11,409
• Σy2 = 40,022
• n is the sample size, in our case = 6

The correlation coefficient coefficient =

• 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]
• = 0.5298

The scope of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98 %, which means the variables have a centrist positive correlation coefficient.

Like the explanation ? Check out the Practically Cheating Statistics Handbook, which has hundreds more bit-by-bit explanations, barely like this one !

If you ’ re taking AP Statistics, you won ’ thymine actually have to work the correlation coefficient convention by hand. You ’ ll use your graph calculator. hera ’ s how to find radius on a TI83 .
dance step 1 : Type your data into a list and make a scatter plat to ensure your variables are approximately correlated. In other words, expression for a straight line. not sure how to do this ? See : TI 83 Scatter plat .
step 2 : wardrobe the STAT push button .
pace 3 : coil right to the CALC menu .
footfall 4 : Scroll polish to 4 : LinReg ( ax+b ), then press ENTER. The output will show “ gas constant ” at the very bottom of the list .
Tip : If you don ’ thyroxine witness radius, turn diagnostic ON, then perform the steps again .
Watch the video :
Find the Correlation Coefficient in Excel

Can’t see the video? Can ’ thyroxine see the video ? Click here step 1 : Type your data into two columns in Excel. For example, type your “ ten ” data into column A and your “ y ” data into column B .
step 2 : Select any empty cell.
mistreat 3 : Click the function button on the decoration.

Step 4 : Type “correlation” into the ‘Search for a function’ box.
footstep 5 : Click “Go.” CORREL will be highlighted.

step 6 : Click “OK.”
step 7 : Type the location of your data into the “Array 1” and “Array 2” boxes. For this case, type “ A2 : A10 ” into the Array 1 box and then type “ B2 : B10 ” into the Array 2 box.

Step 8 : Click “OK.” The result will appear in the cell you selected in Step 2. For this particular data hardening, the correlation coefficient ( gas constant ) is -0.1316 .
caution : The results for this examination can be mislead unless you have made a spread plot first to ensure your data approximately fits a straight line. The correlation coefficient in Excel 2007 will always return a rate, even if your data is something other than linear ( i.e. the datum fits an exponential model ) .
That ’ s it !
subscribe to our Youtube Channel for more Excel tips and stats help.
Watch the video for the steps :
Pearson ‘s Correlation Coefficient in SPSS Watch this video on YouTube
Can’t see the video? Can ’ thyroxine see the video recording ? Click here step 1 : Click “Analyze,” then click “Correlate,” then click “Bivariate.” The Bivariate Correlations window will appear.

step 2 : Click one of the variables in the left window of the Bivariate Correlations pop-up book window. then click the kernel arrow to move the variable to the “ Variables : ” window. Repeat this for a second gear varying.

mistreat 3 : Click the “Pearson” check box if it isn ’ t already checked. then click either a “ one-tailed ” or “ two-tailed ” test radio release. If you aren ’ triiodothyronine certain if your test is one-tailed or two-tailed, see : Is it a a one-tailed test or two-tailed test ?
dance step 4 : Click “OK” and read the results. Each box in the output gives you a correlation between two variables. For example, the PPMC for Number of older siblings and GPA is -.098, which means practically no correlation coefficient. You can find this information in two places in the output signal. Why ? This cross-referencing column and rows is very utilitarian when you are comparing PPMCs for dozens of variables.

Tip #1: It ’ s constantly a good idea to make an SPSS scatter plot of your data set before you perform this test. That ’ second because SPSS will always give you some kind of answer and will assume that the datum is linearly related. If you have data that might be better suited to another correlation coefficient ( for exemplar, exponentially related data ) then SPSS will hush run Pearson ’ sulfur for you and you might get misinform results.
Tip #2 : cluck on the “ Options ” clitoris in the Bivariate Correlations window if you want to include descriptive statistics like the mean and standard deviation.
Watch this video recording on how to calculate the correlation coefficient in Minitab :
How to Find Pearson ‘s Correlation Coefficient in Minitab Watch this video on YouTube
Can’t see the video? Can ’ thyroxine see the video ? Click hera The Minitab correlation coefficient will return a measure for r from -1 to 1 .
Example question : Find the Minitab correlation coefficient coefficient based on age vs. glucose level from the following table from a pre-diabetic survey of 6 participants :

Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

step 1 : Type your data into a Minitab worksheet. I entered this sample data into three columns.
step 2 : Click “Stat”, then click “Basic Statistics” and then click “Correlation.”
step 3 : Click a variable name in the left window and then click the “Select” button to move the variable name to the Variable box. For this exemplar question, suction stop “ Age, ” then click “ Select, ” then click “ Glucose Level ” then click “ Select ” to transfer both variables to the Variable window.

gradation 4 : ( Optional ) Check the “P-Value” box if you want to display a P-Value for roentgen .
step 5 : Click “OK”. The Minitab correlation coefficient will be displayed in the Session Window. If you don ’ thyroxine see the results, cluck “ Window ” and then click “ Tile. ” The Session windowpane should appear.

For this dataset:

• Value of r: 0.530
• P-Value: 0.280

For this dataset : That ’ s it !
Tip: Give your column meaningful names ( in the first row of the column, right under C1, C2 and so forth ). That way, when it comes to choosing variable names in Step 3, you ’ ll easily see what it is you are trying to choose. This becomes particularly authoritative when you have dozens of column of variables in a data plane !
Pearson ’ s Correlation Coefficient is a linear correlation coefficient that returns a measure of between -1 and +1. A -1 mean there is a strong negative correlation coefficient and +1 means that there is a impregnable positive correlation. A 0 means that there is no correlation coefficient ( this is besides called zero correlation ) .
This can initially be a little intemperate to wrap your principal around ( who likes to deal with negative numbers ? ). The political Science Department at Quinnipiac University posted this utilitarian number of the mean of Pearson ’ s Correlation coefficients. They note that these are “ crude estimates ” for interpreting strengths of correlations using Pearson ’ second correlation :

 r value = +.70 or higher Very strong positive relationship +.40 to +.69 Strong positive relationship +.30 to +.39 Moderate positive relationship +.20 to +.29 weak positive relationship +.01 to +.19 No or negligible relationship 0 No relationship [zero correlation] -.01 to -.19 No or negligible relationship -.20 to -.29 weak negative relationship -.30 to -.39 Moderate negative relationship -.40 to -.69 Strong negative relationship -.70 or higher Very strong negative relationship

It may be helpful to see graphically what these correlations look like :
The images show that a strong negative correlation means that the graph has a downward slope from left to right : as the x-values increase, the y-values get smaller. A potent positive correlation means that the graph has an up gradient from left to right : as the x-values increase, the y-values get larger.
Cramer ’ s V Correlation is alike to the Pearson Correlation coefficient. While the Pearson correlation is used to test the forte of linear relationships, Cramer ’ randomness V is used to calculate correlation coefficient in tables with more than 2 x 2 columns and rows. Cramer ’ s V correlation coefficient varies between 0 and 1. A value close to 0 means that there is identical little affiliation between the variables. A Cramer ’ s V of near to 1 indicates a very strong association .

 Cramer’s V .25 or higher Very strong relationship .15 to .25 Strong relationship .11 to .15 Moderate relationship .06 to .10 weak relationship .01 to .05 No or negligible relationship

A correlation coefficient coefficient gives you an estimate of how well data fits a line or curl. Pearson wasn ’ t the original inventor of the term correlation but his use of it became one of the most popular ways to measure correlation .
Francis Galton ( who was besides involved with the development of the interquartile crop ) was the first person to measure correlation coefficient, originally termed “ co-relation, ” which actually makes sense considering you ’ re studying the relationship between a couple of different variables. In Co-Relations and Their Measurement, he said

“ The statures of kinsmen are co-related variables ; frankincense, the stature of the forefather is correlated to that of the adult son, ..and so on ; but the index of co-relation … is different in the different cases. ”

It ’ mho worth noting though that Galton mentioned in his paper that he had borrowed the condition from biology, where “ Co-relation and correlation of structure ” was being used but until the time of his newspaper it hadn ’ thymine been by rights defined .
In 1892, british statistician Francis Ysidro Edgeworth published a wallpaper called “ Correlated Averages, ” Philosophical Magazine, 5th Series, 34, 190-204 where he used the condition “ Coefficient of Correlation. ” It wasn ’ triiodothyronine until 1896 that british mathematician Karl Pearson used “ Coefficient of Correlation ” in two papers : Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia. It was the moment paper that introduced the Pearson product-moment correlation rule for estimating correlation.

If you can read a table — you can test for correlation coefficient. Note that correlations should only be calculated for an stallion range of data. If you restrict the crop, roentgen will be weakened .
Sample problem : screen the significance of the correlation coefficient coefficient roentgen = 0.565 using the critical values for PPMC table. trial at α = 0.01 for a sample size of 9 .
Step 1: Subtract two from the sample size to get df, degrees of exemption.
9 – 7 = 2
Step 2: Look the values up in the PPMC Table. With df = 7 and α = 0.01, the table value is = 0.798
Step 3: Draw a graph, so you can more easily see the relationship.

roentgen = 0.565 does not fall into the rejection area ( above 0.798 ), then there international relations and security network ’ deoxythymidine monophosphate enough evidence to express a strong linear relationship exists in the data .
It ’ south rare to use trigonometry in statistics ( you ’ ll never need to find the derivative of tan ( x ) for example ! ), but the kinship between correlation coefficient and cosine is an exception. correlation can be expressed in terms of angles :

• Positive correlation = acute angle <45°,
• Negative correlation = obtuse angle >45°,
• Uncorrelated = orthogonal (right angle).

More specifically, correlation is the cosine of an fish between two vectors defined as follows ( Knill, 2011 ) :

If X, Y are two random variables with zero bastardly, then the covariance Cov [ XY ] = E [ X · Y ] is the dot product of X and Y. The standard deviation of X is the length of X .

## References

Acton, F. S. analysis of Straight-Line Data. New York : dover, 1966.
Edwards, A. L. “ The Correlation Coefficient. ” Ch. 4 in An insertion to Linear Regression and Correlation. San Francisco, CA : W. H. Freeman, pp. 33-46, 1976.
Gonick, L. and Smith, W. “ Regression. ” Ch. 11 in The Cartoon Guide to Statistics. New York : Harper Perennial, pp. 187-210, 1993.
Knill, O. ( 2011 ). Lecture 12 : correlation coefficient. Retrieved April 16, 2021 from : hypertext transfer protocol : //people.math.harvard.edu/~knill/teaching/math19b_2011/handouts/lecture12.pdf
other alike formulas you might come across that imply correlation ( click for article ) :

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – Need help with a homework or test question? With Chegg Study, you can get bit-by-bit solutions to your questions from an expert in the field. Your beginning 30 minutes with a Chegg tutor is free !