Difference Between Correlation and Regression in Statistics

Correlation is a measure of linear association between two variables X and Y, while linear regression is a technique to make predictions, using the following model:

Y = a0a1 X1 + ... + ak Xk + Error

Here Y is the response (what we want to predict, for instance revenue) while the Xi's are the predictors (say gender, with 0 = male, 1 = female, education level, age, etc.) The predictors are sometimes called independent variables, or features in machine learning.

Typically,  the predictors are somewhat correlated to the response. In regression, we want to maximize the absolute value of the correlation between the observed response and the linear combination of the predictors. We choose the parameters a0, ..., ak that accomplish this goal. The square of the correlation coefficient in question is called  the R-squared coefficient. The coefficients a0, ..., ak are called the model parameters and a0 (sometimes set to zero) is called the intercept.

There are various types of correlation coefficient as well as regression. For more details, see

Views: 48826

Tags: dsc_analytics, dsc_tagged


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on September 18, 2020 at 9:41am

Thank you Omer, I fixed the mistake.

Comment by Omer Sayli on September 18, 2020 at 3:07am


In the first paragraph, there is  a mistake.  In regression analysis, the dependent variable is denoted "Y" and the independent variables are denoted by "X".  But it is stated here that " The predictors are sometimes called dependent variables, or features in machine learning.". 

Comment by Sabastian Mukonza on July 12, 2020 at 6:33am

is it wrong to extend the scope of correlation to non-linear associations like exponential association between two variables for example x and y which are exponentially associated

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service