Correlation is a measure of linear association between two variables *X* and *Y*, while linear regression is a technique to make predictions, using the following model:

*Y* = *a*0 + *a*1 *X*1 + … + *ak* *Xk* + Error

Here *Y* is the response (what we want to predict, for instance revenue) while the *X*i‘s are the predictors (say gender, with 0 = male, 1 = female, education level, age, etc.) The predictors are sometimes called independent variables, or features in machine learning.

Typically, the predictors are somewhat correlated to the response. In regression, we want to maximize the absolute value of the correlation between the observed response and the linear combination of the predictors. We choose the parameters *a*0, …, a*k* that accomplish this goal. The square of the correlation coefficient in question is called the R-squared coefficient. The coefficients *a*0, …, *a**k* are called the model parameters and *a*0 (sometimes set to zero) is called the intercept.

There are various types of correlation coefficient as well as regression. For more details, see