Subscribe to DSC Newsletter

# Machine Learning with Python - Linear Regression Model

I am pursuing a course in Data Science with Python. When i tried to implement Linear Regression model to predict the new outcome, it changes every time i re-run the "cross_validation.train_test_split()" function. I also noticed that whenever i run this cell not only the outcome changes but the intercept value, Coefficient value and Mean Square Error keeps on changing when changing my training and testing data set.

My questions:

1) What does Mean Square error signifies in Linear Regression? If it tells about the error in predicted outcome then, how to optimise my model so that MSE is the lowest?

2) Does "cross_validation.train_test_split()" function splits data at random for training and testing data set?

Views: 426

### Replies to This Discussion

Wikipedia gives a quite good answer to question 1)

Also in regression analysis, "mean squared error", often referred to as mean squared prediction error or "out-of-sample mean squared error", can refer to the mean value of the squared deviations of the predictions from the true values, over an out-of-sample test space, generated by a model estimated over a particular sample space. This also is a known, computed quantity, and it varies by sample and by out-of-sample test space.

An obvious approach is to minimize your error in prediction, i.e. to minimize, for instance, the MSE between the real-world data Y and the predicted data Y'. One way is to minimize the L2-metric |MSE(Y,Y')| by changing the values that determines Y'.

2) I do not know this specific (python) function, however, cross validation is about the following process:

1. Split your sample randomly to train and test data
2. Fit the model to train set
3. Test the model on test set
4. Calculate the prediction error (e.g. using the MSE)
5. Repeat the process n times

The heuristic behind is to average the "noise" (of the model) and thus to get a robust model.

1

2

3

4

5

6

## Videos

• ### Self-Service Analytics Case Study: Accelerating Speed-to-Insight

Added by Tim Matteson

• ### Data Viz – Death to Flat Dashboards

Added by Tim Matteson