Subscribe to DSC Newsletter

Machine Learning with Python - Linear Regression Model

I am pursuing a course in Data Science with Python. When i tried to implement Linear Regression model to predict the new outcome, it changes every time i re-run the "cross_validation.train_test_split()" function. I also noticed that whenever i run this cell not only the outcome changes but the intercept value, Coefficient value and Mean Square Error keeps on changing when changing my training and testing data set. 

My questions:

1) What does Mean Square error signifies in Linear Regression? If it tells about the error in predicted outcome then, how to optimise my model so that MSE is the lowest?

2) Does "cross_validation.train_test_split()" function splits data at random for training and testing data set?

Views: 236

Reply to This

Replies to This Discussion

Wikipedia gives a quite good answer to question 1)

Also in regression analysis, "mean squared error", often referred to as mean squared prediction error or "out-of-sample mean squared error", can refer to the mean value of the squared deviations of the predictions from the true values, over an out-of-sample test space, generated by a model estimated over a particular sample space. This also is a known, computed quantity, and it varies by sample and by out-of-sample test space.

An obvious approach is to minimize your error in prediction, i.e. to minimize, for instance, the MSE between the real-world data Y and the predicted data Y'. One way is to minimize the L2-metric |MSE(Y,Y')| by changing the values that determines Y'.

2) I do not know this specific (python) function, however, cross validation is about the following process:

  1. Split your sample randomly to train and test data
  2. Fit the model to train set
  3. Test the model on test set
  4. Calculate the prediction error (e.g. using the MSE)
  5. Repeat the process n times

The heuristic behind is to average the "noise" (of the model) and thus to get a robust model.

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service