Hi Everyone,
Recently i gave an interview for data science opening and interviewer asked optimisation algorithm for logistic regression. I answered gradient descent and explained same.
He said these days nobody use gradient descent? Is that true and if yes what are alternatives.
What are disadvantages of gradient descent because of which it is not used anymore?
Thanks for your help in advance.
Tags:
Here is a great alternative: swarm optimization, see https://www.datasciencecentral.com/profiles/blogs/swarm-optimizatio...
The animated picture below shows how it works:
Genetic Algorithms may also be of interest. In general, the problem domain is that of Global Optimization.
I think the interviewer was looking for some variation of gradient descent, because according to my knowledge gradient descent is used and used heavily; but with some tweaks. Check this momentum-rmsprop-and-adam, it'll give the idea.
I think the interviewer its confused ... Stochastic gradient descent is heavily used and its the "de facto" method for learning
Of course there are very good alternatives like Swarm optimization, but I will not say that nobody use SGD nowadays ...
Thanks Vincent will try to implement and replace with gradient descent.
Vincent Granville said:
Here is a great alternative: swarm optimization, see https://www.datasciencecentral.com/profiles/blogs/swarm-optimizatio...
The animated picture below shows how it works:
I also think so will try to compare both stochastic and swarm. Thanks
Eduardo Di Santi said:
I think the interviewer its confused ... Stochastic gradient descent is heavily used and its the "de facto" method for learning
Of course there are very good alternatives like Swarm optimization, but I will not say that nobody use SGD nowadays ...
I don't agree with the interviewer fully. ADAM, ADAGRAD,SGD are some of the popular flavors of gradient descent method used in solving real world problems when employing regression (of any kind) techniques.
May be there is a specific problem he had in mind....
That is a very nice article. Thanks for sharing.
surya prakash Sahu said:
I think the interviewer was looking for some variation of gradient descent, because according to my knowledge gradient descent is used and used heavily; but with some tweaks. Check this momentum-rmsprop-and-adam, it'll give the idea.
There are many optimization methods out there. Gradient Descent (GD) is just one technique (among SO many others) for discovering minima of functions, and is used when you don't have other [better] options. For General Linear Models (GLMs), go with traditional methods, e.g., Newton-Raphson is a classic workhorse for a great many situations. I would also invest time in learning about SVD (!!!), QR factorization, etc.
Please review: http://www.stat.cmu.edu/~cshalizi/350/lectures/26/lecture-26.pdf which goes into Newton in some reasonable detail.
*Pro-Tip: No one should be using gradient descent for simple linear least squares problems, except (maybe) to illustrate how gradient descent itself works so that it can be applied to 'highly' nonlinear scenarios later on. For most people, I wouldn't recommend they jump to swarm, ...or gradient descent, ... or stochastic anything ...until you learn the basics.
For data scientists in training, I would recommend some courses in statistical/scientific computing, starting with the Golub and Ortega classic: https://www.amazon.com/Scientific-Computing-Differential-Equations-...
I would clarify here:
1. SVD is NOT SGD; i.e., I really did mean to write SVD when I wrote SVD.
2. SGD scales well and therefore also very useful. But not what you should learn first, IMO.
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service