UPDATE: Mar 20, 2016 – Added my new follow-up course on Deep Learning, which covers ways to speed up and improve vanilla backpropagation: momentum and Nesterov momentum, adaptive learning rate algorithms like AdaGrad and RMSProp, utilizing the GPU on AWS EC2, and stochastic batch gradient descent. We look at TensorFlow and Theano starting from the basics – variables, functions, expressions, and simple optimizations – from there, building a neural network seems simple!

https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow

Deep learning is all the rage these days. What exactly is deep learning? Well, it all boils down to neural networks. Neural networks have been around for decades, just that no one used to call them deep networks back then.

Now we have all sorts of different flavors of neural networks – deep belief networks (DBNs), convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and more.

There are also a ton of different learning algorithms for them nowadays too. It used to be just backpropagation, but now you’ve got contrastive divergence, dropout, DropConnect, and all sorts of other modifications to the vanilla gradient descent algorithm.

Where do you start?

I like to start from the basics since everything else builds on top of that.

Our first stop is **linear regression**.

Linear regression is a model that maps inputs (X) to outputs (Y) by some linear function, i.e. Y = WX + b.

Linear regression is a great learning tool before diving into deep networks because it teaches you some very important concepts, namely:

- How to create an objective function
- Solving for the optimal W and b
- Presents a simple version of the same functional form (WX + b) that we’ll use all throughout deep learning

You can learn more about linear regression by taking my in-depth online video course, Data Science: Linear Regression in Python:

https://www.udemy.com/data-science-linear-regression-in-python

It uses Python and Numpy to show you how you can implement linear regression on your own.

Our next stop is **logistic regression**.

The first difference you’ll see when studying logistic regression is that it does classification, not regression, like linear regression did. So instead of the output Y being any number, now Y = {1, …, K} to represent K different classes.

If we assume Y = {0, 1} then we get a binary classifier.

Our model gets a little more complex, but not by much. Now, P(Y=1|X) = s(WX + b), where s() represents the sigmoid function.

Logistic regression is a great next step, because now you can learn about:

- A different objective function – the cross-entropy error. You will see how this maximizes a different distribution than the squared error that we use for linear regression.
- What to do when you can’t solve for W and b directly. (Hint: we use gradient descent).

To get an in-depth tutorial on logistic regression, check out my Udemy course, Data Science: Logistic Regression in Python:

https://www.udemy.com/data-science-logistic-regression-in-python

I also add in some stuff on regularization since it’s a topic that you need to learn about when working with real-world data.

Now that you know about logistic regression, you are ready to learn about neural networks.

Neural networks are basically a bunch of logistic regression units connected together in multiple layers.

Surprisingly, there is so much to learn about neural networks that a typical machine learning class will only scratch the surface of this huge topic.

Some new things you need to learn about include:

- How to apply different non-linearities other than the sigmoid, such as tanh() and rectifier unit
- Backpropagation, an application of the chain rule from calculus that essentially means we’re doing gradient descent
- What happens when you can only find local extrema
- How a neural network automatically learns features so that they don’t have to be hand-coded
- Neural network libraries such as Theano and TensorFlow that allow you to take advantage of the GPU for faster learning

In my class I also extend the number of classes of our classifier from 2 to K, so that you can learn about the softmax function and how to take its derivative. You can take my class on neural networks in Python / Numpy / TensorFlow on Udemy:

https://www.udemy.com/data-science-deep-learning-in-python

My follow-up course shows how you can improve vanilla gradient descent / backpropagation (and I prove it by doing demonstrations!) (I also figured out how to turn the knobs on my audio interface better so the recordings are better quality =)) In this course we cover: stochastic and batch gradient descent, momentum, adaptive learning rate methods like exponential decay, AdaGrad, and RMSProp, hyperparameter optimization using grid search and random search, and utilizing GPU instances on AWS EC2. I teach you Theano and TensorFlow from scratch – variables, functions, expressions, and simple optimization – so that the next step, building neural networks, is just simply connecting those pieces together. Check out this follow-up deep learning course here:

https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow

These are the courses I’ve released as of the time of this writing.

There are more advanced topics I want to cover but have not had the chance yet. These include:

- Pre-training using Restricted Boltzmann Machines (RBMs)
- Convolutional Neural Networks (CNNs)
- Long Short-Term Memory (LSTM) Networks
- word2vec
- and more!

Stay tuned because it is highly probable the above topics will be covered in the near future.