The back-propagation algorithm is one of the foundations of neural networks and deep learning. It is also hard to understand for beginners. I found a resource, a free chapter of a book by Andrew Glassner, link below that explains backpropagation using a unique and an easy to understand approach

Essentially, we start with a contrived situation where we train a neural network using an artificially slow (called in the chapter as ‘glacial’) approach

While this strategy is not practical in real life, starting in this way makes backpropagation easier to explain

The objective is to train a categorizer that assigns a label to a given input. If the prediction matches the label that we previously determined for it, we move on to the next sample. If the prediction is wrong, we change the network to help it do better next time. The key is in how we change the network

A neural network can be seen as a giant collection of neurons. Each neuron does its own little calculation and then passes on its results to other neurons. When the neurons are organized in a set of layers, there is no learning algorithm. The backpropagation algorithm provides a learning mechanism for a neural network

So, we can start with this slow/glacial way to training. While this approach is hypothetical, it provides a starting point for us from which we can improve.

The network comprising of thousands of interconnected neurons is designed to classify each input into one of five categories. Thus, the network has five probabilistic outputs. The weights of the neural network are assigned randomly. When we run the first piece of data into the neural network as an input, the outputs of successive layers of connective neurons cascade and produce an overall output for the neural network. This probabilistic value represents a label (one of five categories).

At this point, the prediction is likely to be wrong because the network is untrained. The number which represents the magnitude of the mismatch is the loss or the error score.

Now, let us discuss the ‘slow’ way to learn.

- We first pick up a small random number
- Next we randomly choose a weight
- We then update this weight by the random number
- We then re-evaluate the sample leading the same cascading of weights but now with the small random change for one weight
- We can continue this process of randomly picking values and nudging weights until the results improve

Although we are improving with each iteration, the process is very slow

So, how could we make this process of training faster?

That’s where other changes to the network parameters come in for example – activation functions, learning rate, mini-batches etc

The chapter is very detailed but also readable

Deep learning from basics to practise – back propagation by Andrew Glassner

Image source bernie pixabay