Following the original observations of Neural Networks in action; I decided a follow up was needed. In the original blog, ; the smallest neural net (NN) that learnt the data set was 2-6-3-1 but the details were not saved, a second NN with the same configuration came close but its approach was different. The first and second layer nodes nodes (& outputs) are modelled but it is the second layer nodes that really shows how the clusters of blue circles were isolated by the neural networks.
Using a new NN of 2-6-3-1, 2 inputs, 6 nodes in the first hidden layer, 3 nodes in the second hidden layer and a single output, the cluster of blue circles were isolated by three curved overlapping lines. The images below are from the second layer video, it does show that the network does not fully learn the dataset because the boundaries that close the cluster of blue squares were not learnt before convergence. The output of the trained network is always in red
2-6-3-1 (2nd hidden layer and output nodes)
The data set plotted
Pushing the boundaries (2-5-3-1)
After trying several different networks, the decision was made to see if a smaller network could learn the data set, a network with one less neuron in the first layer and the results were surprising. This time, I have included the first and second layer animations into a single plot, added the data set as a point of reference and included the weights of the entire network so that the changes/activities can be monitored in real time.
The neural network weight values were not displayed because it would be pointless unless an observer can sum and calculate the logistic function on the fly. Instead, the direction of the weights is now displayed, so up for red, yellow for unchanged and blue for decreasing weights. We can now observe how the weights work during the learning process as well as how the nodes are shaped in the pattern space.
At about five minutes into the animation, a node regressed to zero as was observed in the first blog, where several nodes in the 2-15-1 neural network regressed to zero and contributed nothing to the final solution.
On closer inspection, the network 2-8-6-1 also had nodes that reduced their contribution; it also had a nodes that duplicated features, the orange and blue neuron have clearly learnt the same thing, the yellow neuron is also fairly flat but suspended. Clearly the 2-8-6-1 has too much power for the data set, see above right.
Although the 2-5-3-1 neural network (below left) was chosen to get a more improved performance but failed; I wanted to see if the weights of the dormant neuron still contributed to the final solution and so I decided to remove the weights of the dormant node from the converged solution and re-run the visualisation tool. The solution remained the same (below right); this data set can thus be learnt from a neural net of 2-5-2-1.
2-5-3-1 2-5-2-1(dormant node removed)
Solution 1 (2-5-2-1)
One of the best solution using a 2-5-2-1 neural network is shown below, the network has a similar structure to the one at the start of this blog except some regions were inverted. The network clearly struggled to hold its shape before it converged in a sub optimal solution and clearly has distortions. To be honest, it looks like a mess.
2-5-2-1 (2nd hidden layer + Output)
This could be due to the lime green node lying flat on the underside of the first hidden network, below.
2-5-2-1 (1st hidden layer + Output)
2-5-2-1 (Training in Progress)
A new network is trained with the learning rate reduced and kept low for the duration of the training and the network (no momentum added) converged perfectly.