Introduction
This post is designed show the internal changes of an Artificial Neural Network (ANN/NN), it shows the outputs of the neurons from the beginning of a Backpropagation algorithm to convergence.
The hope is for a better understanding of why we use a second hidden layer, local minimums, to how many internal nodes are required and their impact on the final solution.
The Dataset
1 |
0.1 |
0.1 |
1 |
|
13 |
0.9 |
0.5 |
1 |
|
25 |
0.7 |
0.4 |
0 |
2 |
0.3 |
0.1 |
1 |
|
14 |
0.8 |
0.7 |
1 |
|
26 |
0.3 |
0.5 |
0 |
3 |
0.5 |
0.1 |
1 |
|
15 |
0.9 |
0.7 |
1 |
|
27 |
0.2 |
0.55 |
0 |
4 |
0.7 |
0.1 |
1 |
|
16 |
0.1 |
0.9 |
1 |
|
28 |
0.4 |
0.55 |
0 |
5 |
0.9 |
0.1 |
1 |
|
17 |
0.3 |
0.9 |
1 |
|
29 |
0.1 |
0.65 |
0 |
6 |
0.1 |
0.3 |
1 |
|
18 |
0.5 |
0.9 |
1 |
|
30 |
0.2 |
0.65 |
0 |
7 |
0.3 |
0.3 |
1 |
|
19 |
0.7 |
0.9 |
1 |
|
31 |
0.3 |
0.65 |
0 |
8 |
0.5 |
0.3 |
1 |
|
20 |
0.9 |
0.9 |
1 |
|
32 |
0.4 |
0.65 |
0 |
9 |
0.9 |
0.3 |
1 |
|
21 |
0.7 |
0.2 |
0 |
|
33 |
0.5 |
0.65 |
0 |
10 |
0.1 |
0.5 |
1 |
|
22 |
0.6 |
0.3 |
0 |
|
34 |
0.2 |
0.7 |
0 |
11 |
0.5 |
0.5 |
1 |
|
23 |
0.7 |
0.3 |
0 |
|
35 |
0.3 |
0.7 |
0 |
12 |
0.7 |
0.5 |
1 |
|
24 |
0.8 |
0.3 |
0 |
|
36 |
0.4 |
0.7 |
0 |
|
|
|
|
|
|
|
|
|
|
37 |
0.3 |
0.8 |
0 |
The dataset chosen is designed to look like holes in the ground and chosen particularly to challenge the network to surround the blue circles and separate them.
The ideal solution is for the network to perfectly surround the blue circles in a class of its own and try to avoid a connected solution like the purple enclosure
After surrounding the holes, the main challenges are
Animations
The following animations of neural networks will show the output in a Red mesh and the hidden layers in various colours, some colours may be used more than once. The animations will show the nodes in the hidden layer with the output node but not the hidden layers together. The animations shown are from the start of the algorithm, from the random weights up to convergence; the bias nodes are not shown.
One hidden layer (2-15-1)
A NN of 2-10-1 and two NN of 2-15-1 were tried before a solution was found, despite the network learning the data set to the desired convergence threshold of 10% total error, three hidden nodes were found to be practically unused.
Unused in the sense that after all the backpropagation of errors had taken place; their Z output value did not rise above 11%, in fact if the data set convergence threshold errors was set at 1%, I am convinced that their contribution would have been even less. These are the Yellow, Dark Orange and Blue meshes.
They can be seen when the animation has an elevation of 0°, although these colours were used twice, the other instances had Z output values that were much higher. This suggests a network with a single hidden layer of 12 nodes may be sufficient to learn the dataset
Two hidden layers (2-6-3-1)
A network using 2 hidden layers, 6 nodes in the first and 3 in the second
2nd hidden layer + output node
1st Hidden layer + Output node
Two hidden layers (2-8-6-1)
A network using 2 hidden layers, 8 nodes in the first and 6 in the second, the output node is animated separately this time, but inverted
2nd hidden layer + output node
1st Hidden layer + Output node
Observations (so far)
The initial thinking was that a single hidden layer would not be able to learn the data set because the zero output values were isolated/not continuous. The smallest network observed to have met the challenges and learn the dataset contained two hidden layers of six and three, 2-6-3-1; unfortunately the animation was not captured.
The Sigmoid curve; the “S” shape, is clearly visible in the layer closer to the input nodes but when a second hidden layer is used, it is not noticeable. This shows a function of functions to develop in the second layer, allowing more complex models to be learned.
Convergence seems to happen in three observable ways
Other animations will be uploaded with observations, such as the variations in the number and types of local minimums that occurred. The networks with a single hidden layer, even with sufficient hidden nodes tended to get stuck in local minimums more often and took much longer to train than the NN with two hidden layers
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central