Let’s understand the GAN(Generative Adversarial Network).
Generative Adversarial Networks were invented in 2014 by Ian Goodfellow(author of best Deep learning book in the market) and his fellow researchers. The main idea behind GAN was to use two networks competing against each other to generate new unseen data(Don’t worry you will understand this further). GANs are often described as a counterfeiter versus a detective, let’s get an intuition of how they work exactly.
So we can think of counterfeiter as a generator.
The generator is going to:
Keep in mind, regardless of your source of images whether it’s MNIST with 10 classes, the discriminator itself will perform Binary classification. It just tries to tell whether it’s real or fake.
We first start with some noise like some Gaussian distribution of noise data and we feed directly into the generator. The goal of the generator is to create images that fool the discriminator.
In the very first stage of training, the generator is just going to produce noise.
And then we also grab images from our real dataset.
And then in PHASE1, we train the discriminator essentially labeling fake generated images as zeros and real data generated images as one. So basically zero if you are fake and one if you are real.
We feed that into the discriminator and the discriminator gets trained to detect the real images versus the fake image. And then as time goes on the generator during the second PHASE of training is going to keep improving its images and trying to fool the discriminator, until it’s able to hopefully generate images that appear to mimic the real dataset and discriminator. Is no longer able to tell the difference between the false image and the real image.
So from the above example, we see that there are really two training phases:
In phase one, what we do is we take the real images and we label them as one and they are combined with fake images from a generator labeled as zero. The discriminator then trains to distinguish the real images from fake images. Keep in mind that in phase one of training the backpropagation is only occurring on the discriminator. So we are only optimizing the discriminator’s weights during phase one of training.
Then in phase two, we have the generator produce more fake images and then we only feed the fake images to the generator with all the labels set as real. And this causes a generator to attempt to produce images that the images discriminator believes to be real. And what’s important to note here is that in phase two because we are feeding and all fake images labeled as 1, we only perform backpropagation on the generator weights in this step. So we are not going to be able to a typical fit call on all the training data as we did before. Since we are dealing with two different models(a discriminator model and generator model), we will also have two different phases of training.
What is really interesting here and something you should always keep in mind, the generators itself never actually sees the real images. It generates convincing images only based on gradients flowing back through the discriminator during its phase of training. Also, keep in mind the discriminator also improves as training phases continues, meaning the generated images will also need to hopefully get better and better in order to fold the discriminator.
This can lead to pretty impressive results. In the video, research has published many models such as style GANs and also a face GAN to actually produce fake human images that are extremely detailed. See below the example of face GAN performance from NVIDIA. IMPRESSIVE RIGHT????
Now let’s talk about difficulties with GANs networks,
Since GANs are more often used with image-based data and due to the fact that we have two networks competing against each other they require GPUs for reasonable training time. But fortunately, we have Google Collab with us to use GPUs for free.
Often what happens is the generator figure out just a few images or even sometimes a single image that can fool the discriminator and eventually “collapses” to only produce that image. So you can imagine back where it was producing faces, maybe it figured out how to produce one single face that fools the discriminator. Then the generator ends up just learning to produce the same face over and over again.
So in theory it would be preferable to have a variety of images, such as multiple numbers or multiple faces, but GANs can quickly collapse to produce the single number or phase whatever the dataset happens to be regardless of the input noise.
This means you can feed in any type of random noise you want but the generator figured out the one image that it can use to fool the discriminator.
It is typically better to avoid the mode collapse because they are more complex and they have deeper layers to them.
There are a couple of different ways to overcome this problem is by using DCGAN(Deep convolutional GAN, this I will explain in another blog).
Researchers have also experimented with what’s known as “mini-batch discrimination”, essentially punishing generated batches that are all too similar. So if the generator starts having mode collapse and getting batches of very very similar looking images, it will begin to punish that particular batch inside of discriminator for having the images be all too similar.
It can be difficult to ascertain performance and appropriate training epochs since all the generated images at the end of the day are truly fake. So it’s difficult to tell how well our model is performing at generating images because a discriminate thinks something is real doesn’t mean that a human-like us will think of a face or a number looks real enough.
And again due to the design of a GAN, the generator and discriminator are constantly at odds with each other which leads to performance oscillation between the two.
So while dealing with GAN you have to experiment with hyperparameters such as the number of layers, the number of neurons, activation function, learning rates, etc especially when it comes to complex images.