Image identification using a convolutional neural network

This blog explores a typical image identification task using a convolutional (“Deep Learning”) neural network. For this purpose we will use a simple JavaCNN packageby D.Persson, and make our example small and concise using the Python scripting language. This example can also be rewritten in Java, Groovy, JRuby or any scripting language supported by the Java virtual machine.

This example will use images in the grayscale format (PGM). The name “PGM” is an acronym derived from “Portable Gray Map” where cell values range from 0 – 255. The files are typically in the binary format (it has the magic value “P5” – you can see it by opening one of such files), but you can convert them to “Plain” (or uncompressed PGM), where each pixel in the raster is represented as an ASCII decimal number (of arbitrary size).

Our input images are from a publicly available database (CBCL Face Database MIT Center For Biological and Computation Learning). Let us copy a zip file with this (slightly modifies) database and unzip it. To do this, install the most recent DataMelt program, make a file “example.py” and run these commands using DataMelt:

from jhplot import *
print Web.get("http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip")
 print IO.unzip("mitcbcl_pgm_set2.zip")

The last command unzips two directories “train” and “test”. You can omit “print”, which is only used to print the status of these commands. Each directory has images with faces (“face_*”) and some other images (“cmu_*). Note that “_” in the file name is important since this help identify image type. The “train” directory has about 1500 files with images of faces and 13700 files with other types of images. Let us look at one image and study its properties. We will use the IJ Java package. Append the following code to your previous lines:

from ij import *
imp = IJ.openImage("mitcbcl_pgm_set2/train/face_00001.pgm")
 print "Width:", imp.width," Hight:", imp.height
 imp.show()  # show this image in a frame
 ip = imp.getProcessor().convertToFloat()
 pixels = ip.getPixels()  # get array of pixels
 print pixels  # print array with pixels

These commands show an image on the screen, print its size (19×19 pixels) and the 3D matrix with the PGM image values. Now let us create a code which will do the following:

Reads the images from “train/” directory
Reads the images from “test/” directory
Initialize the CNN using several convolutional layers and pooling layers
It runs over 50 iterations. You can increase or decrease this number depending on the required precision of image identification.
During each iteration it calculates the probability for correct identification images with faces from the “test/” directory and saves the CNN to a file
At the end of the training, it reads the trained CNN from the file and performs the final run over test images, printing the predictions.

Copy these lines and save in a file “example.py”. Then run this code inside the DataMelt:

from jhplot import *
print Web.get("http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip")
 print IO.unzip("mitcbcl_pgm_set2.zip")
 NMax=50       # Total runs. Reduce this number to get results faster
 from org.ea.javacnn.data import DataBlock,OutputDefinition,TrainResult
 from org.ea.javacnn.layers import DropoutLayer,FullyConnectedLayer,InputLayer,LocalResponseNormalizationLayer
 from org.ea.javacnn.layers import ConvolutionLayer,RectifiedLinearUnitsLayer,PoolingLayer
 from org.ea.javacnn.losslayers import SoftMaxLayer
 from org.ea.javacnn.readers import ImageReader,MnistReader,PGMReader,Reader
 from org.ea.javacnn.trainers import AdaGradTrainer,Trainer
 from org.ea.javacnn import JavaCNN
 from java.util import ArrayList,Arrays
 from java.lang import System
 layers = ArrayList(); de = OutputDefinition() 
 print "Total number of runs=", NMax 
 print "Reading train sample.."
 mr = PGMReader("mitcbcl_pgm_set2/train/")
 print "Total number of trainning images=",mr.size()," Nr of types=",mr.numOfClasses()
 print "Read test sample .."
 mrTest = PGMReader("mitcbcl_pgm_set2/test/")
 print "Total number of test images=",mrTest.size()," Nr of types=",mrTest.numOfClasses()
 modelName = "model.ser" # save NN to this file  
 layers.add(InputLayer(de, mr.getSizeX(), mr.getSizeY(), 1))
 layers.add(ConvolutionLayer(de, 5, 32, 1, 2)) # uses different filters 
 layers.add(RectifiedLinearUnitsLayer())       # applies the non-saturating activation function 
 layers.add(PoolingLayer(de, 2,2, 0))          # creates a smaller zoomed out version
 layers.add(ConvolutionLayer(de, 5, 64, 1, 2))
 layers.add(RectifiedLinearUnitsLayer())
 layers.add(PoolingLayer(de, 2,2, 0))
 layers.add(FullyConnectedLayer(de, 1024))
 layers.add(LocalResponseNormalizationLayer())
 layers.add(DropoutLayer(de))
 layers.add(FullyConnectedLayer(de, mr.numOfClasses()))
 layers.add(SoftMaxLayer(de))
 print "Training.."
 net = JavaCNN(layers)
 trainer = AdaGradTrainer(net, 20, 0.001)
 from jarray import zeros
 numberDistribution,correctPredictions = zeros(10, "i"),zeros(10, "i") 
 start = System.currentTimeMillis()
 db = DataBlock(mr.getSizeX(), mr.getSizeY(), 1, 0)
 for j in range(NMax):
   loss = 0
   for i in range(mr.size()):
     db.addImageData(mr.readNextImage(), mr.getMaxvalue())
     tr = trainer.train(db, mr.readNextLabel())
     loss = loss + tr.getLoss()
     if (i != 0 and i % 500 == 0):
        print "Nr of images: ",i," Loss: ",(loss/float(i))
   print "Loss: ", (loss / float(mr.size())), " for run=",j 
   mr.reset()
   print 'Wait.. Calculating predictions for labels=', mr.getLabels()
   Arrays.fill(correctPredictions, 0)
   Arrays.fill(numberDistribution, 0)
   for i in range(mrTest.size()):
             db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
             net.forward(db, False)
             correct = mrTest.readNextLabel()
             prediction = net.getPrediction()
             if(correct == prediction): correctPredictions[correct] +=1 
             numberDistribution[correct] +=1
   mrTest.reset()
   print " -> Testing time: ",int(0.001*(System.currentTimeMillis() - start))," s"
   print " -> Current run:",j
   print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())
   print " -> Save current state to ",modelName
   net.saveModel(modelName)
 print "Read trained network from ",modelName," and make the final test"
 cnn =net.loadModel(modelName)
 Arrays.fill(correctPredictions, 0)
 Arrays.fill(numberDistribution, 0)
 for i in range(mrTest.size()):
             db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
             net.forward(db, False)
             correct = mrTest.readNextLabel()
             prediction = net.getPrediction()
             if(correct == prediction): correctPredictions[correct] +=1
             numberDistribution[correct] +=1
 print "Final test:"
 print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())

50 iterations usually take a few hours. The final probability to identify images with human faces will be close to 85%. Taking into account the complexity of this task, this is a rather decent performance.

Image identification using a convolutional neural network

Leave a Reply Cancel reply