Home » Uncategorized

Image identification using a convolutional neural network

This blog  explores a typical image identification task using a convolutional (“Deep Learning”) neural network. For this purpose we will use a simple JavaCNN packageby D.Persson, and make our example small and concise using the Python scripting language. This example can also be rewritten in Java, Groovy, JRuby or any scripting language supported by the Java virtual machine.

This example will use images in the grayscale format (PGM). The name “PGM” is an acronym derived from “Portable Gray Map” where cell values range from 0 – 255. The files are typically in the binary format (it has the magic value “P5” – you can see it by opening one of such files), but you can convert them to “Plain” (or uncompressed PGM), where each pixel in the raster is represented as an ASCII decimal number (of arbitrary size).

Our input images are from a publicly available database (CBCL Face Database MIT Center For Biological and Computation Learning). Let us copy a zip file with this (slightly modifies) database and unzip it. To do this, install the most recent DataMelt program, make a file “example.py” and run these commands using DataMelt:

from jhplot import *
print Web.get("http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip")
print IO.unzip("mitcbcl_pgm_set2.zip")

The last command unzips two directories “train” and “test”. You can omit “print”, which is only used to print the status of these commands. Each directory has images with faces (“face_*”) and some other images (“cmu_*). Note that “_” in the file name is important since this help identify image type. The “train” directory has about 1500 files with images of faces and 13700 files with other types of images. Let us look at one image and study its properties. We will use the IJ Java package. Append the following code to your previous lines:

from ij import *
imp = IJ.openImage("mitcbcl_pgm_set2/train/face_00001.pgm")
print "Width:", imp.width," Hight:", imp.height
imp.show() # show this image in a frame
ip = imp.getProcessor().convertToFloat()
pixels = ip.getPixels() # get array of pixels
print pixels # print array with pixels

These commands show an image on the screen, print its size (19×19 pixels) and the 3D matrix with the PGM image values. Now let us create a code which will do the following:

  • Reads the images from “train/” directory
  • Reads the images from “test/” directory
  • Initialize the CNN using several convolutional layers and pooling layers
  • It runs over 50 iterations. You can increase or decrease this number depending on the required precision of image identification.
  • During each iteration it calculates the probability for correct identification images with faces from the “test/” directory and saves the CNN to a file
  • At the end of the training, it reads the trained CNN from the file and performs the final run over test images, printing the predictions.

Copy these lines and save in a file “example.py”. Then  run this code inside the DataMelt:

from jhplot import *
print Web.get("http://jwork.org/dmelt/examples/data/mitcbcl_pgm_set2.zip")
print IO.unzip("mitcbcl_pgm_set2.zip")
NMax=50 # Total runs. Reduce this number to get results faster
from org.ea.javacnn.data import DataBlock,OutputDefinition,TrainResult
from org.ea.javacnn.layers import DropoutLayer,FullyConnectedLayer,InputLayer,LocalResponseNormalizationLayer
from org.ea.javacnn.layers import ConvolutionLayer,RectifiedLinearUnitsLayer,PoolingLayer
from org.ea.javacnn.losslayers import SoftMaxLayer
from org.ea.javacnn.readers import ImageReader,MnistReader,PGMReader,Reader
from org.ea.javacnn.trainers import AdaGradTrainer,Trainer
from org.ea.javacnn import JavaCNN
from java.util import ArrayList,Arrays
from java.lang import System
layers = ArrayList(); de = OutputDefinition()
print "Total number of runs=", NMax
print "Reading train sample.."
mr = PGMReader("mitcbcl_pgm_set2/train/")
print "Total number of trainning images=",mr.size()," Nr of types=",mr.numOfClasses()
print "Read test sample .."
mrTest = PGMReader("mitcbcl_pgm_set2/test/")
print "Total number of test images=",mrTest.size()," Nr of types=",mrTest.numOfClasses()
modelName = "model.ser" # save NN to this file
layers.add(InputLayer(de, mr.getSizeX(), mr.getSizeY(), 1))
layers.add(ConvolutionLayer(de, 5, 32, 1, 2)) # uses different filters
layers.add(RectifiedLinearUnitsLayer()) # applies the non-saturating activation function
layers.add(PoolingLayer(de, 2,2, 0)) # creates a smaller zoomed out version
layers.add(ConvolutionLayer(de, 5, 64, 1, 2))
layers.add(RectifiedLinearUnitsLayer())
layers.add(PoolingLayer(de, 2,2, 0))
layers.add(FullyConnectedLayer(de, 1024))
layers.add(LocalResponseNormalizationLayer())
layers.add(DropoutLayer(de))
layers.add(FullyConnectedLayer(de, mr.numOfClasses()))
layers.add(SoftMaxLayer(de))
print "Training.."
net = JavaCNN(layers)
trainer = AdaGradTrainer(net, 20, 0.001)
from jarray import zeros
numberDistribution,correctPredictions = zeros(10, "i"),zeros(10, "i")
start = System.currentTimeMillis()
db = DataBlock(mr.getSizeX(), mr.getSizeY(), 1, 0)
for j in range(NMax):
loss = 0
for i in range(mr.size()):
db.addImageData(mr.readNextImage(), mr.getMaxvalue())
tr = trainer.train(db, mr.readNextLabel())
loss = loss + tr.getLoss()
if (i != 0 and i % 500 == 0):
print "Nr of images: ",i," Loss: ",(loss/float(i))
print "Loss: ", (loss / float(mr.size())), " for run=",j
mr.reset()
print 'Wait.. Calculating predictions for labels=', mr.getLabels()
Arrays.fill(correctPredictions, 0)
Arrays.fill(numberDistribution, 0)
for i in range(mrTest.size()):
db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
net.forward(db, False)
correct = mrTest.readNextLabel()
prediction = net.getPrediction()
if(correct == prediction): correctPredictions[correct] +=1
numberDistribution[correct] +=1
mrTest.reset()
print " -> Testing time: ",int(0.001*(System.currentTimeMillis() - start))," s"
print " -> Current run:",j
print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())
print " -> Save current state to ",modelName
net.saveModel(modelName)
print "Read trained network from ",modelName," and make the final test"
cnn =net.loadModel(modelName)
Arrays.fill(correctPredictions, 0)
Arrays.fill(numberDistribution, 0)
for i in range(mrTest.size()):
db.addImageData(mrTest.readNextImage(), mr.getMaxvalue())
net.forward(db, False)
correct = mrTest.readNextLabel()
prediction = net.getPrediction()
if(correct == prediction): correctPredictions[correct] +=1
numberDistribution[correct] +=1
print "Final test:"
print net.getPredictions(correctPredictions, numberDistribution, mrTest.size(), mrTest.numOfClasses())

50 iterations usually take a few hours. The final probability to identify images with human faces will be close to 85%. Taking into account the complexity of this task, this is a rather decent performance.