In my earlier articles, I had discussed about about application of Big data for gathering Insights on green revolution and witnessed about a research work on supply chain management using big data analytics on agriculture. Incrementally, got an opportunity to implement data science methodology (a game theory approach) to make the results of SCM as an incentive compatible one. However, in this article I am trying to discuss about a large scale digital image processing obtained using time-series photographs of agricultural fields and sensor data for parameters, that should be done parallely with the help of Big Data Analytics such that the result of this work can facilitate SCM process exponentially.
We are focusing on using deep learning and machine learning techniques for identifying patterns for making predictins and decision making on large-scale stored / near real-time data sets. By this, we can identify the crop type, quality, maturity period for harvesting, early identification of bugs and diseases, soil quality attributes, early identification of need for soil nourishments etc., on a larger farms. These automation reduces considerable amount of manual work on large farms.
For this,we need to work with two kinds of images that is fetched on near-real-time by sensors; i.e: vector and bitmap. The vector image (represented by mathematical vectors) where resizing image at any scale can be achieved without loss of quality, However the bitmap image is achieved by means of mapped bits where the image pixels are organized as a series of rows and columns formed by pixels (as a pixels’ matrix) with each pixel (picture element) has only one color.
There is a necessity for using intelligent deep learning methodology on BigData such that on each iteration, the machine itself grasps the minute details on patterns that enables the system to give accurate results over the period of time. We can exploit the ability of Deep Learning for extracting large-scale, high-level, complex abstraction on given data sets along with the data representations especially unsupervised data (such as our digital imagery data), that makes it as a valuable analysis on Big Data. More specifically, we can classify the problem areas in a large agricultural field by means of image tagging, using automated semantic indexes, data tagging, fast information retrieval then discriminative modeling, which can be better addressed with the aid of Deep Learning along with automated machine learning. However traditional machine learning and feature extraction algorithms are not efficient enough for extracting the complex and non-linear patterns generally observed in large scale images on Big Data sets. By trying to extract such features, Deep Learning enables the use of relatively simpler linear models, such as categorization, classification and prediction on analytics.
While working on machine learning for classification and semantic building; we need to
- Treating Images as Data
- Have the backup of the Originals
- Making Simple Adjustments
- Cropping (if we are working with a portion)
- Comparing Images
- Manipulating the entire Image
- Filters Degrade Data
- Cloning Degrades Data
- Making Intensity Measurements
- Lossy Compression Degrades Data
- Issues with Magnification
- Issues with Pixels
When working on multicore or cloud computing environment along with parallel programming models (such MapReduce), we can significantly improve the accuracy and efficiency of our analysis that is sought after solution to address big data problems.
Parallelising an application is necessary when working with near-real-time data sets (scheduled survey imagery from agricultural field). We need to check how we are going to distribute workloads or decompose an algorithm into parts that can be parallely manipulated. We need to check how we are going to map the tasks onto various computing nodes and execute subtasks in parallel
How to coordinate and communicate subtasks on those computing nodes? These can be addressed by categorizing the images based on features and applying specific methodologies to each group and combine the results.
In short, when working with MapReduce or equivalent tool for parallel computation of large scale digital imagery we need to follow the below steps to get the outputs such as;
- Image Integration
- Image Processing
- Feature Generation
- Feature selection and extraction
- boundary, edge & curve detection
- brightness gradient
- texture gradient
- color gradient
- contour maps
- multi-scale gradient magnitude
- second moment matrices
- segmentation induced by scale invariance
- video scene segments
- Classifier selection
- Testing of selected classifiers
- Evaluation of classifiers
- Machine Learning for automatic annotations
From a predictive modeling point of view, three sub-sets of all extracted instances are generated as:
- training instances,
- test instances and
- predicted instances.
The automatic annotations acts as a semantic for next level classification and computation for decision making. However, the outputs after image process will be an abstract one. An important advantage of more abstract representations is that they can be invariant to the local changes, in the input data. Learning such invariant features is an ongoing major goal in pattern recognition when machine learning is involved. These are actually represented as architectures of consecutive layers. Each layer applies a nonlinear transformation on its input (previous layer) and provides a representation in its output. The objective is to learn a complicated and abstract representation of the data (i.e; machine learning) in a hierarchical manner by passing the data through multiple transformation layers. The near-real-time digital imagery data from agricultural fields (pixels in an image) is fed to the first layer. Consequently the output of each layer is provided as input to its next layer.
By this we can extract features with Deep Learning that adds nonlinearity to the data analysis, enables associating the machine learning tasks that closely relates to Artificial Intelligence and applying relatively simple linear analytical models on the extracted features, which is important when we use Big Data Analytics.
I will share the implemented outputs on an agricultural field (rice) on next article…