We are indeed living in interesting times, where we celebrate human-built machines defeating the best human minds at variety of activities. IBM Deep Blue's win against Chess champion Gary kasparov in 1997, IBM watson acing Jeopardy in 2011 and now Google DeepMind reportedly wining 'Go' with high precision, being cited as a major breakthrough in AI, which even Facebook claims their team came close to acing the game as well.
DeepMind goes against the 'Go' champion, to be streamed live for the world to witness.
While these feats are undoubtedly remarkable, and as understandable its creating quite a buzz in the AI community; as it provides the glimpse to the future seen only in sci-fi. As exciting as it may sound, it leaves a few questions before us.
It feels fascinating to see a representative of human race compete against a machine built, rather 'acquired' by google. A computer program trained for days on the past games played by the best players in the world finally managed to play like pro. How long would the same system need to learn a game which is fundamentally the same as the game which it has been trained on ? Can the learning from this game be transferred to an another game?
Being an AI enthusiast I was wondering, While, GO and Jeopardy are completely different games , can DeepMind win jeopardy and Deep Blue win Go ? Maybe, they can , but with the painstaking process of training them again for that specific task.
That brings us to the problem of transferrable Generalization of representations learnt from one experience/ domain to another .
While the next segment of this post , may seem riddled by jargons to many of the readers, these jargons are sure to become commonly heard concepts in near future. In fact, for AI to develop to a stage where it can be called 'general AI' its important that we witness advancements in the following areas of work.
Lets try to understand these concepts in simple language.
It is safe to conclude that most of the learning algorithms operate on the premise that the test data has been drawn from the same feature space and similar distribution as the training data. If the distribution changes, the model is left unusable, it does not adapt to a new task.
"Although, If you know how to drive a car, learning to drive a truck is not at all a problem."
The Learning and experience of a different activity assists humans ,in learning a new activity very quickly with limited amount of training time. But the same isn't true for machine learning. The model has to be trained again for the new task.
Inductive transfer / Transfer learning is the concept which can help us in this regard. For example : There are thousands of products at a gifting portal which needs to be categorized based on its fitment to an occasion or relationship, based on its product features. Traditional learning algorithms will require a large number of annotated data on which a supervised learning algorithm could be trained to learn the relation between the product features and an occasion. What if a new product gets added to the listing which has entirely different product attributes ? Can the trained model generalize for such an unknown instance ? nope ! This is exactly where transfer learning steps in.
Learning to recognize a cat, if assists us in making the system learn to recognize a tiger with minimum training instance, is a simple example of transfer learning.
Inductive transfer learning: When the target task and domain is different from the source task and domain. Some of the patterns in the source task has to be induced to the target domain.
Transductive transfer learning : Source and target domain are different but related, while the source and target tasks are the same.
Unsupervised transfer learning: Source & target domain and source & target task are different but related.
Instance-transfer : Re-weight some labeled data in the source domain for use in the target domain
Feature-representation-transfer :Find a good feature representation that reduces difference between the source and the target domains and the error of models.
Parameter-transfer : Discover shared parameters or priors between the source domain and target domain models, which can benefit for transfer learning.
Relational-knowledge-transfer : Build mapping of relational knowledge between the source domain and the target domains. Both domains are relational domains and i.i.d assumption is relaxed in each domain
When asked to explain the basics of supervised learning in layman terms, I use an example a lot. A father and son standing at the roadside, a red Ferrari passes by. Father says to the son, "Look son, thats a sports car". After a while, a red colored Maruti hatchback passes by . The son shouts, "Look dad , thats a sports Car". Lets consider this scenario to be a car classification problem; The kid to be the 'learner' , father the 'data geek' , the two cars were the 'instances' . The first instance was the 'training set' and then the kid was presented with an unknown test data to predict on. Predicted incorrectly , and it was a false positive. So what went wrong ? Too less instances , Spurious correlation. Red color does not always mean a sports car. Since the learner was not given with an ample amount of training data, it was sure not to generalize, and predict well on an unknown data sample. This brings us to the question, "How much data is enough"?
Depends on who is being trained ! A human brain would not need a huge sample of training data, A machine would need . A few pictures, and spotting is enough for a human to recognize a 'cat' , a machine needs thousands of images to recognize one with similar precision, and when it does, it is seen as an accomplishment !
Can the machines be trained for specific hypothesis or concepts with minimum training samples?
Machine Teaching is the inverse of machine learning. Instead of feeding the machine with a set of training data with input and output labelled, and asking the machine to come up with a function which relates the input to output with minimum error, we provide a specific hypothesis to the machine and let the machine comes up with the training data optimal enough (smallest possible size) to enable machine to understand the concept and be able to generalize over an unknown sample with high precision.
"For example, consider a "student" who runs the Support Vector Machine learning algorithm. Imagine a teacher who wants to teach the student a specific target hyperplane in some feature space (never mind how the teacher got this hyperplane in the first place). The teacher constructs a training set D=(x1,y1) ... (xn, yn), where xi is a feature vector and yi a class label, to train the student. What is the smallest training set that will make the student learn the target hyperplane? It is not hard to see that n=2 is sufficient with the two training items straddling the target hyperplane. " -Source
The next post will consist of a detailed analysis on 'machine teaching' and the following topics:
1. Inverse Reinforcement Learning
2. Active learning
3. Representation learning - mapping representation entity resolution (matching objects), schema matching (matching predicates) and ontology alignment (matching concepts)
4. Learning to learn