Ashish kumar has not received any gifts yet
Meet Neo ! Neo is a talented developer who loves building stuff. One fine morning , Neo decides to take up a road, less travelled, decides to build a chatbot ! After a couple of keyword searches and skimming through dozens of articles with titles “build a chatbot in 5 mins” , “chatbot from scratch” etc, Neo figures out the basic components to be intent detection, Named Entity Recognition ,Text Matching for QnA . Another 30 mins of Google search and Neo has collected his arsenal, the state of the art implementations for these 3 components. His arsenal has the almighty Bert for NER, Ulmfit for Text classification, RoBERTa for text matching.
Excited to have decoded the path to greatness, Neo arranges annotated data and sets up the pipeline and calls it ‘The Matrix’ . Little did he know, the name would turn out to be the nemesis. Neo tests out some happy-flows and deploys the system gleefully awaiting well-behaved users.
To Neo’s Horror, The AI system proved to be poles apart from the expected promise of Natural language understanding. ‘the Matrix’ was as awkward as a cow on roller skates.
Neo’s users queried abut everything ‘the matrix’ was not trained for , fooling it more often than not. Happy-flows turned out to be a myth. we, humans as always beat the AI bot down to death, as if we were being challenged in intelligence supremacy . Battered and defeated , Neo decides to fix the ‘matrix’ by hook or crook . Not getting into the debate of enhancing NLP (Natural Language Processing) v/s plugging the gaps using…
At NIPS 2016, there was an unprecedented story building up. Something that got every AI enthusiast agog about an unknown AI startup ‘Rocket AI’.
The names associated with the hot startup were pioneers in the AI field and it was informed to the media that there was major announcement soon to come. There was even a workshop held, where one of the researchers explained about the concept of Temporal Recurrent Optimal Learning to the house full of researchers and media personnel.
The whole community was abuzz with the jargons “Temporally Recurrent Optimal Learning”, “Jacobian Optimized Kernel Expansion”, “Fully Automatic Kernel Expansion”, a couple of which were coined by a leading AI researcher. These made rounds on web with a hype so strong that it got 5 Major VCs reaching out to them for investment.
There were rumours about Rocket AI’s acquisition as well — all within a day.
Only to be figured out, later, to be a joke!
Temporally Recurrent Optimal Learning = TROL
Jacobian Optimized Kernel Expansion = JOKE
Fully Automatic Kernel Expansion = FAKE
If you still don’t get the joke, you should probably get yourself the highly coveted…Continue
This post is 'not' intended to teach people how to use popular predictive modelling APIs for free. Although, to your surprise, this isn't a far fetched possibility. Trained Machine learning models are basically a function that maps feature vectors to the output variable. Upon querying with a test instance, the model predicts an outcome, assigning probability scores to all the possible classes. Google, Amazon etc provides public facing APIs to train predictive models on the subscriber's data, the model can further be used for prediction purposes . This service comes at a cost : Pay per query model, monthly subscription etc.
Lets consider a scenario, A user subscribes for such a service on a trial basis for a fraction of cost and queries the system for as long as he can. With these queries and subsequent output by the model, Can the user reverse engineer the system to emulate the exact/ equivalent model, also replicate the underlying algorithm? Can the stolen model leak sensitive training data as well ? can the feature extraction methods been employed behind the scene also be decoded?
How many queries would he need to hit for the same? It depends !
"Amazon uses logistic regression for classification and provides black-box access to trained models. It uses one-hot-encoding for categorical variables and quantile binning for numeric ones."
Say for example, if the algorithm being used to train the data was logistic regression .The confidence value in case of logistic regression is nothing but a log-linear function 1/(1+e−(w·x+β)) of the d- dimensional input vector x . All one needs to do is to solve for the unknown d+1 parameters w and β.Any user who wishes to make more than d + 1 queries to a model would then minimize the prediction cost by first running a cross- user model…Continue
originally posted by the author on Linkedin : Link
It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It’s not the algorithm alone , which can provide the best solution ; Model built on carefully engineered and selected features can provide far better results.
"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."- Albert Einstein
The complex models are not easily interpretable and tougher to tune. Simpler algorithms, with better features or more data can perform far better than a weak assumption accompanied with a complex model.
Better features means flexibility, simpler models, better results. Presence of irrelevant features hurt generalization. Thus feature selection and feature engineering should not be considered as mutually exclusive activities and should be performed in conjunction to each other. With the help of an effective feature engineering process, we intend to come up with an effective representation of the data. The question arises, what is considered to be a good or bad representation?
Representation is as good as the information it contains.
Entropy: Higher the entropy, more the information contained in the data , variance: higher the variance: more the information , projection for better separation: the projection to the basis which has the highest variance holds more information, feature to class association etc , all of these explains the information in data.
Feature engineering is a vital component…Continue
It's a known fact that bagging (an ensemble technique) works well on unstable algorithms like decision trees, artificial neural networks and not on stable algorithms like Naive Bayes. The well known ensemble algorithm Random forest thrives on the ability of bagging technique which leverages the 'instability' of decisions trees, to help build a better classifier.
Even though, random forest attempts to handle the issues caused by highly correlated trees, does it completely solve the issue? Can the decision trees be made more unstable than what random forest does, so that the learner be even more accurate?
1. Discards pruning: No more early stopping. If trees are sufficiently deep, they have very low bias.
Mean Squared Error = Variance + (Bias)2.
This explains why discarding pruning works for random forest.
2. The most important parameters to tune while building a random forest model are mtry i.e the number of variables per level and ntree i.e the number of tress to ensemble. optimal 'mtry' can be estimated by using 'tuneRF'. tuneRF assumes the default value as the square root of total number of variables (lets say 'n') for classification problem, while n/3 for prediction problems. It then calculates the out of bag error. Further, it goes for left and right estimation, assuming the 'mtry' to be equal to default value/step factor and (default value)* (step factor) respectively ; and calculates the out of bag error on both the…Continue
We are indeed living in interesting times, where we celebrate human-built machines defeating the best human minds at variety of activities. IBM Deep Blue's win against Chess champion Gary kasparov in 1997, IBM watson acing Jeopardy in 2011 and now Google DeepMind reportedly wining 'Go' with high precision, being cited as a major breakthrough in AI, which even Facebook claims their team came close to acing the game as well.
DeepMind goes against the 'Go' champion, to be streamed live for the world to witness.
While these feats are undoubtedly remarkable, and as understandable its creating quite a buzz in the AI community; as it provides the glimpse to the future seen only in sci-fi. As exciting as it may sound, it leaves a few questions before us.
It feels fascinating to see a representative of human race compete against a machine built, rather 'acquired' by google. A computer program trained for days on the past games played by the best players in the world finally managed to play like pro. How long would the same system need to learn a game which is fundamentally the same as the game which it has been trained on ? Can the learning from this game be transferred to an another game?
Being an AI enthusiast I was wondering, While, GO and Jeopardy are completely different games , can DeepMind win jeopardy and Deep Blue win Go ? Maybe, they can , but with the painstaking process of training them again for that specific task.