Rohit Walimbe has not received any gifts yet
Outliers is one of those issues we come across almost every day in a machine learning modelling. Wikipedia defines outliers as “an observation point that is distant from other observations.” That means, some minority cases in the data set are different from the majority of the data. I would like to classify outlier data in to two main categories: Non-Natural and Natural.
The non-natural outliers are those which are caused by measurement errors,…Continue
Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority class. Welcome to the real world of imbalanced data sets!!…Continue
Any time series classification or regression forecasting involves the Y prediction at 't+n' given the X and Y information available till time T. Obviously no data scientist or statistician can deploy the system without back testing and validating the performance of model in history. Using the future actual information in training data which could be termed as "Look Ahead Bias" is probably the gravest mistake a data scientist can make. Even the sentence “we cannot make use future…Continue