Any organisation needs talented, hardworking and skilled employees irrespective of department, business unit or a team. But finding and nurturing such talent can be challenging sometimes. When it comes to data science field, with rapid change and demand in the technology, many organisations have set up the data science teams. A successful data science team has 3 major strengths, A-availability of data, B- infrastructure and most importantly C - the “right” data scientists.
The…
ContinueAdded by Rohit Walimbe on June 9, 2019 at 6:03am — No Comments
When dealing with building machine learning models, Data scientists spend most of the time on 2 main tasks when building machine learning models
Pre-processing and Cleaning
The major portion of time goes in to collecting, understanding, and analysing, cleaning the data and then building features. All the above steps mentioned are very important and critical to build successful machine learning…
ContinueAdded by Rohit Walimbe on April 21, 2019 at 9:00pm — 1 Comment
Outliers is one of those issues we come across almost every day in a machine learning modelling. Wikipedia defines outliers as “an observation point that is distant from other observations.” That means, some minority cases in the data set are different from the majority of the data. I would like to classify outlier data in to two main categories: Non-Natural and Natural.
The non-natural outliers are those which are caused by measurement errors,…
ContinueAdded by Rohit Walimbe on April 9, 2018 at 2:30am — No Comments
Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority class. Welcome to the real world of imbalanced data sets!!…
ContinueAdded by Rohit Walimbe on April 24, 2017 at 10:00pm — No Comments
Any time series classification or regression forecasting involves the Y prediction at 't+n' given the X and Y information available till time T. Obviously no data scientist or statistician can deploy the system without back testing and validating the performance of model in history. Using the future actual information in training data which could be termed as "Look Ahead Bias" is probably the gravest mistake a data scientist can make. Even the sentence “we cannot make use future…
ContinueAdded by Rohit Walimbe on April 21, 2017 at 6:00am — No Comments
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles