Subscribe to DSC Newsletter

Rohit Walimbe's Blog (5)

Hiring the right data scientist for the organisation

Any organisation needs talented, hardworking and skilled employees irrespective of department, business unit or a team. But finding and nurturing such talent can be challenging sometimes. When it comes to data science field, with rapid change and demand in the technology, many organisations have set up the data science teams. A successful data science team has 3 major strengths, A-availability of data, B- infrastructure and most importantly C - the “right” data scientists. 

The…

Continue

Added by Rohit Walimbe on June 9, 2019 at 6:03am — No Comments

Building machine learning models in Apache Spark using SCALA in 6 steps

Introduction:

When dealing with building machine learning models, Data scientists spend most of the time on 2 main tasks when building machine learning models

Pre-processing and Cleaning

The major portion of time goes in to collecting, understanding, and analysing, cleaning the data and then building features. All the above steps mentioned are very important and critical to build successful machine learning…

Continue

Added by Rohit Walimbe on April 21, 2019 at 9:00pm — 1 Comment

Is it ‘always’ necessary to treat outliers in a machine learning model?

Outliers is one of those issues we come across almost every day in a machine learning modelling. Wikipedia defines outliers as “an observation point that is distant from other observations.” That means, some minority cases in the data set are different from the majority of the data. I would like to classify outlier data in to two main categories: Non-Natural and Natural.

The non-natural outliers are those which are caused by measurement errors,…

Continue

Added by Rohit Walimbe on April 9, 2018 at 2:30am — No Comments

Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.

Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority class. Welcome to the real world of imbalanced data sets!!…

Continue

Added by Rohit Walimbe on April 24, 2017 at 10:00pm — No Comments

Avoiding Look Ahead Bias in Time Series Modelling

Any time series classification or regression forecasting involves the Y prediction at 't+n' given the X and Y information available till time T. Obviously no data scientist or statistician can deploy the system without back testing and validating the performance of model in history. Using the future actual information in training data which could be termed as "Look Ahead Bias" is probably the gravest mistake a data scientist can make. Even the sentence “we cannot make use future…

Continue

Added by Rohit Walimbe on April 21, 2017 at 6:00am — No Comments

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service

console.log("HostName");