Any organisation needs talented, hardworking and skilled employees irrespective of department, business unit or a team. But finding and nurturing such talent can be challenging sometimes. When it comes to data science field, with rapid change and demand in the technology, many organisations have set up the data science teams. A successful data science team has 3 major strengths, A-availability of data, B- infrastructure and most importantly C - the “right” data scientists.

The…

ContinueAdded by Rohit Walimbe on June 9, 2019 at 6:03am — No Comments

When dealing with building machine learning models, Data scientists spend most of the time on 2 main tasks when building machine learning models

**Pre-processing and Cleaning**

The major portion of time goes in to collecting, understanding, and analysing, cleaning the data and then building features. All the above steps mentioned are very important and critical to build successful machine learning…

ContinueAdded by Rohit Walimbe on April 21, 2019 at 9:00pm — 1 Comment

Outliers is one of those issues we come across almost every day in a machine learning modelling. Wikipedia defines outliers as *“an observation point that is distant from other observations.”* That means, some minority cases in the data set are different from the majority of the data. I would like to classify outlier data in to two main categories: Non-Natural and Natural.

The non-natural outliers are those which are caused by measurement errors,…

ContinueAdded by Rohit Walimbe on April 9, 2018 at 2:30am — No Comments

Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority class. Welcome to the real world of imbalanced data sets!!…

ContinueAdded by Rohit Walimbe on April 24, 2017 at 10:00pm — No Comments

Any time series classification or regression forecasting involves the Y prediction at 't+n' given the X and Y information available till time T. Obviously no data scientist or statistician can deploy the system without back testing and validating the performance of model in history. Using the future actual information in training data which could be termed as *"Look Ahead Bias"* is probably the gravest mistake a data scientist can make. Even the sentence *“we cannot make use future…*

Added by Rohit Walimbe on April 21, 2017 at 6:00am — No Comments

- Hiring the right data scientist for the organisation
- Building machine learning models in Apache Spark using SCALA in 6 steps
- Is it ‘always’ necessary to treat outliers in a machine learning model?
- Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.
- Avoiding Look Ahead Bias in Time Series Modelling

- Hiring the right data scientist for the organisation
- Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.
- Is it ‘always’ necessary to treat outliers in a machine learning model?
- Avoiding Look Ahead Bias in Time Series Modelling
- Building machine learning models in Apache Spark using SCALA in 6 steps

- Learning (2)
- Machine (2)
- Bias (1)
- Look (1)
- Modelling (1)
- Scala (1)
- Series (1)
- Spark (1)
- Time (1)
- data (1)
- dataset (1)
- imbalanced (1)
- learning (1)
- machine (1)
- science (1)
- supervised (1)
- #datascience (1)
- #datascientist (1)
- Ahead (1)
- Apache (1)

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions