Subscribe to DSC Newsletter

So, How Many ML Models You Have NOT Built?

What a weird question. That’s what you would have thought after reading the headline. Perhaps you thought the word “NOT” was accidental.

Hmm, for past few years many of us have come across articles like

  • “Top 10 Machine Learning Algorithms every Data Scientist SHOULD KNOW
  • “Top 20 R packages every Data Scientist SHOULD KNOW
  • “TOP 30 Python Libraries every Data Scientist SHOULD KNOW

The list is endless. Any new Data Science aspirant already waves the white flag just by merely seeing the “SHOULD KNOW” type articles on the internet.

At the end of the day, a person does not know where to start in the first place due to the overwhelming amount of information.

What I have described above is a Problem from an aspiring Data Scientist point of view.

Build Me An ML Model

There is a bigger problem due to the “SHOULD KNOW” type of articles and the problem bearer are companies- both startups and big MNCs.

What are the problems you ask ?

Everybody wants to have a pie of the latest in thing “ Data Science” .

Many companies want to do Data Science and as things are new for many of these companies, the job description is often strange and the interview process even stranger.

Some of these companies influenced by “SHOULD KNOW” kind of articles tell the job applicant

Here is our problem, What Machine Learning Algorithms can be applied ?

The newly minted Data Scientists quickly blurt out 2–3 ML algorithms and the enamored company hires him/her . In due course of time the algorithms are implemented. The Data Scientist impresses the company with good accuracy % of the models. The models are put in production. But lo and behold, the model does not net the company the ROI it hoped for. What happened?

Well what happened was the Data Scientist did not have business acumen and thought his/her KPI was just building ‘good’ ML models. The company had business acumen but not the Machine Learning / Statistics Knowledge. The ideal marriage never happened.

The Ship Repair Man Story

We all have heard of this story or the variant of the story.

A ship company hired an Engineer to fix the engine of the ship. The Engineer had all the tools in his toolkit. After some analysis the engineer took out a hammer and hit one of the components of the engine. The Engine started to work.Next day the Engineer sent the invoice to the Ship company for a whopping $10,000 for hardly a 5 min job.

The Ship company manager was taken aback and asked the Engineer to itemize the invoice. The bill read as follows

Hitting with Hammer — $ 2
Knowing where to Hit — $ 9,998

Now you may think I am laying emphasis on Domain Knowledge and Experience, yes you guessed it right.

The Ship Repair Man - Data Scientist Analogy

The Engineer in the story had all the tools in his toolkit, yet chose only the hammer (perhaps the simplest tool) to fix the engine. Also, most importantly he knew where the problem was . Similarly, Shouldn't a Data Scientist choose to solve problems first by basic Analytics ? rather implementing Machine Learning algorithms straightaway ?

Minimizing Loss Function

“All Models are wrong, some are useful”.

In most Machine Learning Algorithms we try to minimize the loss function.

Models are an abstraction of the reality. The word here is abstraction. It is not actual.

If you think about it, the process of building Machine Learning Algorithms itself has a larger ‘Loss Function”. That is we differ from the reality.

So, shouldn’t we build less models to minimize this larger ‘Loss Function’ ?

Hey Data Scientist, Think like a CEO

Often we Data Scientists get pigeon holed into a very technical thinking. We think only in terms of which ML algo can be applied to x, y, z problem. How to do feature selection. How to reduce the number of features. How to improve the accuracy of the models.

What we don’t think is how the ML algorithms will benefit the company. How much money am I gonna save or earn the company through my ML algorithm. Will the ROI be positive ?

The most important question we forget to ask is “Is Machine Learning algorithm really required for this business problem” ?

I know the last statement would have set a cat among the pigeons. Many of you would be alarmed and probably might ask “Are you trying to put us out of our job ?”.

On the contrary, No.

There are many business problems which do require Machine Learning approaches but not all. Most of the business problems can be solved through simple analytics or a base line approach.

What will put us out of our job is Machine Learning Overkill. I have seen implementation of Machine Learning algorithms to very frivolous problems and worse still the companies have invested heavily into the idea. It is a ticking time bomb. The moment the companies realize that the ROI is negative, they will shun the Data Science practice altogether. We all know how difficult it is to win over a chided customer. No Data Science, No Data Scientist.

Cometh The Hour, Cometh The Data Science Auditor

The Industry is both excited and wary about the prospects of Data Science. Many who have implemented the Data Science solution are left disenchanted due to the poor ROI.

Enter the Data Science Auditor

I foresee a new job role being created “THE DATA SCIENCE AUDITOR”, where companies would hire experienced Data Scientist (statisticians / applied mathematicians) to audit the Data Science Projects.

In one of my recent consulting project I felt exactly like an auditor. I was asked to improvise the ML model built by a Data Scientist, but upon analysis found that the ML algorithm applied was not only wrong but for the given business problem no ML algorithm would work !!

The Client was simply taken in for a ride.

The Repercussion — The Client did not have a good opinion about Data Scientists and felt cheated both emotionally and monetarily.

Perhaps, next time ask not a Data Scientist “How many ML Algorithms you have built”

Rather ask

"How Many ML Algorithms You Have NOT Built"


If you liked my article, give it a like and you can also comment below to express your thoughts on the article.

You can reach out to me on

Linkedin

Twitter

Views: 1269

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Venkat Raman on February 8, 2018 at 10:10pm

Thanks Arnuld. Glad to know you liked my article. Yes the hype is overwhelming. 

Comment by arnuld on February 8, 2018 at 9:47pm

"Go back to the Source", that is exactly what your article reminds me, back to basics approach. Thumbs-Up for taking time to write it up. There is so much hype and so many courses on Data Science and ML and now DNN. It is really very good to see someone writing (and hence teaching) fundamentals.

I am a Computer Programmer and in 5 years and one of the most important things I have learned about  is business requirement, not just code. Code comes after Business.

Comment by Venkat Raman on February 7, 2018 at 8:15am

Thanks Roberto. Glad you liked my article. I don't think you have insulted your interviewers rather i would say your approach was spot on. Sooner or later your prospective employer will see the difference between you and others. Best of luck !

Comment by Roberto Jourdain on February 7, 2018 at 7:29am

Excellent!!!  I have many years of experience in business and I have trying to break into the data science field. I have taken multiple courses in Machine Learning and the likes, however I always find interesting that the interviews I have done, nobody cares about knowledge of business in general. They only care how many algorithms do you have lots of experience with.

When they ask me what is my approach to a data set, I tell them that I don't care about the data set at first. My first question is to know what the problem is... some time I get the feeling that I insulted them by saying so...

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service