Subscribe to DSC Newsletter

Guest blog by Seth Dobrin and Daniel Hernandez.

Companies have been sold on the alchemy of data science. They have been promised transformative results. They modeled their expectations after their favorite digital-born companies. They have piled a ton of money into hiring expensive data scientists and ML engineers. They invested heavily in software and hardware. They spend considerable time ideating. Yet despite all this effort and money, many of these companies are enjoying little to no meaningful benefit. This is primarily because they have spent all these resources on too much experimentation, projects with no clear business purpose, and activity that doesn’t align with organizational priorities.

When the music stops and the money dries up, the purse strings will tighten up and the resources that are funding this work will die. It’s then that data science will be accused of being a scam.

To turn data science from a scam to source of value, enterprises need to consider turning their data science programs from research endeavors into integral parts of their business and processes. At the same time, they need to consider laying down a true information architecture foundation. We frame this as the AI ladder: Data foundation, analytics, machine learning, AI/Cognitive:

To break the current pattern of investing in data science without realizing the returns, businesses can address key areas:

  1. Finding, retaining, and building the right talent and teams
  2. Formulating an enterprise strategy for data and data science
  3. Operationalizing data science
  4. Overcoming culture shock

Finding, retaining and building the right talent and teams

Our two previous VentureBeat articles cover the composition of a data science team and the skills we look for in a data scientist. To recap, great data science teams rely on four skillsets: Data Engineer, Machine Learning Engineer, Optimization Engineer, and Data Journalist. If you want to maximize the number of qualified applicants, try posting roles with those four titles and skill sets instead of seeking out generic “Data Scientists”.

Retaining talent requires attention on several fronts. First, the team needs to be connected to the value they’re driving: How is their project impacting the line of business and the enterprise? Second, they need to feel empowered and know you have their backs. Finally, when planning for your team, build in 20–25% of free time work on innovative, blue-sky projects, to jump into Kaggle-like competitions, and to learn new tools and skills. Carving out that much time might seem pricey in terms of productivity, but it provides an avenue for the team to build the skills that accelerate future use cases — and it’s far more efficient than hiring and training new talent.

Formulating an enterprise strategy for data and data science

Identify, Value, and Prioritize Decisions

Map out the decisions being made and align them to tangible value, specifically, cost avoidance, cost savings, or net new revenue. This is the most important step in this process and the first step in shifting data science from research to an integral part of your business. We’ve previously mapped out a process for doing this in Six Steps Ups, but briefly, it requires direct conversations with business owners (VPs or their delegates) about the decisions they’re making. Ask about the data they use to make those decisions, its integrity, whether there’s adequate data governance, and how likely the business is to use any of the models already developed.

You can drive decisions using a dashboard that’s integrated directly into processes and applications. However, beware of situations where data simply supports preconceived notions. Instead, look for chances to influence truly foundational decisions:

“Where should we position product for optimal availability at minimal cost?”

“What are our most likely opportunities for cross-sell/up-sell for specific customers?”

“Which are my top-performing teams? Bottom-performing teams?”

“How can I cut costs from my supply chain by optimizing x given y constraints?”

Value each decision. Making decisions more quickly and with greater efficacy avoids costs, saves costs, or creates additional revenue. Express this value using whatever methodologies and terms your CFO advocates.

Prioritize the decision portfolio. This exercise creates a decision portfolio, which can serve as the basis for a data science backlog. Prioritize the backlog by assessing the likelihood of success, the ease of implementation, and the value (based on the scoring metric in the table above). We’ve developed a framework for building and prioritizing the portfolio by going through this exercise ourselves.

Discrete Deliverables. Next, take your top decisions and break them into manageable chunks that you can deliver in small sprints. This starts by identifying the minimal viable product (MVP) and then working back from there. Consider three-week sprints that can start delivering value (however small) after two sprints.

Operationalizing data science

Moving data from a research project to an integral part of your company requires operationalizing your data science program. In addition to building the team and setting the strategy, it requires integrating the models into processes, applications, and dashboards. Also plan for continual monitoring and retraining of model deployments.

Truly integrating the models means they can’t be deployed as csv files sent by email or code tossed over the wall to a development team. They need to be deployable as reusable and trusted services: versioned RESTful APIs output directly from the data science platform. Delivering models as csv files severs the connection to the process — and the feedback that comes from the implementation. Tossing R or Python code to a development team to convert it into an API is inefficient at best. But be prepared for some work. Setting up a robust process can often take three to six months and needs to be configured as a feedback-loop that easily allows your team to retrain and redeploy the models.

Applying predictive or prescriptive analytics to your business inevitably requires you to retrain the models to stay current with the accelerated rate of change they are driving and based on the feedback to the models from the outcomes themselves. We’ve seen instances where a team develops more than one hundred models to drive a single decision over the course of a year only to develop zero models the following year because the team is now focused entirely on monitoring and retraining of their existing models. It’s important to recognize that this isn’t a defect in their approach. They needed to build that many models to solve the problem. The issue is that in the course of operationalizing the model deployments, they didn’t automate the monitoring and retraining of those models.

Unless you’ve already executed a large number of data science projects for the enterprise, the challenges of operationalizing can come as a surprise — but they are very real.

Derived data products. We can often overlook the fact that our engineered features are valuable data in and of themselves. As part of model building and engineering, consider deploying this new data as APIs and integrating them into the appropriate data assets rather than letting them remain proprietary. For example, if a data science team engineers a feature that combines customer data, product data, and finance data, deploy the new feature as an API and have the corresponding model consume that new API.

Overcoming culture shock

Among the various reasons that data science becomes a scam at so many enterprises, one reason in particular looms large: cultural resistance. To break through resistance from management, focus on any of their peers who are excited to engage. Once they start applying the data and models in their processes and applications, the advocates may start to outperform the resistors. At some point, managers will ask what they are doing differently, and the resistors may feel pressure to shift their positions. Think of this as leading through shame. The value you demonstrate to managers is often about out-performing their peers by avoiding costs, saving money, or creating net new value

Individual contributors might resist the shift for a few different reasons. They might be worried they’ll be replaced by the machine or that the people who built it don’t fully understand the process or environment. Both are valid concerns. Buy credibility by being honest and addressing concerns head-on. However, in most cases you won’t actually be automating anyone out of a job, but rather making each job safer or more efficient. Help the team to see this directly. For the concern that the data science team doesn’t really understand what they do, consider pull one of the hold-outs off the floor and asking them work directly on the project as a product owner or subject matter expert. That provides other resisters an advocate that is “one of us”. When that team member returns to his regular job, you’ll have an advocate for the current data science approach, as well as an advocate for future implementations and deployments.

Finally, you can overcome the culture shock by raw mass. Identify a use case and build a related hack-a-thon that’s sponsored by senior executives. The hack-a-thon should include basic presentations on machine learning, cloud, and APIs, as well as more advanced presentations and conversations on the same topics. Let the teams work hands-on with the use case and allow individuals across the company to participate, independent of their training and background.

To turn the alchemy of data science into gold, enterprises must align their data science efforts to business outcomes with real and tangible value. They must stop focusing on experimentation and shift their efforts to data science as an integral part of their business models and align these with corporate priorities. If you follow the methodology above, the music will keep on playing, the funding will keep flowing, and data science will not be a scam in your enterprise.

Originally posted here.

About the Author

Seth Dobrin, PhD, is VP & CDO, IBM Analytics. Exponential Change Leader and Life-Long Learner with a proven track record of transforming businesses via data science, information technology, automation and molecular techniques. Highly skilled at leading thru influence across complex corporate organizations as demonstrated by having built, developed and executed corporate wide strategies.

Related Articles

DSC Resources

Views: 14909


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Phil Curtis on August 1, 2018 at 12:14am

Good observations and suggestions - the potential "Data Science" scam is easy for companies to fall into and risks future investment.

Perhaps similar to Steve's Derived Data Products, I would consider splitting out good ol' "BI" below its current "Analytics" on your ladder (but above "Data & IA" fundamentals) - not to split hairs but to differentiate, because BI apps (PowerBI, Tableau, Qlik, TIBCO etc) can be quick to employ for immediate cross-analysis of categories, boundaries and targets to get the business to focus on where the more predictive or algorithmic Analytics should add value. BI can also be used to help quantify the benefits post project. Why wait 1.5 months for first sprint value from deeper analytics when you could get BI head-up and decision value providing guidance within half that? It seems in the phrase BI being gradually replaced by Analytics, many orgs just don't have adequate BI level of sight to act as the foundation for Data Science, including its provision of good measures of success.

Comment by Steve Kamotho on July 30, 2018 at 12:35pm

Great insights Seth. I think you understate the "derived data products". This area is a separate ROI stream as data products, if viewed as a proper object rather than just a derived byproduct, can be a goldmine. A decision should be made whether the mainstream data science team has the mandate to pursue this or a focus team is needed. 

Comment by Vincent Granville on July 30, 2018 at 11:23am

John - you are right: there will be unfortunate casualties: very promising junior data scientists who won't get a job anymore. It has already started to happen: I see more and more junior data scientists from all over the world, who want to connect with me on LinkedIn. Many have this statement in their job title: "actively looking for work / internship." I am not sure how best to change this situation. Anyone with a degree (be it engineering, math, data science, operations research, MBA) from a respected school (MIT, Stanford, Berkeley, NYU, Carnegie Mellon etc.) I accept the invite.

Yet it makes me a bit sad: I was not born in wealth, I come from a small unknown university (Namur, Belgium) but somewhat managed to succeed despite my modest background. I have ideas about how  to change things (see first doctorship in data science) but at his point, I am interested in partnering with professionals who want to make a positive impact on our profession: I don't have the time or resources to do it myself alone, being extremely busy working on a number of projects.

Comment by John L. Ries on July 30, 2018 at 10:37am


I do have a concern about lower level people losing their jobs and possibly having to change careers because of the sins of their bosses, but that happens in every bubble and is probably unavoidable (I'm sure you're old enough to know the term "dot bomb").  But I agree that  "data scientists" who have proven their value will have no problem keeping their jobs or getting new ones (karma usually works well in the job market); if nothing else, the occasional recession is a guarantee of full employment for competent econometricians.

The problem I have with the term "data scientist" is that it is overly vague.  If you think about it, all scientists are data scientists because all real science is driven by data (such is the nature of the scientific method); and forget about "theory free" data science; there is no such thing.  But I'm a mere "data engineer" (aka a statistical programmer), so what do I know?

Comment by Vincent Granville on July 30, 2018 at 10:18am

John, one of the issues I see is that anyone can call herself a data scientist. It is not a regulated profession, unlike calling yourself a lawyer or doctor in medicine. And I think it should not be regulated. Eventually, employers will learn who to hire and who to not hire. Not different than hiring a SEO expert (80% of them are scammers and destroyed the reputation associated with that profession.) Yes, I might change my job title again, moving away from calling myself a data scientist, possibly shifting to entrepreneur (another much abused word.) 

In a different context, I am now purchasing a new big home in Anacortes. People who lend money have now figured out who they can trust: a guy (me) not on a payroll for over 6 years, yet after due diligence the lender knows we can pay everything. People with a big W2 income may be rejected for that kind of loan. The same will happen with data science: very soon, hiring managers will be able to easily figure out who can truly deliver ROI, and those who are not known for delivering ROI, won't get a job anymore. 

Comment by John L. Ries on July 30, 2018 at 9:54am

It is somewhat of a concern, but not as big of one as it would have been if "data science" were actually new.  The bubble will assuredly burst, but those researchers who have been straightforward with their superiors and clients about what they can do for the organization will probably come out of it all right, though they might have to go back to calling themselves "statisticians" or "data miners".  And those who have been overselling their capabilities deserve what they get.

Comment by Michael Bryan on July 30, 2018 at 6:54am

Well done Seth, Daniel.

In particular, the decomposition of the mystical data scientist into component roles. For those who struggle with staffing advanced analytics, this helps immensely - the variety of personality in these roles will allow recruiters and leaders to manage beyond 'alchemy'.

It's also good news in wolf's clothing.  Hiring the few scientists now means hiring more, defining their methods and relating their roles to mission, objectives. Growing pains are signs of growth, maturing.

Again, good note. Well said.

Comment by Robert D. Brown III on July 29, 2018 at 9:57pm

Seth - I began discussing this issue in various outlets 6 years ago after reading a WSJ article entitled “So, What’s Your Algorithm?” (Berman, Jan 4, 2012) 

I also wrote another LinkedIn article on Oct 29, 2017 where I laid out my concerns for the hype surrounding “big data analytics.” The concern being that the business case value of data analytics was not being established and that was leading many people to allocate analytic resources to irrelevant problems, which was ultimately leading to their eventual failure. 

The real point is that data analytics needs decision management to increase the likelihood of it being effective to the organization that uses it. I think on this point we concur.

Comment by Seth Dobrin on July 29, 2018 at 8:23pm

Bill, The goal of this post was to lay out what it takes to build a data science team and more importantly what you should and shouldn't do when building an entire practice from the ground up in a fortune 500 company. It was not to go into any mathematical details about the topics you describe.

The points you make are critical, but they are also meaningless if you cant get them used or valued in a large company

Comment by Seth Dobrin on July 29, 2018 at 8:16pm

This was originally published on Medium March 16th of this year

Follow Us


  • Add Videos
  • View All


© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service