Guest blog by Seth Dobrin and Daniel Hernandez.

Companies have been sold on the alchemy of data science. They have been promised transformative results. They modeled their expectations after their favorite digital-born companies. They have piled a ton of money into hiring expensive data scientists and ML engineers. They invested heavily in software and hardware. They spend considerable time ideating. Yet despite all this effort and money, many of these companies are enjoying little to no meaningful benefit. This is primarily because they have spent all these resources on too much experimentation, projects with no clear business purpose, and activity that doesn’t align with organizational priorities.

When the music stops and the money dries up, the purse strings will tighten up and the resources that are funding this work will die. It’s then that data science will be accused of being a scam.

To turn data science from a scam to source of value, enterprises need to consider turning their data science programs from research endeavors into integral parts of their business and processes. At the same time, they need to consider laying down a true information architecture foundation. We frame this as the AI ladder: Data foundation, analytics, machine learning, AI/Cognitive:

To break the current pattern of investing in data science without realizing the returns, businesses can address key areas:

  1. Finding, retaining, and building the right talent and teams
  2. Formulating an enterprise strategy for data and data science
  3. Operationalizing data science
  4. Overcoming culture shock

Finding, retaining and building the right talent and teams

Our two previous VentureBeat articles cover the composition of a data science team and the skills we look for in a data scientist. To recap, great data science teams rely on four skillsets: Data Engineer, Machine Learning Engineer, Optimization Engineer, and Data Journalist. If you want to maximize the number of qualified applicants, try posting roles with those four titles and skill sets instead of seeking out generic “Data Scientists”.

Retaining talent requires attention on several fronts. First, the team needs to be connected to the value they’re driving: How is their project impacting the line of business and the enterprise? Second, they need to feel empowered and know you have their backs. Finally, when planning for your team, build in 20–25% of free time work on innovative, blue-sky projects, to jump into Kaggle-like competitions, and to learn new tools and skills. Carving out that much time might seem pricey in terms of productivity, but it provides an avenue for the team to build the skills that accelerate future use cases — and it’s far more efficient than hiring and training new talent.

Formulating an enterprise strategy for data and data science

Identify, Value, and Prioritize Decisions

Map out the decisions being made and align them to tangible value, specifically, cost avoidance, cost savings, or net new revenue. This is the most important step in this process and the first step in shifting data science from research to an integral part of your business. We’ve previously mapped out a process for doing this in Six Steps Ups, but briefly, it requires direct conversations with business owners (VPs or their delegates) about the decisions they’re making. Ask about the data they use to make those decisions, its integrity, whether there’s adequate data governance, and how likely the business is to use any of the models already developed.

You can drive decisions using a dashboard that’s integrated directly into processes and applications. However, beware of situations where data simply supports preconceived notions. Instead, look for chances to influence truly foundational decisions:

“Where should we position product for optimal availability at minimal cost?”

“What are our most likely opportunities for cross-sell/up-sell for specific customers?”

“Which are my top-performing teams? Bottom-performing teams?”

“How can I cut costs from my supply chain by optimizing x given y constraints?”

Value each decision. Making decisions more quickly and with greater efficacy avoids costs, saves costs, or creates additional revenue. Express this value using whatever methodologies and terms your CFO advocates.

Prioritize the decision portfolio. This exercise creates a decision portfolio, which can serve as the basis for a data science backlog. Prioritize the backlog by assessing the likelihood of success, the ease of implementation, and the value (based on the scoring metric in the table above). We’ve developed a framework for building and prioritizing the portfolio by going through this exercise ourselves.

Discrete Deliverables. Next, take your top decisions and break them into manageable chunks that you can deliver in small sprints. This starts by identifying the minimal viable product (MVP) and then working back from there. Consider three-week sprints that can start delivering value (however small) after two sprints.

Operationalizing data science

Moving data from a research project to an integral part of your company requires operationalizing your data science program. In addition to building the team and setting the strategy, it requires integrating the models into processes, applications, and dashboards. Also plan for continual monitoring and retraining of model deployments.

Truly integrating the models means they can’t be deployed as csv files sent by email or code tossed over the wall to a development team. They need to be deployable as reusable and trusted services: versioned RESTful APIs output directly from the data science platform. Delivering models as csv files severs the connection to the process — and the feedback that comes from the implementation. Tossing R or Python code to a development team to convert it into an API is inefficient at best. But be prepared for some work. Setting up a robust process can often take three to six months and needs to be configured as a feedback-loop that easily allows your team to retrain and redeploy the models.

Applying predictive or prescriptive analytics to your business inevitably requires you to retrain the models to stay current with the accelerated rate of change they are driving and based on the feedback to the models from the outcomes themselves. We’ve seen instances where a team develops more than one hundred models to drive a single decision over the course of a year only to develop zero models the following year because the team is now focused entirely on monitoring and retraining of their existing models. It’s important to recognize that this isn’t a defect in their approach. They needed to build that many models to solve the problem. The issue is that in the course of operationalizing the model deployments, they didn’t automate the monitoring and retraining of those models.

Unless you’ve already executed a large number of data science projects for the enterprise, the challenges of operationalizing can come as a surprise — but they are very real.

Derived data products. We can often overlook the fact that our engineered features are valuable data in and of themselves. As part of model building and engineering, consider deploying this new data as APIs and integrating them into the appropriate data assets rather than letting them remain proprietary. For example, if a data science team engineers a feature that combines customer data, product data, and finance data, deploy the new feature as an API and have the corresponding model consume that new API.

Overcoming culture shock

Among the various reasons that data science becomes a scam at so many enterprises, one reason in particular looms large: cultural resistance. To break through resistance from management, focus on any of their peers who are excited to engage. Once they start applying the data and models in their processes and applications, the advocates may start to outperform the resistors. At some point, managers will ask what they are doing differently, and the resistors may feel pressure to shift their positions. Think of this as leading through shame. The value you demonstrate to managers is often about out-performing their peers by avoiding costs, saving money, or creating net new value

Individual contributors might resist the shift for a few different reasons. They might be worried they’ll be replaced by the machine or that the people who built it don’t fully understand the process or environment. Both are valid concerns. Buy credibility by being honest and addressing concerns head-on. However, in most cases you won’t actually be automating anyone out of a job, but rather making each job safer or more efficient. Help the team to see this directly. For the concern that the data science team doesn’t really understand what they do, consider pull one of the hold-outs off the floor and asking them work directly on the project as a product owner or subject matter expert. That provides other resisters an advocate that is “one of us”. When that team member returns to his regular job, you’ll have an advocate for the current data science approach, as well as an advocate for future implementations and deployments.

Finally, you can overcome the culture shock by raw mass. Identify a use case and build a related hack-a-thon that’s sponsored by senior executives. The hack-a-thon should include basic presentations on machine learning, cloud, and APIs, as well as more advanced presentations and conversations on the same topics. Let the teams work hands-on with the use case and allow individuals across the company to participate, independent of their training and background.

To turn the alchemy of data science into gold, enterprises must align their data science efforts to business outcomes with real and tangible value. They must stop focusing on experimentation and shift their efforts to data science as an integral part of their business models and align these with corporate priorities. If you follow the methodology above, the music will keep on playing, the funding will keep flowing, and data science will not be a scam in your enterprise.

Originally posted here.

About the Author

Seth Dobrin, PhD, is VP & CDO, IBM Analytics. Exponential Change Leader and Life-Long Learner with a proven track record of transforming businesses via data science, information technology, automation and molecular techniques. Highly skilled at leading thru influence across complex corporate organizations as demonstrated by having built, developed and executed corporate wide strategies.

Related Articles

DSC Resources

Views: 24136


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Bill Luker Jr on July 28, 2018 at 9:48am

No Vince, you didn't read my blog post over at PATimes. And I've got a PhD, too. And I'm not talking about "breeds" of CS-IT Data Sciences, where everything is algorithmically based, and scientific methodology of asking questions first, then doing the analysis, has been thrown out the window based on idiotic articles in Wired. I am talking about all the sciences that use data, and which have all adopted statistical approaches to their work. CS and IT people almost never ask the questions that are demanded of any kind of analytic endeavor--questions like "What do we want to know?" And they do not know how to help their management friends pose and formulate those questions so they are tractable. It's all about data management, dude, not data analysis. Just ask around. Management is hopping mad.  

Comment by Vincent Granville on July 28, 2018 at 9:37am

Bill, you can do data science with or without statistics, be it big or small data. I spent the first 15 years of my career  deeply involved in advanced statistical modeling -- including earning a PhD in computational statistics, doing a postdoc at the Statslab at Cambridge University, and working for NISS. Despite my strong statistical background, I have found over my long corporate career (including productizing algorithms to process trillions of credit card transactions or Internet data) that you can do just as well, if not better, with a data-driven, model-free approach in many cases. This is particularly true for machine-to-machine communications and black-box or automated data science, designing tools that non-statisticians can use, understand, implement, and maintain. In short, favoring simplicity, robustness, and ease of use, over technicalities, arcane modeling, or jargon, that many executives don't understand anyway. You can even compute confidence intervals or perform statistical tests of hypotheses without knowing basic probabilities, random variables, or statistical distributions: see here. Engineers love it.

As for data scientists, I agree that there are many different breeds. I wrote about this in my article Six categories of data scientists, as well as in 16 disciplines compared to data science

Comment by Bill Luker Jr on July 28, 2018 at 9:09am

This cat from IBM goes through 1200 words or so to tell you a way to organize a data analysis shop. He says nothing about what you to have to do to get valid analytical results, how to uncover relationships that are meaningful, or what the data science scam is really all about.

For example, I think it's fair to say that most "data scientists" think that “bigness" mitigates whatever may be wrong with data that might bias findings from analytical operations on it. But the problem is that the authors and the rest of the mob at Data Science Central do not recognize that the data science of Computer Science and IT is not the only type of "data science." It is but one type, and there are dozens, if not hundreds. (see Bill Luker, Jr. (2018) “The “Data Science” of Computer Science and IT Is Not the Only Data ....” February 02. https://www.predictiveanalyticsworld.com/patimes/data-science-compu...)

This statement about "bigness" is similar to other claims of CS-IT Data Science “evangelists” (salesmen) such as “no need for statistics (i.e., ‘not invented here’),” “NO THEORY ALLOWED Past This Point,” or “no sampl(ers) need apply.” Pardon the puns, but these are mere slogans for logical and analytical dead-ends. They are not supported (and may not be supportable) with evidence from independent research, i.e., not carried out by vendors. Not IBM, not SAS, or any of the lesser ones. I don’t see anybody doing that.

Big data is and has been less easy to build, manage, and most importantly, analyze, than originally claimed. I think everyone recognizes this. But the reasons are unfortunately not clear to the two groups most likely to be misled by Big Data marketing about the scientific facts of life in the data and predictive analysis world: People who run companies and other organizations  and want answers; and the CS-IT Data Scientists who are often mis-employed in finding them.

The facts are that we simply cannot analyze Big Datasets without the tools and empirically grounded theory from what I call the Statistical Data Sciences (see the reference to my blog post, above)—known everywhere as just plain statistics. And CS-IT Data Science (with perhaps the exception of machine learning, in its main role as automated applied statistics) has backed itself into a blind alley by dismissing statistics. An approach that embraces this dismissal--and it is widespread--is doomed to failure in the industry settings in which it is being applied. A recognition and rejection of these slogans, and adoption of proven Statistical Data Science analytical methods that have been properly adapted to the high dimensionality of Big Data will go further than any organizational methods (please, it is not a "methodology," methodology is the study of the logic of method) proposed in this highly unsatisfactory and narrowly conceived piece. But what else would I expect from a computer "scientist"? You guys need to get with it or this whole thing will come crumbling down upon our heads, and CS-IT Data Science, as currently practiced and thought about, will push the analytics revolution in business and other endeavors back 25 years.  

Comment by Vincent Granville on July 28, 2018 at 7:58am

One role that is overlooked in enterprise for an expert data scientist, is that of adviser. There are plenty of "unicorn" data scientists contrarily to popular belief, with considerable knowledge, business acumen, and expertise across multiple domains and industries -- computer science, data engineering, statistics, operations research. Rather than spending millions of dollars to built a team of pure geeks, you might get better results with a part-time adviser, for $20k per year, who will work and help implement solid, proven solutions with your executive team and software engineers already in place in your company. These advisers typically have decades of experience working for all sort of companies, and can talk two different languages: one that your CEO understands, and one that your software engineers understand, thus being the missing bridge between the decision making process / ROI creation and maintenance, and the coding / implementation / development process. Thus far, I don't know anyone working in this capacity, but I know that I could if I was not busy working on other stuff (managing my own companies, in particular.)

Currently, these "unicorns" are misused or underused in enterprise, usually assigned a role of developer or coder -- sometimes because the job they signed up for is totally different than the one advertised in the first place and discussed during the job interview. This is not an efficient way to do business.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service