Home » Uncategorized

Bridging the Gap Between Data Science and DevOps

What’s the real value of data science? Hailed as the sexiest job of the 21st century just a few years ago, there are rumors that it’s not quite proving its worth. Gianmario Spacagna, a data scientist for Barclays bank in London, told Computing magazine at Spark Summit Europe in October 2015 that, in many instances, there’s not enough impact from data science teams – “It’s not a playground. It’s not academic” he’s quoted as saying. His solution sounds simple. We need to build a bridge between data science and DevOps:

If you’re a start-up, the smartest person you want to hire is your DevOps guy, not a data scientist. And you need engineers, machine learning specialists, mathematicians, statisticians, agile experts. You need to cover everything otherwise you have a very hard time to actually create proper applications that bring value.

This idea makes a lot of sense. It’s become clear over the past few years that ‘data’ itself isn’t enough; it might even be distracting for some organizations. Sometimes too much time is spent in spreadsheets and not enough time is spent actually doing stuff. Making decisions, building relationships, building things – that’s where real value comes from.

What Spacagna has identified is ultimately a strategic flaw within how data science is used in many organizations. There’s often too much focus on what data we have and what we can get, rather than who can access it and what they can do with it. If data science isn’t joining the dots, DevOps can help. True, a large part of the problem is strategic, but DevOps engineers can also provide practical solutions by building dashboards and creating APIs. These sort of things immediately give data additional value by making they make it more accessible and, put simply, more usable. Even for a modest medium sized business, data scientists and analysts will have minimal impact if they are not successfully integrated into the wider culture.

While it’s true that many organizations still struggle with this, Airbnb demonstrate how to do it incredibly effectively. Take a look at their Airbnb Engineering and Data Science publication on Medium. In this post, they talk about the importance of scaling knowledge effectively. Although they don’t specifically refer to DevOps, it’s clear that DevOps thinking has informed their approach. In the products they’ve built to scale knowledge, for example, the team demonstrate a very real concern for accessibility and efficiency. What they build is created so people can do exactly what they want and get what they need from data. It’s a form of strict discipline that is underpinned by a desire for greater freedom.

If you keep reading Airbnb’s publication, another aspect of ‘DevOps thinking’ emerges: a relentless focus on customer experience. By this, I don’t simply mean that the work done by the Airbnb engineers is specifically informed by a desire to improve customer experiences; that’s obvious. Instead, it’s the sense that tools through which internal collaboration and decision making take place should actually be similar to a customer experience. They need to be elegant, engaging, and intuitive. This doesn’t mean seeing every relationship as purely transactional, based on some perverse logic of self-interest, but rather having a deeper respect for how people interact and share ideas. If DevOps is an agile methodology that bridges the gap between development and operations, it can also help to bridge the gap between data and operations.

This isn’t a new idea. As much as I’d like to, I can’t claim credit for inventing ‘DataOps’. But there’s not really much point in asserting that distinction. DataOps is simply another buzzword for the managerial class. And while some buzzwords have value, I’m not so sure that we need another one. More importantly, why create another gap between Data and Development? That gap doesn’t make sense in the world we’re building with software today. Even for web developers and designers, the products they are creating are so driven by data that separating the data from the dev is absurd.

Perhaps then, it’s not enough to just ask more from our data science as Gianmario Spacagna does. DevOps offers a solution, but we’re going to miss out on the bigger picture if we start asking for more DevOps engineers and some space for them to sit next to the data team. We also need to ask how data science can inform DevOps too. It’s about opening up a dialogue between these different elements. While DevOps evangelists might argue that DevOps has already started that, the way forward is to push for more dialogue, more integration and more collaboration.

As we look towards the future, with the API economy becoming more and more important to the success of both startups and huge corporations, the relationships between all these different areas are going to become more and more complex. If we want to build better and build smarter we’re going to have to talk more. DevOps is a good place to start the conversation, but it’s important to remember it’s just the start.