Data Ops: What it is, and why you should care

Data Ops brings together principles of Agile, DevOps and Lean Manufacturing, but it isn't right for every situation, argues James Lupton, CTO of Cynozure

You've heard of DevOps, DevSecOps, NoOps and potentially AIOps, but Data Ops is unlikely to be on the radar of many IT leaders just yet.

Data Ops brings together principles of Agile, DevOps and Lean Manufacturing, and is designed to help organsations get more value form their data, and faster to boot.

Which means it's potentially of huge value, given the emphasis placed on data by organisations today, much as some dislike the overused phrase 'data is the new oil'.

Computing spoke to James Lupton, CTO at data and analytics consultancy Cynozure to find out more.

CTG: What is Data Ops and why should IT leaders care about it?

James Lupton: DataOps brings together principles of Agile, DevOps and Lean Manufacturing, with an aim to help organisations rapidly produce insight, turn that insight into operational tools, and continuously improve analytic operations and performance.

It emphasises communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL engineers, information technology (IT), and quality assurance/governance. Successfully implemented, it will increase the velocity, reliability, and quality of data analytics.

These core principles are all looked at through a data lens though, as working with data presents a whole load of new challenges compared to traditional software engineering due to the fluid nature of data.

Much like the agile, there's a DataOps manifesto https://www.dataopsmanifesto.org.

How is it different from DevOps?

One of the challenges when comparing these topics is that definitions vary, some more tightly scoped and some more broadly. In many ways, the core mission of the two concepts is the same - to drive quality and become more efficient with the delivery of new ‘products'.

As I've already suggested data presents a number of different challenges that software engineering, and by extension DevOps, don't have to deal with. That could be the variety of skills and resources involved who aren't as comfortable with hardcore software engineering and coding such as analysts and data scientists or the mercurial nature of data and the different testing challenges this drives such as managing large volumes or security across environments.

What sorts of problems is Data Ops designed to solve?

DataOps is all about getting more value from your data, faster. It looks to achieve this through a number of mechanisms, but the two primary ones are 1) reducing the time it takes to prototype and release new products and 2) improving the quality of delivered work through automation.

This is particularly relevant for large teams of engineers, analysts and support engineers etc that need to coordinate together and who would otherwise struggle to get access to the right data and tools to do their work.

One of the chief complaints we hear from analytics teams that DataOps can solve is getting access to a sandbox environment with the right tools and data they need for the analysis they are doing. Often the solution ends up being a new silo in a cloud somewhere that has none of the right controls in place. DataOps looks to automate and standardise the process for requesting and deploying these environments to make doing the right thing the easiest thing.

[Turn to next page]

Data Ops: What it is, and why you should care

Data Ops brings together principles of Agile, DevOps and Lean Manufacturing, but it isn't right for every situation, argues James Lupton, CTO of Cynozure

When might Data Ops not be appropriate?

A full-fledged DataOps process covering the entire tool chain from environment management, governance and test automation through to continuous integration and deployment, orchestration and monitoring isn't going to suit everyone.

This could be a particularly costly exercise for smaller organisations who stand to benefit less from the efficiency savings they might gain making the investment in tools particularly hard to swallow. DataOps doesn't have to be implemented in an all or nothing way though, and different parts can be picked and customised to suit an organisations most pressing needs.

How might IT leaders build a business case for Data Ops?

There are two ways that you can approach this. As with most data initiatives, the best place to start is with use cases. This means having a clear view of the use cases you plan to deliver and the value they will add to your organisation.

If you're able to demonstrate that instead of delivering three use cases a year you can deliver five due to improved efficiency, you're able to talk about the value add those additional use cases will drive for the organisation. This all comes back to having a clear data strategy and good understanding of where value is hiding in your data.

Alternatively, if you're a huge organisation with a big head count in data, then you can look at the efficiencies and savings you can drive through a well implemented DataOps approach. A recent Gartner study showed that the average losses organisations incurred due to poor data quality is $15m.

A figure that can be significantly reduced through DataOps tooling and processes such as test automation, governance and pipeline monitoring.

The evidence from the devops world is also fairly compelling showing that well implemented DevOps processes can help save a lot of money. For example, it costs on average 5x as much to fix a bug in production as it does catching it earlier in the dev cycle and the average cost of software defects to a $1bn organisation is $6m annually.

How should Data Ops projects / initiative start?

When rolling out a DataOps initiative you'll need to start with people and process. It's easy to want to jump straight into technology, but if you're not clear about how things are going to run, what role people are going to play and what tools people need access to your going to run into problems quickly.

When thinking about the technology you should start with, if it isn't already in place then I'd recommend getting a tool like Jira properly setup so you can really get to grips with what's being worked on and how long things are taking.

The key here is to identify bottlenecks. If most projects take six months to deliver and three of those are spent waiting to get access to the right environment then you know environment management and provisioning is where you should focus to reap the maximum benefit as soon as possible. Likewise, if development work gets done quickly but it takes a long time to release to production then a continuous deployment technology may be where you should be focusing initially.

What are the challenges or potential pitfalls of a Data Ops approach? Are cultural challenges common?

The pitfalls on the technical side are easy state, but perhaps less easy to avoid. The best thing I can recommend here is to look at the entire toolchain you will need and decide whether you're looking for a single (or as few as possible) providers for that tooling or whether you're picking best in class components.

Either way, it's crucial to ensure that all components integrate with both the other elements of the DataOps toolchain as well as any core underlying data platform technologies. A piecemeal approach to thinking about your architecture mean you find incompatiblyity down the line.

The one thing most likely to challenge an organisation culturally is the cross functional nature of the teams. There's a good chance that your business operates with different capabilities such as data engineering, analytics and testing as separate teams.

DataOps brings together a range of skills from across functions and focuses them on delivering something of value. This new cross-functional way of working can be a major departure from the norm for many organisations and coupled with agile principles and increased automations can signify a significant upset to the status quo.

[Turn to next page]

Data Ops: What it is, and why you should care

Data Ops brings together principles of Agile, DevOps and Lean Manufacturing, but it isn't right for every situation, argues James Lupton, CTO of Cynozure

What are the benefits of implementing a Data Ops culture?

As DataOps is all about the delivery of value faster more reliably, it can help change the conversation around data projects. Instead of being focused on deploying some technology or getting some data into a data lake people will start to look at what value a specific deliverable adds to the business. This change in thinking, not just within the data teams but across the business as a whole, can help bring the best ideas to the forefront, support prioritisation and ultimately help data deliver more value to the organisation.

What's the future for Data Ops in terms of its adoption, and any evolution you see in its approach and implementation?

It's still early days for DataOps but in 2020 I'm expecting to see a significant increase in its adoption as firms look to maximise their investments over the last few years.

While the processes side of DataOps draws on some fairly established methodology, I expect the technology to enable it will change considerably. On the one hand, elements of it will start to more commonly come as a built in feature of platforms. This will be complemented by new offerings coming to market focusing on specific issues - test automation for data is an area ready for lots of innovation for example.

The wider availability of these tools will expose more organisations to the concepts of DataOps and as such the skillsets associated with using it will become more widely expected of data professionals.

DataOps big advantage is that it keeps its focus squarely on value and that's going to help it be an easier concept for businesses to grasp and ultimately adopt as the default way of working for data teams.