The next stage of DevOps: Bringing the data along for the ride

Google, database and visualisation luminaries discuss the next stages in data democratisation

According to the DevOps creation myth, early Devs used to haul their code to a mythical wall. They would then throw it over wait for the Ops folks to get it running, which could take many moons. These delays (which truth be told were often down to indolence and sloth on the part of the Devs rather than any fault of the Ops), were the cause of much wailing and gnashing of teeth and led to bad blood between the two groups, which in turn caused the code to be late and of poor quality. The Customer looked upon it and was displeased. But having been shown the error of their ways the warring tribes came together in a spirit of collaboration and lo the code was good.

Many working at the cutting edge of large-scale cloud native software and machine learning believe a similar intervention is required with data. And it's not just at the cutting edge where access to data needs to be rethought. The movement to democratise data access beyond the domain of technologists has barely got started. Moribund pipelines now stand in the way of progress the way snowflake servers did in the pre-DevOpsian era.

The need to free data from its current constraints via some kind of DataOps was a key topic of conversation during a DataStax-hosted video call last week.

Everyone should be part data scientist

Melody Meckfessel, formerly a data engineer at Google and now CEO at Observable, the team behind the D3.js open-source data visualisation tool, comes at the problem from a right-brain point of view. Tapping into the human visual system helps us think more effectively, she believes.

"I want everyone to be part data scientist," she said. "We don't want any barriers around exploring the data and getting to the insight that we need.

"Bringing the data scientists, and the developers, and the data analysts, and the business analyst, and the financial analyst and the hobbyist who's working on a project together to be able to collaborate and share, I think that's the journey for the next five to 10 years."

Join us for DeskFlix: DevOps on Tuesday 30th June. Register for free today.

Currently, there's something of a data scientist - DevOps divide that mirrors the old Dev vs Ops schism, Meckfessel said.

"Data scientists use their own toolset, working in their own environment, and when they have something interesting, they throw it over the wall to the developer team to turn it into something that's interactive and can reach more folks within the organisation or out in the world."

The ideal environment for idea sharing and collaboration around code is visual rather than text-based, she said, requiring no compilation and thus allowing for rapid iteration in support of more open experimentation.

"We have the potential to reach many more folks who don't consider themselves developers, right? Because now they have accessibility to code because they're starting from an example, and they can tweak it and learn much faster."

A million small databases

Google's Eric Brewer VP of infrastructure and Google fellow, was instrumental in the development of Kubernetes. He tackles the issue of data liberation in terms of pipelines and metadata.

"I think there's plenty more to do on how do you do state management in something like Kubernetes," he said.

"The models we have are not quite right. So for example, within Google there are storage teams, and if you're not on a storage team, you don't get to do state management. That's very hard to do."

While this setup has led to an increase in Google's developer productivity, for now, since they don't need to worry about the storage layer, it doesn't fit with Brewer's vision of the future in which individuals will be able to ‘fork' data in the same way that they can fork parts of a codebase before eventually merging their work back into a central repository. For this vision to be fulfilled there will need to be a lot more automation, he said, and that will require much more metadata concerning data usage, security, units of measurement, axes and provenance.

"If you want a million small databases, that's actually a hard problem today. We need lots of databases to give the autonomy to teams, even if they're built out of fractions of larger databases. But we don't quite have the right tools for it, and we definitely don't have the right metadata for it," Brewer explained.

"So, if you want to collaborate and pick up data from someone else, you need to know a lot about that data if you want to use any automation at all. It doesn't need to have a full database schema, but it needs to have enough metadata that automation is possible. We're missing that layer today."

Cloud native is really fundamentally transformative

For Sam Ramji, chief strategy officer at DataStax, databases are going to have to change further. Most databases are still designed to be a single repository on sitting on a single machine or multiple machines, whereas the future will be much more heterogeneous, a collection an ever-changing array of loosely coupled sources. Cloud native means breaking monoliths into component parts and dealing with messaging that is fundamentally asynchronous.

"The move to cloud native is really fundamentally transformative," he said.

"Cassandra used to have to take care of all the lower layers by itself, so of course it did it in a slightly monolithic and quirky way. But being able to embrace that idea of, of small pieces loosely joined, letting go, kind of exhaling and trusting Kubernetes to take on a bunch of that also lets us improve the lives of operators," he said.

Ramji continued: "There are a lot of people who are working on cloud native data. I don't think any of us has the right solution"

The next stage, he added, will be enabling data to follow applications as they become almost infinitely scalable.

"In the 2010s when Melody and Eric and others built a compute infrastructure that could scale to billions of nodes, right, billions of workloads. This next piece is where we bring data along for that ride. That's what gets my propeller hat spinning these days."