What's happening with open-source databases?

Cloud has changed the rules of engagement, says Aiven CEO Oskari Saarenmaa

After operating systems and web servers, the area where open-source software made a real early impact was databases. What started with Postgres and MySQL continued with MongoDB, Couchbase and Redis and this has carried on into the data streaming age with technologies like Kafka, Spark and Flink.

Databases and streaming are also where open-source licensing has become controversial with MongoDB, Redis Labs and Confluent adding restrictive clauses to some parts of their products to prevent their IP being monetised by cloud vendors.

Helsinki-headquartered Aiven is a managed service provider that hosts and manages exclusively open-source databases, streaming and messaging systems. As such, CEO Oskari Saarenmaa is well positioned to spot developments in the open-source data management landscape.

Computing: Have you seen much impact from the licensing changes at companies like MongoDB?

Saarenmaa: So, there's still some friction between especially AWS, and maybe, to a lesser extent, some of the other hyperscalers and some companies that were built around open source technology. But what we can see in the market and what we hear about more is that cloud has won. It's the de facto deployment platform for new greenfield applications.

Companies are using cloud and they also increasingly want to use open source. From our perspective, what these licensing changes have done is maybe actually increase the adoption of our services which are really fully open source.

We don't host MongoDB at Aiven. It doesn't really fit our criteria as it was always controlled by a single company. Anyway, they changed their licence from AGPL and that would probably prohibit companies like ours from using them.

There have been some issues with the licensing of some of the technology like Elastic Search. Elastic has sued Search Guard [which provided that technology to AWS]. So companies like Elastic and MongoDB and the others are very much following through with those process changes and these companies [that resell their products] are possibly violating their licence terms.

Why are your customers looking specifically at open-source for data management?

They want to make sure that the systems that hold their data are open source and can be deployed in the cloud or on-prem if needed. I recently talked to a number of companies who are working on an open-source-only basis because they need to be able to run their software services not just in AWS or Google Cloud, but also in private data centres and in regions and countries where there are no cloud providers to help them manage these systems. If they went for proprietary services that they would have no chance of deploying into these countries and regions where AWS is not available.

We hear a lot about companies using Kubernetes now to deploy applications. How is that affecting your model?

We don't use Kubernetes in any way right now, but we do have an increasing number of customers who do. They deploy stateless applications to run their business logic on Kubernetes into many different kinds of cloud and physical hardware environments. They use the same tools like Terraform for setting up both their applications and their managed database and data streaming services like ours. So, we've put in a bit of a work and effort in the last year to improving the integration that we have with tools like Terraform, and making sure that our APIs are very flexible and open to accommodate for different kinds of automation tools and DevOps tools that companies are now adopting.

It's still rather difficult to operate your stateful production data systems on top of Kubernetes, so organisations are relying on specialised providers such as us or AWS managed services like Kinesis or RDS to hold onto the actual stateful data, with the applications running on Kubernetes containers.

Since you work across cloud vendors are you seeing much of a shift to multi-cloud?

We do have customers that that operate on top of AWS and GCP and Azure - or maybe two out of those three. But many companies that are just starting out, I don't think that's the way they look to set up their initial environments.

Is there any demand for graph databases?

It's a topic that comes up quite frequently, but I don't think we've seen the frequency increase over the over the years so we don't have immediate plans for including a graph database in our portfolio. I think like they are very useful for some use cases, but they still seem to be just behind the mainstream adoption curve.

One characteristic of open source is the breadth of choice. There may be hundreds of tools for any particular job. How do you decide which databases or streaming services to support?

The way we look at is what gaps do we have in our current portfolio, what are the new use cases and those that are not so well served and what are the complex use cases that are difficult to manage and set up, where you typically want help?

So, we look at that big scalable systems that are a bit hairy and complicated to set up. And maybe they're just emerging and coming to market, similar to how we saw Kafka five years ago.

We are adding a new time-series database called M3DB to our platform, but it's not really about including more databases or streaming tools but making it easier for users to integrate these technologies into their products.