Picking winners in open source. An interview with DataStax CEO Chet Kapoor
Kapoor talks about the decisions made as the company built its open source stack
DataStax began life in 2010 as a proprietary fork of the NoSQL database Apache Cassandra. Since then the company has changed direction, mending bridges with the core Cassandra developers, contributing to the last major release, and building out a full stack that includes streaming platform, DBaaS and API gateway, focussed on supporting real-time applications.
Chet Kapoor has been CEO for two years, arriving at the company from Google after it acquired Apigee, the API management company that he founded in 2004.
Arriving at a time of change, Kapoor insists that DataStax has the potential to be a $1billion company in revenue terms, so long as it remains focussed on developers needs and adding tangible value to its customers. This means picking winners when building the stack. Open source software changes rapidly and in some ways tech companies are spoiled for choice, but choosing wisely is essential over the long term, because the stack is the bedrock and changing elements is hard to do.
With its roots in Apache Cassandra, the database element of the stack is set in stone. This imposes a certain baseline in that organisations don't change enterprise databases on a whim - once a decade or more, according to Kapoor: "Databases are sticky".
So the company is all in on Cassandra, but what of the other components? Technology companies need to take risks with new developments, but at the same time enterprise customers expect their choices to be conservative, they don't want to run their business on immature software.
And as ever, timing is everything, and sometimes the emergence of a dominant technology can change the equation. There's an art to picking winners.
When I came in we had zero revenue on cloud. I mean zero
"We were late to the cloud. When I came in we had zero revenue on cloud. I mean zero," Kapoor said. By contrast, competitor MongoDB launched its managed service Atlas in 2016. However, he continued, being first carries its own risks.
"It was not clear to MongoDB that they had to be Kubernetes based, but it was very clear to us, and so we had the advantage of building serverless and cloud native Kubernetes infrastructure from day one."
Like Cassandra, Kubernetes is now embedded in DataStax's offerings, being the basis of both its managed cloud service Astra and a Kubernetes based distribution of Cassandra called K8ssandra (pronounced Kate-Sandra).
Going all-in on Kubernetes carries a low risk, he added. Even if an alternative were to emerge tomorrow, it would take years to reach the maturity of the incumbent, leaving ample time for refactoring.
"I think was our bet on Kubernetes is a very safe bet. Enterprises don't change as fast as software companies do."
Kafka worked out 50 per cent more expensive than Pulsar
Other choices might seem a little more risky though.
"About 18 months ago, we said ‘we have to be in the streaming business', said Kapoor. "There's a lot of people that want to do data in motion for messaging as well as for data science. How do we go off and do that?"
The result called CDC (Change Data Capture) for Astra is built on Apache Pulsar, a cloud-native, distributed messaging and streaming platform, rather than the industry standard Apache Kafka.
"We looked at Kafka in great detail. Remember, as software company, you pick up architecture for a decade, right? We found Kafka had a great ecosystem, for sure. But there were two problems. One was from a resource from a TCO perspective, it was really expensive, right, because it's not efficient with resources. In fact it worked out 50 per cent more expensive than Pulsar."
The second issue is tenancy. While the newer Pulsar lacks Kafka's substantial ecosystem a point in its favour was the fact it is multi-tenant, making it worth the punt, Kapoor went on.
"Serverless is one of our design principles, and it was very easy to for us to look at Pulsar and said can make queues and go up and down very easily with Pulsar and make it really cloud native."
Kafka is still supported via the API should customers want to use it, he added, so no bridges are being burned.
The Stargate API gateway is another key element of the stack, sitting between the application and the database. This was developed in-house by DataStax to allow access to the various features of Cassandra. In some ways it represents a de-risking of the database for enterprises, which is acknowledged to have a steep learning curve. Thus, the API had to focus on developer experience, Kapoor said, speaking about Astra.
"We knew we couldn't just offer Cassandra as a service, we had to leapfrog everything because we're building something for the next decade.
"Cassandra is not easy to manage, well you solve that if you provide a service. But then you have to make it easy to develop too. Because at the end of the day, the puck has moved and developers are making the database choices, not the operators, and so you have to meet them where they are, make sure it's REST, it's GraphQL, you've got JSON, you've got gRPC and everything else."
As the environment becomes more hybridised, making the right component bets is especially important for a tech company if it is to be able to support different use cases, as it must. It's not as if everything is moving to the cloud, Kapoor concluded.
"There's a bunch of people on premise that their on-premise stuff is growing. They are lifting and shifting, but also they are modernising as they go on this journey on the cloud. They take steps and hops as they go there. It's not zero to 60 and in two days you're on the cloud."