All change: How MariaDB handles transactional processing and analytics on one database platform

Computing talks to MariaDB's Shane Johnson about some of the factors driving change in the database market, and adoption of hybrid database technology capable of doing much more compared to conventional database products.

Q. Database technology used to be a settled argument. What's happened to re-open the debate?

The status quo is once again being disrupted by a convergence of changes. As companies undergo digital transformation and shift to online customer engagement, there are multiple challenges.

First, they need to shift budget to development and architecture. So they're looking to cut their costs and reliance on proprietary software, and use the savings to innovate where it matters most - applications.

Second, they need to become cloud native.

Many transactional databases can only store a few months of data

And third, after having embraced agile development, organisations are trying to achieve operational agility. For example, using Kubernetes and containers to deploy microservices. When companies try to do this with Oracle, Microsoft or IBM, it doesn't take long to realize they're trying to fit a square peg into a round hole. It just doesn't work.

Q. With the MariaDB Platform, there are two datastores to handle different workloads. What does this enable organisations to do that could not be done with two separate database products, such as MariaDB TX and MariaDB AX?

MariaDB Platform introduces two hybrid transactional/analytical processing (HTAP) capabilities. The first is change-data-capture. All inserts, updates and deletes on row storage are automatically applied to columnar storage to support analytics on near real-time data.

The second is query routing. When a query is sent to the MariaDB Platform, it can be sent to row storage or analytical storage depending on whether it is a transactional query or an analytical query. These two capabilities abstract away the complexity of using two separate database products from application developers, and simplifies administration for operational teams.

Q. What does Change Data Capture (CDC) bring to the platform? What data is synchronized and how?

CDC ensures that all transactional data is available for analytics, and without any delay (often a matter of seconds). This includes all inserts, updates and deletes. The DBA can specify what data is replicated to columnar storage for analytics.

With CDC, organisations no longer have to use expensive third-party tools to move data from a transactional environment to an analytical one

In many cases, it will not be practical to replicate all tables, in particular tables with a small number of rows. Thus, the DBA can determine what tables should be replicated to columnar storage for analytics and what tables should not.

With CDC, organisations no longer have to use expensive third-party tools to move data from a transactional environment to an analytical one, and administrators don't have to worry about the complexity and reliability involved when it comes to creating and maintaining in-house batch ETL processes.

Q. Does HTAP require a single data store running in-memory?

It does NOT require in-memory storage. The benefit of the MariaDB Platform is both row storage (for transactions) and columnar storage (for analytics) are stored on disk. This is both efficient and cost effective.

Note, memory will be used where possible. The columnar storage engine will cache as much data in memory as possible, but it is not required. If there is enough memory to hold the entire data set, it will be stored both on disk and in memory.

Q. If there is no cache tier, doesn't this present problems in terms of data synchronization between the two instances?

No. All data is asynchronously replicated from row storage to columnar storage in much the same way as data from a primary database is replicated one or more secondary databases for high availability and/or read scaling. There is no need for a caching tier because the columnar storage engine will cache as much data in memory as possible.

MariaDB Platform enables business to make years worth of data available to customer-facing applications for real-time analytics

Q. What kind of use cases have you seen of the MariaDB Platform?

Any use case requiring more historical data and more powerful analytics. Many transactional databases can only store a few months of data. This is particularly true for databases storing columnar data in memory - there simply isn't enough memory to store years of data. MariaDB Platform enables business to make years worth of data available to customer-facing applications for real-time analytics. This can be indirectly (for example, to provide real-time recommendations) or directly (providing self-service analytics to businesses in the case of SaaS or B2B). To review specific sample use cases for MariaDB Platform, read our sample Platform X3 implementation for transactional and analytical workloads.

Q. Finally, hybridization. In the case of on-premise transactional clusters and analytics clusters in the cloud, how does CDC handle the synchronization required between the two?

It is not different than on-premises. The data written on-premise (transactional to row storage) is asynchronously replicated to the cloud (analytics on columnar storage). As far as the MariaDB Platform is concerned, it does not matter where the transactional and/or analytical instances run.

Shane Johnson is senior director, product marketing, MariaDB Corporation. To contact Shane, or for more details, please contact MariaDB directly