Google shoots for "limitless" data with Big Cloud and new alliance
Google has announced the preview of a new storage engine aimed at making it easier for enterprises to analyse the information in their data lakes and data warehouses, without having to worry about the underlying storage format or systems.
Known as BigLake, Google touts the product as a unified storage engine designed to provide a single, unified interface and uniform fine-grained access control across any storage layer, including multi-cloud, data warehouses and data lakes - no matter the format.
The open source architecture is meant to integrate all of Google's massive data storage resources with a variety of other sources, which may be in competing clouds or on a customer's own servers.
Google Cloud data analytics product manager Sudhir Hasbe said BigLake will be "vital" in bringing disparate worlds together.
BigLake eliminates the need to "copy the data, move the data across your object stores, like in Google Cloud Storage, S3, or Azure in a multi-cloud environment," implying that users will be able to use access all of their data from a single place.
BigLake supports all open file formats, such as Parquet, as well as open source processing engines like Apache Spark or Beam, and numerous table formats, including Iceberg and Delta.
The engine extends the capabilities of Google's 11-year-old BigQuery to data lakes on Google Cloud Storage.
BigQuery is a Google Cloud-managed, serverless, multicloud data warehouse customers can use to perform real-time analyses on massive volumes of data: it processes over 110 terabytes of client data per second, on average.
'BigLake extends BigQuery's fine-grained row- and column-level security to tables on data resident object stores such as Amazon S3, Azure Data Lake Storage Gen2, and Google Cloud Storage. BigLake decouples access to the table from the underlying cloud storage data through access delegation,' said Google.
BigLake will sit at the heart of Google Cloud's data platform strategy, Hasbe said, with the cloud provider ensuring that all of its tools and capabilities integrate with it.
"When you think about limitless data, it is time that we end the artificial separation between managed warehouses and data lakes," said Gerrit Kazmaier, Google Cloud's vice president and general manager for database, data analytics and Looker.
"Google is doing this in a unique way."
Also this week Google announced the establishment of the Data Cloud Alliance, which it has formed with other founder members to enhance data portability. The members will provide infrastructure, APIs and integration support to make it easier to move data, and access multiple platforms and products, across multiple environments.
Members include Google, Confluent, Databricks, Dataiku, Deloitte, Elastic, Fivetran, MongoDB, Neo4j, Redis, and Starburst.
At the same time, the company unveiled Vertex AI Workbench, a web interface aimed to make it easier for 'the average user' to explore their data using powerful AI, machine learning and statistics. Workbench has been built to work with BigQuery, Serverless Spark and Dataproc, as well as a range of other AI and data solutions.
Google is also offering a mechanism for people to exchange machine learning models through the Vertex AI Model Registry.
The Model Registery is currently in preview, and is meant to be a central repository for finding, utilising, and governing machine learning models, including those stored in BigQuery ML.