Confluent Address Unified Stream and Batch Processing with AI

Written by Matt Aslett | Jul 22, 2025 10:00:00 AM

I have been saying for several years that success with streaming data requires enterprises to manage data in motion alongside data at rest, rather than treating streaming as a niche activity. Software providers have also been moving in this direction. Many established data management providers have added the ability to manage, store and process streaming data alongside their existing batch data processing capabilities. At the same time, providers closely associated with streaming data, such as Confluent, have increased their support for batch processing, with a view to delivering a holistic view of all data across an enterprise.

Confluent was founded in 2014 by the creators of the open-source Apache Kafka distributed event streaming platform. Based on a publish-and-subscribe messaging model for communicating events and event streams, Apache Kafka was originally developed at LinkedIn to store and process data related to member activity as well as logs and metrics. Apache Kafka has been widely adopted by thousands of enterprises to support real-time data processing by capturing event data from sensors, applications and databases, and then processing and analyzing it in real time as it flows through the organization. It also forms the basis of Confluent’s product portfolio, which includes the Confluent Platform distribution of Apache Kafka for self-managed deployment on-premises and in the cloud, as well as the Confluent Cloud managed service. The company reported total revenue of $964 million in fiscal 2024, an increase of 24% on $777 million the previous year, and forecasts total revenue of approximately $1.12 billion in fiscal 2025.

While Confluent is still best known in relation to Apache Kafka, the company has expanded its capabilities well beyond messaging and event processing. The company’s Confluent Cloud managed service adds support for tiered storage, elastic scaling, high availability and improved performance, while the company has also invested in security and governance capabilities. Its Stream Governance suite provides capabilities for schema management and data quality as well as self-service data discovery and classification and stream lineage. Serverless stream processing was added via support for Apache Flink, which provides an engine for performing stateful computations on unbounded and bounded streams of events. Confluent recently announced new capabilities to improve support for artificial intelligence (AI) in Confluent Cloud for Apache Flink. Specifically, Confluent added Flink Native Inference, enabling the use of open-source AI models directly in Confluent Cloud; Flink search support for discovery and retrieval of vectors in multiple external databases; and native machine learning functions to address forecasting, anomaly detection, and real-time visualization using Flink SQL. I assert that by 2027, more than one-third of enterprises will integrate streaming and event processing with AI and GenAI inferencing to deliver interactive real-time applications.

Confluent also recently announced the general availability of Tableflow, which automatically materializes Apache Kafka topics and schemas as Parquet files to be persisted in a data warehouse, data lake or cloud storage using open table formats. Support for Apache Iceberg is GA, while the company also announced an early access support for Delta Lake format. The combination of batch and stream processing was also the cornerstone of the company’s announcements at its Current London event in May, including early access to a new feature called Snapshot Queries in Confluent Cloud for Apache Flink. Snapshot Queries automatically bound data sets to provide batch-style processing, enabling the use of Flink SQL to perform unified stream and batch processing. The ability to query both historical and real-time data includes the ability to access data in Apache Iceberg and Delta table formats via Tableflow. Also new in Confluent Cloud for Apache Flink is support for private networking on AWS and Microsoft Azure, as well as IP Filtering for Flink and Schema Registry for additional security.

While Confluent Cloud is the company’s flagship offering, Confluent Platform remains an important part of the portfolio for self-managed deployment on-premises and in the cloud. Version 8.0 of Confluent Platform was released in June and is based on version 4.0 of Apache Kafka. Both are significant updates as they remove the previous dependency on the Apache ZooKeeper distributed coordination project. While ZooKeeper was historically important for enabling the configuration and coordination of Apache Kafka in distributed environments, it has now been completely replaced by KRaft, which is an implementation of the Raft distributed consensus algorithm for Kafka. KRaft mode Confluent Platform 8.0 provides native metadata management to deliver fault tolerance and scalability without the need to deploy and maintain ZooKeeper as a separate system. Confluent also recently introduced the latest version of Confluent Control Center, which provides operational monitoring and management for Confluent Platform deployments. Confluent Control Center has now been enhanced with Confluent Manager for Apache Flink, providing the ability to create, modify and monitor Flink environments and applications.

Confluent was rated as Exemplary in the ISG 2025 Buyers Guides for Real-Time Data, Streaming Data and Streaming Analytics, and Innovative in the 2025 ISG Buyers Guide for Messaging and Event Processing, as well as a Provider of Merit in the 2024 ISG Buyers Guide for Data Governance and Data Integration. Although capabilities such as Tableflow and Snapshot Queries have enhanced its capabilities for the long-term persistence and batch-based processing historical event data the company remains best known as an event and streaming data specialist. Nevertheless, I would encourage enterprises evaluating data architectures that provide a holistic view of all data—in motion and at rest—to consider streaming data platforms and Confluent alongside more traditional data platforms and providers.

Regards,

Matt Aslett

View full post