ISG Research is happy to share insights gleaned from our latest Buyers Guide, an assessment of how well software providers’ offerings meet buyers’ requirements. The Data Pipelines: ISG Research Buyers Guide is the distillation of a year of market and product research by ISG Research.
The development, testing and deployment of data pipelines is a fundamental accelerator of data-driven strategies. The pipelines enable enterprises to extract data generated by
operational applications that run the business and transport it into the analytic data platforms used to analyze the business.
ISG Research defines data pipelines as the systems used to transport, process and deliver data produced by operational data platforms and applications into analytic data platforms and applications for consumption. Healthy data pipelines are necessary to ensure data is ingested, processed and loaded in the sequence required to generate business intelligence and artificial intelligence (AI).
The concept of the data pipeline is not new. It is, however, increasingly critical as business decisions become more dependent on data-driven processes that require agile, continuous data processing as part of a DataOps approach to data management. In their simplest form, data pipelines move data between production and consumption applications. However, data-driven enterprises are increasingly thinking of the steps involved in extracting, integrating, aggregating, preparing, transforming and loading data as a continual process orchestrated to facilitate data-driven analytics.
The need for more agile data pipelines is driven by the need for real-time data processing. Almost a quarter (22%) of enterprises participating in ISG’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. More frequent data analysis requires that data is available in a continuous and agile process.
Data pipelines are commonly associated with data integration, which is the software that enables enterprises to extract data from sources such as applications and databases and combine it for analysis to generate business insights. However, data pipelines can move and process data from a single source without any need for integration.
Data integration is performed using data pipelines, but not all data pipelines perform data integration. While the ISG Data Integration Buyers Guide focused specifically on the requirements for data integration pipelines, the Data Pipelines Buyers Guide addresses the wider requirements for data pipelines of all types, as well as higher-level data operations tasks related to the testing and deployment of multiple data pipelines.
Compared to the Data Integration Buyers Guide, the Data Pipelines Buyers Guide places greater emphasis on agile and collaborative practices. This includes integration with the wider ecosystem of DevOps, data management, DataOps and BI and AI tools and applications.
The development, testing and deployment of data pipelines can be automated and orchestrated to provide further agility by reducing the need for manual intervention. Specifically, the batch extraction of data can be scheduled at regular intervals of a set number of minutes or hours, while the various stages in a data pipeline are managed as orchestrated workflows using data engineering workflow management platforms.
Data observability also has a complementary role to play in monitoring the health of data pipelines and associated workflows as well as the quality of the data itself. Many products for data pipeline development, testing and deployment also offer functionality for monitoring and managing pipelines and are integrated with data orchestration and/or observability functionality.
The combination of healthy and well-orchestrated data pipelines and data observability is also complementary to developing and delivering data products, ensuring that data consumers can trust the provenance and quality of data made available across the enterprise.
Traditionally, data pipelines have involved batch extract, transform and load processes designed to extract data from a source (typically a database supporting an operational application), transform it in a dedicated staging area and then load it into a target environment (typically a data warehouse or data lake) for analysis. The need for real-time data processing is driving demand for continuous data processing and more agile data pipelines that are adaptable to changing business conditions and requirements, including the increased reliance on streaming data and events.
There are multiple approaches to increasing the agility of data pipelines. For example, we see an increased focus on extract, load and transform processes that reduce upfront delays in transforming data by pushing transformation execution to the target data platform. These pipelines involve the more lightweight staging tier, which is required to extract data from the source and load it into the target data platform.
Rather than a separate transformation stage prior to loading, as with an ETL pipeline, ELT pipelines use pushdown optimization, maximizing the data processing functionality and processing power of the target data platform to transform the data. Pushing data transformation execution to the target data platform results in a more agile data extraction and loading phase, which is more adaptable to changing data sources.
Additionally, so-called zero-ETL approaches have emerged to make operational data from a single source available instantly for real-time analytics. Zero-ETL can be seen as a form of ELT that automates extraction and loading and has the potential to remove the need for transformation, assuming that schema is strictly enforced when the data is generated. Meanwhile, reverse ETL tools can help improve actionable responsiveness by extracting transformed and integrated data from the analytic data platforms and loading it back into operational systems.
Both ETL and ELT approaches can be accelerated using change data capture techniques. CDC is similarly not new but has come into greater focus given the increasing need for real-time data processing. As the name suggests, CDC is the process of capturing data changes. Specifically, CDC identifies and tracks changes to tables in the source database as they are inserted, updated or deleted. CDC reduces complexity and increases agility by synchronizing changed data rather than the entire dataset. The data changes can be synchronized incrementally or in a continuous stream.
ISG asserts that by 2026, three-quarters of enterprises will adopt data engineering processes that span data integration, transformation and preparation producing repeatable data pipelines that create more agile information architectures. Additionally, while machine learning (ML) is already used to provide recommendations for building data pipelines, there is also growing interest in applying generative AI to automatically generate or recommend data pipelines in response to natural language explanations of desired outcomes.
There remains a need for traditional batch ETL pipelines, not least to support existing data integration and analytic processes. However, ELT and CDC approaches have a role to play alongside automation and orchestration in increasing data agility, and all enterprises are recommended to explore the potential benefits and evaluate data integration software providers offering capabilities that support multiple approaches to increase the focus on consumption rather than production-driven data and analytics.
The ISG Buyers Guide™ for Data Pipelines evaluates software providers and products in key areas, including data pipeline development, data pipeline testing, and data pipeline deployment. This research evaluates the following software providers that offer products to address key elements of data pipelines as meet our definition: Airbyte, Alteryx, Astronomer, AWS, BMC, Census, Dagster Labs, Databricks, DataKitchen, DataOps.live, dbt Labs, Google, Hitachi, IBM, Informatica, Infoworks, K2view, Keboola, Mage, Matillion, Microsoft, Nexla, Prefect, Qlik, Rivery, SAP, Y42 and Zoho.
This research-based index evaluates the full business and information technology value of data pipelines software offerings. We encourage you to learn more about our Buyers Guide and its effectiveness as a provider selection and RFI/RFP tool.
We urge organizations to do a thorough job of evaluating data pipelines offerings in this Buyers Guide as both the results of our in-depth analysis of these software providers and as an evaluation methodology. The Buyers Guide can be used to evaluate existing suppliers, plus provides evaluation criteria for new projects. Using it can shorten the cycle time for an RFP and the definition of an RFI.
The Buyers Guide for Data Pipelines in 2024 finds Microsoft first on the list, followed by Alteryx and Databricks.
Software providers that rated in the top three of any category ﹘ including the product and customer experience dimensions ﹘ earn the designation of Leader.
The Leaders in Product Experience are:
- Microsoft.
- Informatica.
- Alteryx.
- Google.
The Leaders in Customer Experience are:
- Databricks.
- Microsoft.
- SAP.
The Leaders across any of the seven categories are:
- Informatica, which has achieved this rating in five of the seven categories.
- Microsoft in four categories.
- Databricks in three categories.
- Google and SAP in two categories.
- Alteryx, AWS, DataOps.live, Keboola and Qlik in one category.

The overall performance chart provides a visual representation of how providers rate across product and customer experience. Software providers with products scoring higher in a weighted rating of the five product experience categories place farther to the right. The combination of ratings for the two customer experience categories determines their placement on the vertical axis. As a result, providers that place closer to the upper-right are “exemplary” and rated higher than those closer to the lower-left and identified as providers of “merit.” Software providers that excelled at customer experience over product experience have an “assurance” rating, and those excelling instead in product experience have an “innovative” rating.
Note that close provider scores should not be taken to imply that the packages evaluated are functionally identical or equally well-suited for use by every enterprise or process. Although there is a high degree of commonality in how organizations handle data pipelines, there are many idiosyncrasies and differences that can make one provider’s offering a better fit than another.
ISG Research has made every effort to encompass in this Buyers Guide the overall product and customer experience from our data pipelines blueprint, which we believe reflects what a well-crafted RFP should contain. Even so, there may be additional areas that affect which software provider and products best fit an enterprise’s particular requirements. Therefore, while this research is complete as it stands, utilizing it in your own organizational context is critical to ensure that products deliver the highest level of support for your projects.
You can find more details on our community as well as on our expertise in the research for this Buyers Guide.