“How to Evaluate and Select a Data Pipeline Product”
Expert panelists from our sponsoring vendors shared insights on the evolution of data pipeline management, modern requirements, and product evaluation criteria. Here are some highlights.
- Mark Van de Wiel, Field CTO of Fivetran, observed that companies had minimal cloud infrastructure when his company HVR (now part of Fivetran) launched in 2012. Then cloud adoption exploded, along with rising fragmentation of data sources and rising volumes of data changes for pipelines to accommodate. Today data teams evaluate ingestion tools based on their ability to handle these rising volumes in real time as well as their security, reliability, breadth of source support, and cost of ownership.
- Mike Pickett, VP of Product Growth at StreamSets (now part of Software AG), described how data gravity forces larger companies to maintain some legacy systems on premises even as they move other data to the cloud. To manage these hybrid environments, data teams need tools that offer secure data access and simplify management with visual pipeline design. They also need to manage data drift: i.e., those inexorable changes to schema, metadata, semantics, and infrastructure versions that can break pipelines if mishandled. Pickett’s sage observation: “shift happens.”
- Data teams today have a short-term objective of reducing time to value and a longer-term objective of scaling their data pipelines, according to Taylor McGrath, VP of Solutions Engineering at Rivery. SaaS-based tools contribute to both objectives. They reduce time to value by quickly building or changing pipelines, and they assist scalability by leveraging elastic cloud resources that have no upper limits. SaaS tools also help data teams improve reliability and reduce cost of ownership.
- Data products give data consumers a more composable way of finding and using data, according to Saket Saraubh, CEO and co-founder of Nexla. Stakeholders of various types can create, use, and reuse these packages of data, metadata, schema, and pipeline code to improve the efficiency and scale of their environments. Saket also suggested we’re entering an era of multi-modal pipeline tools that support ETL, ELT, and other delivery patterns–similar to the way that iPhones today consolidate capabilities that used to require a Blackberry, Garman, iPod, and other devices.
- Justin Mullen, CEO and co-founder of DataOps.live, described the key tenets of DataOps. This “battle-hardened” discipline helps companies meet business and data requirements by accelerating and parallelizing the efforts of pipeline developers. DataOps tools help these data engineers boost their productivity, agility, and governance with capabilities such as automated regression testing and reuse of pipeline artifacts. Mullen also underscored the need to orchestrate elements such as tools, pipeline jobs, and data quality checks across their hybrid environments. Companies should evaluate tools based on these requirements as well as the need for orchestrating different versions of interrelated data products.