{"id":10,"date":"2022-06-09T12:48:33","date_gmt":"2022-06-09T12:48:58","guid":{"rendered":"https:\/\/eck.uidev.site\/?page_id=10"},"modified":"2023-05-31T16:36:27","modified_gmt":"2023-05-31T16:36:27","slug":"overview-2","status":"publish","type":"page","link":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/overview-2\/","title":{"rendered":"Definition of a Data Pipeline"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><\/h2>\n\n\n<div class=\"entry-content\">\n\t<div style=\" z-index: 2\" class=\"simplesub wow zoomIn bgblue b100w notoppover footunder cwidetext withtitle singleno\">\n\t\t\t\t<h1>Definition of a Data Pipeline<\/h1>\n\t\t\n\t\t<div class=\"textblockwrap\">\n\n\t\t\t\t\t\t<div class=\"txt\">\n\t\t\t<p><span style=\"font-weight: 400\">A data pipeline refers to a workflow that ingests multi-structured data, schemas, and other types of metadata from sources to targets and transforms that data for analytics. Ingestion entails one or more of the following tasks:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Extracting or capturing<\/b><span style=\"font-weight: 400\"> data from a source, such as one or many records from a database<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Streaming <\/b><span style=\"font-weight: 400\">data messages in memory between sources and targets, for example to enable real-time transformation, delivery, and\/or analytics<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Loading<\/b><span style=\"font-weight: 400\"> either batch data or incremental updates into a target such as a data lake<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Appending <\/b><span style=\"font-weight: 400\">data to a target by adding it to existing datasets<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Merging<\/b><span> data into a target by combining it with existing objects such as tables or files<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Transformation, meanwhile, includes tasks such as the following. It can take place before or after the pipeline loads data to the target.\u00a0<\/span><\/p>\n<ul>\n<li><b>Filtering <\/b><span style=\"font-weight: 400\">data to identify and remove unneeded subsets such as columns, tables, or images, for example to protect personally identifiable information\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li><b>Combining <\/b><span style=\"font-weight: 400\">multi-sourced data, for example to add columns to a table or join tables for a query<\/span><\/li>\n<\/ul>\n<ul>\n<li><b>Formatting <\/b><span style=\"font-weight: 400\">data, for example by converting various tables to a single format such as Parquet<\/span><\/li>\n<\/ul>\n<ul>\n<li><b>Structuring <\/b><span style=\"font-weight: 400\">data, for example by applying a schema to organize tables and columns in a database<\/span><\/li>\n<\/ul>\n<ul>\n<li><b>Cleansing <\/b><span style=\"font-weight: 400\">data by removing duplicates, fixing errors, or taking other steps to improve data quality<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Modern pipelines span on premises, hybrid, cloud, and multi-cloud ecosystems that include various pipelines, languages, open-source projects, interfaces, tools and now AI bots, as shown in the examples in this diagram.<\/span><\/p>\n\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"sub\">\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"quote\">\n\t\t\t\t\t<p>A data pipeline refers to a workflow that ingests multi-structured data, schemas, and other types of metadata from sources to targets and transforms that data for analytics<\/p>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\n\t\t\t\t\n\n\t\t\t\t\t\t\t\t<div class=\"cfigure bgnone\">\n\t\t\t\t\t<h2><\/h2>\n\t\t\t\t\t\n\n\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-content\/uploads\/sites\/18\/2023\/05\/Definition-of-a-Data-Pipeline.png\" alt=\"\" \/>\n\t\t\t\t\t\n\n\n\n\n\t\t\t\t\t\t\t\t\t\t<a class=\"close\" href=\"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-content\/uploads\/sites\/18\/2023\/05\/Definition-of-a-Data-Pipeline.png\">Close<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\n\n\n\n\n\t\t\t\t\t\t\t\t<div class=\"vidimg\">\n\t\t\t\t\t<div class=\"img\">\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"iframelink\">youtu.be\/ReMctdGLs70<\/div>\n\t\t\t\t\t\t<iframe class=\"videoem\" src=\"https:\/\/www.youtube.com\/embed\/ReMctdGLs70?rel=0\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen=\"\"><\/iframe>\n\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<div class=\"txtlnk\">\n\t\t\t\t\t\t<p>What is a data pipeline?<\/p>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\t\n\t<\/div>\n<\/div>\t\n\n<div id=\"figure\" class=\"entry-content\">\n<div style=\" z-index: 3\" class=\"figure wow fadeIn bgnone b80w topover footover\" data-wow-delay=\"0.4s\">\n\t\t\n\t\t\t\n\n\t\n\t\t\n\t\n\t\n\t\n\t\n\t\n\t\t\n\t\n\t<div class=\"ref\"><\/div>\n\t\n\t<a class=\"close\" href=\"\">View Large<\/a>\n<\/div>\n<\/div>\n\n<div class=\"entry-content\">\n\t<div style=\" z-index: 1\" class=\"wow fadeInDown resources bgblue topunder colsingle b60w\">\n\t<h2>Additional Resources<\/h2>\n\t<ul>\n\t\t\t\t<li><a class=\"icon sml iwebsite\" target=\"_blank\" href=\"https:\/\/www.eckerson.com\/articles\/data-pipeline-design-patterns\">Data Pipeline Design Patterns<\/a><\/li>\n\t\t\t<\/ul>\n<\/div>\n\t<\/div>","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":1333,"parent":0,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"class_list":["post-10","page","type-page","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/pages\/10","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/comments?post=10"}],"version-history":[{"count":138,"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/pages\/10\/revisions"}],"predecessor-version":[{"id":2092,"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/pages\/10\/revisions\/2092"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/media\/1333"}],"wp:attachment":[{"href":"https:\/\/markets.eckerson.com\/modern-data-pipelines\/wp-json\/wp\/v2\/media?parent=10"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}