Data observability is an emerging discipline that studies the health of enterprise data environments. It uses techniques adapted from governance tools and application performance management to address modern use cases and environments. Data observability tools apply machine learning to monitor the accuracy and timeliness of data delivery, with a particular focus on cloud environments. Data observability contributes to DataOps programs by providing intelligent monitoring that improves testing, CI/CD, and orchestration of data pipelines. This helps optimize data delivery across distributed architectures for both analytics and operations.
Data observability includes:
- Data quality observability, which studies the quality and timeliness of data. It observes data in flight or at rest, for example by validating sample values and checking metadata such as value distributions and data volumes, schema, and lineage.
- Data pipeline observability, which studies the quality and performance of data pipelines, including the infrastructure that supports them. It observes pipeline elements such as data sources, compute clusters, landing zones, targets, and applications by studying their logs, traces, and metrics.
Data observability contributes to three other types of observability. They are business observability, which studies business metrics and their trends, correlations, and anomalies; model observability, which studies the performance, accuracy, and compliance of ML or other analytics models; and operations observability, which studies the performance, availability, and utilization of applications and infrastructure.
“Data observability is an emerging discipline that studies the health of enterprise data environments.”
What is data observability?