Our next generation Data Observability platform covers the whole data lifecycle.

Our Data Observability framework combines the best practices:

EASY OF USE:

  • Distributed as an Open Source platform

  • Set up the Data Observability framework in less than 5 minutes since you found our website

  • Custom Data Quality checks are easy to implement with Python

  • Simple Data Quality Checks are implemented as SQL templates, compatible with Django templates

  • Custom quality rules and thresholds are easy to implement using a full Python Data Science toolkit (Pandas, Anaconda, etc.)

  • Data Quality configuration (list of tables and columns, enabled checks, configuration of thresholds) may be edited in your editor of choice (VSCode, IntelliJ) with auto completion

  • You have an absolute freedom to pick a CLI, Shell or Web UI interface to perform any action

  • Execute Data Quality checks from anywhere: server, cloud, developer's laptop, Windows, Linux, MacOS

PLAYS NICE WITH DATA ENGINEERING:

  • All Data Quality and Data Observability rules may be stored in the source repository (Git) and versioned along the data pipeline code

  • Data Observability may be enabled on a new data sources in minutes

  • Data Quality and Observability checks may be easily moved across environments, prepare checks on the development database, run them on the production environment

  • CI/CD friendly

  • Execute Data Quality and Data Observability checks from your data pipeline

  • Or just schedule the checks on the server

OBSERVES ANYTHING:

  • Observe databases in the cloud or on-premise

  • Analyze flat files and unstructured data on S3, Azure Storage Accounts and GCP buckets

  • Observe new data incoming to your Data Lake

  • Observe and compare data across different clouds

  • Observe REST API and backend services

  • Observe logs from custom logging frameworks

  • Observe Business Intelligence tools and business applications

WORKS AT SCALE:

  • Multi-cloud Data Observability is possible using remote agents deployed close to the observed system

  • Large databases (petabyte scale) are supported with partitioning and partition pruning

  • Observe only the newest data in time partitioned tables

REALLY PROTECTS YOU FROM DIRTY DATA:

  • Data Observability may be activated directly from your data pipelines

  • Data Pipelines may be stopped when the quality gates are not met

  • Data Pipelines for downstream tables may be paused until clean data arrives

  • Dependencies between upstream and downstream tables are used to halt the data

  • Data Curation is possible when the data is on-hold, until an alert is cleared

  • The Data Steward may clear an alert which will release paused downstream data pipelines

FRIENDLY FOR OPERATIONS:

  • Alerts are deduplicated and raised only once until the data changes

  • Partitions may be monitored separately

  • Time based partitions are pruned and filtered to analyze only the most recent data

  • External logs generated by custom logging frameworks may be imported as flat files or API calls

PROVIDES A BIG PICTURE OF YOUR DATA:

  • Integrate with data catalog tools

  • Import a summary of data observability for tables into a data catalog

  • Pull metadata from a data catalog

LEARNS FROM YOUR DATA:

  • Predict alerts and anomalies before they happen with AI

  • Predict future values and future anomalies

  • Detect similar alerts and treat them as the same issue

THE BUSINESS WILL LOVE TO PAY FOR IT:

  • Verify business metrics as data quality checks, just check the data quality metrics that the business cares for

  • Connect to the Data Observability Data Warehouse with any Business Intelligence tool to build KPI scorecards

  • Run Ground Truth Checks to compare data across systems, no more mismatches between different applications

  • Observe sensitive and highly restricted data

So far, almost the whole vision is supported by DQO.ai or out team has already provided such services for our customers.

Contact us to learn more.