Data Observability For Data Ingestion

Ensure that your data sources are pulled correctly

How often do you hear that the data received from external sources is wrong, but it was correct in the past?

Data Observability is a way to define Data Quality rules to monitor your ingestion tables. Detect schema changes, data format changes, missing data or inconsistent delays in the data delivery.

Data format and ranges

Data format and ranges

Detect data format and data ranges issues in source data before the data pipeline fails on the transformation steps.


Validity checks like the data format, not null, data ranges or uniqueness checks are defined for each source table. Data Quality checks are executed after the source data was loaded into your ingestion tables. Any Data Quality issues are easy to understand. Order of columns in a CSV file has changes and the data was loaded into different columns? Data format or distinct column count checks will detect it.

  • Define data format and data range rules for source data
  • Run Data Quality checks at the end of the data ingestion pipeline
  • Detect data format issues by detecting unexpected changes to distinct counts of column values

Schema change detection

Schema change detection

Monitor unexpected changes to the table schema that will affect the downstream. Find out that a column size was increased and some rows may not load to downstream tables.


DQO.ai captures the table schema when the table metadata is retrieved for the first time. The list of columns is hashed to detect schema changes. Additionally column data types are compared every time when the schema checks are executed.

  • Detect column data type changes
  • Detect column count changes, added or removed columns
  • Detect a change to the order of columns that may turn SELECT * queries unusable

Data delays and stale tables

Data delays and stale tables

Analyze the growth (row count increase) and data lags (most recent rows) to detect tables that are not refreshed recently.


DQO.ai can analyze the timeliness of the data to measure the average lag of data. When your rows do not have a timestamp column, DQO.ai can analyze the growth of the row count by measuring the row count every day and learning the average table growth rate.

  • Detect tables that are not refreshed recently
  • Find out which tables are not updated frequently in their data sources (inconsistent refresh)
  • Detect tables that receive less data than the average

Ground Truth Checks

Ground Truth Checks

Compare the source data with other trusted sources to ensure that your database shows accurate data.


DQO.ai can run accuracy checks that will compare the table with a real world reference data. Just define the other table or a query that returns the same aggregated data (sum, count, etc.) grouped by the same business dimension. Also compare the data with flat files that you can load to the Data Quality database.

  • Compare the data with the real world, reference data
  • Detect issues at a business relevant dimension (date, country, city, department, state, etc.)
  • Ensure that you really trust the source data and you can proof it

No one can understand your data like we do!