| | **RAW** | **STAGING** | **INTERMEDIATE** | **MART** | | ----------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | | **Data Quality tests** | - Source freshness checks based on ingestion timestamp<br> | - Uniqueness (on primary key)<br>- non-null (primary key) | - Uniqueness (on primary key)<br>- non-null (primary key) | perform relationship integrity tests between dim& fact tables | | **Store test results** | Store dbt test failures in Snowflake | Store dbt test failures in Snowflake | Store dbt test failures in Snowflake | Store dbt test failures in Snowflake | | **DQ test threshold failure tolerance %** | start with 0% failure tolerance. <br>Evaluate & adjust if needed on model-level | start with 0% failure tolerance. <br>Evaluate & adjust if needed on model-level | start with 0% failure tolerance. <br>Evaluate & adjust if needed on model-level | start with 0% failure tolerance. <br>Evaluate & adjust if needed on model-level | ## dbt data quality packages - [dbt-expectations](https://github.com/calogica/dbt-expectations) - data testing - [dbt-utils](https://github.com/dbt-labs/dbt-utils) - data testing ## dbt data quality tests dbt provides four generic tests that can be applied in a dbt project without having to install any external packages: not_null, unique, relationships, and accepted_values. #### Source freshness - [https://docs.getdbt.com/docs/build/sources#snapshotting-source-data-freshness](https://docs.getdbt.com/docs/build/sources#snapshotting-source-data-freshness)[https://docs.getdbt.com/reference/resource-properties/freshness](https://docs.getdbt.com/reference/resource-properties/freshness) - dbt built in check - dbt-expectations - [expect_row_values_to_have_recent_data](https://github.com/calogica/dbt-expectations#expect_row_values_to_have_recent_data) #### Incorrect data type - dbt-expectations - [expect_column_values_to_be_of_type](https://github.com/calogica/dbt-expectations#expect_column_values_to_be_of_type) - dbt-expectations - [expect_column_values_to_be_in_type_list](https://github.com/calogica/dbt-expectations#expect_column_values_to_be_in_type_list) #### Missing data - dbt-expectations - [expect_column_to_exist](https://github.com/calogica/dbt-expectations#expect_column_to_exist) for capturing any potential **schema changes** - dbt-expectations - [expect_column_values_to_not_be_null](https://github.com/calogica/dbt-expectations#expect_column_values_to_not_be_null) for null testing with the ability to add a **row condition** (can specify a filter to be applied with the test) #### Uniqueness - dbt-expectations - [expect_column_values_to_be_unique](https://github.com/calogica/dbt-expectations#expect_column_values_to_be_unique) with the ability to add a **row condition** #### Volume / size - dbt-expectations - [expect_column_count_to_equal](https://github.com/calogica/dbt-expectations#expect_table_column_count_to_equal) for capturing any potential **schema changes** - dbt-expectations - [expect_table_row_count_to_equal](https://github.com/calogica/dbt-expectations#expect_table_row_count_to_equal) / [expect_table_row_count_to_equal_other_table](https://github.com/calogica/dbt-expectations#expect_table_row_count_to_equal_other_table) if row count needs to stay constant #### Referential integrity - dbt-core - [https://docs.getdbt.com/reference/resource-properties/data-tests#relationships](https://docs.getdbt.com/reference/resource-properties/data-tests#relationships) validates that all of the records in a child table have a corresponding record in a parent table #### Cardinality equality - dbt-utils - [cardinality_equality](https://github.com/dbt-labs/dbt-utils/blob/main/README.md#cardinality_equality-source) ensure that the cardinality between entities are maintained Catch data quality issue upstream Use the dbt test description field to explain which data quality issues the tests catch, the dbt test owner field to notify owners, the dbt test tag field to group tests together, and the dbt test severity field to trigger a response