dbt hub

dbt test: expect_column_values_to_be_unique

USE CASE

Uniqueness

APPLIES TO

Column

This page describes the expect_column_values_to_be_unique test from the dbt-expectations package. This test is essential for ensuring that all values in a specified column are unique, which is especially critical for columns that serve as identifiers or have a unique constraint placed on them.

How it Works

The expect_column_values_to_be_unique test assesses whether each entry in a chosen column across the entire dataset is distinct. This is key for maintaining the integrity of data where values are expected to be unique, such as in primary key columns or columns storing unique identifiers.

Steps and Conditions:

  1. Column Selection: Identify the column that needs to be checked for uniqueness.
  2. Optional Configuration:
    • Row Condition: You can specify a condition to filter the rows that are included in the test using the row_condition parameter. This helps test uniqueness for specific segments of the data.
  3. Execution: After applying any row conditions, the test evaluates whether each value in the column is unique.
  4. Outcome:
    • Pass: The test passes if every value in the column is unique.
    • Fail: The test fails if any duplicate values are found in the column. This indicates a violation of the uniqueness condition, necessitating further investigation or corrective actions.

Example Usage: E-commerce

For an E-commerce company, maintaining unique product identifiers in the product_id column of the product_catalog table is vital for accurate inventory management and order processing.

Consider a scenario where the product_catalog table includes various attributes of items being sold, and the product_id serves as a unique identifier for each product.


models:
  - name: product_catalog
    columns:
      - name: product_id
        tests:
          - dbt_expectations.expect_column_values_to_be_unique:
              row_condition: "product_id is not null"

In this example, the expect_column_values_to_be_unique test checks that every product_id in the product_catalog model is unique, provided that the product_id is not null. This ensures that each product is distinctively recognized and can be accurately referenced in transactions and stock levels, thereby supporting efficient e-commerce operations.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph