dbt hub

dbt test: expect_compound_columns_to_be_unique

USE CASE

Uniqueness

APPLIES TO

Model
Seed
Source

This dbt test, expect_compound_columns_to_be_unique, is designed to ensure the uniqueness of combinations of values across multiple specified columns within a table or model. It is particularly useful in scenarios where unique constraints need to apply across a group of columns, often utilized as a form of composite or multi-column primary key.

How it Works

The expect_compound_columns_to_be_unique test checks that every combination of values in the specified columns is unique across all rows in the dataset. It is applicable to models, seeds, and sources.

Steps and Conditions:

  1. Column List Specification: Define a list of column names in column_list which should together be unique.
  2. Row Exclusion Based on Null Values:
    • Utilize the ignore_row_if parameter to handle rows with null values:
      • 'all_values_are_missing': excludes rows where all specified columns are null (default setting).
      • 'any_value_is_missing': excludes rows if any of the specified columns are null.
  3. Optional Settings:
    • Quote Columns: Toggle quoting of column names with the quote_columns parameter.
    • Row Filtering: Apply additional filters with row_condition to narrow down the rows that are considered for uniqueness checking.
  4. Execution: Run the test across the dataset; it processes according to the specified configurations and verifies that all combinations of values in the listed columns are unique.
  5. Outcome:
    • Pass: Uniqueness is confirmed if no duplicate combinations are found.
    • Fail: Detection of any duplicate combination results in a test failure, indicating the need to address these data inconsistencies.

Example Usage: E-commerce

For an E-commerce company, managing product listings across different sellers and ensuring that no duplicate listings for a product exist on a specific date can be crucial. This situation can utilize the expect_compound_columns_to_be_unique test to ensure the uniqueness of product listings by a combination of the listing date and the product ID.

Consider a table product_listings that stores information about various sellers’ listings for products each day. Specifying that the combination of the listing_date and product_id should be unique helps prevent multiple listings of the same product from the same date, which could confuse customers and disrupt inventory management.


models:
  - name: product_listings
    tests:
      - dbt_expectations.expect_compound_columns_to_be_unique:
          column_list: ["listing_date", "product_id"]
          ignore_row_if: "any_value_is_missing"
          quote_columns: false
          row_condition: "seller_id is not null"

In this example, the expect_compound_columns_to_be_unique is applied to the listing_date and product_id columns of the product_listings model. By setting ignore_row_if to "any_value_is_missing," the test ensures that listings with incomplete data are not considered in the uniqueness validation. This setup is integral for maintaining an accurate and reliable online catalog where each listing is uniquely identifiable.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph
An illustration showing a debugging console with dbt run commands and an error message. The image includes stylized icons: a smiling orange bar chart, the purple dbt logo, and a magnifying glass containing a pink error icon.