dbt test: expect_compound_columns_to_be_unique

SOURCE

github.com/calogica/dbt-expectations

USE CASE

Uniqueness

APPLIES TO

Model

Seed

Source

This dbt test, expect_compound_columns_to_be_unique, is designed to ensure the uniqueness of combinations of values across multiple specified columns within a table or model. It is particularly useful in scenarios where unique constraints need to apply across a group of columns, often utilized as a form of composite or multi-column primary key.

How it Works

The expect_compound_columns_to_be_unique test checks that every combination of values in the specified columns is unique across all rows in the dataset. It is applicable to models, seeds, and sources.

Steps and Conditions:

Column List Specification: Define a list of column names in column_list which should together be unique.
Row Exclusion Based on Null Values:
- Utilize the ignore_row_if parameter to handle rows with null values:
  - 'all_values_are_missing': excludes rows where all specified columns are null (default setting).
  - 'any_value_is_missing': excludes rows if any of the specified columns are null.
Optional Settings:
- Quote Columns: Toggle quoting of column names with the quote_columns parameter.
- Row Filtering: Apply additional filters with row_condition to narrow down the rows that are considered for uniqueness checking.
Execution: Run the test across the dataset; it processes according to the specified configurations and verifies that all combinations of values in the listed columns are unique.
Outcome:
- Pass: Uniqueness is confirmed if no duplicate combinations are found.
- Fail: Detection of any duplicate combination results in a test failure, indicating the need to address these data inconsistencies.

Example Usage: E-commerce

For an E-commerce company, managing product listings across different sellers and ensuring that no duplicate listings for a product exist on a specific date can be crucial. This situation can utilize the expect_compound_columns_to_be_unique test to ensure the uniqueness of product listings by a combination of the listing date and the product ID.

Consider a table product_listings that stores information about various sellers’ listings for products each day. Specifying that the combination of the listing_date and product_id should be unique helps prevent multiple listings of the same product from the same date, which could confuse customers and disrupt inventory management.


models:
  - name: product_listings
    tests:
      - dbt_expectations.expect_compound_columns_to_be_unique:
          column_list: ["listing_date", "product_id"]
          ignore_row_if: "any_value_is_missing"
          quote_columns: false
          row_condition: "seller_id is not null"

In this example, the expect_compound_columns_to_be_unique is applied to the listing_date and product_id columns of the product_listings model. By setting ignore_row_if to "any_value_is_missing," the test ensures that listings with incomplete data are not considered in the uniqueness validation. This setup is integral for maintaining an accurate and reliable online catalog where each listing is uniquely identifiable.

AI Agents

Detection & Coverage

Governance

Triage and Response

Enabling business users

Integrations

Cloud vs. OSS

Webinars

dbt Tests Hub

Customer stories

Blog

Cloud Demo Videos

About Us

Documentation

The Elementary Community

dbt test: expect_compound_columns_to_be_unique

SOURCE

USE CASE

APPLIES TO

How it Works

Steps and Conditions:

Example Usage: E-commerce

The only data observability platform built into your dbt code

Product

Resources

Company

Live Demo