This dbt test, expect_compound_columns_to_be_unique
, is designed to ensure the uniqueness of combinations of values across multiple specified columns within a table or model. It is particularly useful in scenarios where unique constraints need to apply across a group of columns, often utilized as a form of composite or multi-column primary key.
How it Works
The expect_compound_columns_to_be_unique
test checks that every combination of values in the specified columns is unique across all rows in the dataset. It is applicable to models, seeds, and sources.
Steps and Conditions:
- Column List Specification: Define a list of column names in
column_list
which should together be unique. - Row Exclusion Based on Null Values:
- Utilize the
ignore_row_if
parameter to handle rows with null values:'all_values_are_missing'
: excludes rows where all specified columns are null (default setting).'any_value_is_missing'
: excludes rows if any of the specified columns are null.
- Utilize the
- Optional Settings:
- Quote Columns: Toggle quoting of column names with the
quote_columns
parameter. - Row Filtering: Apply additional filters with
row_condition
to narrow down the rows that are considered for uniqueness checking.
- Quote Columns: Toggle quoting of column names with the
- Execution: Run the test across the dataset; it processes according to the specified configurations and verifies that all combinations of values in the listed columns are unique.
- Outcome:
- Pass: Uniqueness is confirmed if no duplicate combinations are found.
- Fail: Detection of any duplicate combination results in a test failure, indicating the need to address these data inconsistencies.
Example Usage: E-commerce
For an E-commerce company, managing product listings across different sellers and ensuring that no duplicate listings for a product exist on a specific date can be crucial. This situation can utilize the expect_compound_columns_to_be_unique
test to ensure the uniqueness of product listings by a combination of the listing date and the product ID.
Consider a table product_listings
that stores information about various sellers’ listings for products each day. Specifying that the combination of the listing_date
and product_id
should be unique helps prevent multiple listings of the same product from the same date, which could confuse customers and disrupt inventory management.
In this example, the expect_compound_columns_to_be_unique
is applied to the listing_date
and product_id
columns of the product_listings
model. By setting ignore_row_if
to "any_value_is_missing," the test ensures that listings with incomplete data are not considered in the uniqueness validation. This setup is integral for maintaining an accurate and reliable online catalog where each listing is uniquely identifiable.