dbt hub

dbt test: expect_column_distinct_values_to_equal_set

USE CASE

Values set

APPLIES TO

Column

This page outlines the expect_column_distinct_values_to_equal_set test from the dbt-expectations package. It's a valuable test to maintain strict control over the contents of a column by making sure only a specific set of values are present, neither more nor less. This test is necessary in scenarios requiring a fixed categorization or limited set of identifiers.

How it Works

The expect_column_distinct_values_to_equal_set test evaluates a specified column to ensure that the set of all distinct values exactly matches a predetermined set of values. This guarantees not just the inclusion, but the exclusivity of the values in the column, aligning perfectly with enforced data constraints.

Steps and Conditions:

  1. Column and Value Set Definition: A target column is specified along with a set of values that must exactly match the column’s distinct entries.
  2. Optional Configurations:
    • Quote Values: This argument determines if values should be quoted, useful for string data types, and defaults to true.
    • Row Condition: Defines a condition to filter rows for the test. This can be applied to exclude specific rows based on defined criteria.
  3. Execution: The test checks the distinct values found in the selected column against the defined set. It ensures that no additional values are present, and no predefined values are missing.
  4. Outcome:
    • Pass: If the column’s distinct values match exactly with the provided set, the test passes.
    • Fail: If any discrepancies exist between the column's distinct values and the expected set, the test fails.

Example Usage: E-commerce

For an E-commerce platform, it is crucial to categorize products accurately under predefined categories. Consider the products table, which includes a category column delineating each product's category.


models:
  - name: products
    columns:
      - name: category
        tests:
          - dbt_expectations.expect_column_distinct_values_to_equal_set:
              value_set: ['electronics', 'apparel', 'home', 'beauty']
              row_condition: "inventory_status != 'discontinued'"

In this example, the expect_column_distinct_values_to_equal_set test verifies that the category column of the products model includes exactly the categories ‘electronics’, ‘apparel’, ‘home’, and ‘beauty’. Applying this test ensures that each product is assigned to a recognized category, thus maintaining consistency and accuracy in product categorization, crucial for inventory management and customer browsing experience.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph