dbt hub

dbt test: expect_column_distinct_values_to_contain_set

USE CASE

Values set

APPLIES TO

Column

This page explains the expect_column_distinct_values_to_contain_set test from the dbt-expectations package, which ensures that a set of specific values is present among the distinct values in a column. This test is useful for verifying the presence of critical categorical data elements within a dataset.

How it Works

The expect_column_distinct_values_to_contain_set test identifies distinct values in a chosen column and checks whether all values listed in a predefined set are included among these distinct values. It is particularly beneficial for confirming the representation of essential values in the dataset, without requiring that all entries in the column are limited to this set.

Steps and Conditions:

  1. Column Selection: Choose the column that you intend to verify for the presence of certain values.
  2. Specify Required Values: Define the set of values that must appear at least once among the column's distinct entries.
  3. Optional Configuration:
    • Quote Values: You may decide whether to treat string values as quoted (i.e., treated as strings). By default, this is set to true.
    • Row Condition: Specify a condition to limit the rows evaluated in this test. For instance, you might exclude rows where IDs are null if such rows are irrelevant or expected to be incomplete.
  4. Execution: The test compiles the distinct values from the selected column, applies the row condition if present, and then checks if every value from the predefined set appears in this list of distinct column values.
  5. Outcome:
    • Pass: If all values from the specified set are found among the distinct values of the column, the test passes.
    • Fail: If one or more specified values are not found, the test fails, indicating missing expected values which might affect further analysis or operations.

Example Usage: E-commerce

An E-commerce company could use this test to ensure that key product categories are represented in the product listings. Ensuring that certain categories are always available can be essential for maintaining a diverse and attractive product portfolio.

Consider a scenario where the products table has a category column that categorizes each product. To verify that important product categories like "electronics" and "furniture" are always present in the catalog, you could set up this test as follows:


models:
  - name: products
    columns:
        - name: category
          tests:
            - dbt_expectations.expect_column_distinct_values_to_contain_set:
                value_set: ['electronics', 'furniture']
                row_condition: "status = 'active'"

In this configuration, the test checks that the distinct categories of all active products must include "electronics" and "furniture". This is critical for the E-commerce platform to ensure that they continuously meet diverse customer demands across these significant product categories.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph