dbt hub

dbt test: expect_column_values_to_be_in_set

USE CASE

Values set

APPLIES TO

Column

This page details the expect_column_values_to_be_in_set test from the dbt-expectations package. This test verifies that each value in a specific column falls within a defined set of permissible values. It's useful for ensuring the values in a column are restricted to a specified list, such as identifiers, categories, or other predefined data.

How it Works

The expect_column_values_to_be_in_set test checks each value in a specified column against a set of allowed values, ensuring all data entries in that column conform to a predefined list. This test is essential for data validation and integrity, especially when managing standardized datasets where deviation from expected values could lead to errors in data processing or analysis.

Steps and Conditions:

  1. Column Specification: Confirm the column to be tested.
  2. Set Definition: Define a set of values (value_set) that are acceptable for the column entries.
  3. Optional Configuration:
    • Quote Values: Determine if the values within value_set should be quoted (true by default).
    • Row Condition: Apply any specific conditions to filter the rows that will be tested (e.g., excluding null or certain IDs).
  4. Execution: The test is run where each entry in the column is checked. If all entries are within the defined value_set, the process moves to the next step.
  5. Outcome:
    • Pass: If every column value matches one of the values in the set, the test passes, affirming that all entries in the column are valid and expected.
    • Fail: If any column value does not match the values in the set, the test will fail, indicating the presence of invalid or unexpected entries.

Example Usage: E-commerce

In an E-commerce platform, managing product categories effectively is crucial to ensure customer satisfaction and efficient inventory management. The expect_column_values_to_be_in_set test can be applied to the category_id column in the products table to verify that each product's category is correctly listed and compliant with existing category definitions.

Consider a scenario where the products table includes details about items for sale and the category_id column specifies the product category. It's vital that the category IDs used are from a controlled list to maintain consistency and accuracy in product categorization.


models:
  - name: products
    columns:
        - name: category_id
          tests:
            - dbt_expectations.expect_column_values_to_be_in_set:
                value_set: ['ELEC', 'HOME', 'TOYS', 'BOOKS']

In this implementation, every product’s category in the e-commerce system is validated against a predefined set of category IDs: ['ELEC', 'HOME', 'TOYS', 'BOOKS']. This test ensures that each item is categorized correctly, vital for functions like sorting, filtering, and promotion targeting on the platform.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph