dbt hub

dbt test: expect_column_values_to_not_be_in_set

USE CASE

Values set

APPLIES TO

Column

This page explains the expect_column_values_to_not_be_in_set test from the dbt-expectations package. This test is key for ensuring that the values in a specified column do not include any disallowed or undesirable entries. It is useful for filtering out unwanted data entries that can negatively impact the integrity of the dataset.

How it Works

The expect_column_values_to_not_be_in_set test verifies that none of the values in a targeted column appear in a specified set of unacceptable values. By enforcing this, the test helps maintain the purity of the data by preventing unwanted values from being present.

Steps and Conditions:

  1. Column Selection: Choose the column you wish to check for the presence of disallowed values.
  2. Define Unacceptable Value Set: Specify the set of values that should not be found in the chosen column using the value_set argument.
  3. Optional Configuration:
    • Quote Values: Use the quote_values argument to decide if the values in value_set should be treated as quoted. The default setting is true.
    • Row Condition: The row_condition argument allows you to specify conditions that filter which rows are considered in the test, such as excluding rows where specific fields might be null.
  4. Execution: The test applies the optional row condition and then checks each row in the specified column to ensure that its value is not included in the value_set.
  5. Outcome:
    • Pass: If none of the values in the column are found in the unacceptable value set, the test passes.
    • Fail: If any value in the column is detected in the set of disallowed values, the test fails, indicating a contamination of data that requires review and cleanup.

Example Usage: Fintech

For a Fintech company, maintaining accuracy and compliance in transaction categorization is crucial. Incorrect or unwanted transaction categories can lead to issues in reporting and regulatory compliance.

Consider a scenario where the transactions table includes a category column that classifies each financial transaction. Ensuring that this column does not include deprecated or erroneous categories is vital.


models:
  - name: transactions
    columns:
      - name: category
        tests:
          - dbt_expectations.expect_column_values_to_not_be_in_set:
              value_set: ['undefined', 'miscategorized', 'deprecated']
              row_condition: "amount > 0"

In this example, expect_column_values_to_not_be_in_set ensures that the category column in the transactions table does not contain the values 'undefined', 'miscategorized', or 'deprecated', particularly for transactions where the amount is greater than zero. This setup helps the fintech company to avoid misclassification and maintain accurate and compliant financial records.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph