dbt hub

dbt test: expect_column_pair_values_to_be_in_set

USE CASE

Columns relationship
Values set

APPLIES TO

Model
Seed
Source

This page discusses the expect_column_pair_values_to_be_in_set test in dbt (data build tool). This test checks that pairs of values from two specified columns match any pair in a predefined set of valid value pairs. It is particularly useful for ensuring data integrity in relational datasets or where column values are interdependent.

How it Works

This dbt test verifies that every specified pair of values from two columns are among an expected set of valid pairs. These pairs are defined beforehand, ensuring that only certain combinations of values are allowed, which can be critical for data validation in systems with complex relational rules or dependencies.

Steps and Conditions:

  1. Column Pair Selection: Identify two columns, column_A and column_B, whose values will be paired and checked.
  2. Set of Valid Pairs Definition: Define a list of valid pairs. Each pair is expressed as a list containing the acceptable values for column_A followed by column_B.
  3. Optional Row Condition:
    • You may specify a row_condition to filter the rows that should be included in the test, such as checking only rows where an "id" is not null.
  4. Checking Value Pairs: During execution, the test iterates through the dataset, extracting values from the selected columns, forming pairs, and then comparing each pair against the defined valid pairs set.
  5. Outcome:
    • Pass: If every column pair from the dataset matches one of the defined valid pairs, the test passes.
    • Fail: If any column pair does not match the valid pairs set, the test fails, indicating a potential issue or anomaly that should be further investigated.

Example Usage: Fintech

In a Fintech application, certain transactions might require specific linked actions between two accounts, encoded in account_type_A and account_type_B columns. For example, transferring funds might only be valid between certain types of accounts.

Consider a scenario where a transaction_details table records every transaction attempt between accounts with the types of accounts involved captured in account_type_A and account_type_B. To ensure transaction validity according to predefined rules, the expect_column_pair_values_to_be_in_set test could be employed.


tests:
  - dbt_expectations.expect_column_pair_values_to_be_in_set:
      column_A: account_type_A
      column_B: account_type_B
      value_pairs_set: [['savings', 'checking'], ['checking', 'investment']]

In this example, the test validates every transaction attempt by comparing the pairs of account types involved against a set of valid account type pairs (['savings', 'checking'] and ['checking', 'investment']). This ensures transactions are only initiated between permissible account types, preventing operational issues or fraudulent activities involving incompatible account types.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph