This page discusses the expect_column_pair_values_to_be_in_set
test in dbt (data build tool). This test checks that pairs of values from two specified columns match any pair in a predefined set of valid value pairs. It is particularly useful for ensuring data integrity in relational datasets or where column values are interdependent.
How it Works
This dbt test verifies that every specified pair of values from two columns are among an expected set of valid pairs. These pairs are defined beforehand, ensuring that only certain combinations of values are allowed, which can be critical for data validation in systems with complex relational rules or dependencies.
Steps and Conditions:
- Column Pair Selection: Identify two columns,
column_A
andcolumn_B
, whose values will be paired and checked. - Set of Valid Pairs Definition: Define a list of valid pairs. Each pair is expressed as a list containing the acceptable values for
column_A
followed bycolumn_B
. - Optional Row Condition:
- You may specify a
row_condition
to filter the rows that should be included in the test, such as checking only rows where an "id" is not null.
- You may specify a
- Checking Value Pairs: During execution, the test iterates through the dataset, extracting values from the selected columns, forming pairs, and then comparing each pair against the defined valid pairs set.
- Outcome:
- Pass: If every column pair from the dataset matches one of the defined valid pairs, the test passes.
- Fail: If any column pair does not match the valid pairs set, the test fails, indicating a potential issue or anomaly that should be further investigated.
Example Usage: Fintech
In a Fintech application, certain transactions might require specific linked actions between two accounts, encoded in account_type_A
and account_type_B
columns. For example, transferring funds might only be valid between certain types of accounts.
Consider a scenario where a transaction_details
table records every transaction attempt between accounts with the types of accounts involved captured in account_type_A
and account_type_B
. To ensure transaction validity according to predefined rules, the expect_column_pair_values_to_be_in_set
test could be employed.
In this example, the test validates every transaction attempt by comparing the pairs of account types involved against a set of valid account type pairs (['savings', 'checking']
and ['checking', 'investment']
). This ensures transactions are only initiated between permissible account types, preventing operational issues or fraudulent activities involving incompatible account types.