This page describes the expect_table_columns_to_not_contain_set
test from the dbt-expectations package. This test is designed to ensure certain values are absent in specific columns of a model, seed, or source, ensuring data cleanliness and integrity by verifying that prohibited values do not appear in specified columns.
How it Works
The expect_table_columns_to_not_contain_set
test checks that none of the values in the specified list appear in the selected columns. This is crucial for maintaining the integrity of data by ensuring that specific unwanted or invalid values are excluded from data columns.
Steps and Conditions:
- Column Selection: Select one or multiple columns to apply the test to.
- Define Prohibited Values: List the values that should not be present in the specified columns.
- Optional Transformation:
- Transform: Apply a transformation like
upper
orlower
to the data in the columns before performing the check. This ensures uniformity in data comparison, especially if case sensitivity could affect the test results.
- Transform: Apply a transformation like
- Execution: The test examines each value in the specified columns, applying any defined transformations, and verifies that none of the values listed as prohibited are present.
- Outcome:
- Pass: If none of the prohibited values are found in the selected columns, the test passes.
- Fail: If any prohibited values are detected in the columns, the test fails, indicating the presence of unwanted data.
Example Usage: Fintech
For a Fintech company, it's essential to maintain strict control over financial transaction categories to ensure they comply with regulations. Using expect_table_columns_to_not_contain_set
, a Fintech firm can enforce restrictions on the categories of transactions processed.
Consider a scenario where the transactions
table contains a category
column that classifies each transaction. Regulatory guidelines mandate that certain categories, like 'gambling' and 'bribery', should not be processed.
In this example, the expect_table_columns_to_not_contain_set
is applied to ensure that prohibited transaction categories such as 'gambling' and 'bribery' are absent from the category
column in the transactions
table. This test verifies compliance with regulatory standards avoiding potential legal and reputational risks.