This page describes the expect_column_values_to_match_regex_list
test from the dbt-expectations package, which ensures that string entries in a specified column match one or more defined regular expressions. This test is useful for validating string formats, such as email addresses, phone numbers, or any custom pattern that needs to be strictly followed within the data.
How it Works
The expect_column_values_to_match_regex_list
test checks if the string values of a specified column conform to a list of regular expressions. This ensures that the data in the column adheres to expected textual patterns, enhancing data integrity and reliability.
Steps and Conditions:
- Column and Regex List Setup: Select the column to check and define a list of regular expressions that the column values should match.
- Matching Logic:
- Match on 'any' (default): A column value passes the test if it matches any one of the regular expressions.
- Match on 'all': A column value passes the test only if it matches all provided regular expressions.
- Adjust Regex Behavior:
- Is Raw: If set to True, treat the pattern as a raw string where escape sequences are not translated (default is False).
- Flags: Specify flags such as 'i' for case-insensitive matching to customize how the regex is applied.
- Row Condition: Apply the test only to rows that meet a specific condition (optional).
- Outcome:
- Pass: All applicable column values match the regex pattern(s) as per the 'match_on' setting.
- Fail: One or more column values do not conform to the required regex pattern(s), indicating discrepancies that need attention.
Example Usage: Fintech Company
For a Fintech company, ensuring that transaction codes or identifiers in their systems conform to a specific format is crucial for maintaining secure and consistent financial record-keeping.
Consider a scenario where the transaction_details
table includes a transaction_code
column, and each code needs to follow a specific format, such as beginning with a country code followed by a specific set of characters.
In this example, the regex patterns ^USA\d{3}-[A-Z]{4}$
and ^CAN\d{3}-[A-Z]{4}$
ensure that transaction codes start with 'USA' or 'CAN', followed by three digits, a dash, and four uppercase letters. The is_raw
setting is used to avoid interpreting the backslashes as escape sequences. This setup validates that confirmed transactions are correctly formatted, reflecting tight adherence to data quality requirements.