dbt hub

dbt test: expect_column_values_to_match_regex_list

USE CASE

String matching

APPLIES TO

Column

This page describes the expect_column_values_to_match_regex_list test from the dbt-expectations package, which ensures that string entries in a specified column match one or more defined regular expressions. This test is useful for validating string formats, such as email addresses, phone numbers, or any custom pattern that needs to be strictly followed within the data.

How it Works

The expect_column_values_to_match_regex_list test checks if the string values of a specified column conform to a list of regular expressions. This ensures that the data in the column adheres to expected textual patterns, enhancing data integrity and reliability.

Steps and Conditions:

  1. Column and Regex List Setup: Select the column to check and define a list of regular expressions that the column values should match.
  2. Matching Logic:
    • Match on 'any' (default): A column value passes the test if it matches any one of the regular expressions.
    • Match on 'all': A column value passes the test only if it matches all provided regular expressions.
  3. Adjust Regex Behavior:
    • Is Raw: If set to True, treat the pattern as a raw string where escape sequences are not translated (default is False).
    • Flags: Specify flags such as 'i' for case-insensitive matching to customize how the regex is applied.
  4. Row Condition: Apply the test only to rows that meet a specific condition (optional).
  5. Outcome:
    • Pass: All applicable column values match the regex pattern(s) as per the 'match_on' setting.
    • Fail: One or more column values do not conform to the required regex pattern(s), indicating discrepancies that need attention.

Example Usage: Fintech Company

For a Fintech company, ensuring that transaction codes or identifiers in their systems conform to a specific format is crucial for maintaining secure and consistent financial record-keeping.

Consider a scenario where the transaction_details table includes a transaction_code column, and each code needs to follow a specific format, such as beginning with a country code followed by a specific set of characters.


models:
  - name: transaction_details
    columns:
      - name: transaction_code
        tests:
          - dbt_expectations.expect_column_values_to_match_regex_list:
              regex_list: ["^USA\\d{3}-[A-Z]{4}$", "^CAN\\d{3}-[A-Z]{4}$"]
              is_raw: True
              flags: i
              row_condition: "status = 'confirmed'"

In this example, the regex patterns ^USA\d{3}-[A-Z]{4}$ and ^CAN\d{3}-[A-Z]{4}$ ensure that transaction codes start with 'USA' or 'CAN', followed by three digits, a dash, and four uppercase letters. The is_raw setting is used to avoid interpreting the backslashes as escape sequences. This setup validates that confirmed transactions are correctly formatted, reflecting tight adherence to data quality requirements.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph
An illustration showing a debugging console with dbt run commands and an error message. The image includes stylized icons: a smiling orange bar chart, the purple dbt logo, and a magnifying glass containing a pink error icon.