dbt test: expect_column_values_to_match_regex_list

SOURCE

USE CASE

String matching

APPLIES TO

Column

This page describes the expect_column_values_to_match_regex_list test from the dbt-expectations package, which ensures that string entries in a specified column match one or more defined regular expressions. This test is useful for validating string formats, such as email addresses, phone numbers, or any custom pattern that needs to be strictly followed within the data.

How it Works

The expect_column_values_to_match_regex_list test checks if the string values of a specified column conform to a list of regular expressions. This ensures that the data in the column adheres to expected textual patterns, enhancing data integrity and reliability.

Steps and Conditions:

Column and Regex List Setup: Select the column to check and define a list of regular expressions that the column values should match.
Matching Logic:
- Match on 'any' (default): A column value passes the test if it matches any one of the regular expressions.
- Match on 'all': A column value passes the test only if it matches all provided regular expressions.
Adjust Regex Behavior:
- Is Raw: If set to True, treat the pattern as a raw string where escape sequences are not translated (default is False).
- Flags: Specify flags such as 'i' for case-insensitive matching to customize how the regex is applied.
Row Condition: Apply the test only to rows that meet a specific condition (optional).
Outcome:
- Pass: All applicable column values match the regex pattern(s) as per the 'match_on' setting.
- Fail: One or more column values do not conform to the required regex pattern(s), indicating discrepancies that need attention.

Example Usage: Fintech Company

For a Fintech company, ensuring that transaction codes or identifiers in their systems conform to a specific format is crucial for maintaining secure and consistent financial record-keeping.

Consider a scenario where the transaction_details table includes a transaction_code column, and each code needs to follow a specific format, such as beginning with a country code followed by a specific set of characters.


models:
  - name: transaction_details
    columns:
      - name: transaction_code
        tests:
          - dbt_expectations.expect_column_values_to_match_regex_list:
              regex_list: ["^USA\\d{3}-[A-Z]{4}$", "^CAN\\d{3}-[A-Z]{4}$"]
              is_raw: True
              flags: i
              row_condition: "status = 'confirmed'"

In this example, the regex patterns ^USA\d{3}-[A-Z]{4}$ and ^CAN\d{3}-[A-Z]{4}$ ensure that transaction codes start with 'USA' or 'CAN', followed by three digits, a dash, and four uppercase letters. The is_raw setting is used to avoid interpreting the backslashes as escape sequences. This setup validates that confirmed transactions are correctly formatted, reflecting tight adherence to data quality requirements.

AI Agents

Detection & Coverage

Governance

Triage and Response

Enabling business users

Integrations

Cloud vs. OSS

Webinars

dbt Tests Hub

Customers

Blog

Cloud Demo Videos

About Us

Docs

Community

Evaluating Observability Tools?

dbt test: expect_column_values_to_match_regex_list

SOURCE

USE CASE

APPLIES TO

How it Works

Steps and Conditions:

Example Usage: Fintech Company

The only data observability platform built into your dbt code

Product

Resources

Company

AI Agents Demo