dbt hub

dbt test: expect_column_values_to_match_regex

USE CASE

String matching

APPLIES TO

Column

This page describes the expect_column_values_to_match_regex test in dbt, which ensures that column values adhere to a specified regular expression pattern. This test is crucial for validating the format of string data within a database to confirm it meets expected textual patterns.

How it Works

The expect_column_values_to_match_regex test applies a regular expression (regex) to each value in a specific column to check if the values conform to the defined pattern. This functionality is essential for maintaining structured data that often requires specific formatting, such as phone numbers, email addresses, or custom identifiers.

Steps and Conditions:

  1. Column and Regex Selection: The column to be tested is defined, and a regex pattern is specified to determine the valid format for the values in this column.
  2. Regular Expression Match: The test evaluates each value in the column to see if it matches the regex pattern.
  3. Optional Configuration:
    • Raw String: Set is_raw to true if the regex pattern needs to escape characters, marking it as a raw string.
    • Regex Flags: The flags parameter can include flags such as 'i' for case-insensitive matching, enhancing the flexibility of the match requirements.
    • Row Condition: The row_condition can specify conditions to filter which rows are included in the test.
  4. Outcome:
    • Pass: If all evaluated column values match the regex, the test passes, verifying the data conforms to the desired format.
    • Fail: If any value does not match the regex, the test fails, indicating discrepancies that require attention.

Example Usage: Fintech

In a Fintech company, ensuring that transaction IDs are formatted correctly is crucial for data integrity and reliable transaction tracking. The expect_column_values_to_match_regex test can be used to validate the format of the transaction ID in a transactions table.

Consider a scenario where each transaction ID should follow a specific format, for example, two letters followed by six numbers (e.g., "AB123456").


models:
  - name: transactions
    columns:
        - name: transaction_id
          tests:
            - dbt_expectations.expect_column_values_to_match_regex:
                regex: "^[A-Za-z]{2}\d{6}$"
                is_raw: True

This setup ensures that the transaction_id column in the transactions model of the Fintech platform contains values that all conform to the specified regex pattern. This is crucial for maintaining systematic transaction records and supporting efficient, accurate data retrieval and processing.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph