dbt hub

dbt test: expect_column_values_to_not_match_regex

USE CASE

String matching

APPLIES TO

Column

This page introduces the expect_column_values_to_not_match_regex test from the dbt-expectations package, which is designed to ensure that strings in a particular column do not match a specified regular expression pattern. This test is essential for validating the format of data entries and ensuring they adhere to expected patterns or rules that define what should not appear in the data.

How it Works

The expect_column_values_to_not_match_regex test evaluates each entry in a specified column, ensuring that none of these entries match a given regular expression pattern. This test is useful for enforcing data integrity by confirming that unwanted patterns or characters do not appear in the data values.

Steps and Conditions:

  1. Regular Expression Definition: Define the regular expression pattern that should not appear in any of the string values of the targeted column.
  2. Column Validation: Apply the test to the column, checking every value against the defined regular expression.
  3. Optional Configurations:
    • Row Condition: Specify a condition that filters which rows are considered during the validation.
    • Raw Regular Expression: Determine whether the regular expression is treated as a raw string, which affects how special characters in the regex are interpreted.
    • Regex Flags: Set any flags that modify the behavior of the regex matching, such as case-insensitivity.
  4. Outcome:
    • Pass: The test passes if none of the column's values match the regex pattern.
    • Fail: The test fails if at least one value matches the pattern, indicating that the data contains unexpected or forbidden patterns.

Example Usage: Fintech Company

For a Fintech company, maintaining the security and format of sensitive data like personal identifiers or account numbers is critical. Consider using the expect_column_values_to_not_match_regex test on a customer_identifiers column in a database that stores sensitive customer information.


models:
  - name: customer_data
    columns:
        - name: customer_identifiers
          tests:
            - dbt_expectations.expect_column_values_to_not_match_regex:
                regex: "^(?!.*\b(SSN|CreditCard)\b).*$" # Ensure identifiers do not contain SSN or CreditCard
                flags: i # Case-insensitive matching

In this example, the regex pattern is designed to fail if it finds any strings that contain "SSN" or "CreditCard," regardless of case. This setup ensures that sensitive data types are not stored improperly in the customer_identifiers column, aligning with best practices for data security in the financial sector.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph