dbt hub

dbt test: expect_column_values_to_not_match_like_pattern_list

USE CASE

String matching

APPLIES TO

Column

This page explains the expect_column_values_to_not_match_like_pattern_list test from the dbt-expectations package, which checks that strings in a specified column do not match a list of SQL LIKE patterns. This test helps ensure that column entries are free from unwanted or problematic characters or patterns that could interfere with processing or analysis.

How it Works

The test evaluates each value in the selected column to confirm that none of the values match any undesirable patterns specified in the like_pattern_list. By ensuring that these patterns are not present, it helps maintain the integrity and cleanliness of the data.

Steps and Conditions:

  1. Column Selection: Choose the column you want to evaluate.
  2. Pattern Specification: Define a list of SQL LIKE patterns that values in the column should not match.
  3. Optional Configuration:
    • Match Criteria: By default, if any value matches any pattern (any), the test will fail. Setting it to all requires all patterns to match for the test to fail.
    • Row Condition: Apply a condition to select specific rows for testing (e.g., only consider rows where id is not null).
  4. Execution: The test runs through each selected row's column value, checking for matches against the specified patterns.
  5. Outcome:
    • Pass: No values in the column match any of the listed patterns, confirming that the data does not contain the specified undesirable characteristics.
    • Fail: If one or more values match any of the listed patterns, the test fails, indicating the presence of undesirable data patterns that need attention.

Example Usage: E-commerce

In an E-commerce setting, ensuring that product descriptions or customer reviews do not contain problematic patterns (like SQL injection elements or profanity encoded with special symbols) is crucial for maintaining the reliability of product information and the safety of the platform.

Consider a scenario where the product_reviews table stores customer feedback, and you're particularly concerned about preventing SQL injection through user input, which could be attempted via product review submissions.


models:
  - name: product_reviews
    columns:
        - name: review_text
          tests:
            - dbt_expectations.expect_column_values_to_not_match_like_pattern_list:
                like_pattern_list: ["%;--%", "%' OR '%"]
                row_condition: "created_at > '2023-01-01'"

In this example, the expect_column_values_to_not_match_like_pattern_list test is applied to the review_text column of the product_reviews model, checking that no review submitted after January 1, 2023, contains the patterns %;--% or %' OR '%'. These patterns are often used in SQL injection attacks. By filtering out these patterns, the E-commerce site can better safeguard its data and systems from potential security threats posed by injurious review content.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph