This page explains the expect_column_values_to_not_match_like_pattern_list
test from the dbt-expectations package, which checks that strings in a specified column do not match a list of SQL LIKE
patterns. This test helps ensure that column entries are free from unwanted or problematic characters or patterns that could interfere with processing or analysis.
How it Works
The test evaluates each value in the selected column to confirm that none of the values match any undesirable patterns specified in the like_pattern_list
. By ensuring that these patterns are not present, it helps maintain the integrity and cleanliness of the data.
Steps and Conditions:
- Column Selection: Choose the column you want to evaluate.
- Pattern Specification: Define a list of SQL
LIKE
patterns that values in the column should not match. - Optional Configuration:
- Match Criteria: By default, if any value matches any pattern (
any
), the test will fail. Setting it toall
requires all patterns to match for the test to fail. - Row Condition: Apply a condition to select specific rows for testing (e.g., only consider rows where
id is not null
).
- Match Criteria: By default, if any value matches any pattern (
- Execution: The test runs through each selected row's column value, checking for matches against the specified patterns.
- Outcome:
- Pass: No values in the column match any of the listed patterns, confirming that the data does not contain the specified undesirable characteristics.
- Fail: If one or more values match any of the listed patterns, the test fails, indicating the presence of undesirable data patterns that need attention.
Example Usage: E-commerce
In an E-commerce setting, ensuring that product descriptions or customer reviews do not contain problematic patterns (like SQL injection elements or profanity encoded with special symbols) is crucial for maintaining the reliability of product information and the safety of the platform.
Consider a scenario where the product_reviews
table stores customer feedback, and you're particularly concerned about preventing SQL injection through user input, which could be attempted via product review submissions.
In this example, the expect_column_values_to_not_match_like_pattern_list
test is applied to the review_text
column of the product_reviews
model, checking that no review submitted after January 1, 2023, contains the patterns %;--%
or %' OR '%'
. These patterns are often used in SQL injection attacks. By filtering out these patterns, the E-commerce site can better safeguard its data and systems from potential security threats posed by injurious review content.