This page explains the expect_column_values_to_match_like_pattern_list
test from the dbt-expectations package, which checks that string entries in a specified column conform to any one or multiple provided SQL LIKE
patterns. Particularly suited for validating formats or specific string content in a column, this test is a powerful tool for ensuring data consistency and reliability.
How it Works
The expect_column_values_to_match_like_pattern_list
test applies one or several LIKE
patterns to string values in a column to confirm that each entry matches expected string formats. This test is useful for verifying that data adheres to format specifications, such as containing specific symbols or patterns.
Steps and Conditions:
- Column and Patterns Specification: Choose the column you want to validate and specify the list of
LIKE
patterns that the values should match. - Pattern Matching:
- Match Mode: Decide whether the values should match any pattern in the list (
any
) or all of them (all
). The typical setting isany
, which is used unless specified otherwise. - Optional Configurations:
- Row Condition: Include a condition to filter the rows that this test will apply to (e.g., exclude null values).
- Match Mode: Decide whether the values should match any pattern in the list (
- Execution: The test filters the data based on any row conditions, then checks each value in the selected column against the specified
LIKE
patterns using the chosen match mode. - Outcome:
- Pass: All checked values in the column match the specified patterns according to the chosen match mode, indicating conformity with expected string formats.
- Fail: If one or more values do not comply with the specified patterns, the test fails, signaling possible inconsistencies in data entry or processing.
Example Usage: Fintech Company
For a Fintech company, ensuring accurate and consistent formatting in transaction data, like transaction IDs or references, is crucial for data tracking and integrity.
Consider a scenario where the transaction_data
table records details about user transactions, including a transaction_reference
column that stores unique identifiers for each transaction. It's important that these identifiers maintain a specific format for ease of tracking and consistency.
In this example, the expect_column_values_to_match_like_pattern_list
test verifies that each transaction reference follows one of the set formats, either starting with "TRX-" or ending with "-2023". This ensures that all entries are in line with format expectations, facilitating reliable transaction identification and processing.