dbt hub

dbt test: expect_column_value_lengths_to_be_between

USE CASE

String format

APPLIES TO

Column

This page describes the expect_column_value_lengths_to_be_between test from the dbt-expectations package. This test confirms the length of string values in a given column fall within a specified range. It is essential for ensuring data conforms to expected formats, potentially influencing data processing or user interface design where specific string lengths are required.

How it Works

The expect_column_value_lengths_to_be_between test verifies that all string lengths in a designated column lie between an inclusive minimum and maximum value. This verification prevents errors related to data format mismatches, like truncation or misalignment in reports and data feeds.

Steps and Conditions:

  1. Column Selection: Choose the column whose string values are to be tested for length.
  2. Set Length Parameters: Define the acceptable range for string lengths using the min_value and max_value parameters.
  3. Optional Configuration:
    • Row Condition: Apply row_condition to filter which rows are to be included in the test, based on specific criteria.
    • Strict Comparison: By setting strictly to false (default), the test includes the boundary values in the comparison. If strictly is true, the boundaries are exclusive, not including the minimum and maximum lengths.
  4. Execution: During test execution, it evaluates each string's length in the specified column against the defined range.
  5. Outcome:
    • Pass: The test passes if all string lengths are within the defined range, ensuring consistency in data entry.
    • Fail: The test fails if any string length is outside the specified range, indicating an anomaly that might need correction or further investigation.

Example Usage: Fintech

For a Fintech company, ensuring accuracy in data such as account identifiers, which are often formatted as strings of specific lengths, is crucial. Incorrectly formatted account numbers can lead to processing errors affecting transactions and customer satisfaction.

Imagine the customer_accounts table includes a column account_number intended to store account identifiers as 10-digit strings.


models:
  - name: customer_accounts
    columns:
      - name: account_number
        tests:
          - dbt_expectations.expect_column_value_lengths_to_be_between:
              min_value: 10
              max_value: 10
              row_condition: "status = 'active'"

In this scenario, the expect_column_value_lengths_to_be_between test checks the length of the account_number for all active customer accounts to ensure each is exactly 10 characters long. This setup helps in maintaining the integrity and usability of account data across the company's processing systems.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph