dbt hub

dbt test: expect_column_unique_value_count_to_be_between

USE CASE

Values distribution
Uniqueness

APPLIES TO

Column

This page outlines the expect_column_unique_value_count_to_be_between test from the dbt-expectations package. This test ensures that the number of unique values in a specified column falls within a defined range, offering flexibility in validating data uniqueness across different scenarios.

How it Works

The expect_column_unique_value_count_to_be_between test assesses whether the count of unique values in a selected column is between a minimum and maximum value. This test is useful for ensuring that columns maintain a desired level of uniqueness, which could vary based on the data's nature and the specific requirements of your database schema.

Steps and Conditions:

  1. Column Selection: Choose the column for which you need to verify the unique value count.
  2. Define Range: Set the minimum and maximum numbers of unique values you expect in the column using min_value and max_value arguments. These limits can include the values themselves depending on the strictly parameter.
  3. Optional Configuration:
    • Group By: If specified using the group_by argument, the test will assess uniqueness within the defined groups, allowing for segmented validation.
    • Row Condition: Using the row_condition argument, define any specific conditions for the rows that should be included in the test, such as excluding rows with null IDs.
    • Strict Comparison: The strictly flag determines whether the boundary values are included (false) or strictly excluded (true) in the range check.
  4. Execution: Apply the configured conditions and groupings, then count the unique values in the selected column. Compare this count against the specified range.
  5. Outcome:
    • Pass: If the number of distinct values falls within the inclusive or exclusive range based on the strictly flag, the test passes.
    • Fail: If the count of unique values lies outside the established range, the test fails.

Example Usage: E-commerce

For an E-commerce company, maintaining a precise range of unique product categories can be vital for managing inventory efficiently and streamlining product search functionalities.

Consider the products table where each row represents a different product, and the category_id column specifies the category of each product:


models:
  - name: products
    tests:
      - dbt_expectations.expect_column_unique_value_count_to_be_between:
          column: category_id
          min_value: 10
          max_value: 50
          row_condition: "active = true"

In this example, the expect_column_unique_value_count_to_be_between test checks that the number of unique, active product categories ranges from 10 to 50. This setup helps ensure that the E-commerce platform offers a diversified but manageable array of product categories, enhancing user experience and operational efficacy.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph