This page outlines the expect_column_unique_value_count_to_be_between
test from the dbt-expectations package. This test ensures that the number of unique values in a specified column falls within a defined range, offering flexibility in validating data uniqueness across different scenarios.
How it Works
The expect_column_unique_value_count_to_be_between
test assesses whether the count of unique values in a selected column is between a minimum and maximum value. This test is useful for ensuring that columns maintain a desired level of uniqueness, which could vary based on the data's nature and the specific requirements of your database schema.
Steps and Conditions:
- Column Selection: Choose the column for which you need to verify the unique value count.
- Define Range: Set the minimum and maximum numbers of unique values you expect in the column using
min_value
andmax_value
arguments. These limits can include the values themselves depending on thestrictly
parameter. - Optional Configuration:
- Group By: If specified using the
group_by
argument, the test will assess uniqueness within the defined groups, allowing for segmented validation. - Row Condition: Using the
row_condition
argument, define any specific conditions for the rows that should be included in the test, such as excluding rows with null IDs. - Strict Comparison: The
strictly
flag determines whether the boundary values are included (false
) or strictly excluded (true
) in the range check.
- Group By: If specified using the
- Execution: Apply the configured conditions and groupings, then count the unique values in the selected column. Compare this count against the specified range.
- Outcome:
- Pass: If the number of distinct values falls within the inclusive or exclusive range based on the
strictly
flag, the test passes. - Fail: If the count of unique values lies outside the established range, the test fails.
- Pass: If the number of distinct values falls within the inclusive or exclusive range based on the
Example Usage: E-commerce
For an E-commerce company, maintaining a precise range of unique product categories can be vital for managing inventory efficiently and streamlining product search functionalities.
Consider the products
table where each row represents a different product, and the category_id
column specifies the category of each product:
In this example, the expect_column_unique_value_count_to_be_between
test checks that the number of unique, active product categories ranges from 10 to 50. This setup helps ensure that the E-commerce platform offers a diversified but manageable array of product categories, enhancing user experience and operational efficacy.