dbt hub

dbt test: expect_column_distinct_count_to_equal

USE CASE

Distinct count

APPLIES TO

Column

This page explains the expect_column_distinct_count_to_equal test from the dbt-expectations package. This test verifies that the number of distinct values in a specified column exactly matches a specified number, ensuring precise data representation and consistency.

How it Works

The expect_column_distinct_count_to_equal test challenges the data by confirming that the variety or count of unique entries in a designated column is exactly what is expected, underlining the precision of data categorization or encoding.

Steps and Conditions:

  1. Column Identification: Select the column whose distinct values will be counted.
  2. Setting Expectations: Define the exact number of distinct values you expect in the column using the value argument.
  3. Optional Configurations:
    • Quote Values: Determine if the values should be quoted (default is true).
    • Group By: Provide one or more columns that you wish to group by prior to performing the distinct count.
    • Row Condition: Set up a condition to filter rows, allowing only certain records to be evaluated.
  4. Execution: Apply any row conditions, grouping, and then count the distinct values in the selected column. The outcome is compared against the specified expected count.
  5. Outcome:
    • Pass: Achieved when the count of distinct values matches the expected count exactly.
    • Fail: Occurs if the distinct count diverges from the anticipated number, indicating discrepancies that require attention.

Example Usage: E-commerce

When managing product listings in an E-commerce platform, it's crucial to have distinct product IDs for each unique product. Ensuring the correct number of unique product IDs prevents inventory and listing errors.

Consider a scenario where an inventory table lists each product's information, and the product_id column contains identifiers for each product.


models:
  - name: inventory
    columns:
      - name: product_id
        tests:
          - dbt_expectations.expect_column_distinct_count_to_equal:
              value: 150
              row_condition: "inventory_date >= '2023-01-01'"

In this example, the expect_column_distinct_count_to_equal test verifies that there are exactly 150 distinct product IDs in the inventory from January 1, 2023, onwards. This exact count ensures that the E-commerce platform correctly displays and manages the intended number of unique products.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph