dbt hub

dbt test: expect_column_quantile_values_to_be_between

USE CASE

Values distribution

APPLIES TO

Column

This test from the dbt-expectations package assesses whether specific quantiles of a column's values fall within a set range. It is tailored for ensuring that quantiles, such as the 95th percentile, do not exceed or fall below predetermined limits, allowing for fine-grained analysis of data distribution.

How it Works

The expect_column_quantile_values_to_be_between test analyzes quantiles within a column to ascertain if they lie between the specified minimum and maximum values. This functionality is critical for monitoring distribution changes in key metrics or fields within your dataset.

Steps and Conditions:

  1. Quantile Selection: Define the quantile(s) to be tested (e.g., 0.95 for the 95th percentile).
  2. Range Specification: Set the minimum and maximum values that the selected quantile should lie between.
  3. Optional Settings:
    • Group By: Apply the test to quantiles within specific groups by defining one or more grouping columns.
    • Row Filter: Include a condition to filter which rows are considered for quantile calculation based on specific criteria.
    • Strict Comparison: Decide whether the range comparison includes the boundary values by setting strictly to true or false (default is false).
  4. Calculation and Validation: Calculate the defined quantile for the selected column and verify whether it falls within the specified range.
  5. Outcome:
    • Pass: The quantile value lies within the given range, confirming that the data distribution meets the expectations.
    • Fail: The quantile value is outside the set range, indicating discrepancies that might need examination or action.

Example Usage: E-commerce

In an E-commerce context, understanding the distribution of purchase amounts can be vital, particularly in spotting outliers or unusual trends.

Consider a hypothetical scenario where an e-commerce company keeps track of customer purchases in a transactions table. Assessing the 95th percentile of transaction amounts could be imperative to gauge high-value purchases.


models:
  - name: transactions
    columns:
      - name: amount
        tests:
          - dbt_expectations.expect_column_quantile_values_to_be_between:
              quantile: .95
              min_value: 200
              max_value: 5000
              row_condition: "amount > 0"
              group_by: [customer_segment]
              strictly: true

In this example, the test ensures that the 95th percentile of the amount column in the transactions model lies between $200 and $5000 for different customer segments, considering only positive transaction amounts. This range is crucial for understanding spending habits among the top spending customers, allowing the e-commerce company to tailor marketing and pricing strategies effectively.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph