This test from the dbt-expectations package assesses whether specific quantiles of a column's values fall within a set range. It is tailored for ensuring that quantiles, such as the 95th percentile, do not exceed or fall below predetermined limits, allowing for fine-grained analysis of data distribution.
How it Works
The expect_column_quantile_values_to_be_between
test analyzes quantiles within a column to ascertain if they lie between the specified minimum and maximum values. This functionality is critical for monitoring distribution changes in key metrics or fields within your dataset.
Steps and Conditions:
- Quantile Selection: Define the quantile(s) to be tested (e.g., 0.95 for the 95th percentile).
- Range Specification: Set the minimum and maximum values that the selected quantile should lie between.
- Optional Settings:
- Group By: Apply the test to quantiles within specific groups by defining one or more grouping columns.
- Row Filter: Include a condition to filter which rows are considered for quantile calculation based on specific criteria.
- Strict Comparison: Decide whether the range comparison includes the boundary values by setting
strictly
to true or false (default is false).
- Calculation and Validation: Calculate the defined quantile for the selected column and verify whether it falls within the specified range.
- Outcome:
- Pass: The quantile value lies within the given range, confirming that the data distribution meets the expectations.
- Fail: The quantile value is outside the set range, indicating discrepancies that might need examination or action.
Example Usage: E-commerce
In an E-commerce context, understanding the distribution of purchase amounts can be vital, particularly in spotting outliers or unusual trends.
Consider a hypothetical scenario where an e-commerce company keeps track of customer purchases in a transactions
table. Assessing the 95th percentile of transaction amounts could be imperative to gauge high-value purchases.
In this example, the test ensures that the 95th percentile of the amount
column in the transactions
model lies between $200 and $5000 for different customer segments, considering only positive transaction amounts. This range is crucial for understanding spending habits among the top spending customers, allowing the e-commerce company to tailor marketing and pricing strategies effectively.