This page describes the expect_column_stdev_to_be_between
test in dbt, part of the dbt-expectations package. This test ensures the standard deviation of a specified column's values falls within a defined range. Standard deviation is a statistical measure of the amount of variation or dispersion in a set of values, and controlling it can be valuable for maintaining consistency in dataset characteristics.
How it Works
The expect_column_stdev_to_be_between
test calculates the sample standard deviation (normalized by N-1, where N is the sample size) of a given column and checks that this value lies between a specified minimum and maximum value.
Steps and Conditions:
- Column Selection: Choose the column from your model to assess for its variability in terms of standard deviation.
- Configure Values: Define
min_value
andmax_value
to set the acceptable range for the standard deviation. These parameters are optional, allowing for one-sided checks if needed. - Optional Configuration:
- Group By: The
group_by
feature allows defining specific groups within your data for which the standard deviation should be calculated separately. - Row Condition: Use
row_condition
to include only certain rows in the calculation, based on a specified condition. - Strict Comparison: The
strictly
parameter, when set tofalse
, includes the boundary values in the acceptable range (i.e., "or equal to"). If set totrue
, the test requires the standard deviation to fall strictly between the minimum and maximum values.
- Group By: The
- Execution: After applying any row conditions and grouping if specified, the test calculates the standard deviation for the column and checks if it falls within the defined range.
- Outcome:
- Pass: The test passes if the standard deviation of the column is within the specified range, indicating acceptable variability.
- Fail: If the standard deviation falls outside the specified range, the test fails, signaling an unexpected level of variability that may need attention.
Example Usage: Fintech Company
For a Fintech company, maintaining controlled variability in transaction amounts can be crucial for spotting unusual patterns that might indicate fraud or errors in transaction processing systems.
Consider a scenario in which a transactions
table records each transaction's details, including the amount
column that logs the transaction value.
In this example, the test monitors the standard deviation of the amount
column in the transactions
model for each customer_id
, considering only transactions from January 1, 2023, onward. By setting min_value
to 10 and max_value
to 200, with strictly
true, the setup ensures that each customer's transaction amounts are varied within a controlled range, which aids in detecting anomalies in transaction patterns.