This page describes the expect_column_values_to_be_within_n_moving_stdevs
test from the dbt-expectations package. This test is designed to detect anomalies in time-series data by checking if logged differences in metric values between periods are within a specified number of standard deviations from a moving average. It's particularly useful for monitoring the stability and consistency of metrics over time.
How it Works
The expect_column_values_to_be_within_n_moving_stdevs
test focuses on evaluating whether the variations in a column's values, especially in time-series data, are within a normal range defined by standard deviations from a moving average. It assumes that these differences follow a log-normal distribution when logged.
Steps and Conditions:
- Time Periods and Interval Configurations: Define the relevant time periods for the test, including the baseline date column, the period for comparison (e.g., day), look-back periods to consider, and the periods over which trends and tests should be applied.
- Standard Deviation Thresholds: Specify the number of standard deviations (
sigma_threshold
) that the metric values can deviate from the moving average before they are considered anomalies. - Calculation of Logged Differences: If
take_logs
is true, the test computes the logarithm of differences between metric values from the current period andN
periods ago. - Grouping (Optional): Apply the test within grouped subsets of data if specified using the
group_by
parameter. - Outcome:
- Pass: If the logged differences for all considered periods are within the specified number of standard deviations from the moving average, the test passes.
- Fail: If any logged difference exceeds this threshold, the test fails, indicating a potential anomaly or issue worth investigating.
Example Usage: Fintech
For a Fintech company, monitoring the stability of transaction volumes or customer spending is crucial. Fluctuations beyond normal levels can indicate issues or significant changes in customer behavior.
Consider a scenario where the daily_transactions
table records the volume of transactions processed each day, including a date column and a transaction_volume
metric.
In this example, the expect_column_values_to_be_within_n_moving_stdevs
test is configured to ensure that the day-to-day logged differences in transaction volumes don't deviate more than 3 standard deviations from a 7-day moving average, calculated over a span of 14 days. Grouping by transaction_type
allows for nuanced anomaly detection across different types of transactions, providing a tailored and effective monitoring mechanism for unexpected shifts in transaction volumes.