dbt hub

dbt test: expect_column_values_to_be_within_n_moving_stdevs

USE CASE

Values distribution
Anomaly detection

APPLIES TO

Column

This page describes the expect_column_values_to_be_within_n_moving_stdevs test from the dbt-expectations package. This test is designed to detect anomalies in time-series data by checking if logged differences in metric values between periods are within a specified number of standard deviations from a moving average. It's particularly useful for monitoring the stability and consistency of metrics over time.

How it Works

The expect_column_values_to_be_within_n_moving_stdevs test focuses on evaluating whether the variations in a column's values, especially in time-series data, are within a normal range defined by standard deviations from a moving average. It assumes that these differences follow a log-normal distribution when logged.

Steps and Conditions:

  1. Time Periods and Interval Configurations: Define the relevant time periods for the test, including the baseline date column, the period for comparison (e.g., day), look-back periods to consider, and the periods over which trends and tests should be applied.
  2. Standard Deviation Thresholds: Specify the number of standard deviations (sigma_threshold) that the metric values can deviate from the moving average before they are considered anomalies.
  3. Calculation of Logged Differences: If take_logs is true, the test computes the logarithm of differences between metric values from the current period and N periods ago.
  4. Grouping (Optional): Apply the test within grouped subsets of data if specified using the group_by parameter.
  5. Outcome:
    • Pass: If the logged differences for all considered periods are within the specified number of standard deviations from the moving average, the test passes.
    • Fail: If any logged difference exceeds this threshold, the test fails, indicating a potential anomaly or issue worth investigating.

Example Usage: Fintech

For a Fintech company, monitoring the stability of transaction volumes or customer spending is crucial. Fluctuations beyond normal levels can indicate issues or significant changes in customer behavior.

Consider a scenario where the daily_transactions table records the volume of transactions processed each day, including a date column and a transaction_volume metric.


models:
  - name: daily_transactions
    columns:
      - name: transaction_volume
        tests:
          - dbt_expectations.expect_column_values_to_be_within_n_moving_stdevs:
              date_column_name: date
              period: day
              lookback_periods: 1
              trend_periods: 7
              test_periods: 14
              sigma_threshold: 3
              take_logs: true
              group_by: ['transaction_type']

In this example, the expect_column_values_to_be_within_n_moving_stdevs test is configured to ensure that the day-to-day logged differences in transaction volumes don't deviate more than 3 standard deviations from a 7-day moving average, calculated over a span of 14 days. Grouping by transaction_type allows for nuanced anomaly detection across different types of transactions, providing a tailored and effective monitoring mechanism for unexpected shifts in transaction volumes.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph