dbt hub

dbt test: expect_column_stdev_to_be_between

USE CASE

Values distribution

APPLIES TO

Column

This page describes the expect_column_stdev_to_be_between test in dbt, part of the dbt-expectations package. This test ensures the standard deviation of a specified column's values falls within a defined range. Standard deviation is a statistical measure of the amount of variation or dispersion in a set of values, and controlling it can be valuable for maintaining consistency in dataset characteristics.

How it Works

The expect_column_stdev_to_be_between test calculates the sample standard deviation (normalized by N-1, where N is the sample size) of a given column and checks that this value lies between a specified minimum and maximum value.

Steps and Conditions:

  1. Column Selection: Choose the column from your model to assess for its variability in terms of standard deviation.
  2. Configure Values: Define min_value and max_value to set the acceptable range for the standard deviation. These parameters are optional, allowing for one-sided checks if needed.
  3. Optional Configuration:
    • Group By: The group_by feature allows defining specific groups within your data for which the standard deviation should be calculated separately.
    • Row Condition: Use row_condition to include only certain rows in the calculation, based on a specified condition.
    • Strict Comparison: The strictly parameter, when set to false, includes the boundary values in the acceptable range (i.e., "or equal to"). If set to true, the test requires the standard deviation to fall strictly between the minimum and maximum values.
  4. Execution: After applying any row conditions and grouping if specified, the test calculates the standard deviation for the column and checks if it falls within the defined range.
  5. Outcome:
    • Pass: The test passes if the standard deviation of the column is within the specified range, indicating acceptable variability.
    • Fail: If the standard deviation falls outside the specified range, the test fails, signaling an unexpected level of variability that may need attention.

Example Usage: Fintech Company

For a Fintech company, maintaining controlled variability in transaction amounts can be crucial for spotting unusual patterns that might indicate fraud or errors in transaction processing systems.

Consider a scenario in which a transactions table records each transaction's details, including the amount column that logs the transaction value.


models:
  - name: transactions
    tests:
      - dbt_expectations.expect_column_stdev_to_be_between:
          column: amount
          min_value: 10
          max_value: 200
          group_by: [customer_id]
          row_condition: "transaction_date >= '2023-01-01'"
          strictly: true

In this example, the test monitors the standard deviation of the amount column in the transactions model for each customer_id, considering only transactions from January 1, 2023, onward. By setting min_value to 10 and max_value to 200, with strictly true, the setup ensures that each customer's transaction amounts are varied within a controlled range, which aids in detecting anomalies in transaction patterns.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph