dbt hub

dbt test: expect_column_mean_to_be_between

USE CASE

Values distribution

APPLIES TO

Column

This page describes the expect_column_mean_to_be_between test from the dbt-expectations package. This test ensures that the average value of a numeric column lies within a specified range, inclusive of minimum and maximum values. It is essential for verifying that data points in a particular column maintain expected averages, which can be critical for operational metrics, financial calculations, or quality control measures.

How it Works

The expect_column_mean_to_be_between test calculates the mean (average) of a specified column and checks if this mean value falls between the defined minimum and maximum values. This test helps in monitoring the consistency and expected standards of data values over time.

Steps and Conditions:

  1. Column Selection: Choose the numeric column you want to evaluate for its average value.
  2. Define Range: Set the minimum (min_value) and maximum (max_value) thresholds that the column's mean should fall between.
  3. Optional Configuration:
    • Group By: You may specify one or more columns to group the data before calculating the mean. This is useful for segmented analysis.
    • Row Condition: Provides the ability to include only certain rows in the calculation based on a condition, like filtering out rows where IDs are null.
    • Strictness: Decide whether the comparison to the threshold is strict (does not include the boundary values) or not, using the strictly argument.
  4. Execution: After applying any groupings and row conditions, the test computes the mean of the column and evaluates whether it falls within the specified range.
  5. Outcome:
    • Pass: The mean value is within the set bounds, including boundaries if strictly is false.
    • Fail: The mean value is outside the specified range, indicating a deviation from expected values.

Example Usage: Fintech Company

For a Fintech company, maintaining accurate financial metrics is crucial, especially when dealing with transaction amounts. Ensuring that the average transaction values are within expected limits can help in identifying anomalies or shifts in user behavior.

Consider a scenario where the transactions table records details of monetary transactions and includes a transaction_amount column.


tests:
  - dbt_expectations.expect_column_mean_to_be_between:
      column: transaction_amount
      min_value: 50
      max_value: 500
      row_condition: "transaction_date >= '2023-01-01'"

In this example, the expect_column_mean_to_be_between test checks that the average transaction_amount for all transactions since January 1, 2023, stays between $50 and $500. This range could be indicative of typical user transaction behavior, and maintaining this range is essential for the stability of financial planning and anomaly detection in the system.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph