dbt hub

dbt test: expect_column_median_to_be_between

USE CASE

Values distribution

APPLIES TO

Column

This page provides insights into the expect_column_median_to_be_between test from the dbt-expectations package. This test is geared towards ensuring that the median value of a specific column falls within a set range. It can be vital for monitoring central tendencies in datasets to avoid skewed data distributions which might affect business decisions or data modelling.

How it Works

The expect_column_median_to_be_between test checks that the median of a designated column lies within a specified minimum and maximum value. Adjusting for central values within a specific range offers a controlled view for ensuring data consistency and reliability.

Steps and Conditions:

  1. Column Identification: The test pinpoints the column whose median needs assessment.
  2. Range Specification: Set the minimum and maximum values (inclusive) that form the permissible range for the median value.
  3. Optional Configuration:
    • Group By: You can specify one or more columns to segregate the data into specific groups, then evaluate the median for each group separately.
    • Row Condition: Apply a condition to filter rows that are included in the test analysis. This step can refine the dataset to more relevant subsets.
    • Strictness: Determine whether the comparison to the defined range should be strict or if it should include the boundary values.
  4. Execution: The test computes the median for the chosen column after applying any row conditions and grouping. It then verifies if this median lies within the defined range of values.
  5. Outcome:
    • Pass: The median value of the column falls within the specified range, indicating appropriate data centralization.
    • Fail: The median falls outside the defined limits, suggesting possible outliers or skewed data that might require further investigation.

Example Usage: B2B SaaS

For a B2B SaaS company, it's crucial to monitor the median session times or user actions per session to understand customer engagement levels. High or excessively low median values could suggest usability issues or software performance problems.

Consider a scenario in which the user_sessions table logs each session's total active time, and we aim to keep the median session time demonstratively indicative of sustained user engagement.


tests:
  - dbt_expectations.expect_column_median_to_be_between:
      column: session_duration_minutes
      min_value: 5
      max_value: 30
      row_condition: "session_end_date >= '2023-01-01'"

In this setup, the expect_column_median_to_be_between test verifies that the median duration of user sessions, starting from the beginning of 2023, remains between 5 and 30 minutes. This helps ensure that the platform maintains an optimal level of engagement, hinting at satisfactory user interaction and effective application performance.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph