dbt hub

dbt test: expect_grouped_row_values_to_have_recent_data

USE CASE

Freshness

APPLIES TO

Model
Seed
Source

This page describes the expect_grouped_row_values_to_have_recent_data test from the dbt-expectations package. This test assures that each group in a specified set has data entries no older than a defined interval. It is particularly important for continually updated datasets where fresh data is critical.

How it Works

The expect_grouped_row_values_to_have_recent_data test checks for the presence of recent data within each group defined by the group_by columns. It ensures that the latest data entry in each group is not older than a specified interval back from the current date and time.

Steps and Conditions:

  1. Group Definition: The test first defines which columns to group by using the group_by argument.
  2. Identify Timestamp Column: Points out which column holds the timestamp data (timestamp_column).
  3. Set Time Interval: Specifies the acceptable age of the data with the interval and datepart (e.g., day, hour) arguments.
  4. Apply Row Condition (optional): A condition (row_condition) can be applied to exclude certain rows from the test.
  5. Check Recent Data: For each group, it checks if the most recent entry's timestamp is within the allowed interval from the current timestamp.
  6. Outcome:
    • Pass: If every group's most recent data is as new as or newer than the defined interval, the test passes.
    • Fail: If any group contains data older than the allowable interval, the test fails, indicating outdated or missing entries that require investigation.

Example Usage: Fintech Company

For a Fintech company managing real-time financial transactions, it's critical to have the latest data for accurate reporting and decision-making.

Consider a scenario where the transaction_records table stores each transaction with timestamps, and transactions are grouped by transaction_type and region.


models:
  - name: transaction_records
    tests:
      - dbt_expectations.expect_grouped_row_values_to_have_recent_data:
          group_by: [transaction_type, region]
          timestamp_column: transaction_timestamp
          datepart: hour
          interval: 2
          row_condition: "transaction_status = 'completed'"

In this example, the expect_grouped_row_values_to_have_recent_data test checks that the most recent transaction for each type and region is not older than 2 hours. This setup ensures that the data reflects the current state of affairs and supports timely insights and operations.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph
An illustration showing a debugging console with dbt run commands and an error message. The image includes stylized icons: a smiling orange bar chart, the purple dbt logo, and a magnifying glass containing a pink error icon.