This page describes the expect_grouped_row_values_to_have_recent_data
test from the dbt-expectations package. This test assures that each group in a specified set has data entries no older than a defined interval. It is particularly important for continually updated datasets where fresh data is critical.
How it Works
The expect_grouped_row_values_to_have_recent_data
test checks for the presence of recent data within each group defined by the group_by
columns. It ensures that the latest data entry in each group is not older than a specified interval back from the current date and time.
Steps and Conditions:
- Group Definition: The test first defines which columns to group by using the
group_by
argument. - Identify Timestamp Column: Points out which column holds the timestamp data (
timestamp_column
). - Set Time Interval: Specifies the acceptable age of the data with the
interval
anddatepart
(e.g., day, hour) arguments. - Apply Row Condition (optional): A condition (
row_condition
) can be applied to exclude certain rows from the test. - Check Recent Data: For each group, it checks if the most recent entry's timestamp is within the allowed
interval
from the current timestamp. - Outcome:
- Pass: If every group's most recent data is as new as or newer than the defined interval, the test passes.
- Fail: If any group contains data older than the allowable interval, the test fails, indicating outdated or missing entries that require investigation.
Example Usage: Fintech Company
For a Fintech company managing real-time financial transactions, it's critical to have the latest data for accurate reporting and decision-making.
Consider a scenario where the transaction_records
table stores each transaction with timestamps, and transactions are grouped by transaction_type
and region
.
In this example, the expect_grouped_row_values_to_have_recent_data
test checks that the most recent transaction for each type and region is not older than 2 hours. This setup ensures that the data reflects the current state of affairs and supports timely insights and operations.