dbt hub

dbt test: expect_row_values_to_have_data_for_every_n_datepart

USE CASE

Completeness

APPLIES TO

Model
Seed
Source

This page explains the expect_row_values_to_have_data_for_every_n_datepart test in dbt (data build tool). The test is vital for ensuring that data exists for every specified division of time (date part) within a specific column, which is crucial for accurate time series analysis and maintaining the integrity of chronological data.

How it Works

The test checks for the presence of data in a model for each unit of a specified date_part within a date_col. This process is pivotal for confirming that no gaps exist in the data for the time frames expected, particularly in scenarios where continuous data availability is necessary for analysis, reporting, or operational purposes.

Steps and Conditions:

  1. Date Column Specification: Identify the column (date_col) containing the date information.
  2. Date Part Specification: Define the date part (date_part) to check for continuous data, such as 'day', 'month', or 'year'. The default is 'day'.
  3. Date Range Configuration:
    • Optionally specify a starting date (test_start_date) and an ending date (test_end_date) to focus the test within a particular timeframe.
  4. Row and Exclusion Conditions:
    • The row_condition allows the exclusion of certain rows from the test based on specified conditions.
    • The exclusion_condition offers further granular control over which data points to exclude, such as excluding dates on weekends or holidays.
  5. Execution:
    • The test is run to ensure data presence for each unit of the specified date_part within the chosen date range in date_col.
  6. Outcome:
    • Pass: If data is present for every specified date part in the range, the test passes, showing no gaps.
    • Fail: If any expected date part is missing data, the test fails, indicating gaps that may require attention.

Example Usage: Fintech Company

For a Fintech company, it's critical to ensure continuous transaction data for precise financial tracking and regulatory compliance. Consider a scenario where the daily_transactions table records each transaction's date in the transaction_date column.


models:
  - name: daily_transactions
    tests:
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          date_col: transaction_date
          date_part: day 
          test_start_date: '2023-01-01'
          test_end_date: '2023-01-31'
          row_condition: "transaction_type = 'completed'"
          exclusion_condition: "DAYOFWEEK(transaction_date) IN (1, 7)" # Exclude weekends

In this example, the test ensures no days are missing transaction data for completed transactions in January 2023, excluding weekends. This setup helps the company monitor and maintain uninterrupted transaction records deem necessary for daily operations, financial analysis, and compliance with financial regulations.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph