dbt hub

dbt test: expect_table_row_count_to_equal_other_table_times_factor

USE CASE

Volume
Tables relationship

APPLIES TO

Model
Seed
Source

This page delves into the expect_table_row_count_to_equal_other_table_times_factor test from the dbt-expectations package. This test is essential for validating the relationship between the row counts of two tables, scaled by a predefined factor. It ensures that the data growth or reduction across tables maintains a predictable ratio, which is critical in many analytical contexts.

How it Works

The expect_table_row_count_to_equal_other_table_times_factor test compares the row count of one table to the row count of another table multiplied by a specified factor. This verification can highlight discrepancies that may suggest data issues such as duplication errors, missing entries, or improper integrations.

Steps and Conditions:

  1. Table and Scale Factor Selection: Choose the primary table and another table to compare against, along with a multiplication factor by which the row count of the second table is scaled.
  2. Conditional Parameters:
    • Group By: Optionally, define columns for grouping data before counting rows in both the primary and comparison tables.
    • Row Condition: Define conditions to filter the rows included in the count for both tables, which can refine and target the test more precisely.
  3. Execution: The test performs the row count, applies any groupings or conditions, and then scales the row count of the comparison table by the specified factor. It checks if this scaled value matches the row count of the primary table.
  4. Outcome:
    • Pass: If the primary table's row count equals the scaled row count of the comparison table, the test passes, indicating consistency with the expected ratio.
    • Fail: If there is a mismatch in counts, indicating an unexpected ratio of row counts between the two tables, the test fails. This may warrant a closer inspection of data handling practices.

Example Usage: Fintech

In a Fintech context, maintaining accurate transaction records across different systems or tables is crucial for financial reporting and analysis. Let's consider a scenario where a transactions table needs to be consistent with a processed_transactions table, but scaled by a factor due to processing rules or fees applied.


models:
  - name: transactions
    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table_times_factor:
          compare_model: ref("processed_transactions")
          factor: 0.95
          group_by: ['transaction_type']
          compare_group_by: ['transaction_type']
          row_condition: "status = 'completed'"
          compare_row_condition: "status = 'processed'"

In this example:- The row count of processed_transactions is expected to be 95% of transactions due to a 5% transaction failure rate.- Both tables are grouped by transaction_type and include only transactions marked as completed or processed, respectively, ensuring that counts are compared within relevant subsets of data.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph