dbt hub

dbt test: equal_rowcount

USE CASE

Volume
Tables relationship

APPLIES TO

Model

This page explains the equal_rowcount test in dbt (data build tool), which is designed to ensure consistency in dataset sizes by verifying that two datasets (or relations) contain the same number of rows. This test is vital for validating data completeness, especially after transformations, to ensure no records are unexpectedly added or lost.

How it Works

The equal_rowcount test compares the number of rows in two specified datasets to confirm they match. This comparison is crucial for data integrity checks, particularly in scenarios where the volume of data should remain constant through various stages of processing or between different tables that are expected to align closely in terms of data volume.

Steps and Conditions:

       
  1. Dataset Selection: The test targets two datasets (models, seeds, or sources) for comparison.
  2.    
  3. Row Count Comparison: It calculates the number of rows in each dataset and compares these counts to ensure they are equal.
  4.    
  5. Outcome:        
                 
    • Pass: If both datasets have the same number of rows, the test passes, indicating that the datasets are aligned in terms of row count, as expected.
    •            
    • Fail: If the datasets have a different number of rows, the test fails. This discrepancy signals potential issues with data processing, transformation, or integrity that require investigation.
    •        

Example Usage: Fintech

In a Fintech application, ensuring that transaction records are consistent across different stages of processing is crucial for accuracy in reporting and analysis. The equal_rowcount test can be applied to compare the transactions_raw table (raw transaction data) with the transactions_processed table (transactions after cleansing and categorization) to ensure no records are lost during processing.Consider a scenario where the transactions_raw table contains raw transaction data imported from various sources, and the transactions_processed table stores the cleaned and categorized transactions ready for analysis.


models:
  - name: transactions_processed
    tests:
      - dbt_utils.equal_rowcount:
          compare_model: ref('transactions_raw')

In this example, the equal_rowcount test ensures that the number of rows in the transactions_processed model matches the number of rows in the transactions_raw model. This validation is critical to confirm that the data cleansing and categorization processes do not inadvertently remove or duplicate transaction records, maintaining data integrity for accurate financial analysis and reporting.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph