This page provides an overview of the expect_table_aggregation_to_equal_other_table
test from the dbt-expectations package, which is designed to assert the equivalence of aggregated results between two tables. This test is essential for validating that summaries or computed results from different sources or stages within a dataset are consistent with each other.
How it Works
The expect_table_aggregation_to_equal_other_table
test compares the results of an aggregation expression executed on one table to the results of the same or a different expression on another table. This is beneficial for ensuring consistency in data processing, especially when data is transferred or transformed across tables or models.
Steps and Conditions:
- Expression Setup:
- Expression: Define an aggregation expression to evaluate on the primary model.
- Compare Expression: Optionally, specify a different aggregation expression for the comparison model.
- Model Selection:
- Primary Model: The initial table or model on which the primary expression is executed.
- Comparison Model: The table or model against which the primary model's results are compared (specified by
compare_model
).
- Grouping and Conditions:
- Group By: Optional grouping criteria for the primary expression.
- Compare Group By: Optional grouping criteria for the comparison expression.
- Row Conditions: Optional conditions to filter rows in both the primary and comparison expressions.
- Tolerance Settings:
- Tolerance: Optional absolute tolerance for differences between the results.
- Tolerance Percent: Optional percentage tolerance, expressed as a decimal, to accommodate minor variations in the results.
- Outcome:
- Pass: The test passes if the aggregated results match within the specified tolerances.
- Fail: The test fails if the results differ beyond the acceptable tolerance levels, indicating a discrepancy that needs resolution.
Example Usage: B2B SaaS
In the context of a B2B SaaS company, accurately tracking and reconciling revenue across different systems or data stages is crucial. Consider a scenario where revenue_monthly
and revenue_archived
are two models storing current and previously archived revenue data, respectively.
In this example, the test ensures that the total monthly revenue per account matches between the revenue_monthly
table and the revenue_archived
table within a tolerance of 1%. This helps to verify the integrity and consistency of revenue data as it moves from active records to archives, which is fundamental for accurate billing and financial reporting.