This page details the expect_table_row_count_to_be_between
test from the dbt-expectations package, which is used to ensure that the number of rows in a data table stays within a specified range. This test is critical for validating that a table retains a predictable amount of data, which can be essential for monitoring data processing flows and ensuring dataset completeness.
How it Works
The expect_table_row_count_to_be_between
test verifies that the count of rows in a table falls within a defined minimum and maximum range. This ensures control over the quantity of data being processed and stored, which can help detect issues like data truncation, duplication, or unexpected data loss.
Steps and Conditions:
- Model Selection: Choose the table (model, seed, or source) to apply the test.
- Defining Limits: Set the minimum (
min_value
) and maximum (max_value
) number of rows expected in the table. - Optional Configurations:
- Group By: Allows aggregating by certain fields to apply the row count check on grouped data subsets.
- Row Condition: Specify conditions to select certain rows for evaluating the test, such as filtering out rows where a particular column is null.
- Strict Comparison: The
strictly
parameter defines whether the comparison includes the boundary values (true
indicates>=
and<=
, whilefalse
is>
and<
).
- Execution: The number of rows is calculated after applying any groupings or row conditions, and checked against the set boundaries.
- Outcome:
- Pass: The table row count is within the specified range, indicating expected data volume is present.
- Fail: The row count is outside of the set boundaries, signaling potential issues with data handling or system performance.
Example Usage: Fintech
In a Fintech company, managing precise data in tables related to transaction history is crucial for compliance and operational auditing. Assume a table named daily_transactions
logs every transaction each day.
In this example, the test ensures that the number of transactions recorded every day in the daily_transactions
table is between 100 and 500, a range considered normal for daily operations based on historical data. This setup helps the fintech firm monitor transaction processing accurately, ensuring no unexpected drops or spikes in transaction volumes that could indicate issues like transaction failures or fraudulent activity spikes.