dbt hub

dbt test: expect_column_most_common_value_to_be_in_set

USE CASE

Values distribution
Values set

APPLIES TO

Column

This page introduces the expect_column_most_common_value_to_be_in_set test from the dbt-expectations package. This test verifies that the most frequently occurring value in a specified column is part of a predefined set of acceptable values. This is crucial for ensuring that the most common data entries meet expected criteria or standards, especially in scenarios where specific values are anticipated to be predominant.

How it Works

The test assesses a chosen column to identify its most frequently occurring value, ensuring that this common value is included in a set of specified acceptable values. This test is particularly useful for validating the data's consistency and adherence to expected norms or limits.

Steps and Conditions:

  1. Column and Value Set Selection: Choose the column to check, and define the value_set which contains the acceptable, most common values.
  2. Frequency Analysis: The test determines the top occurring value(s) in the specified column based on the top_n parameter.
  3. Validation: Compares the identified most frequent value against the predefined value_set.
  4. Optional Configurations:
    • Data Type: Specifies the data type for comparison, defaulting to "decimal".
    • Quote Values: States whether the values should be quoted, default is true.
    • Strictly Comparison: Adds flexibility in comparison (strict or not strict), default is false.
  5. Outcome:
    • Pass: If the most common value falls within the acceptable set, the test passes.
    • Fail: If the most frequent value is not in the set, the test fails, indicating a deviation from expected values.

Example Usage: Fintech

In a Fintech company, it's critical to monitor the most commonly selected interest rates for loans to ensure they align with the expected standard rates set by regulatory or internal guidelines. Consider a loans table that has a column interest_rate where various rates are applied based on client profiles and loan types.


tests:
  - dbt_expectations.expect_column_most_common_value_to_be_in_set:
      column: interest_rate
      value_set: [2.5, 3.0, 3.5]
      top_n: 1

In this example, the expect_column_most_common_value_to_be_in_set test confirms that the most frequently occurring interest rate in the interest_rate column is either 2.5%, 3.0%, or 3.5%. This check ensures that the most common rates are within the set parameters, reflecting adherence to the expected pricing strategy and regulatory compliance.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph