dbt hub

dbt test: expect_select_column_values_to_be_unique_within_record

USE CASE

Uniqueness
Columns relationship

APPLIES TO

Model
Seed
Source

This page describes the expect_select_column_values_to_be_unique_within_record test in dbt (data build tool). This test is designed to ensure that the values across specified columns within a single record are unique, helping to prevent potential data integrity issues from duplicate values in the same record.

How it Works

The expect_select_column_values_to_be_unique_within_record test validates that for each record, the values in the selected columns are unique. It is critical in scenarios where multiple fields in a record should not have repeating values, ensuring data accuracy and uniqueness within single records.

Steps and Conditions:

  1. Column Selection: Choose multiple columns that need to be checked for unique values within each record.
  2. Row Exclusion (Optional): Define conditions to exclude certain rows from the test based on presence of null values or specific identifiers. The default behavior excludes rows if _all_ specified columns are null.
  3. Configurations (Optional):
    • Ignore Row If: Decide whether to ignore rows where any or all values are missing. Default is "all_values_are_missing".
    • Quote Columns: Specify whether the column names should be quoted. Default is false.
    • Row Condition: Apply additional filters to specify which rows should be included based on a condition.
  4. Validation: For each included row, the test checks if all selected columns contain unique values.
  5. Outcome:
    • Pass: If all checked records have unique values across selected columns, the test passes.
    • Fail: If any record has duplicate values in the selected columns, the test fails.

Example Usage: Fintech

In a Fintech company, ensuring that transaction records are unique and correctly recorded is crucial, especially when each transaction involves multiple identifiers or references that should not repeat within the same record.

Consider a scenario where the transactions table stores information about user transactions, including two fields: transaction_id and reference_code.


models:
  - name: transactions
    tests:
      - dbt_expectations.expect_select_column_values_to_be_unique_within_record:
          column_list: ["transaction_id", "reference_code"]
          ignore_row_if: "any_value_is_missing"

In this example, the expect_select_column_values_to_be_unique_within_record test checks that the transaction_id and reference_code within each transaction record are unique to that record. This prevents issues such as duplication of transaction IDs or reference codes, which could lead to financial discrepancies and challenges in transaction verification.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph