dbt hub

dbt test: expect_table_columns_to_match_set

USE CASE

Schema

APPLIES TO

Model
Seed
Source

The expect_table_columns_to_match_set test in dbt (data build tool) checks that the columns in a model align with a specified list of column names. This test is important for ensuring that data models, sources, or seeds conform to expected schema specifications, which can help prevent integration issues, reporting errors, and enhance maintainability.

How it Works

The test verifies that all columns in a specified model, seed, or source exactly match a predefined list of column names. It ensures consistency and accuracy in schema implementation across different environments and versions.

Steps and Conditions:

  1. Model/Seed/Source Selection: Identify the model, seed, or source to be tested.
  2. Column List Specification: Define a list of expected column names that should be present in the selected model, seed, or source.
  3. Optional Transformation:
    • Transformation on Columns: Use the transform argument (optional) to apply a transformation to the column names in the column_list (e.g., upper to convert names to uppercase) before the comparison is made.
  4. Comparison Execution: The test checks that the list of actual column names in the specified model, seed, or source matches the transformed (if applicable) column_list.
  5. Outcome:
    • Pass: If the set of actual column names matches the expected list completely, the test passes, verifying schema accuracy.
    • Fail: If there is any discrepancy between the actual set of columns and the expected list, the test fails, signaling a potential issue with the schema implementation.

Example Usage: E-commerce

In an E-commerce platform, maintaining a consistent database schema across different environments (like development, staging, and production) is critical for accurate reporting and operational efficiency. The expect_table_columns_to_match_set test can be instrumental for ensuring that the product_catalog table adheres to the required schema specifications.

Consider a scenario where the product_catalog table is expected to have the specific columns product_id, product_name, price, category, stock_quantity.


models:
  - name: product_catalog
    tests:
      - dbt_expectations.expect_table_columns_to_match_set:
          column_list: ["product_id", "product_name", "price", "category", "stock_quantity"]
          transform: lower  # Ensuring the column names are checked in lowercase

This setup confirms that the product_catalog model precisely contains the columns listed and in the required format if transformations are applied, critical for tasks like inventory management, pricing updates, and category analyses.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph