dbt hub

dbt test: expect_table_columns_to_contain_set

USE CASE

Schema

APPLIES TO

Model
Seed
Source

This page explains the expect_table_columns_to_contain_set test from the dbt-expectations package, which checks if all columns specified in a model contain a predefined list of column names. This test is important for validating the schema integrity, ensuring that all necessary columns are present and correctly named within the table or model.

How it Works

The expect_table_columns_to_contain_set test is used to verify that the specified columns in a model conform to a designated set of column names.

Steps and Conditions:

  1. Column List Definition: Define the list of column names that are expected to be present in the model.
  2. Optional Transformation: Apply any transformations to the column names in the model before the check (e.g., converting all columns to uppercase to ensure case insensitivity in the comparison).
  3. Verification:
    • The test checks each column name in the model against the predefined list.
    • If transformations are specified, they are applied to the column names before comparison.
  4. Outcome:
    • Pass: If every column listed in the model matches the predefined list (considering any transformations), the test passes.
    • Fail: If any column in the model does not match the predefined list, the test fails. This indicates a discrepancy in the expected schema.

Example Usage: E-commerce

For an E-commerce platform, maintaining accurate and consistent product data is essential. Let's consider a scenario in which the product_details model should include specific columns such as 'product_id', 'product_name', and 'category'.


models:
  - name: product_details
    tests:
      - dbt_expectations.expect_table_columns_to_contain_set:
          column_list: ["product_id", "product_name", "category"]
          transform: lower

In this example, the expect_table_columns_to_contain_set test is applied to ensure that the product_details table has the necessary columns 'product_id', 'product_name', and 'category'. The transformation to lower case ensures that the column name comparison is case-insensitive, which helps prevent common errors due to case mismatches in the column names. This check is crucial for ensuring data consistency across different parts of the E-commerce system, from inventory management to customer-facing product listings.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph