Quality Checks

Quality checks run automatically after model materialisation during interlace run. They validate output data and store results in the state database, giving you confidence that your pipeline produces correct data.

Configuration

Quality checks can be defined in two places: on the model decorator or in config.yaml. Decorator-level checks take precedence over config-level checks for the same model.

Decorator Level

Attach checks directly to a model with the quality_checks parameter:

from interlace import model

@model(
    name="users",
    materialise="table",
    quality_checks=[
        {"type": "not_null", "column": "id", "severity": "error"},
        {"type": "unique", "column": "email"},
        {"type": "accepted_values", "column": "status", "values": ["active", "inactive"]},
    ],
)
def users(raw_users):
    return raw_users.filter(raw_users.active == True)

Config Level

Define checks in config.yaml under the quality key:

quality:
  enabled: true
  fail_on_error: false    # Stop pipeline on error-severity failures
  checks:
    users:
      - type: not_null
        column: id
        severity: error
      - type: unique
        column: email
    orders:
      - type: row_count
        min_count: 100
        severity: warn
      - type: freshness
        column: updated_at
        max_age_hours: 24

Check Types

Interlace includes six built-in check types. Five can be used from both YAML config and decorators. The expression check is Python-only.

TypeDescriptionKey Parameters
not_nullNo NULL values in a columncolumn
uniqueAll values are uniquecolumn or columns
accepted_valuesValues are in a whitelistcolumn, values
freshnessTimestamp column is recentcolumn, max_age_hours / max_age_days / max_age_minutes
row_countRow count is within rangemin_count, max_count
expressionCustom ibis boolean expressionexpression (callable), name

not_null

Verify that a column contains no NULL values.

- type: not_null
  column: id
  severity: error

Parameters:

ParamTypeRequiredDescription
columnstrYesColumn to check for NULLs

unique

Verify that all values are unique. Supports single columns and composite keys.

# Single column
- type: unique
  column: email

# Composite key
- type: unique
  columns: ["tenant_id", "user_id"]

Parameters:

ParamTypeRequiredDescription
columnstrOne of column or columnsSingle column to check
columnslist[str]One of column or columnsMultiple columns for composite uniqueness

accepted_values

Verify that all values in a column are within a specified set.

- type: accepted_values
  column: status
  values: ["active", "inactive", "pending"]

Parameters:

ParamTypeRequiredDescription
columnstrYesColumn to check
valueslistYesAllowed values

When this check fails, it reports a sample of the invalid values found for debugging.

freshness

Verify that a timestamp column has recent data. Useful for detecting stale data or broken upstream pipelines.

# Data must be no older than 24 hours
- type: freshness
  column: updated_at
  max_age_hours: 24

# Data must be no older than 7 days
- type: freshness
  column: created_at
  max_age_days: 7

Parameters:

ParamTypeRequiredDescription
columnstrYesTimestamp column to check
max_age_hoursfloatAt least oneMaximum age in hours
max_age_daysfloatAt least oneMaximum age in days
max_age_minutesfloatAt least oneMaximum age in minutes

Age parameters are additive – you can combine them (e.g. max_age_days: 1 + max_age_hours: 6 = 30 hours). Empty tables are skipped automatically.

row_count

Verify that the table row count falls within an expected range. At least one of min_count or max_count is required.

# At least 100 rows
- type: row_count
  min_count: 100

# Between 1,000 and 10,000 rows
- type: row_count
  min_count: 1000
  max_count: 10000

Parameters:

ParamTypeRequiredDescription
min_countintAt least oneMinimum row count (inclusive)
max_countintAt least oneMaximum row count (inclusive)

expression

Custom ibis boolean expression for checks that do not fit the built-in types. This check is Python-only – it cannot be defined in YAML config because it requires a callable.

from interlace.quality import ExpressionCheck

@model(
    name="orders",
    materialise="table",
    quality_checks=[
        ExpressionCheck(
            expression=lambda t: t["amount"] > 0,
            name="positive_amount",
        ),
        ExpressionCheck(
            expression=lambda t: t["start_date"] <= t["end_date"],
            name="valid_date_range",
            severity="warn",
        ),
    ],
)
def orders(raw_orders):
    return raw_orders

Parameters:

ParamTypeRequiredDescription
expressioncallableYesFunction taking an ibis.Table, returning a boolean column. True = pass, False = fail
namestrYesName for this check
invertboolNoIf True, invert the expression (True = fail)

Severity Levels

Each check has a severity that controls pipeline behaviour on failure:

SeverityDefaultBehaviour
errorYesMarks the model as failed when quality.fail_on_error: true in config
warnLogs a warning but the pipeline continues

If severity is omitted, it defaults to error.

Set quality.fail_on_error: false (the default) to run all checks without stopping the pipeline, regardless of severity. This is useful during development when you want visibility into data issues without blocking runs.

Results Storage

Quality check results are persisted in the state database after each run. Each result includes:

FieldDescription
check_nameAuto-generated or custom name (e.g. not_null_id)
check_typeThe check type (not_null, unique, etc.)
table_nameModel that was checked
statuspassed, failed, skipped, or error
severityerror or warn
messageHuman-readable result description
failed_rowsNumber of rows that failed
total_rowsTotal rows checked
duration_secondsHow long the check took

Results are accessible via the REST API at /api/quality/results.

Example: Full Pipeline

from interlace import model

@model(
    name="customers",
    materialise="table",
    strategy="merge_by_key",
    primary_key=["id"],
    quality_checks=[
        {"type": "not_null", "column": "id", "severity": "error"},
        {"type": "not_null", "column": "email", "severity": "error"},
        {"type": "unique", "column": "email"},
        {"type": "accepted_values", "column": "tier", "values": ["free", "pro", "enterprise"]},
        {"type": "row_count", "min_count": 1, "severity": "warn"},
    ],
)
def customers(raw_customers):
    return raw_customers.filter(raw_customers.active == True)

With matching config:

quality:
  enabled: true
  fail_on_error: true

During interlace run, after customers is materialised, all five checks execute automatically. If email contains NULLs or duplicates, the model is marked as failed and the pipeline stops. If row_count drops to zero, a warning is logged but execution continues.

Key Points

  • Checks run automatically after materialisation – no extra step needed.
  • Ephemeral models skip quality checks (no persisted table to validate against).
  • Quality results are stored per-run for historical tracking and trend analysis.
  • Use warn severity for non-blocking checks during development.
  • Decorator-level checks take precedence over config-level checks for the same model.
  • The expression check type is Python-only and cannot be defined in YAML.