Configuration Reference

Complete reference for config.yaml.

Basic Structure

name: my-pipeline

connections:
  default:
    type: duckdb
    path: ./data/warehouse.duckdb

models:
  default_schema: public
  default_materialize: table
  default_strategy: replace

Connection Settings

Common Fields

All connections support these fields:

connections:
  my_connection:
    type: duckdb # Required: backend type
    access: readwrite # Optional: "read" or "readwrite" (default)
    shared: false # Optional: true to share across environments

DuckDB

connections:
  default:
    type: duckdb
    path: ./data/warehouse.duckdb # Path to database file, or ":memory:"
    attach: # Optional: attach external databases
      - name: pg_db
        type: postgres
        read_only: true
        config:
          host: localhost
          database: mydb

PostgreSQL

connections:
  warehouse:
    type: postgres
    config:
      host: localhost
      port: 5432
      database: mydb
      user: myuser
      password: ${POSTGRES_PASSWORD}
    pool:
      max_size: 10
      timeout: 30.0
    health_check:
      enabled: true
      interval: 60.0

Generic ibis Backends

Any ibis-supported backend (Snowflake, BigQuery, MySQL, etc.):

connections:
  snowflake_wh:
    type: snowflake
    config:
      account: myorg-myaccount
      user: ${SNOWFLAKE_USER}
      password: ${SNOWFLAKE_PASSWORD}
      database: ANALYTICS
      warehouse: COMPUTE_WH

Install the required extra: pip install 'ibis-framework[snowflake]'

DuckDB ATTACH Types

attach:
  # PostgreSQL
  - name: pg_db
    type: postgres
    read_only: true
    config: { host: ..., database: ..., user: ..., password: ... }

  # MySQL
  - name: mysql_db
    type: mysql
    read_only: true
    config: { host: ..., database: ..., user: ..., password: ... }

  # SQLite
  - name: sqlite_db
    type: sqlite
    path: ./data/local.sqlite
    read_only: true

  # Another DuckDB file
  - name: shared_db
    type: duckdb
    path: ./data/sources.duckdb
    read_only: true

  # DuckLake (lakehouse with time travel)
  - name: lakehouse
    type: ducklake
    catalog: 'postgres:postgresql://host/catalog_db'
    data_path: 's3://bucket/path/'
    read_only: true

Model Settings

models:
  default_schema: public # Default schema for models
  default_materialize: table # Default materialisation type
  default_strategy: replace # Default strategy

Model Decorator Parameters

See the Models and API Reference pages for the full list of @model parameters.

@model(
    name="model_name",                   # Model name (defaults to function name)
    schema="public",                      # Schema/database name
    connection="default",                 # Connection name from config
    materialise="table",                  # "table", "view", "ephemeral", "none"
    strategy="merge_by_key",              # "replace", "append", "merge_by_key", "scd_type_2", "none"
    primary_key="id",                     # Primary key column(s)
    tags=["source"],                      # Tags for organisation
    cache={"ttl": "7d", "strategy": "ttl"},  # Source cache policy
    schema_mode="safe",                   # "strict", "safe", "flexible", "lenient", "ignore"
    quality_checks=[...],                 # Quality checks (see Quality Checks guide)
    cursor="event_id",                    # Cursor column for incremental processing
    schedule={"cron": "0 * * * *"},       # Cron or interval scheduling
    export={"format": "csv", "path": "..."}, # Export after materialisation
    retry_policy=RetryPolicy(...),        # Retry config for transient failures
)

Cache Configuration

cache={
    "ttl": "7d",              # Time-to-live: "30s", "5m", "24h", "7d", "2w"
    "strategy": "ttl",        # "ttl", "if_exists", or "always"
}

Environment Settings

environments:
  source_connection: sources # Connection for shared source data
  fallback_connections: # Connections to search for missing deps
    - sources

State Management

state:
  connection: default # Which connection stores state
  schema: interlace # Schema for state tables

Retry Policies

retry:
  default_policy:
    max_attempts: 3
    initial_delay: 1.0
    max_delay: 30.0
    backoff_multiplier: 2.0
    jitter: true
  circuit_breaker:
    failure_threshold: 5
    recovery_timeout: 60.0
  dlq:
    enabled: true
    persist_to_db: true

Quality Checks

quality:
  enabled: true
  fail_on_error: false       # Stop pipeline on error-severity failures
  checks:
    users:
      - type: not_null       # Verify no NULLs
        column: id
        severity: error
      - type: unique         # Verify uniqueness
        column: email
      - type: accepted_values
        column: status
        values: [active, inactive, pending]
    orders:
      - type: row_count      # Verify row count range
        min_count: 100
        severity: warn
      - type: freshness      # Verify data recency
        column: created_at
        max_age_hours: 24
      - type: expression     # Custom SQL expression
        expression: "amount > 0"

Check types: not_null, unique, accepted_values, row_count, freshness, expression. See the Quality Checks guide for details.

Service (Authentication & Rate Limiting)

service:
  auth:
    enabled: true
    api_keys:
      - name: "production"
        key: "${INTERLACE_API_KEY}"
        permissions: [read, write, execute]
      - name: "monitoring"
        key: "${MONITORING_KEY}"
        permissions: [read]
    whitelist:                  # Paths that skip authentication
      - /health
      - /api/docs
      - /api/openapi.yaml
    rate_limit:
      requests_per_second: 100
      burst: 200               # Token bucket burst capacity

See the REST API & Service guide for endpoint documentation.

Scheduler

scheduler:
  enabled: true
  timezone: UTC

Models with a schedule parameter are automatically registered with the background scheduler when running interlace serve.

Observability

observability:
  metrics:
    enabled: true
    port: 9090
  tracing:
    enabled: true
    exporter: console
  logging:
    format: human
    level: INFO

Environment Variables

Reference environment variables with ${VAR_NAME}:

connections:
  warehouse:
    type: postgres
    config:
      password: ${DB_PASSWORD}
      port: ${DB_PORT:-5432} # With default value

The {env} placeholder is replaced with the active environment name:

connections:
  default:
    type: duckdb
    path: data/{env}/main.duckdb # data/dev/main.duckdb, data/prod/main.duckdb

Connections with shared: true skip {env} substitution.

Environment Overlays

Create environment-specific config files:

config.yaml           # Base configuration
config.dev.yaml       # Dev overrides
config.staging.yaml   # Staging overrides
config.prod.yaml      # Production overrides

Run with: interlace run --env dev

Environment files are deep-merged with the base config.