DBT Core Training | Open-Source Data Build Tool Course

Prerequisites

Participants should have the following skills before attending this training
Basic SQL knowledge, including SELECT, JOIN, GROUP BY, simple subqueries
Familiarity with any cloud data warehouse (Snowflake preferred but not mandatory)
Basic command-line experience for running Python/DBT commands
Basic Git usage, such as cloning, committing, and pushing changes
Understanding of raw vs transformed data and general data pipeline concepts
Basic exposure to Python OR YAML, enough to understand simple syntax used in DBT
General analytical thinking to follow multi-step data transformations

Topics

Modern ELT Foundations and Snowflake Essentials

Modern ELT Foundations

1.1 Evolution of Data Engineering

Transition from traditional ETL to cloud-based ELT
Why modern platforms rely on SQL-driven transformations 
Importance of scalable, modular data pipelines

1.2 ELT Architecture Overview

Raw, staging, intermediate and mart layers 
Ingestion vs transformation responsibilities 
How data flows across the modern data stack

1.3 Role of DBT in the ELT Workflow

DBT as the transformation layer 
Modular SQL development through ref() 
Tests, documentation, lineage and macros 
Why DBT is standard for analytics engineering

Snowflake Essentials

2.1 Snowflake Architecture

Separation of compute and storage
Warehouses, databases and schemas 
How Snowflake handles scaling and concurrency

2.2 Working with Snowflake for DBT

Creating the required Snowflake objects for DBT
Minimal role and permission structure for development 
Schema organization for DEV, TEST and PROD

2.3 Storage & Query Concepts

Micro-partitions (concept-level understanding) 
Time Travel and cloning basics 
Interpreting simple query profile elements

SQL Foundations for DBT

3. SQL Development Standards

3.1 Writing Clean Analytical SQL

Consistent aliasing and naming
Predictable formatting for long-term maintainability
Designing SQL for readability rather than shortcuts

3.2 Join Patterns in Analytics

INNER and LEFT join usage
Identifying incorrect join logic
Ensuring complete and accurate data through proper join conditions

CTE-Centric SQL for DBT

4.1 CTE Fundamentals

WITH clause structure
Breaking transformations into logical steps 
Replacing nested subqueries with clearer CTE chains

4.2 CTE Patterns for DBT Models

Source standardization CTEs
Enrichment and transformation CTEs 
CTEs for dimensional and factual modeling

4.3 Subqueries and SQL Best Practices

When subqueries are acceptable
Trade-offs vs CTE usage
Common SQL mistakes in analytical pipelines

DBT Project Setup and Core Development Workflow

5. DBT Project Initialization

5.1 Project Structure

Purpose of dbt_project.yml 
Organizing models, macros, tests and seeds 
Folder structure for scalable team development

5.2 profiles.yml and Environment Configuration

Creating Snowflake profiles 
Managing DEV, TEST and PROD targets 
Using environment variables for credential safety

5.3 Git Integration

Initializing a repository
Branching and commit conventions 
Version control practices for DBT teams

Core DBT Development Workflow

6.1 DBT Commands

dbt debug
dbt run
dbt test 
dbt build
dbt deps

6.2 Understanding DBT DAG

How ref() defines dependencies 
Execution order 
Lineage graph interpretation

6.3 Materializations

Table
View
Incremental (introduction before full treatment later) 
When to use each materialization

DBT Modeling: Staging, Intermediate and Mart Layers

7. Modeling Design Principles

7.1 Staging Models

Standardizing raw input 
Casting, renaming and formatting
No business logic in staging

7.2 Intermediate Models

Combining datasets 
Consolidation and enrichment logic
Preparing data for downstream marts

7.3 Mart Models (Fact and Dimension) 

Dimensional modeling principles 
Grain definition and surrogate key strategy
Designing dimensions and fact tables for analytics

Hands-on Model Development

8.1 Building Staging Models

Ingesting raw sources through source.yml 
Cleaning and standardizing inputs

8.2 Building Intermediate Models

Multiple-join transformations 
Using CTE chains for clarity and modularity

8.3 Developing Fact and Dimension Models

Fact table measures & foreign keys 
Dimension attribute modeling
Ensuring referential consistency through tests

Sources, Tests, Seeds and Snapshots

9. Source Management

9.1 Defining Sources

Structuring source.yml 
Organizing columns, metadata and schema references

9.2 Source Testing and Freshness

Applying standard tests 
Freshness warnings and SLAs 
Handling upstream schema changes

Tests, Seeds and Snapshots

10.1 Built-in Data Tests

unique
not_null 
relationships 
accepted_values

10.2 Custom Tests

Writing SQL-based test logic
Parameterizing custom tests 
Organizing reusable test macros

10.3 Seeds

Using CSV files for reference data
Managing seed loading and updates

10.4 Snapshots

SCD Type 2 fundamentals 
Snapshot SQL and configuration
Deciding between snapshots and incremental models

Incremental Models, Jinja and Macros

11. Incremental Model Development

11.1 Incremental Concepts

Why incremental processing reduces warehouse cost 
Insert-only vs insert-update logic

11.2 Implementing Incremental Models

Using materialized: incremental 
Configuring unique_key
Applying is_incremental() correctly

11.3 Managing Incremental Model Challenges

Late arriving data
Backfill strategies 
Converting incremental to full-refresh models

Jinja and Macros in DBT

12.1 Jinja Templating

Control structures, variables and conditions 
Dynamic SQL generation

12.2 Building Macros

Macro structure and arguments 
Creating reusable SQL blocks 
Centralizing business logic

12.3 Transformer Scripts and Utility Macros

Standardizing dates 
Column cleanup templates 
Reusable join macros

12.4 Hooks and run-operation

Pre-hooks and post-hooks 
Running Snowflake SQL utilities via run-operation

Environments, CI/CD, DBT Cloud and Capstone

13. Environment and Deployment Management

13.1 Multi-Environment Structure

DEV, TEST and PROD setup
Schema and database isolation
Promotion strategy

13.2 Environment-Specific Configuration

Using profile targets 
Using tags to manage selective runs 
Run strategies for different environments

CI/CD with GitHub Actions

14.1 CI Pipeline Behavior

Triggering workflows on PR or commit 
Installing DBT
Running deps, build and test

14.2 Deployment Scenarios

Automatic builds on main merges 
Use of environment variables 
Integration with DBT Cloud via API

14.3 Automation of Documentation

Regenerating docs site
Publishing in CI steps

CI/CD with Jenkins (Simplified and Real Execution)

15.1 Jenkins Overview for DBT

Pipeline concept 
Jenkinsfile structure
Stages and steps

15.2 Running Real DBT Commands on Snowflake

Using Jenkins credentials for Snowflake
Simplified environment variables 
Running dbt debug, deps and build

15.3 Deployment Using Jenkins

Managing environment-specific runs 
Using Jenkins to promote code
Handling test failures and pipeline control

DBT Cloud Overview

16.1 Key Components

Jobs 
Schedules 
Notifications

16.2 DBT Cloud vs CLI

Collaborative development 
When teams use DBT Cloud

Capstone Project

End-to-End Build

Raw → staging → intermediate → mart 
Applying tests 
Generating documentation
Reviewing lineage

17.2 CI/CD Integration

Running DBT in GitHub Actions or Jenkins 
Full build and validation
Final review and Q&A

DBT & Snowflake Data Engineering

Prerequisites

Topics