• DBT & Snowflake Data Engineering

    Duration : 56 hours

Prerequisites

Participants should have the following skills before attending this training
Basic SQL knowledge, including SELECT, JOIN, GROUP BY, simple subqueries 
Familiarity with any cloud data warehouse (Snowflake preferred but not mandatory) 
Basic command-line experience for running Python/DBT commands 
Basic Git usage, such as cloning, committing, and pushing changes 
Understanding of raw vs transformed data and general data pipeline concepts 
Basic exposure to Python OR YAML, enough to understand simple syntax used in DBT
General analytical thinking to follow multi-step data transformations

Topics

Modern ELT Foundations and Snowflake Essentials

Modern ELT Foundations
1.1 Evolution of Data Engineering
  • Transition from traditional ETL to cloud-based ELT
  • Why modern platforms rely on SQL-driven transformations  
  • Importance of scalable, modular data pipelines
1.2 ELT Architecture Overview
  • Raw, staging, intermediate and mart layers  
  • Ingestion vs transformation responsibilities  
  • How data flows across the modern data stack
1.3 Role of DBT in the ELT Workflow
  • DBT as the transformation layer  
  • Modular SQL development through ref()  
  • Tests, documentation, lineage and macros  
  • Why DBT is standard for analytics engineering
Snowflake Essentials
2.1 Snowflake Architecture
  • Separation of compute and storage
  • Warehouses, databases and schemas  
  • How Snowflake handles scaling and concurrency
2.2 Working with Snowflake for DBT
  • Creating the required Snowflake objects for DBT
  • Minimal role and permission structure for development  
  • Schema organization for DEV, TEST and PROD
2.3 Storage & Query Concepts
  • Micro-partitions (concept-level understanding)  
  • Time Travel and cloning basics  
  • Interpreting simple query profile elements

SQL Foundations for DBT
3. SQL Development Standards
3.1 Writing Clean Analytical SQL
  • Consistent aliasing and naming
  • Predictable formatting for long-term maintainability
  • Designing SQL for readability rather than shortcuts
3.2 Join Patterns in Analytics
  • INNER and LEFT join usage
  • Identifying incorrect join logic
  • Ensuring complete and accurate data through proper join conditions
CTE-Centric SQL for DBT
4.1 CTE Fundamentals
  • WITH clause structure
  • Breaking transformations into logical steps  
  • Replacing nested subqueries with clearer CTE chains
4.2 CTE Patterns for DBT Models
  • Source standardization CTEs 
  • Enrichment and transformation CTEs  
  • CTEs for dimensional and factual modeling
4.3 Subqueries and SQL Best Practices
  • When subqueries are acceptable
  • Trade-offs vs CTE usage
  • Common SQL mistakes in analytical pipelines
DBT Project Setup and Core Development Workflow
5. DBT Project Initialization
5.1 Project Structure
  • Purpose of dbt_project.yml  
  • Organizing models, macros, tests and seeds  
  • Folder structure for scalable team development
5.2 profiles.yml and Environment Configuration
  • Creating Snowflake profiles  
  • Managing DEV, TEST and PROD targets  
  • Using environment variables for credential safety
5.3 Git Integration
  • Initializing a repository
  • Branching and commit conventions  
  • Version control practices for DBT teams
Core DBT Development Workflow
6.1 DBT Commands
  • dbt debug
  • dbt run
  • dbt test  
  • dbt build
  • dbt deps
6.2 Understanding DBT DAG
  • How ref() defines dependencies  
  • Execution order  
  • Lineage graph interpretation
6.3 Materializations
  • Table
  • View
  • Incremental (introduction before full treatment later)  
  • When to use each materialization

DBT Modeling: Staging, Intermediate and Mart Layers
7. Modeling Design Principles
7.1 Staging Models
  • Standardizing raw input  
  • Casting, renaming and formatting
  • No business logic in staging
7.2 Intermediate Models
  • Combining datasets  
  • Consolidation and enrichment logic
  • Preparing data for downstream marts
7.3 Mart Models (Fact and Dimension)  
  • Dimensional modeling principles  
  • Grain definition and surrogate key strategy
  • Designing dimensions and fact tables for analytics

Hands-on Model Development
8.1 Building Staging Models
  • Ingesting raw sources through source.yml 
  • Cleaning and standardizing inputs
8.2 Building Intermediate Models
  • Multiple-join transformations  
  • Using CTE chains for clarity and modularity
8.3 Developing Fact and Dimension Models
  • Fact table measures & foreign keys  
  • Dimension attribute modeling
  • Ensuring referential consistency through tests

Sources, Tests, Seeds and Snapshots
9. Source Management
9.1 Defining Sources
  • Structuring source.yml  
  • Organizing columns, metadata and schema references
9.2 Source Testing and Freshness
  • Applying standard tests  
  • Freshness warnings and SLAs  
  • Handling upstream schema changes
Tests, Seeds and Snapshots
10.1 Built-in Data Tests
  • unique
  • not_null  
  • relationships  
  • accepted_values
10.2 Custom Tests
  • Writing SQL-based test logic
  • Parameterizing custom tests  
  • Organizing reusable test macros
10.3 Seeds
  • Using CSV files for reference data
  • Managing seed loading and updates
10.4 Snapshots
  • SCD Type 2 fundamentals  
  • Snapshot SQL and configuration
  • Deciding between snapshots and incremental models
Incremental Models, Jinja and Macros
11. Incremental Model Development
11.1 Incremental Concepts
  • Why incremental processing reduces warehouse cost  
  • Insert-only vs insert-update logic
11.2 Implementing Incremental Models
  • Using materialized: incremental  
  • Configuring unique_key
  • Applying is_incremental() correctly
11.3 Managing Incremental Model Challenges
  • Late arriving data
  • Backfill strategies  
  • Converting incremental to full-refresh models
Jinja and Macros in DBT
12.1 Jinja Templating
  • Control structures, variables and conditions  
  • Dynamic SQL generation
12.2 Building Macros
  • Macro structure and arguments  
  • Creating reusable SQL blocks  
  • Centralizing business logic
12.3 Transformer Scripts and Utility Macros
  • Standardizing dates  
  • Column cleanup templates  
  • Reusable join macros
12.4 Hooks and run-operation
  • Pre-hooks and post-hooks  
  • Running Snowflake SQL utilities via run-operation
Environments, CI/CD, DBT Cloud and Capstone
13. Environment and Deployment Management
13.1 Multi-Environment Structure
  • DEV, TEST and PROD setup
  • Schema and database isolation
  • Promotion strategy
13.2 Environment-Specific Configuration
  • Using profile targets  
  • Using tags to manage selective runs  
  • Run strategies for different environments

CI/CD with GitHub Actions

14.1 CI Pipeline Behavior
  • Triggering workflows on PR or commit  
  • Installing DBT
  • Running deps, build and test

14.2 Deployment Scenarios
  • Automatic builds on main merges  
  • Use of environment variables  
  • Integration with DBT Cloud via API

14.3 Automation of Documentation
  • Regenerating docs site
  • Publishing in CI steps

CI/CD with Jenkins (Simplified and Real Execution)
15.1 Jenkins Overview for DBT
  • Pipeline concept  
  • Jenkinsfile structure
  • Stages and steps

15.2 Running Real DBT Commands on Snowflake
  • Using Jenkins credentials for Snowflake
  • Simplified environment variables  
  • Running dbt debug, deps and build

15.3 Deployment Using Jenkins
  • Managing environment-specific runs  
  • Using Jenkins to promote code
  • Handling test failures and pipeline control

DBT Cloud Overview
16.1 Key Components
  • Jobs  
  • Schedules  
  • Notifications

16.2 DBT Cloud vs CLI
  • Collaborative development  
  • When teams use DBT Cloud

Capstone Project

End-to-End Build
  • Raw → staging → intermediate → mart  
  • Applying tests  
  • Generating documentation
  • Reviewing lineage

17.2 CI/CD Integration
  • Running DBT in GitHub Actions or Jenkins  
  • Full build and validation
  • Final review and Q&A