Prerequisites
Participants should have the following skills before attending this training
Basic SQL knowledge, including SELECT, JOIN, GROUP BY, simple subqueries
Familiarity with any cloud data warehouse (Snowflake preferred but not mandatory)
Basic command-line experience for running Python/DBT commands
Basic Git usage, such as cloning, committing, and pushing changes
Understanding of raw vs transformed data and general data pipeline concepts
Basic exposure to Python OR YAML, enough to understand simple syntax used in DBT
General analytical thinking to follow multi-step data transformations
Topics
- Transition from traditional ETL to cloud-based ELT
- Why modern platforms rely on SQL-driven transformations
- Importance of scalable, modular data pipelines
- Raw, staging, intermediate and mart layers
- Ingestion vs transformation responsibilities
- How data flows across the modern data stack
- DBT as the transformation layer
- Modular SQL development through ref()
- Tests, documentation, lineage and macros
- Why DBT is standard for analytics engineering
- Separation of compute and storage
- Warehouses, databases and schemas
- How Snowflake handles scaling and concurrency
- Creating the required Snowflake objects for DBT
- Minimal role and permission structure for development
- Schema organization for DEV, TEST and PROD
- Micro-partitions (concept-level understanding)
- Time Travel and cloning basics
- Interpreting simple query profile elements
- Consistent aliasing and naming
- Predictable formatting for long-term maintainability
- Designing SQL for readability rather than shortcuts
- INNER and LEFT join usage
- Identifying incorrect join logic
- Ensuring complete and accurate data through proper join conditions
- WITH clause structure
- Breaking transformations into logical steps
- Replacing nested subqueries with clearer CTE chains
- Source standardization CTEs
- Enrichment and transformation CTEs
- CTEs for dimensional and factual modeling
- When subqueries are acceptable
- Trade-offs vs CTE usage
- Common SQL mistakes in analytical pipelines
- Purpose of dbt_project.yml
- Organizing models, macros, tests and seeds
- Folder structure for scalable team development
- Creating Snowflake profiles
- Managing DEV, TEST and PROD targets
- Using environment variables for credential safety
- Initializing a repository
- Branching and commit conventions
- Version control practices for DBT teams
- dbt debug
- dbt run
- dbt test
- dbt build
- dbt deps
- How ref() defines dependencies
- Execution order
- Lineage graph interpretation
- Table
- View
- Incremental (introduction before full treatment later)
- When to use each materialization
- Standardizing raw input
- Casting, renaming and formatting
- No business logic in staging
- Combining datasets
- Consolidation and enrichment logic
- Preparing data for downstream marts
- Dimensional modeling principles
- Grain definition and surrogate key strategy
- Designing dimensions and fact tables for analytics
- Ingesting raw sources through source.yml
- Cleaning and standardizing inputs
- Multiple-join transformations
- Using CTE chains for clarity and modularity
- Fact table measures & foreign keys
- Dimension attribute modeling
- Ensuring referential consistency through tests
- Structuring source.yml
- Organizing columns, metadata and schema references
- Applying standard tests
- Freshness warnings and SLAs
- Handling upstream schema changes
- unique
- not_null
- relationships
- accepted_values
- Writing SQL-based test logic
- Parameterizing custom tests
- Organizing reusable test macros
- Using CSV files for reference data
- Managing seed loading and updates
- SCD Type 2 fundamentals
- Snapshot SQL and configuration
- Deciding between snapshots and incremental models
- Why incremental processing reduces warehouse cost
- Insert-only vs insert-update logic
- Using materialized: incremental
- Configuring unique_key
- Applying is_incremental() correctly
- Late arriving data
- Backfill strategies
- Converting incremental to full-refresh models
- Control structures, variables and conditions
- Dynamic SQL generation
- Macro structure and arguments
- Creating reusable SQL blocks
- Centralizing business logic
- Standardizing dates
- Column cleanup templates
- Reusable join macros
- Pre-hooks and post-hooks
- Running Snowflake SQL utilities via run-operation
- DEV, TEST and PROD setup
- Schema and database isolation
- Promotion strategy
- Using profile targets
- Using tags to manage selective runs
- Run strategies for different environments
- Triggering workflows on PR or commit
- Installing DBT
- Running deps, build and test
- Automatic builds on main merges
- Use of environment variables
- Integration with DBT Cloud via API
- Regenerating docs site
- Publishing in CI steps
- Pipeline concept
- Jenkinsfile structure
- Stages and steps
- Using Jenkins credentials for Snowflake
- Simplified environment variables
- Running dbt debug, deps and build
- Managing environment-specific runs
- Using Jenkins to promote code
- Handling test failures and pipeline control
- Jobs
- Schedules
- Notifications
- Collaborative development
- When teams use DBT Cloud
- Raw → staging → intermediate → mart
- Applying tests
- Generating documentation
- Reviewing lineage
- Running DBT in GitHub Actions or Jenkins
- Full build and validation
- Final review and Q&A
