Modern data pipelines with dbt - Big Data Technology Warsaw Summit
Modern data pipelines with dbt
In this one day workshop you will learn how to create modern data transformation pipelines managed by dbt. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team: version controlling, testing, monitoring, snapshotting and standardization of writing SQL transformation scripts. We will work through typical data transformation problems you can encounter on a journey to deliver fresh & reliable data and how dbt can help to solve them.
Target Audience
Data analysts, analytics engineers & data engineers, who are interested in learning to build and deploy data transformations workflows faster. Data analysts who would like to leverage their SQL skills and start working on data transformation tasks.
Requirements
- SQL fluency - ability to write data transforming queries.
- Basic understanding of ETL processes.
- Basic experience with command-line
- Laptop with stable internet connection (participants will connect to Jupyter Notebooks pre-created on Google Cloud Platform)
Participant’s ROI
- Concise and practical knowledge of applying dbt to solve typical problems with data pipelines in a modern way: managing run sequence, data quality issues, monitoring, fluency of switching between environments
- Hands-on coding experience under supervision of Data Engineers experienced in maintaining dbt pipelines
- Tips about real world applications and best practices.
Training Materials
All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshops participants will follow a shared step-by-step guideline with an overview from the perspective of augmenting the Data Team workflow with the dbt tool. Jupyter Notebook environments will be provided for each participant.
Time Box
This is a one-day event (9:00-16:00), there will be some breaks between sessions.
Agenda
Session #1 - Introduction to dbt
- Framework overview
- Typical use cases
- Impact on data transformation development
Session #2 - Core concepts of dbt
- Data models
- Seeds, sources
- Tests
- Documentation, maintenance and data lineage
- Hands-on exercises
Session #3 - Advanced dbt features
- Macros & hooks
- Snapshots
- Extensions
- Other tools to integrate with (overview only)
- Hands-on exercises
Session #4 - Scheduling, deployment, workflow
- GID DataOps Data Platform
- Airflow
- dbt cloud
- Hands-on exercises
Keywords: data warehouse, data analytics, dbt, ETL, ELT, data transformation, sql
Session leader:
GetInData