Build reliable data pipelines using Modern Data Stack in the cloud

In this one-day workshop, you will learn how to create modern data transformation pipelines managed by dbt and orchestrated with Apache Airflow. You will discover how you can improve your pipelines’ quality and the workflow of your data team by introducing a set of tools aimed to standardize the way you incorporate good practices within the data team: version controlling, testing, monitoring, change-data-capture, and easy scheduling. We will work through typical data transformation problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in a public cloud environment (e.g. GCP or AWS).

   Target Audience

Data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy data transformation workflows faster than ever before. Everyone, who would like to leverage their SQL skills and start working on building data pipelines more easily.

    Requirements

  • SQL fluency: ability to write data transforming queries
  • Basic understanding of ETL processes
  • Basic experience with a command-line interface
  • Laptop with a stable internet connection (participants will connect to Jupyter Notebooks pre-created in a cloud environment

    Participant’s ROI

  • Concise and practical knowledge of applying dbt to solve typical problems with data pipelines in a modern way: managing run sequence, data quality issues, monitoring, and scheduling transformations with Apache Airflow
  • Hands-on coding experience under the supervision of Data Engineers experienced in maintaining dbt pipelines
  • Tips about real-world applications and best practices.

    Training Materials

During the workshop, participants will follow a shared step-by-step guideline with an overview from the perspective of augmenting a data team’s workflow with the dbt tool. Jupyter Notebook environments will be supplied for each participant. Pre-generated datasets will be provided to use for all participants to participate in the example real-life use case scenario.

    Time Box

1 Day event

    Agenda

Session #1 - Introduction to Modern Data Stack

  • What is Modern Data Stack? Intro
  • Key components of MDS
  • Core concepts of dbt
    • Data models
    • Seeds, sources
    • Tests
    • Documentation
  • Hands-on exercises

Session #2 - Simple end-do-end data pipeline

  • Data discovery (data search, usage statistics, data lineage)
  • Data profiling & exploration
  • Transforming data using SQL with dbt
  • Data consumption with BI tools
  • Hands-on exercises

Session #3 - Data pipeline - scheduling, deployment & advanced features

  • Apache Airflow as a workflow scheduler
  • Data testing & data observability
  • Exploring transformed data with a BI tool
  • Hands-on exercises

   Participants limit
20 participants

   Keywords:

    Session leader:

Data Analyst / Analytics Engineer
GetInData | Part of Xebia
Data Engineer
GetInData | Part of Xebia

BIG DATA TECHNOLOGY
WARSAW SUMMIT 2024

10-11th of April 2024

ORGANIZER

Evention sp. z o.o

Rondo ONZ 1 Str,

Warsaw, Poland

www.evention.pl

CONTACT

Weronika Warpas

© 2024 | This site uses cookies.