Data Streaming: Analyze Your Data in Real-Time With Flink

Conference Center ADN, room 14

In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.

We will also teach you how to incorporate good engineering practices like version controlling, testing and monitoring your applications. We prepared for you an environment that wrap analysts workflow from designing your application to deploying it to production and that does not require you to be a software engineer. We will work through typical streaming problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in a public cloud environment (e.g. GCP or AWS) and all tools already installed and remotely accessible.

   Target Audience

Data scientists, Data engineers and Analytics Engineers who are interested in solving complex problems on streaming data using Apache Flink and how to deploy their streaming solutions to production.

    Requirements

  • SQL and Python fluency: ability to write data transforming queries and scripts
  • Basic understanding of ETL processes
  • Basic experience with a command-line interface
  • Laptop with a stable internet connection (participants will connect to pre-created cloud development environment)

    Participant’s ROI

  • Concise and practical knowledge of applying stream processing and specifically Apache Flink to solve business problems.
  • Hands-on coding experience under supervision of experienced Flink data engineers.
  • Tips about real world applications and best practices.

    Training Materials

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises.

    Time Box

This is a one-day event, there will be some breaks between sessions.

    Agenda

Session #1 - Introduction to Apache Kafka

Session #2 - Introduction to Apache Flink

  • Key concepts behind stream processing
  • Building a streaming pipeline with Flink SQL
  • Hands-on exercises

Session #3 - Timely Stream Processing

  • Flink’s notions of time, windowing and aggregations
  • Joining multiple data streams or data sets
  • Hands-on exercises

Session #4 - Pattern Matching

  • matching patterns with MATCH_RECOGNIZE clause
  • Hands-on exercises

Session #5 - Productization

  • Deploying Flink jobs to Production
  • Hands-on exercises

Session #6 - Flink advanced concepts (theory only)

  • high-level Python Flink Table API
  • low-level Python Flink Datastream API
  • Temporal Table Functions

    Session leader:

Data Engineer - ML / Stream processing
GetInData | Part of Xebia
Big Data Analytics & Data Science
GetInData | Part of Xebia

BIG DATA TECHNOLOGY
WARSAW SUMMIT

ORGANIZER

Evention sp. z o.o

Rondo ONZ 1 Str,

Warsaw, Poland

www.evention.pl

CONTACT

Weronika Warpas