Data Streaming: Analyze Your Data in Real-Time With Flink - Big Data Technology Warsaw Summit
Data Streaming: Analyze Your Data in Real-Time With Flink
In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.
We will also teach you how to incorporate good engineering practices like version controlling, testing and monitoring your applications. We prepared for you an environment that wrap analysts workflow from designing your application to deploying it to production and that does not require you to be a software engineer. We will work through typical streaming problems you can encounter on a journey to deliver fresh & reliable data and how modern tooling can help to solve them. All hands-on exercises will be carried out in a public cloud environment (e.g. GCP or AWS) and all tools already installed and remotely accessible.
Target Audience
Data scientists, Data engineers and Analytics Engineers who are interested in solving complex problems on streaming data using Apache Flink and how to deploy their streaming solutions to production.
Requirements
- SQL and Python fluency: ability to write data transforming queries and scripts
- Basic understanding of ETL processes
- Basic experience with a command-line interface
- Laptop with a stable internet connection (participants will connect to pre-created cloud development environment)
Participant’s ROI
- Concise and practical knowledge of applying stream processing and specifically Apache Flink to solve business problems.
- Hands-on coding experience under supervision of experienced Flink data engineers.
- Tips about real world applications and best practices.
Training Materials
All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises.
Time Box
This is a one-day event, there will be some breaks between sessions.
Agenda
Session #1 - Introduction to Apache Kafka
Session #2 - Introduction to Apache Flink
- Key concepts behind stream processing
- Building a streaming pipeline with Flink SQL
- Hands-on exercises
Session #3 - Timely Stream Processing
- Flink’s notions of time, windowing and aggregations
- Joining multiple data streams or data sets
- Hands-on exercises
Session #4 - Pattern Matching
- matching patterns with MATCH_RECOGNIZE clause
- Hands-on exercises
Session #5 - Productization
- Deploying Flink jobs to Production
- Hands-on exercises
Session #6 - Flink advanced concepts (theory only)
- high-level Python Flink Table API
- low-level Python Flink Datastream API
- Temporal Table Functions
Session leader:
GetInData | Part of Xebia
GetInData | Part of Xebia