Real-Time Stream Processing - Big Data Technology Warsaw Summit
Real-Time Stream Processing
In this one day workshop you will learn how to process unbounded streams of data in real-time using popular open-source frameworks. We focus mostly on Apache Flink and Apache Kafka – the most promising open-source stream processing framework that is more and more frequently used in production.
During the course we simulate real-world end-to-end scenario – processing logs generated by users interacting with a mobile application in real-time. The technologies that we use include Kafka and Flink. All exercises will be done using either a local docker environment or within your IDE.
Target Audience
Data engineers who are interested in leveraging large-scale and distributed tools to process streams of data in real-time.
Requirements
Some experience coding in Java or Scala and basic familiarity with Big Data tools (HDFS, YARN).
Participant’s ROI
- Concise and practical knowledge of applying stream processing to solve business problems.
- Hands-on coding experience under supervision of experienced Flink engineers.
- Tips about real world applications and best practices.
Training Materials
All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshops the exercises can be done using either a local docker environment or within your IDE.
Time Box
This is a one-day event, there will be some breaks between sessions.
Agenda
Session #1 - Introduction to Apache Kafka
Session #2 - Introduction to Apache Flink
- Key concepts behind stream processing
- Building a streaming pipeline with Flink DataStream API
- Hands-on exercises
Session #3 - Timely Stream Processing
- Flink’s notions of time, windowing and aggregations
- Hands-on exercises
Session #4 - Connecting to the external world
- Integration with Apache Kafka
- Hands-on exercises
Session #5 - Stateful Stream Processing
- Fault tolerance
- Advanced time handling
- Stateful operations
- Hands-on exercises
Session #6 - Summary and comparison with other stream processing engines
Keywords: Kafka, Flink, Real Time Processing, Low Latency Stream Processing
Session leader:
GetInData
GetInData