Big Data on Kubernetes - Big Data Technology Warsaw Summit
Workshop Big Data on Kubernetes - February 23-24, 2021 ONLINE
This two days workshop teaches participants how to use Kubernetes in AWS and run different Big Data tools on top of it.
During the course we simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage. The technologies that we will be using include Kafka, Spark 3.0 and S3.
All exercises will be done on the remote Kubernetes clusters.
Target Audience
Engineers who are interested in Big Data and Kubernetes.
Requirements
Some experience with Docker and programming.
Participant’s ROI
♦ Concise and practical knowledge of using Kubernetes
♦ Hands-on experience on simulated real-life use-cases
♦ Tips about real world applications and best practices from experienced professionals.
Training Materials
All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Kubernetes cluster. If you want to redo exercises later on your own you can use minikube.
Time Box
This is a two-day event, 4h per day, there will be some breaks between sessions.
February 23, 2021, Day 1 (9 am - 1 pm)
Session #1 - Introduction to Kubernetes
♦ Docker recap
♦ Basic Kubernetes concepts and architecture
♦ Hands-on exercise: connecting to Kubernetes cluster
Session #2 - Helm
♦ Introduction to Helm
♦ Hands-on exercise: deploying Helm app
February 24, 2021, Day 2 (9 am - 1 pm)
Session #3 - Apache Kafka
♦ Running Apache Kafka on Kubernetes
♦ Using Kafka Connect to migrate data from Kafka to S3
♦ Leverage Kafka REST in your web application
♦ Hands-on exercise: deploying data pipeline on Kubernetes
Session #4 - Apache Spark 3.0
♦ Spark as streaming processing engine
♦ Deploying Spark on Kubernetes
♦ Hands-on exercise: Real-time data aggregation using Spark Streaming
Keywords: Kubernetes, Docker, Helm, Kafka, Spark
Session leader:
GetInData