All three workshops will take place on 23rd and 24th of February
- 2x4 hours each of these days (8 hours in total each workshop)
BIG DATA ON KUBERNETES
How to use Kubernetes in AWS and run different Big Data tools on top of it? We simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage. The technologies that we will be using include Kafka, Spark 3.0 and S3. All exercises will be done on the remote Kubernetes clusters.
REAL - TIME STREAM PROCESSING
How to process unbounded streams of data in real-time using popular open-source frameworks? We focus mostly on Apache Flink and Apache Kafka. We simulate real-world end-to-end scenario – processing logs generated by users interacting with a mobile application in real-time. The technologies that we use include Kafka, Flink, HDFS and YARN. All exercises will be done on the remote multi-node clusters.
FOUNDATIONS OF DATA ENGINEERING WITH GOOGLE CLOUD
While getting familiar with services like Google Cloud Storage, BigQuery or DataFlow we will walk through the common data flow patterns adopted by companies migrating to the cloud. The workshop will contain a series of exercises that will help you get a hands on experience with Google Cloud Platform as well as an opportunity to discuss best practices, security, scalability and cost management aspects