Workshops: 26 or 28th of February – date of your choice upon registration

9 am – 5 pm

We will work in a group of no more than 20 people.

Developing production-ready Spark application

DESCRIPTION

During this workshop we will create fully functioning, production-ready Spark application using day-to-day tools like Scala, sbt or Intellij.

Workshop’s targeted audience are semi-professionals with little or more background with programming. We will provide necessary project setup and introduction to Scala language and required tools for building the application. Previous Scala knowledge is not mandatory, merely general IT skills.

AGENDA

Session #1 Introduction to Scala and Spark. Present workshops goals.

– brief introduction to Scala programming,

– discuss workshops’ project structure

– present e2e setup for testing processing logic

 

Session #2 Write application code to process JSON data from HDFS to HIVE with Spark

– implement input data processor and formatting,

– apply custom transformations to the data,

– tune processing logic and performance

 

Session #3 Implement testing logic to validate processing

– run and test application code,

– exercise testing skills

 

Session #4 Wrap up

– quick overview,

– discuss deplyoment and maintanence of Spark Jobs

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price)

We will work in a group of no more than 20 people.

Workshop trainers:

Paweł Kubit

Data Engineer, GetInData

Patrycjusz Sienkiewicz

Data Engineer, GetInData

Real-Time stream processing

DESCRIPTION

In this one day workshop you will learn how to process unbounded streams of data in real-time using popular open-source frameworks. We focus mostly on Apache Flink and Apache Kafka – the most promising open-source stream processing framework that is more and more frequently used in production.

 

During the course we simulate real-world end-to-end scenario – processing logs generated by users interacting with a mobile application in real-time. The technologies that we use include Kafka, Flink, HDFS and YARN. All exercises will be done on the remote multi-node clusters.

TARGET AUDIENCE

Data engineers who are interested in leveraging large-scale and distributed tools to process streams of data in real-time. 

REQUIREMENTS

Some experience coding in Java or Scala and basic familiarity with Big Data tools (HDFS, YARN).

PARTICIPANT'S ROI

  • Concise and practical knowledge of applying stream processing to solve business problems.
  • Hands-on coding experience under supervision of experience Flink engineers.
  • Tips about real world applications and best practices. 

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

The workshop will last for 8 full hours, so you should reserve yourself a full 1 day. Of course there will be coffee and lunch breaks during the training.

We will work in a group of no more than 20 people.

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.15

Session #1 - Introduction to Apache Kafka + hands-on exercises

10.15 - 10.30

Coffee break

10.30 - 12.30

Session #2 - Apache Flink

  • Introduction and key concepts
  • Basic Flink API
  • Hands-on exercises

12.30 - 13.30

Lunch

13.30 - 15.00

Session #3 - Flink cont.

  • Time & Windows
  • Integration with Kafka
  • Hands-on exercises

15.00 - 15.15

Coffee break

15.15 - 16.45

Session #4 - Flink c.d.

  • Stateful operations
  • Best practices
  • Daemons and cluster infrastructure
  • Hands-on exercises

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and comparison with other stream processing engines

  • Stateful operations
  • Best practices
  • Daemons and cluster infrastructure
  • Hands-on exercises

Keywords: Kafka, Flink, Real Time Processing, Low Latency Stream Processing, 

Workshop trainer:

Krzysztof Zarzycki

Big Data Architect, CTO and Co-founder, GetInData

Big Data on Kubernetes

DESCRIPTION

This one day workshop teaches participants how to use Kubernetes in AWS and run different Big Data tools on top of it. 

During the course we simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage.
The technologies that we will be using include Kafka, Spark and S3. 

All exercises will be done on the remote Kubernetes clusters.

TARGET AUDIENCE

Engineers who are interested in Big Data and Kubernetes.

REQUIREMENTS

Some experience with Docker and programming.

PARTICIPANT'S ROI

  • Concise and practical knowledge of using Kubernetes 
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Kubernetes cluster. If you want to redo exercises later on your own you can use minikube.

TIME BOX

This is one-day event,  there will be coffee breaks and one-hour lunch break (included in price).

We will work in a group of no more than 20 people.

AGENDA

Session 1 – Introduction to Kubernetes

  • Docker recap
  • Basic Kubernetes concepts and architecture
  • Hands-on exercise: connecting to Kubernetes cluster

 

Session 2 – Helm

  • Introduction to Helm
  • Hands-on exercise: deploying Helm app

 

Session 3 – Apache Kafka

  • Running Apache Kafka on Kubernetes
  • Using Kafka Connect to migrate data from Kafka to S3
  • Leverage Kafka REST in your web application
  • Hands-on exercise: deploying data pipeline on Kubernetes

 

Session 4 – Apache Spark

  • Spark as streaming processing engine
  • Deploying Spark on Kubernetes
  • Hands-on exercise: Real-time data aggregation using Spark Streaming

Keywords: Kubernetes, Docker, Helm, Kafka, Spark

Workshop trainer:

Maciej Bryński

Big Data Architect,