The conference (February 27) and workshops (February 28) will take place at the Hotel Marriott Warszawa,
Al. Jerozolimskie 65/79, 00-697 Warsaw

DESCRIPTION

This is a one-day event dedicated to everyone who wants to understand and get some hands-on taste of working with Big Data and Hadoop ecosystem. We will be talking about technologies such as Hadoop, Hive and Spark.
During the workshop you’ll act as a Big Data specialist working for a fictional company called StreamRock that creates an application for music streaming (Spotify alike). The main goal of your work is to take advantage of Big Data technologies to analyze data about the users and the song they played. You will process the data to get discover answers to many business questions and power product features that StreamRock is building. Every exercise will be executed on a remote multi-node Hadoop cluster.
The workshop is highly focused on a practical experience. Instructor will share with you interesting and practical insights gained while working with Big Data technologies for several years.

TARGET AUDIENCE

Workshop is dedicated to everyone who is interested in Big Data, analytics, engineers, managers and others.

REQUIREMENTS

All you need to fully participate in the workshop is a laptop with the web browser, terminal (e.g. Putty) and the wi-fi connection. Any prior knowledge of Big Data technologies is not assumed.

PARTICIPANT'S ROI

  • Carefully curated knowledge of Hadoop Ecosystem
  • Intuition about when and why use different Big Data tools
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.45

Session #1 - Introduction to the Big Data and Apache Hadoop

  • Description of the StreamRock company along with all its opportunities and challenges that come from the Big Data technologies
  • Introduction to core Hadoop technologies such as HDFS or YARN
  • Hands-on exercise: Accessing a remote multi-node Hadoop cluster

10.45 - 11.00

Coffee break

11.00 - 12.30

Session #2 - Providing data-driven answers to business questions using SQL-like solution

  • Introduction to Apache Hive
  • Hands-on exercise: Importing structured data into the cluster using HUE
  • Hands-on exercise: Ad-hoc analysis of the structured data with Hive
  • Hands-on exercise: The visualisation of results using HUE

12.30 - 13.30

Lunch

13.30 - 15.30

Session #3 - Implementing scalable ETL processes on the Hadoop cluster

  • Introduction to Apache Spark, Spark SQL and Spark DataFrames.
  • Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
  • Quick explanation of the Avro and Parquet binary data formats.
  • Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.

15.30 - 15.45

Coffee break

15.45 - 16.45

Session #4 - Wrap up

  • Quick overview of other Hadoop Ecosystem tools and technologies
  • Most popular Big Data use-cases

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

  • Big Data Jeopardy game

Keywords: Hadoop Ecosystem, Hive, Spark, Big Data Analytics, Big Data ETL

Workshop speaker:

Piotr Krewski

Big Data Consultant and Co-founder, GetInData

DESCRIPTION

This is a one-day event dedicated to everyone who wants to understand and get some hands-on taste of training models on Big Data sets in Apache Spark. During the workshop, you’ll act as a Data Scientist working for a fictional company. The main goal of your work is to build an ML model which will help the company in optimizing their operations. Every exercise will be executed on a remote multi-node Hadoop cluster.

TARGET AUDIENCE

Workshop is dedicated to everyone who is interested in Big Data analytics.

REQUIREMENTS

All you need to fully participate in the workshop is a laptop with the web browser, terminal (e.g. Putty) and the wi-fi connection. A participant should have at least basic knowledge about ML models build in R or Python.

PARTICIPANT'S ROI

  • Knowledge of how to move from local machine PoC to highly scalable ML models trained on Big Data sets
  • Hands-on experience on simulated real-life use-case
  • Tips about real-world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.15

Session #1 - Introduction to Apache Spark and MLLib

10.15 - 10.30

Coffee break

10.30 - 12.30

Session #2 - Data preparation (transformations, feature extraction,…)

12.30 - 13:30

Lunch

13.30 - 15.00

Session #3 - Model training and hyperparameter tuning

15.00 - 15.15

Coffe break

15.15 - 16.45

Session #4 - Embedding everything into a pipeline

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

Keywords: Spark, Big Data Analytics, Machine Learning, Hadoop

Workshop speaker:

Tomasz Żukowski

Data Analyst, GetInData

DESCRIPTION

This one day workshop teaches participants how to use Kubernetes in AWS and run different Big Data tools on top of it.
During the course we simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage.
The technologies that we will be using include Kafka, Flink and S3.
All exercises will be done on the remote multi-node Kubernetes clusters.

TARGET AUDIENCE

Engineers who are interested in Big Data and Kubernetes.

REQUIREMENTS

Some experience with Docker and programming.

PARTICIPANT'S ROI

  • Concise and practical knowledge of using Kubernetes
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Kubernetes cluster. If you want to redo exercises later on your own you can use minikube.

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 11.15

Session #1 - Introduction to Kubernetes

  • Docker recap
  • Basic Kubernetes concepts and architecture
  • Hands-on exercise: connecting to Kubernetes cluster

11.15 - 11.30

Coffee break

11.30 - 12.30

Session #2 - Helm

  • Introduction to Helm
  • Hands-on exercise: deploying Helm app

12.30 - 13.30

Lunch

13.30 - 15.30

Session #3 - Apache Kafka

  • Running Apache Kafka on Kubernetes
  • Using Kafka Connect to migrate data from Kafka to S3
  • Leverage Kafka REST in your web application
  • Hands-on exercise: deploying data pipeline on Kubernetes

15.30 - 15.45

Coffee break

15.45 - 16.45

Session #4 - Apache Flink

  • Flink as streaming processing engine
  • Deploying Flink on Kubernetes
  • Hands-on exercise: Real-time data aggregation on Flink

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

Keywords: Kubernetes, Docker, Helm, Kafka, Flink

Workshop speaker:

Maciej Bryński

Big Data Architect, XCaliber