We invite to the workshops!
February 26th in “Centrum Zielna”, Zielna 37, Warsaw.
February 28th in Conference Center “Golden Floor”, Sienna 39, Warsaw (2nd floor).

*Because of the big interest in the workshops, we are launching new date – February 26th in the other location.
Details in the location tab.

DESCRIPTION

This is a one-day event dedicated to everyone who wants to understand and get some hands-on taste of working with Big Data and Hadoop ecosystem. We will be talking about technologies such as Hadoop, Hive and Spark.
During the workshop you’ll act as a Big Data specialist working for a fictional company called StreamRock that creates an application for music streaming (Spotify alike). The main goal of your work is to take advantage of Big Data technologies to analyze data about the users and the song they played. You will process the data to get discover answers to many business questions and power product features that StreamRock is building. Every exercise will be executed on a remote multi-node Hadoop cluster.
The workshop is highly focused on a practical experience. Instructor will share with you interesting and practical insights gained while working with Big Data technologies for several years.

TARGET AUDIENCE

Workshop is dedicated to everyone who is interested in Big Data, analytics, engineers, managers and others.
We will work in a group of no more than 20 people.

REQUIREMENTS

All you need to fully participate in the workshop is a laptop with the web browser, terminal (e.g. Putty) and the wi-fi connection. Any prior knowledge of Big Data technologies is not assumed.

PARTICIPANT'S ROI

  • Carefully curated knowledge of Hadoop Ecosystem
  • Intuition about when and why use different Big Data tools
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.45

Session #1 - Introduction to the Big Data and Apache Hadoop

  • Description of the StreamRock company along with all its opportunities and challenges that come from the Big Data technologies
  • Introduction to core Hadoop technologies such as HDFS or YARN
  • Hands-on exercise: Accessing a remote multi-node Hadoop cluster

10.45 - 11.00

Coffee break

11.00 - 12.30

Session #2 - Providing data-driven answers to business questions using SQL-like solution

  • Introduction to Apache Hive
  • Hands-on exercise: Importing structured data into the cluster using HUE
  • Hands-on exercise: Ad-hoc analysis of the structured data with Hive
  • Hands-on exercise: The visualisation of results using HUE

12.30 - 13.30

Lunch

13.30 - 15.30

Session #3 - Implementing scalable ETL processes on the Hadoop cluster

  • Introduction to Apache Spark, Spark SQL and Spark DataFrames.
  • Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
  • Quick explanation of the Avro and Parquet binary data formats.
  • Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.

15.30 - 15.45

Coffee break

15.45 - 16.45

Session #4 - Wrap up

  • Quick overview of other Hadoop Ecosystem tools and technologies
  • Most popular Big Data use-cases

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

  • Big Data Jeopardy game

Keywords: Hadoop Ecosystem, Hive, Spark, Big Data Analytics, Big Data ETL

Workshop speaker:

Piotr Krewski

Big Data Consultant and Co-founder, GetInData

DESCRIPTION

This is a one-day event dedicated to everyone who wants to understand and get some hands-on taste of training models on Big Data sets in Apache Spark. During the workshop, you’ll act as a Data Scientist working for a fictional company. The main goal of your work is to build an ML model which will help the company in optimizing their operations. Every exercise will be executed on a remote multi-node Hadoop cluster.

TARGET AUDIENCE

Workshop is dedicated to everyone who is interested in Big Data analytics.
We will work in a group of no more than 20 people.

REQUIREMENTS

All you need to fully participate in the workshop is a laptop with the web browser, terminal (e.g. Putty) and the wi-fi connection. A participant should have at least basic knowledge about ML models build in R or Python.

PARTICIPANT'S ROI

  • Knowledge of how to move from local machine PoC to highly scalable ML models trained on Big Data sets
  • Hands-on experience on simulated real-life use-case
  • Tips about real-world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.15

Session #1 - Introduction to Apache Spark and MLLib

10.15 - 10.30

Coffee break

10.30 - 12.30

Session #2 - Data preparation (transformations, feature extraction,…)

12.30 - 13:30

Lunch

13.30 - 15.00

Session #3 - Model training and hyperparameter tuning

15.00 - 15.15

Coffe break

15.15 - 16.45

Session #4 - Embedding everything into a pipeline

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

Keywords: Spark, Big Data Analytics, Machine Learning, Hadoop

Workshop speaker:

Tomasz Żukowski

Data Analyst, GetInData

DESCRIPTION

This one day workshop teaches participants how to use Kubernetes in AWS and run different Big Data tools on top of it.
During the course we simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage.
The technologies that we will be using include Kafka, Flink and S3.
All exercises will be done on the remote multi-node Kubernetes clusters.

TARGET AUDIENCE

Engineers who are interested in Big Data and Kubernetes.
We will work in a group of no more than 20 people.

REQUIREMENTS

Some experience with Docker and programming.

PLEASE INSTALL Intellij IDEA i JDK 8.

PARTICIPANT'S ROI

  • Concise and practical knowledge of using Kubernetes
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Kubernetes cluster. If you want to redo exercises later on your own you can use minikube.

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 11.15

Session #1 - Introduction to Kubernetes

  • Docker recap
  • Basic Kubernetes concepts and architecture
  • Hands-on exercise: connecting to Kubernetes cluster

11.15 - 11.30

Coffee break

11.30 - 12.30

Session #2 - Helm

  • Introduction to Helm
  • Hands-on exercise: deploying Helm app

12.30 - 13.30

Lunch

13.30 - 15.30

Session #3 - Apache Kafka

  • Running Apache Kafka on Kubernetes
  • Using Kafka Connect to migrate data from Kafka to S3
  • Leverage Kafka REST in your web application
  • Hands-on exercise: deploying data pipeline on Kubernetes

15.30 - 15.45

Coffee break

15.45 - 16.45

Session #4 - Apache Flink

  • Flink as streaming processing engine
  • Deploying Flink on Kubernetes
  • Hands-on exercise: Real-time data aggregation on Flink

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

Keywords: Kubernetes, Docker, Helm, Kafka, Flink

Workshop speaker:

Maciej Bryński

Big Data Architect, XCaliber

DESCRIPTION

This is a one-day event dedicated to everyone who wants to understand and get some hands-on taste of data management and analytics on Google Cloud Platform. We will be talking about technologies such as BigQuery, Dataflow and Data Studio.

During the workshop you’ll have many hats –

  1. Data Analyst hat where you will use BigQuery to analyze dataset and prepare a report.
  2. Data Engineer hat where you will use Dataflow to process and transform dataset.
  3. Data Scientist hat where you will use BigQuery ML to prepare model and predict prices and visualize your findings using Data Studio.

Every exercise will be executed on Google Cloud Platform.

The workshop is highly focused on a practical experience. Instructor will share with you interesting and practical insights gained while working with Google’s technologies.

TARGET AUDIENCE

Workshop is dedicated to everyone who is interested how to use Google Cloud Platform’s Big Data stack. Having in mind, that there will be some coding required, this workshop is recommended for data analysts, data scientists and engineers.

REQUIREMENTS

Please register yourself on Qwiklabs (https://google.qwiklabs.com). You will be given access codes to specially crafted labs. All you need to fully participate in the workshop is a laptop with the web browser and the wi-fi connection.

Any prior knowledge of Big Data technologies is not assumed, no additional software is required.

Basic SQL and basic coding skills in Python/Java is required.

PARTICIPANT'S ROI

  • Knowledge how to build Data pipelines on Google Cloud Platform
  • Some knowledge about Google Cloud ecosystem, intuition about when and why use different Big Data tools
  • Hands-on experience on simulated real-life use-cases
  • Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of Qwiklabs which contain instructions, steps and links to source code. Materials can be accessed after training. If some steps are altered, source code will be shared on git repository. During the workshop exercises will be done on Google Cloud Platform.

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 11.15

Session #1 - Introduction to Google Cloud Platform

  • History of Google, Hadoop and Google Cloud
  • High level overview of GCP stack

10.45 - 11.00

Coffee break

11.00 - 12.30

Session #2 - BigQuery - Cloud data warehouse for analytics

  • Introduction to BigQuery
  • Hands-on exercise: Importing structured data into BigQuery
  • Hands-on exercise: Robust SQL analysis with BigQuery

12.30 - 13.30

Lunch

13.30 - 15.30

Session #3 - Dataflow - Simplified stream and batch processing

  • Introduction to Apache Beam and Dataflow
  • Hands-on exercise: Implementation of the Dataflow ETL job to clean and transform input data
  • Hands-on exercise: Implementation of Streaming Dataflow pipeline that clean and transform messages from PubSub

15.30 - 15.45

Coffee break

15.45 - 16.45

Session #4 - BigQuery ML - building ML models in a fraction of time

  • Introduction to BigQuery ML
  • Hands-on exercise: Implementation of BigQuery ML model and visualization of predictions using Data Studio

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and Q&A

  • Recommandations, further reading, next steps
  • Q&A

Keywords: Google Cloud Platform, BigQuery, Dataflow, BigQuery ML, Data Studio

Workshop speaker:

Radek Stankiewicz

Strategic Cloud Engineer, Google

Mateusz Pytel

Google Certified Professional - Cloud Architect, GetInData