This year we have added half day tutorials* – 26 or 28th of February.

Date of your choice upon registration.

Time:

1. February 26th, 2 pm – 6 pm

2. February 28th,  9 am – 1 pm

Place:

Golden Floor a Conference & Workshops Center, Aleje Jerozolimskie 123A, 02-017 Warsaw.

*We will work in a group of no more than 20 people.

Detect, capture, and ingest changed data from RDBMS to Hadoop

DESCRIPTION

This is a “How to” tutorial where we will be configuring a CDC data processing. It addresses common problem of tracking continuously changing data in RDBMS using Hadoop environment. We will deliver reliable data that is optimised for querying and further analytics.

TARGET AUDIENCE

Data Engineers who are interested in change data capturing concept.

We will work in a group of no more than 20 people.

WHAT YOU WILL LEARN

  • How to configure Debezium with Kafka Connect so that current changes made to the source table can be downloaded and saved to Apache Kafka;
  • How to download captured data from Apache Kafka topic and save it to HDFS using Apache NiFi;
  • How to set up the job which updates the content of the table in Apache Hive.

 

During the event we will discuss the whole process and what can possibly happen when we change conditions in our environment.

TIME BOX

The tutorial will last for 4 full hours. There will be coffee breaks during the training.

AGENDA

Session #1 – Track changes on source table – Kafka, Kafka Connect, Debezium

  • What we are doing here?
  • Configure Kafka Connect with Debezium
  • Troubleshooting
  • Discuss possible improvements

Session #2 – Write data to HDFS and update Hive table – Kafka, NiFi, HDFS, Hive

  • Download data from Kafka to HDFS using NiFi
  • Update table data on Hive
  • Troubleshooting
  • Discuss possible improvements

Tutorial conducted by:

Bartosz Kotwica

Data Engineer, GetInData

From event to insights

DESCRIPTION

During this meeting we will walk you through the common data flow patterns adopted by companies moving into Google Cloud Platform.  In the workshop you will learn:

* How to publish events from your app ecosystem and collect log data from multiple sources.
* Use message queue and stream processing to collect, validate and land the data in a robust data warehouse.
* Combine incoming data with operational data sources and utilise SQL & ML to gain insights, which we will then project in a form of easy to share, fancy visual dashboard.

Additionally we will discuss the cost, maintenance and scalability aspects and how the architecture decisions could impact them as your business grows.

TIME BOX

The tutorial will last for 4 full hours. Of course there will be coffee breaks during the training.

We will work in a group of no more than 20 people.

AGENDA

Tutorial conducted by: 

Mateusz Pytel

Google Certified Professional - Cloud Architect, GetInData

-