Webinar online - Big Data Technology Warsaw Summit | Big Data Technology Warsaw Summit

15th March

22nd March

14.50 – 15.00 Introduction and presentation of experts

Adam Kawa

CEO and Co-founder
GetInData | Part of Xebia

15.00 – 15.30 Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL

A stateful streaming data pipeline needs both a solid base and an engine to drive the data. Apache Kafka is an excellent choice for storing and transmitting high throughput and low latency messages. Apache Flink adds the cherry on top with a distributed stateful compute engine available in a variety of languages, including SQL. In this session we'll explore how Apache Flink operates in conjunction with Apache Kafka to build stateful streaming data pipelines, and the problems we can solve with this combination. We will explore Flink's SQL client, showing how to define connections and transformations with the most known and beloved language in the data industry. This session is aimed at data professionals who want to reduce the barrier to streaming data pipelines by making them configurable as set of simple SQL commands.

Francesco Tisiot

Developer Advocate
Aiven

15.30 – 15.40 Q&A Session

15.40 – 16.10 Flink's Table & DataStream API: A Perfect Symbiosis

The Table API is not a new kid on the block. But the community has worked hard on reshaping its future. Today, it is one of the core abstractions in Flink next to the DataStream API. The Table API can deal with bounded and unbounded streams in a unified and highly optimized ecosystem inspired by databases and SQL. Various connectors and catalogs integrate with the outside world. But this doesn't mean that the DataStream API will become obsolete any time soon. In this talk, we would like to demo what Table API is capable of today. We present how the API solves different scenarios: as a batch processor, a changelog processor, or a streaming ETL tool with many built-in functions and operators for deduplicating, joining, and aggregating data. We discuss the main differences between Table and DataStream APIs and elaborate on their preferred usage. We also show hybrid pipelines in which both APIs interact in symbiosis and contribute their unique strengths.

Timo Walther

Staff Engineer
Ververica

16.10 – 16.20 Q&A Session

16.20 – 16.30 – The annouoncement of the competition winners!

14.50 – 15.00 Introduction and presentation of experts

Adam Kawa

CEO and Co-founder
GetInData | Part of Xebia

15.00 – 15.30 Tweet-Topic Classification: The Real-Life Perspective

In this talk I'm going to present an ML-driven approach at topic classification that we use at Twitter. Although I'll share some details regarding the modelling, my focus would be on challenges of deploying and maintaining a model in production. To begin with, I’ll talk about the project background and requirements that have shaped our design choices. Among them: legacy systems in place and the data characteristics. I’ll cover a BERT-based model architecture that we’ve designed along with evaluation methodology. I’ll describe how our classifier is deployed and how it interplays with other systems. The problem of data gathering and bias will be of a particular importance. I’ll describe problems with recall that we’ve faced and active learning-inspired steps that we’ve taken to overcome them. Finally, I’ll describe continuous training and deployment pipeline that we’ve designed to ensure model freshness. To sum up, I’ll present the whole process that was needed to build an ML-system and ensure its performing well in an production environment.

Mateusz Fedoryszak

ML Engineer
Twitter

15.30 – 15.40 Q&A Session

15.40 – 16.10 No-code ML predictions in real-time with ksqlDB and MLeap.

Training and publishing your ML model into your company's model registry usually does not end the journey. While it's straightforward to generate the batch of predictions in a scheduled job or serve the API that returns the model's response, both methods require some development time to deliver business value provided by the model into it's destination.

But, there is a third way - if your company already uses Kafka as a message hub, you can aggregate events into ML features using ksqlDB in real-time and then get models response, even for the ones trained in scikit-learn or SparkMLlib. With a set of additional tools, like Mlflow Model Registry and Kafka Connect, you can deliver the insights produced by the model with no development, except a few lines of SQL and a couple of JSON files.

Join the webinar to go with me through the entire process! No coding skills required 😉

Mariusz Strzelecki

Senior Machine Learning Engineer
GetInData | Part of Xebia

16.10 – 16.20 Q&A Session

16.20 – 16.30 – The annouoncement of the competition winners!

Big Data Tech presents FREE ONLINE WEBINARS

Join our online webinars, during which, we will focus on the Big Data, Real Streaming, Apache Kafka, Flink, SQL and many matters around those subjects!

Real-time streaming webinar

Description

Speakers

Agenda

AI, ML and Data Science webinar

Description

Speakers

Agenda

Organizers

BIG DATA TECHNOLOGY
WARSAW SUMMIT

ORGANIZER

CONTACT