Webinar online - Big Data Technology Warsaw Summit
Big Data Tech presents FREE ONLINE WEBINARS
Join our online webinars, during which, we will focus on the Big Data, Real Streaming, Apache Kafka, Flink, SQL and many matters around those subjects!
Through the online meetings, we will prepare for this year's conference and gain some practical knowledge. This is a chance to meet online! Take part in our competition during the meetings and win an invitation to the Big Data Technology Summit 2022 and access to the recordings from the last edition! Information, agenda and registration below.
Real-time streaming webinar
Description
During this meeting we will talk about the Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL and Stream Processing with Apache Flink.
Take a part in our competition and WIN an invitation to the Big Data Technology Summit 2022 and access to the recordings from the last edition!
Speakers
Agenda
14.50 – 15.00 Introduction and presentation of experts
GetInData | Part of Xebia
15.00 – 15.30 Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL
A stateful streaming data pipeline needs both a solid base and an engine to drive the data. Apache Kafka is an excellent choice for storing and transmitting high throughput and low latency messages. Apache Flink adds the cherry on top with a distributed stateful compute engine available in a variety of languages, including SQL. In this session we'll explore how Apache Flink operates in conjunction with Apache Kafka to build stateful streaming data pipelines, and the problems we can solve with this combination. We will explore Flink's SQL client, showing how to define connections and transformations with the most known and beloved language in the data industry. This session is aimed at data professionals who want to reduce the barrier to streaming data pipelines by making them configurable as set of simple SQL commands.
Aiven
15.30 – 15.40 Q&A Session
15.40 – 16.10 Flink's Table & DataStream API: A Perfect Symbiosis
The Table API is not a new kid on the block. But the community has worked hard on reshaping its future. Today, it is one of the core abstractions in Flink next to the DataStream API. The Table API can deal with bounded and unbounded streams in a unified and highly optimized ecosystem inspired by databases and SQL. Various connectors and catalogs integrate with the outside world. But this doesn't mean that the DataStream API will become obsolete any time soon. In this talk, we would like to demo what Table API is capable of today. We present how the API solves different scenarios: as a batch processor, a changelog processor, or a streaming ETL tool with many built-in functions and operators for deduplicating, joining, and aggregating data. We discuss the main differences between Table and DataStream APIs and elaborate on their preferred usage. We also show hybrid pipelines in which both APIs interact in symbiosis and contribute their unique strengths.
Ververica
16.10 – 16.20 Q&A Session
16.20 – 16.30 – The annouoncement of the competition winners!
AI, ML and Data Science webinar
Description
During this meeting we will talk about the Tweet-Topic Classification: The Real-Life Perspective and No-code ML predictions in real-time with ksqlDB and MLeap.
Take a part in our competition and WIN an invitation to the Big Data Technology Summit 2022 and access to the recordings from the last edition!
Speakers
Agenda
14.50 – 15.00 Introduction and presentation of experts
GetInData | Part of Xebia
15.00 – 15.30 Tweet-Topic Classification: The Real-Life Perspective
In this talk I'm going to present an ML-driven approach at topic classification that we use at Twitter. Although I'll share some details regarding the modelling, my focus would be on challenges of deploying and maintaining a model in production. To begin with, I’ll talk about the project background and requirements that have shaped our design choices. Among them: legacy systems in place and the data characteristics. I’ll cover a BERT-based model architecture that we’ve designed along with evaluation methodology. I’ll describe how our classifier is deployed and how it interplays with other systems. The problem of data gathering and bias will be of a particular importance. I’ll describe problems with recall that we’ve faced and active learning-inspired steps that we’ve taken to overcome them. Finally, I’ll describe continuous training and deployment pipeline that we’ve designed to ensure model freshness. To sum up, I’ll present the whole process that was needed to build an ML-system and ensure its performing well in an production environment.
15.30 – 15.40 Q&A Session
15.40 – 16.10 No-code ML predictions in real-time with ksqlDB and MLeap.
Training and publishing your ML model into your company's model registry usually does not end the journey. While it's straightforward to generate the batch of predictions in a scheduled job or serve the API that returns the model's response, both methods require some development time to deliver business value provided by the model into it's destination.
But, there is a third way - if your company already uses Kafka as a message hub, you can aggregate events into ML features using ksqlDB in real-time and then get models response, even for the ones trained in scikit-learn or SparkMLlib. With a set of additional tools, like Mlflow Model Registry and Kafka Connect, you can deliver the insights produced by the model with no development, except a few lines of SQL and a couple of JSON files.
Join the webinar to go with me through the entire process! No coding skills required 😉
GetInData | Part of Xebia
16.10 – 16.20 Q&A Session
16.20 – 16.30 – The annouoncement of the competition winners!