Agenda - Big Data Technology Warsaw Summit | Big Data Technology Warsaw Summit

Workshops

Building Generative AI Based Applications With LLMs and Data Augmentation Architectures

Conference Center ADN, room 12, 4th floor

Join us for a one-day workshop on Generative AI and large language models. This event aims to provide participants in-depth knowledge of the latest advancements in natural language processing, computer vision, and machine learning techniques for Gen AI.

The workshop will explore real-life applications of large language models using cutting-edge models such as GPT, PalM, Gemini, and open-source LLMs. Participants will also learn how to use industry-standard LLMs with APIs, fine-tune models on their data, and deploy private LLM-based assistants.

Upon completing the workshop, attendees will gain a comprehensive understanding of integrating Generative AI into data solutions.

Michał Bryś

Machine Learning Engineer

GetInData | Part of Xebia

Marek Wiewiórka

Chief Data Architect

GetInData | Part of Xebia

Workshops

Data Streaming: Analyze Your Data in Real-Time With Flink

Conference Center ADN, room 14, 4th floor

In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.

Adrian Bednarz

Data Engineer - ML / Stream processing

GetInData | Part of Xebia

Piotr Menclewicz

Big Data Analytics & Data Science

GetInData | Part of Xebia

Keynote presentation

9.10 - 9.30 | 20min

Presentation

Scaling from Research Previews to ChatGPT

ONSITE

ONLINE

OpenAI’s mission is to build safe AI, and ensure AI’s benefits are as widely and evenly distributed as possible. Building on the foundation of our API released in 2020 and the insights gained from our Labs platform with DALL-E 2, we have expanded our reach and impact in the field of artificial intelligence. The rapid adoption of ChatGPT presented challenges in scaling and addressing user needs, but it has also highlighted the potential of AI to impact a broad audience. Join us as we discuss how we navigated the challenges of scaling, and the exciting trajectory that lies ahead.

Joyce Lee

Member of Technical Staff

OpenAI

Eric Sigler

Member of Technical Staff

OpenAI

9.30 - 9.55 | 25min

Presentation

How to build an end-to-end platform ready for AI, sustainably and securely

ONSITE

ONLINE

The presentation addresses the hype surrounding Generative AI technologies and the imperative to construct an end-to-end platform for AI readiness. Highlighting infrastructure, platform architecture, and models as foundational pillars, the discussion emphasizes the need for sustainability and security across the AI stack. It navigates through the complexities of integrating Generative AI practically across industries, unveiling innovative strategies for long-term viability and progress in the AI landscape. Join us as we delve into this transformative journey, paving the way for AI as an ally in our technological advancement.

#bigdata #cloud #AI #insights

Sorin-Cristian Cheran

HPE Fellow, VP, Technology Incubation Organization

Hewlett Packard Enterprise

10.15 - 10.35 | 20min

Presentation

Transforming Big Data: Migration from Spark on Hadoop to Databricks

ONSITE

ONLINE

Hadoop has played an outstanding role in shaping the big data landscape, with numerous Hadoop clusters deployed across enterprises. However, managing these clusters comes with challenges such as complexity, high operational costs, and limitations in scaling on-premises infrastructure. These impact the optimization and efficiency of big data workloads. As a result, companies are increasingly turning to cloud-based solutions that offer alternatives to traditional Hadoop setups, avoiding dependencies on native Hadoop YARN schedulers.

Databricks provides an enterprise-ready Spark runtime within a cloud-native ecosystem and modern infrastructure, offering a compelling alternative. However, migrating from Spark on Hadoop to Databricks presents its own set of challenges. In this presentation, we'll dive into the motivations behind such migrations and explore the complexities involved in making the transition at IQVIA.

#bigdata #spark #hadoop #yarn #databricks #cloud #autoscaling #devops

Krzysztof Bokiej

Cloud and Big Data Architect

IQVIA

Techpoint Discussion Pannel

10.45 - 11.15 | 30min

Discussion Pannel

Putting Artificial Intelligence to Work

ONSITE

ONLINE

The panel will focus on the practical applications and implementation of artificial intelligence (AI). The discussion will revolve around how organizations and businesses are leveraging AI technologies to enhance productivity, solve complex problems, and drive innovation. Panelists will discuss challenges, solutions, success stories & failures, the impact on job market and workforce, new opportunities and address the future trends and advancements in AI.

Panel participants:

Marta Kowalczewska

Section Lead

Roche Informatics

Tomasz Mirowski

Chief Technology Officer

3Soft

Filip Rzyszkiewicz

Associate IT Architect Director

Iqvia

Konrad Wypchło

Data & Tools Tribe Lead

ING Hubs Poland

Moderator:

Marcin Choiński

Data & AI Director

TVN Warner Bros Discovery

Parallel session no 1

11.40 - 12.00 | 20min

Presentation

From Chaos to Clarity: Charting the Data Frontier with Databricks, Observability, and Cataloging Innovation

ONSITE

ONLINE

Architecture Operations & Cloud

Data Strategy

Unveiling the integration of Databricks and Delta Lake to orchestrate a seamless Data Mesh architecture. Strategies for developing and implementing a user-friendly data catalog, enhancing searchability and accessibility. Lessons learned in elevating data quality, with a focus on real-world applications and measurable outcomes. Insights into fostering a culture of data observability and its impact on data-driven decision-making

Sergey Enin

Engineering Manager - Data Platform

Dropbox

Parallel session no 2

11.40 - 12.00 | 20min

Presentation

Building trust in data through automatable data contracts

ONSITE

ONLINE

Data Analytics, BI & Visualisation

Data Strategy

High quality data is the backbone for data driven decision making. Unfortunately in today's world we often focus on technical complexity and rarely define clear expectations towards the data we process, let alone how we can automatically evaluate those expectations as part of our data processing frameworks. In this talk we will describe how HelloFresh went from paper agreements around data that were neither enforceable nor up to date, to config-driven, automatable data contracts that significantly lower the entry barrier to easy governance and automatic enforcement of strong data quality guarantees. We will showcase different types of data quality measurements from pre-flight checks that prevent breaking changes from going to production, to post-publishing checks to prove high quality data content.

#data #contracts #trust #quality #automation

Max Schultze

Associate Director of Data Engineering

HelloFresh

Abhishek Khare

Staff Data Engineer

HelloFresh

Parallel session no 4

11.40 - 12.00 | 20min

Presentation

I want to get away, I want to fly away… How Travel Big Data helps uncovering global events.

ONSITE

ONLINE

Data Analytics, BI & Visualisation

In the magnitude of travel data you may find clues to uncover global events. We learned this during Covid pandemic: we witnessed the pandemic outbreak, we saw increased flights traffic to the future European disease epicenter, we followed changing country regulations and then industry recovery. In our presentation we will guide you through Covid history seen from the flights big data perspective (8PB of data daily that had to be processed, cleansed and analyzed), and we will show how we extracted the most crucial elements from this wealth of data.

Maria Porada

Senior Principal Data Scientist

Sabre

Joanna Sarata

Senior Business Information Analyst

Sabre

Parallel session no 1

12.05 - 12.25 | 20min

Presentation

AI at petabyte scale - bring the algorithm to life!

ONSITE

ONLINE

MLOps, LLMops, AIOps

ML & Data Science

From “hello world” to the real world. We will present our FMCG customer's journey that resulted in a complex retail AI-driven solution built on the Cloudera stack.

What architectural decisions were made? Where did we succeed? What challenges did we face?

#ML & Data Science #MLOps #LLMops #AIOps

Michał Gutowski

Solutions Engineer

Cloudera

Tomasz Mirowski

Chief Technology Officer

3Soft

Parallel session no 2

12.05 - 12.25 | 20min

Presentation

How Data Products and Data Contracts drive Data Fabric architecture

ONSITE

ONLINE

Data Fabric architecture harmonizes the data landscape across on-premise, cloud, and hybrid environments, offering numerous benefits including enhanced data quality, improved data visibility, and faster time to market. However, integrating metadata across the corporate landscape creates ongoing challenges. In our session, we will share our experiences with clients from pharmaceutical and finance verticals on overcoming these obstacles through the implementation of data products and data contracts within the Dawiso platform.

#datagoverance #datafabric #data #datastrategy

Michal Heřmanský

Data Governance Team Lead

Billigence

Parallel session no 3

12.05 - 12.25 | 20min

Presentation

The Agile Data Ecosystem from First Principles

ONSITE

ONLINE

In this presentation Jonathan will focus on why data agility is critical to enduring organizational value, what are the typical challenges enterprises face, and how thinking about data ecosystems from first principles can identify new assets that are fundamental to driving automation and creativity.

#dataecosystem #activemetdata #datastrategy #dataproduct

Jonathan Sunderland

Data Evangelist

AB INITIO

Parallel session no 5

12.05 - 12.25 | 20min

Presentation

Automating ML training pipeline for intent detection with Metaflow and MLOps

ONSITE

ONLINE

Let me tell a story about how we automated the ML training pipeline for the intent detection model in Zendesk. It is a critical ML model for the next generation of Zendesk products that already helps our customers more than 1 million times every day, so the ML training process needs to be reliable, trustworthy and compliant with data protection regulations in different countries.

We decided that we would use Metaflow and apply best MLOps practices to accomplish that goal. Did it live up to its promise?

#Machine_Learning #MLOps
#Architecture_Operations&Cloud

Sebastian Niemczyk

Staff Machine Learning Engineer

Zendesk

Parallel session no 1

12.30 - 12.50 | 20min

Presentation

The New Industrial Revolution Is Here: Transforming Business in a New Era of Computing

ONSITE

ONLINE

AI is powering change in every industry. From generative AI and speech recognition to medical imaging and improved supply chain management, AI is providing enterprises with the computing power, tools, and algorithms their teams need to do their life's work. Since its founding in 1993, Nvidia has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. From the cloud to the office to the data centre to the edge, NVIDIA provides solutions that deliver breakthrough performance on enterprise AI and HPC workloads at any scale, driving business decisions in real time and resulting in faster time to value.

#AI #GenerativeAI #NVIDIA #CUDA

Aleks Polak

Enterprise Account Manager (AI & HPC)

Nvidia

Parallel session no 2

12.30 - 12.50 | 20min

Presentation

Strategic approach to complex Big Data migration in the regulated environment

ONSITE

ONLINE

For the last 15 years the mBank data environment has grown to be large in volume and very complex. As the Bank decided to switch technologies and get ready for cloud migration, preparation of transition strategy was needed. The DWH team, with support of PwC, prepared a complex analysis and plan that led to the preparation of strategy of cloud migration, including the area of Big Data. During the presentation representatives of the above mentioned team will present main challenges and obstacles in cloud migration for corporations working in a strongly regulated environment.

The main emphasis of it will concern the area of Big Data technologies.

#cloud #migration #bigdata #banking #data #datawarehouse #strategy

Marcin Darecki

Cloud DWH Team Leader

mBank

Wojciech Kulesza

Data Manager

PwC Polska

Parallel session no 3

12.30 - 12.50 | 20min

Presentation

Achieving High-Quality Data Products: our journey at Allegro

ONSITE

ONLINE

Reliable data is one of the foundations of every modern business, including Allegro. As we've grown, keeping our data clean and accurate has become more complex. In this presentation, I will walk you through our approach for data quality control and discuss the major challenges.

#DataQuality #DataGovernance #DataReliablity #DataEcosystem

Wojciech Taisner

Data Engineer

Allegro

Parallel session no 4

12.30 - 12.50 | 20min

Presentation

Real-time Clickstream Analytics on E-commerce Website Data using Ververica Cloud

ONSITE

ONLINE

This presentation will discuss how Ververica Cloud solves different use cases for real-time Clickstream Analytics on E-commerce websites. We will explore how VERA (Ververica Runtime Assembly) sets a course toward streamlined operations, resource efficiency, and enhanced productivity for Apache Flink applications running on Ververica Cloud. Ultimately, we will show how easy it is to get started and do real-time Clickstream Analytics on E-commerce website data using FlinkSQL on Ververica Cloud.

#apacheflink #streaming #streamprocessing #cdc #dataengineering

Abdul Rehman Zafar

Solutions Architect

Ververica

Parallel session no 5

12.30 - 12.50 | 20min

Presentation

Hybrid transactional & analytical processing in Snowflake

ONSITE

ONLINE

Majority of data analysis would benefit us more if we could analyze it as soon as possible. Combining transactional data with historical context allow us to provide better products or services. That’s why Snowflake provides HTAP (Hybrid Transactional & Analytical Processing) based on realtime ingest and hybrid tables. Check how it works during our presentation to see how you can employ Hybrid Tables to your own use cases while remaining aware of its current limitations.

#hybridProcessing #snowflake #dataArchitecture #performance

Jakub Maćkowiak

Data Architect

core3

Parallel session no 1

12.55 - 13.15 | 20min

Presentation

Bumpy 12 Months road to transform team of mercenaries into Product Minded Team

ONSITE

ONLINE

Architecture Operations & Cloud

Data Strategy

Get ready for the story of how a team of over 20 people transformed their mindset and productivity approach in 12 months - from measuring success by lines of code and closed tickets to delivering value to customers based on behavioral data. We have gone from mercenaries to missionaries
How many times have you attended meetings filled with blame, arguments, and complaints? In my presentation, I will show how everything changed over 12 months. Our conversion rate improved by 5x, and in top markets by 6x. Our reliability increased by 90%, and we went from taking 4 weeks to deliver the smallest feature to just 2 days. We implemented Domain-Driven Design into our daily work, making every refinement session much more productive, with 100% engagement.

Jakub Łęgowik

Tribe Technical Lead

Xapo Bank

Parallel session no 2

12.55 - 13.15 | 20min

Presentation

Data Team’s Best Friend: How Team Internet monitors data quality using its homegrown data watchdog.

ONSITE

ONLINE

Data Engineering

An impThe presentation delves into Watchdog, a robust tool for monitoring data processes and quality. Developed in-house by the Data&AI team at the Team Internet Group, which boasts over 40 brands, Watchdog is divided into three main facets: Input Validation, Service Monitoring, and Data Insights. While I'll briefly mention the latter two, my primary focus will be on Input Validation.

The Input Validation process serves as a vigilant service, stressing the importance of proactive data quality monitoring from the data's entry point. This ensures data ecosystem robustness, early discrepancy identification, and pipeline referential integrity. We achieve this through the application of rigorous Data Quality rules to uploaded data, utilizing advanced technologies like AWS Web Services and Python.

Natalia Sokołowska

Data Engineer

Team Internet Group

Parallel session no 3

12.55 - 13.15 | 20min

Presentation

A GreenOps Story: a journey to reduce Carbon footprint through Databases and Storage Optimizations at Spotify

ONSITE

ONLINE

Architecture Operations & Cloud

In today’s world, managing data wisely is not just good for business—it’s crucial for the planet. Spotify, known for streaming your favorite tunes, is also playing a leading role in sustainability by committing to reaching net zero greenhouse gas (GHG) emissions by 2030 and leveraging our platform to raise awareness and drive engagement among millions of listeners and creators. In this talk, we will share how we’re making big strides in reducing our carbon footprint by optimizing our Databases and Storage Infrastructure at scale.

Juliana Araujo

Group Product Manager

Spotify | Founder of Beela & PlayComply

Parallel session no 4

12.55 - 13.15 | 20min

Presentation

Fast and Scalable Machine Learning Model Deployment

ONSITE

ONLINE

MLOps, LLMops, AIOps

ML & Data Science

An important challenge faced by numerous data-mature organizations is the reliable and efficient deployment of a substantial number of machine learning models. At FreeNow, we have tackled this issue by developing a pipeline that enables us to deploy over 70 machine learning models. To accomplish this, we leverage a range of open-source tools, including GitLab CI, MLFlow, BentoML, Docker, and Kubernetes.

Vikramjit Sidhu

Machine Learning Engineer

FreeNow

Parallel session no 1

14.05 - 14.35 | 30min

Presentation

Kubernetes Takes the Wheel - a Case Study of the Migration from Hadoop to K8s at Play

ONSITE

ONLINE

Architecture Operations & Cloud

Over the years, companies have modernised their data platforms by moving to the cloud or by building Kubernetes-based platforms that run on-premise or in a hybrid way. In our presentation, using examples from our experiences on the PLAY project, we showcase how to migrate a Hadoop cluster to Kubernetes, as well as how to operate additional software such as HDFS, Airflow, Spark, Jupyter, Superset, Metastore, Trino, Ranger etc. in this environment in a reliable and cost-efficient way. We will demonstrate that this solution is cloud-ready and 100% open-source, so it can be also implemented in any cloud environment, for example, using cloud storage, on-premise, or as a hybrid, all in a fully automated manner.

#Kubernetes #open-source #dataplatform #hybrid

Radosław Szmit

Data Platform Architect

GetInData | Part of Xebia

Tomasz Sujkowski

Staff Big Data DevOps

GetInData | Part of Xebia

Kosma Grochowski

Data Engineer

GetInData | Part of Xebia

Parallel session no 2

14.05 - 14.35 | 30min

Presentation

A Data Mesh Quest

ONSITE

ONLINE

Data Engineering

Data Strategy

The initial concept of Data Mesh as introduced by Zhamak Dehghani in 2019 is described as a socio-technological approach. It has implications on how data is organized and worked with. This talk will reflect on a real-world Data Mesh implementation journey and the reoccurring question: how to do it right? Every organization has its specific requirements and thus the introduction of a Data Mesh as data operating model can not follow a one size fits all approach. We’ll share pitfalls we had to deal with and how they can be avoided.

#DataMesh #DataBeyondTheory #DomainDrivenDesign #DataPlatform #DataProduct

Norbert Wirth

Global VP Data

PAYBACK

Parallel session no 3

14.05 - 14.35 | 30min

Presentation

Don't trust the tip of the iceberg: uncovering the main challenges in replacing lambda with kappa architecture

ONSITE

ONLINE

Data Engineering

The wide adoption of table formats is helping many engineers to replace their lambda architectures with kappa architectures. Lamba architecture forces you to develop, deploy, and maintain two different applications to implement your business use case: the batch layer and the streaming layer. Imagine how easy your life would be if you had to handle the lifecycle of just one application. It sounds like a dream, isn't it? But let's imagine also you need to write 3 million messages every 30 minutes against a table of 200TB, you have tens of tables like these, and for each one, you need to guarantee near-real-time analysis. It's clear that, without the right tricks, the dream can immediately turn into a nightmare. In this talk, we'll explore our journey while migrating a data-intensive application from writing against plain parquet (with lambda architecture) versus iceberg table (with kappa architecture). Our stress test against this cool piece of technology uncovers the main challenges you need to face with complex use cases, as fascinating as difficult.

Giuseppe Lorusso

Senior Data Engineer

Agile Lab

Parallel session no 4

14.05 - 14.35 | 30min

Presentation

Data Lineage for the Streaming Universe

ONSITE

ONLINE

Data Engineering

Streaming & Real - Time Analytics

- Explanation of the problem with gathering lineage from streaming jobs
- Quick introduction to OpenLineage and why it's a good fit for gathering lineage from streaming jobs
- Description of what interfaces were available for collecting lineage, and what there are now
- What you can do to collect streaming lineage now
- What's coming in the future to further improve the quality of streaming job observability

Maciej Obuchowski

Data Engineer

GetInData | Part of Xebia

Parallel session no 5

14.05 - 14.35 | 30min

Presentation

Garbage In Garbage Out - The Art and Science of Effective Data Annotation

ONSITE

ONLINE

ML & Data Science

Data Strategy

Today's world is all about AI models, they are everywhere. Unfortunately, most of these models heavily depend on some kind of feedback, such as labels. The higher the quality and accuracy of our labels, the better our model can be. Our model is as good as our labels.
How do we acquire such labels? How can we monitor and evaluate its quality? What is the correct metric? Where should we start? In this talk, Sivan will go over different types of annotation cycles and the different phases they contain. She will describe the impact each phase has on the quality and usability of the output labels. In addition, she will review the considerations and tradeoffs that need to be taken into account when defining a new label collection project. At the end of this talk you will know where to put your attention next time, in order not to throw your labels away

Sivan Biham

Head of Research

Healthy.io

Parallel session no 2

14.40 - 15.10 | 30min

Presentation

The challenges of making vector search billion-scale

ONSITE

ONLINE

Architecture Operations & Cloud

ML & Data Science

Semantic search based on vector similarity is crucial for various modern applications, including those based on Large Language Models, such as GPT. Things become challenging when we go from thousands to millions or even billions of embeddings, and we still want to keep the same performance. We came through the harsh lessons of scaling a vector database at Qdrant and would like to share what we've learned on that journey. This talk will be a detailed description of our design choices and the infrastructure that backs them up.

Kacper Łukawski

Developer Advocate

Qdrant

Parallel session no 3

14.40 - 15.10 | 30min

Presentation

Your personal LLM and RAG-backed Data Copilot

ONSITE

ONLINE

Data Engineering

Generative AI

Over the past year we have observed the rise of open-source Large Language Models (LLMs) that in many cases outperform proprietary state-of-the-art (SOTA) ones. On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation. In this presentation, we will show how we built upon these three concepts a robust Data Copilot combining both proprietary and open-source LLMs that can boost performance of everyone working with data platforms.

#LLM #RAG #Copilot #DataOps #dbt

Marek Wiewiórka

Chief Data Architect

GetInData | Part of Xebia

Parallel session no 1

15.15 - 15.45 | 30min

Presentation

Simply simple Event Processing

ONSITE

ONLINE

Architecture Operations & Cloud

Streaming & Real - Time Analytics

Let's dive into the tech behind Revolut's own Event Streaming and Event Store backbone solutions. This powerhouse not only juggles billions of events monthly but also provides financial services to a whopping 35M retail and 500K business users globally.

What's on the menu?
1. Event Streaming 101: Building one from scratch.
2. Our secret sauce: Why we DIY'd our Event Streaming and Event Store.
3. Scale it like Revolut: Keeping it smooth for millions.
4. Playing by the rules: Adhering to global regulations like a boss.
5. Lessons from the trenches: What we learned from this epic tech adventure.

And here's a cherry on top - but let me keep it secret for now!

Wojciech Ptak

Engineering Executive & Head of Engineering for Revolut Business

Revolut

Parallel session no 2

15.15 - 15.45 | 30min

Presentation

Enhancing High-Volume Clickstream Analytics with Apache Iceberg

ONSITE

ONLINE

Architecture Operations & Cloud

Data Engineering

1. Apache Iceberg Overview: Presenting an overview of Apache Iceberg's features that are pertinent to addressing these challenges.
2. Implementation Insights: Sharing key strategies and steps in integrating Apache Iceberg into the existing pipeline, the main challenges and the lessons learned.
3. Results and Improvements: Explaining the significant improvements like query performance and costs, discussing the future improvements.

Hudson Santos

Staff Data Engineer

Swiss Re | Independent Consultant

Parallel session no 3

15.15 - 15.45 | 30min

Presentation

Implement decentralized data governance in a data mesh architecture

ONSITE

ONLINE

Data Engineering

The session will provide an overview of how to achieve end-to-end governance in a data mesh environment, highlighting the key principles and practices that enable organizations to manage and govern their data effectively in a distributed and decentralized setting.

#datagoverance #datamesh #data

John Thomas

Staff Data Engineer (Data Product Computational Goverance)

Volvo Cars

Parallel session no 4

15.15 - 15.45 | 30min

Presentation

Unleashing Insights - The Impact of Data Infrastructure on Decision Making

ONSITE

ONLINE

Data Engineering

Exploring the transformative impact of data infrastructure on decision-making within Truecaller. Delving into the platform's data ecosystem, unveiling how robust data infrastructure improves decision making.

#dataengineering #datainfrastructure #datadriven

Archana Sathyanarayan

Senior Engineering Manager - Data Infrastructure

Truecaller

Parallel session no 5

15.15 - 15.45 | 30min

Presentation

Digital Experimentation: The role of A/B Testing in the AI Landscape

ONSITE

ONLINE

In this talk, we'll explore the exciting world of digital experimentation and its crucial role in businesses that deal with a lot of data. We'll start by demystifying what digital experimentation is, with a focus on A/B testing—a simple yet powerful tool for improving decisions and strategies. You'll learn how to apply these concepts to your business and discover the main tools to help you. We'll also touch on the connection between this kind of experimentation and causal inference, which allows you to understand the why behind your business outcomes. Finally, we'll look ahead to see how A/B testing is evolving in the fast-paced world of AI, giving you a glimpse into the future of digital experimentation. This talk aims to clearly understand how these methods can be a game changer for your business in the AI-driven landscape.

Alessandro Romano

Senior Data Scientist

Kuehne+Nagel

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.

Roundtables

1. Practical application of the Behaviour-Driven Development (BDD) method that ensures the inclusion of business users in the bank's SDLC process in a big data environment

ONSITE

In order to implement high quality solutions, it is crucial to involve users and business analysts from the very beginning of the software development process(SDLC). BDD makes it possible to ensure that users/business analysts collaborate with developers right from the very beginning of the project, starting from the process of creating Epics and User Stories to automatic acceptance testing. BDD plays a special role during the implementation and deployment of solutions in the area of banking processes (e.g. KYC), where the level of complexity of the process requires cooperation between users/business analysts and developers from the earliest stage of the start of project work. The practical application of the Cucumber tool to support the BDD process in Big Data environments (Hadoop/Spark) will be addressed.

Behaviour-Driven Development
Improving the quality of the SDLC process
Cucumber Framework

Janusz Wilczek

Leader of Data Management Team / (Big) Data Architect

PwC Polska

Roundtables

2. DataOps - buzzword, standard or legacy?

ONSITE

Let’s ignite the DataOps discussion and delve into its historical roots, current landscape, and future trajectory. Join us as we shape our conversation around the key aspects outlined below, unraveling the intricate layers of DataOps in the ever-evolving tech landscape.

How you personally describe DataOps?
Are you actively using DataOps in your daily professional endeavours?
What challenges did you notice during introduction of DataOps?
What’s biggest benefit that DataOps brings to the table in your organisation?
What’s the biggest tooling gap hindering the seamless implementation of DataOps principles?
Does DataOps demand an extension or a redefinition in your perspective?

Marcin Stanisławski

Head of Software Development, Data Engineering

Tesco Technology

Roundtables

3. Bridging the Gap: Data Engineering with Low-Code and Advanced Coding Practices

ONSITE

Join us for a roundtable discussion that covers the problematics of data engineering landscape and development process, where we explore the dynamic changes in the world of low-code and code-focused solutions for data engineering, highlighting how both approaches are evolving and incorporating modern platforms and engineering practices. This conversation aims to shed light on the important decision points for data professionals in adopting either approach or a hybrid model, focusing on aspects such as development efficiency, team collaboration, scalability, and the integration of advanced technologies including AI.

Aliaksei Burak

Solutions Consultant

IBA Poland

Roundtables

4. What Product Owners and Managers should know about ML and LLMs: The Challenges of Business Integration

ONSITE

This discussion will focus on the perspectives of Product Owners and Managers regarding ML and LLM projects. The growth in popularity of ChatGPT is unmatched, with everyone talking about it these days. Yet, not all companies are integrating it into their systems. What are the challenges associated with integrating LLMs? What potential do they bring to the table? And let’s not overlook traditional ML - how do we decide which to choose? During the discussion, we will explore:

Avoiding the hype and selecting the right solution - traditional ML, LLM, or something else.
Deciding on the right scope and ensuring ROI is achievable.
What’s beyond a simple PoC? - how we can build solutions that scale and are maintainable in the future.
Cost calculation - from hardware and cost bills to the less obvious prices you have to pay.
Legal traps.

Marcin Szymaniuk

CEO, Data Engineer

TantusData

Roundtables

6. The only way to eat elephant is… How migrate ML and advanced analytics to the cloud?

ONSITE

ML & Data Science

Data Analytics, BI & Visualisation

We will talk about building an analytical platform in the cloud and migrating from an on-prem solution.
This roundtable discussion will try to answer questions such as:
• how to collect requirements
• divide implementation
• find a business application and overcome the next stages smoothly

Łukasz Bezulski

Head of the Artificial Intelligence Team

Bank Millennium

Tomasz Kubica

Head of Data Science Unit in Customer Intelligence Department

Bank Millennium

Roundtables

9. Inward and Outward-Facing Generative AI: How to Leverage LLMs for Internal and External Use-Cases

ONSITE

As the Generative AI revolution accelerates, organizations are increasingly looking to the technology to offer novel solutions to business problems. While product managers consider its external application to solve customer problems, operational leaders explore internal use-cases to help teams perform their jobs more efficiently. Join this roundtable to discuss how savvy organizations can leverage Gen AI to serve both end customers as well as internal teams. We’ll explore:

Low-hanging fruit organizations can build for customers to quickly deliver value with Gen AI
Data platform preparations to accommodate rapid Gen AI feature building
Data privacy concerns that restrict use of LLMs and how teams can appropriately adjust
Rolling out Gen AI-powered internal tools persuasively

Scott Bastek

Senior Applied Scientist

PagerDuty

Roundtables

10. Garbage In, Catastrophe Out - Why is data governance and data quality so important in the age of data & (Gen) AI?

ONSITE

Let's discuss the importance of data governance for AI success and discuss how different aspects of data management can help you to build the right data products that your teams and algorithms can use to drive business value. Topics of this roundtable include data trust, data quality, data contracts, data catalogs, data mesh and many more.

Max Schultze

Associate Director of Data Engineering

HelloFresh

16.55 - 17.20 | 25min

Presentation

Come talk to me: A Generative AI Model for Personalized Shopping Experience

ONSITE

ONLINE

ML & Data Science

Generative AI

We propose a novel approach to personalized shopping experience: starting from text description or an image, our assistant can help the customer in shopping IKEA items, or advise on furnishing their room - providing. suggestions, feedback, and information about the products.

#generativeAI #multimodal #personalization #interaction

Konrad Banachewicz

Principal Data Scientist

IKEA | Kaggle Grandmaster

10.00 - 12.00 | 120min

Online Traning Session

Let’s build a high performance cluster together. Best practices and hidden secrets of high performance computing systems infrastructures

ONLINE

The workshop will be conducted in the form of a lecture with examples of real projects. Participants will have the opportunity to ask questions and get answers. At the end of the workshop we will conduct a short test of the knowledge gained. The language of the lecture will be English.

Volodymyr Saviak

AI & HPC Specialist

Hewlett Packard Enterprise

Artur Studziński

AI/ML and Big Data Solution Architect

Hewlett Packard Enterprise

10.00 - 12.00 | 120min

Online Traning Session

Empowering local teams to use global data in international organizations - finding right balance between speed and governance

ONLINE

In today’s interconnected business landscape, local market execution teams play a pivotal role in driving growth and customer engagement, especially in multinational organizations. This workshop dives deep into designing a comprehensive model that bridges global data initiatives with local execution needs. We’ll explore the trifecta of technology, people, and processes, ensuring seamless data adoption across organizational layers. From technical stack to user personas, and governance guidelines to communication channels, this workshop equips data practitioners, leaders, and managers with actionable insights to build agile and effective data structures and data teams combining global ambition for streamlining and governing data with local team needs to use it sooner than possible.

Mateusz Gemra

IT Senior Manager, Data Analytics

Reckitt

Roman Ruterle

IT Director Data Analytics

Reckitt

10.00 - 12.00 | 120min

Online Traning Session

Intelligent chatbot with RAG on .NET and Azure

ONLINE

How to build an MS Teams BOT with your own knowledge databases based on Bot framework v4 and Azure's AI offerings?

A beginner-friendly introduction into technologies used in developing your very own a chatbot backed by Azure cloud AI offerings for your app to understand conversational language and to provide human-like responses. The lecture will go through general UI elements used in Bot Framework, where the AI stuff fits, and will run through demonstrate behaviour of those components individually and finally all of them working together in a demo of the existing bot built on top of said technologies.

Tomi Tudor

Senior .NET Engineer

Intellias

12.30 - 14.00 | 90min

Online Traning Session

Rust: The Strong Backbone for Serving open source LLMs

ONLINE

MLOps, LLMops, AIOps

Currently open source Large Language Models have gained popularity due to their advanced capabilities in natural languageunderstanding and text generation. Even though the number of applications is constantly increasing, transitioning them from research to production-ready solutions remains a complex and demanding challenge. In my workshops I would like to demonstrate how Rust's unique capabilities can effectively address these challenges. Several points like open source LLMs models inference and serving will be addressed.

#LLMOps #Rust #ModelsServing

Kuba Sołtys

Solutions Architect

Bank Millennium

12.30 - 14.00 | 90min

Online Traning Session

How to establish data trust in your most critical assets

ONLINE

Lack of trust in data is a common problem both for data leaders, data practitioners and business leaders. Common challenges include:

No prioritization of data assets
Silos in both organization and tools
Tedious process of setting up data quality rules, with significant maintenance over time
Difficulty to understand where and why an issue occurred

Learn how Validio's Data Trust Platform help companies tackle these challenges head on by combining data catalogue, data lineage and data quality in one platform.

Elof Gerde

Strategic Project Manager

Validio

Oliver Gindele, PhD

Solutions Engineering Lead

Validio

Knowledge Zone

Zendesk Data Platform

ONLINE

Let me tell you a story of introducing a new data platform at Zendesk. I aim to cover its challenges from a technical (mixing technologies of AWS and Snowflake) and organisational perspective (Zendesk is a heavily remote and geographically scattered workplace).

Data is at the heart of Zendesk, so getting things right with a tight deadline was vital to help the business move forward at the right pace.

The circumstances presented opportunities (diverse technologies and workforce) and brought difficulties simultaneously.

#DataLake #AWS #Snowflake

Adam Nowacki

Data Engineering Manager

Zendesk

© 2024 | This site uses cookies.