Agenda 2020

February 27, 2020


The conference agenda is almost ready. There is morning and afternoon plenary session, 5 simultaneous tracks and roundtables discussions session. Plus of course evening networking party!

8.00 - 9.00

Registration and welcome coffee

9.00 - 9.15

Conference opening

Przemysław Gamdzyk

CEO & Meeting Designer, Evention

Adam Kawa

CEO and Co-founder, GetInData

9.15 - 10.45

Plenary Session

9.15 - 9.45

To be announced soon

9.45 - 10.15

Credit Risk in practice on a global scale. New technology platform and methodologies in practice

Open source technologies and new machine learning methods are changing regulatory Credit risk in recent times. We will talk how we handle data globally from entire ING, what technologies for modelers experience we use, what machine learning tools we incorporate. All of this being compliant with more strict regulatory frameworks than ever. Handling risk on a balance sheet 1,5 size of Poland GDP is a really interesting task for modelers, data scientists and data engineers.

Keywords: #CreditRisk, #Python, #BigData, #UseCase #MachineLearning, #StatisticalModeling, #DataScience

Marcin Brożek

Credit Risk Modelling Expert, ING Tech Poland

Konrad Wypchło

Senior Chapter Lead, ING Tech Poland

10.15 - 10.45

Leveraging hybrid cloud for real-time insights with the new Cloudera Data Platform

The new Cloudera solutions for hybrid cloud environment. Adding Apache Flink integration to the CDP. Solving the real-life challenges based on the use cases from the Polish market. Apache Flink and CSP roadmap.

Key words: #hybrid_cloud, #data_in_motion, #Cloudera_Data_Platform, #stream_processing

Marton Balassi

Manager, Streaming Analytics, Cloudera

Kamil Folkert

CTO, Member of the Board, 3Soft

10.45 - 11.15

Coffee break

11.15 – 15.30 Simultaneous sessions

Architecture, Operations and Cloud

Data Engineering


Streaming and Real-Time Analytics


Artificial Intelligence and Data Science

Data Strategy and ROI


11.15 - 11.45

From Containers to Kubernetes Operators for a Datastore

Keywords: #docker #container #kubernetes #operator #orchestration

Philipp Krenn

Developer, Elastic

11.15 - 11.45

Will we see driverless cars in 20s?

Keywords: #autonomousdriving #dataingestion #petabytescale #hardwareinthehloop #mapr #spark #openshift

Piotr Frejowski

System Architect, DXC Robotic Drive Program, DXC Technology

Piotr Frejowski

System Architect, DXC Robotic Drive Program, DXC Technology

11.15 - 11.45

Creating an extensible Big Data Platform for advanced analytics - 100s of PetaBytes with Realtime access

Keywords: #bigdata #scalability #hadoop #spark #analytics #datascience #dataplatform

Reza Shiftehfar

Engineering Management & Leadership, Uber

11.15 - 11.45

Building Recommendation Platform for ESPN+ and Disney+. Lessons Learned

Keywords: #recommendersystems  #ML #cloud #experimentation

Grzegorz Puchawski

Data Science and Recommendation, Disney Streaming Services

11.15 - 11.45

From bioreactors to kibana dashboards

Fabian Wiktorowski

IT Expert, Roche

11.45 - 11.50

Technical break

11.50 - 12.20

Replication Is Not Enough for 450 PB: Try an Extra DC and a Cold Store

Keywords: #Hadoop #datasecurity #resilience #in-house #storage

Stuart Pook

Senior Site Reliability Engineer, Criteo

11.50 - 12.20

Data Platform at Bolt: challenges of scaling data infrastructure in a hyper growth startup

Keywords:  #aws #datalake #datawarehouse #preprocessing #machinelearning

Łukasz Grądzki

Engineering Manager, Bolt

11.50 - 12.20

Interactive Analytics at Alibaba


Yuan Jiang

Senior Staff Engineer, Alibaba

11.50 - 12.20

Building a Factory for Machine Learning at Spotify

Keywords: #ml #kubeflow #tensorflow #ml-infra

Josh Baer

Product Lead, Machine Learning Platform, Spotify

11.50 - 12.20

Abstraction matters

Keywords: #lowcode, #executionabstraction, #datavirtualization

Konrad Hoszowski

Technical Account Manager, AB Initio

12.20 - 12.25

Technical break

12.25 - 12.55

How to make your Data Scientists like you and save a few bucks while migrating to cloud - Truecaller case study

Keywords: #cloudmigration #bigquery #airflow #kafka

Juliana Araujo

Data Product Manager, Truecaller

Fouad Alsayadi

Senior Data Engineer, Truecaller

Tomasz Żukowski

Data Analyst, GetInData

12.25 - 12.55

Kafka-workers, Parallelism First

Keywords: #kafka, #data processing, #high-performance

Tomasz Uliński

Software Developer, RTB House

12.25 - 12.55

Adventure in Complex Event Processing at telco


Jakub Błachucki

Big Data Engineer, Orange

Maciej Czyżowicz

Technical Leader for Analytics Stream, Orange

Paweł Pinkos

Big Data Engineer, Orange

12.25 - 12.55

Utilizing Machine Learning To Optimize Marketing Spend Through Attribution Modelling

Keywords: #attribution #datascience #statisticalmodeling #marketingmix #interdisciplinary

Arunabh Singh

Lead Data Scientist, HiQ International AB

12.25 - 12.55

It's 2020. Why are we still using 1980s tech?

Keywords: #Analytics #SQL #DWH #CaseStudy #BigData

Arnon Shimoni

Product Manager and Solutions Architect, SQream

12.55 - 13.50


13.50 - 14.20

DevOps best practices in AWS cloud

Keywords:  #aws_cloud #devops #best_practices #infrastructure_as_a_code

Kamil Szkoda

DevOps Team Leader and Product Owner , StepStone Services

Adam Kurowski

Senior DevOps, StepStone Services

13.50 - 14.20

Presto @ Zalando: A cloud journey for Europe’s leading online retailer

Keywords:  #CloudAnalytics #Presto #DataVirtualization #SQL-on-Hadoop #DWH

Wojciech Biela

Co-founder & Senior Director of Engineering, Starburst

Max Schultze

Data Engineer, Zalando SE

13.50 - 14.20

Network monitoring, data processing, forecasting, fraud and anomaly detection- using Spark, Elasticsearch, Machine Learning and Hadoop

Keywords: #spark #elasticsearch #machinelearning #hadoop #dataprocessing

Kamil Szpakowski

Big Data Main Specialist, T-Mobile

13.50 - 14.20

Feature store: Solving anti-patterns in ML-systems

Andrzej Michałowski

Head of AI Research & Development, Synerise 

13.50 - 14.20

Omnichannel Personalization as example of creating data ROI - from separate use cases to operational complete data ecosystem

Keywords:  #ROI #real-timeomnichannelpersonalization #scalingdataecosystem #businessengagement #harvesting

Tomasz Burzyński

Business Insights Director, Orange

Mateusz Krawczyk

Personalization Solutions Product Owner, Orange

14.20 - 14.25

Technical break

14.25 - 14.55

The Big Data Bento: Diversified yet Unified

Keywords: #bigdatabento #cloud #unifiedanalyticsplatform #unifieddataanalyticsplatform #spark

Michael Shtelma

Solutions Architect, Databricks

14.25 - 14.55

Towards enterprise-grade data discovery and data lineage at ING with Apache Atlas and Amundsen

Keywords: #BigData, #DataDiscovery, #DataIngestion, #Lineage, #MetadataGovernance, #Data-Driven

Verdan Mahmood

Software Engineer, ING

Marek Wiewiórka

Big Data Architect, GetInData

14.25 - 14.55

Monitoring & Analysing Communication and Trade Events as Graphs

Keywords:  #graphAnalytics #transactionProcessing #FlinkGelly #Elasticsearch #Kibana

Christos Hadjinikolis

Senior Consultant, Lead ML Engineer, Data Reply UK

14.25 - 14.55

Neural Machine Translation: achievements, challenges and the way forward

Keywords:  #machinetranslation #deeplearning #adversarialexamples #datascience

Katarzyna Pakulska

Data Science Technology Leader, Findwise

Barbara Rychalska

Senior Data Scientist and Data Science Section Leader, Findwise

14.25 - 14.55

Data Science @ PMI – Journey from business problem to the data product industrialization

Keywords:  #UseCase #CI/CD #BestPracticesforData Science #DataProduct #Reproducibleresearch

Michał Dyrda

Senior Enterprise Data Scientist, Philip Morris International

Maciej Marek

Enterprise Data Scientist , Philip Morris International

14.55 - 15.00

Technical break

15.00 - 15.30

How to send 16,000 servers to the cloud in 8 months?

Keywords:   #Openx #gcp #scale #adtech #migration

Marcin Mierzejewski

Engineering Director, OpenX

Radek Stankiewicz

Strategic Cloud Engineer, Google Cloud

15.00 - 15.30

Optimize your Data Pipeline without Rewriting it

Keywords:  #data-driven #optimize #data-pipeline #operation #improvement

Magnus Runesson

Senior Data Engineer, Tink

15.00 - 15.30

Flink on a trip - a real-time car insurance system in a nut(shell)

Wojciech Indyk

Streaming Analytics and All Things Data Black Belt Ninja,

15.00 - 15.30

Reliability in ML - how to manage changes in data science projects?

Keywords: #datascience #datamanagement #revisioncontrol #datapipeline

Kornel Skałkowski

Senior AI Engineer, Consonance Solutions

15.00 - 15.30

Using data to build Products

Keywords:  #NewProducts #MachineLearning #DataFueledGrowth #DataGuidedProductDevelopment #ScalingNewProduct

Ketan Gupta

Product Leader,

15.30 - 16.00

Coffee break

16.00 – 17.25 Roundtables sessions

16.00 - 16.05


Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be 2 rounds of discussion, hence every conference participants can take part in 2 discussions


16.05 – 16.45    1st round

16.50 – 17.30    2nd round

 Snorkel Beambell – Real-time Weak Supervision on Apache Beam 

1. Deep Learning models have led to a massive growth of real-world machine learning that allows Machine Learning Practitioners to get the state-of-the-art score on benchmarks without any hand-engineered features. 2. The challenge with continuous retraining is that one needs to maintain prior state (e.g., the learning functions in case of Weak Supervision or a pretrained model like BERT or Word2Vec for Transfer Learning) that is shared across multiple streams. 3. Apache Beam’s Stateful Stream processing capabilities are a perfect match to include support for scalable Weak Supervision.

Suneel Marthi

Principal Technologist - AI/ML, Amazon Web Services

Real-life machine learning at scale using Kubernetes and Kubeflow

How to build a machine learning pipeline to process 1500 TB data daily in a fast and cost-effective way on Google Cloud Platform using Kubeflow? How to serve TensorFlow model with almost 1M requests per second and latency < 10ms on Kubernetes? Is Kubernetes and Kubeflow ready to serve data scientists?

Michał Bryś

Data scientist, OpenX

Michał Żyliński

Customer Engineer, Google

Managing workflows at scale

How to build and maintain thousands of pipelines in the organisation? What are the biggest pain points in orchestrating hundreds of ETLs? What open source and managed solutions are available?

Paweł Kupidura

Data Engineer, Bolt

Practical application of AI

Industry 4.0 and AI – are we ready for the 4th industrial transformation? Who should be the beneficiary of the Industry 4.0? Key barriers to the implementation of AI projects in organizations. Real cases of AI in Industry.

Natalia Szóstak

Head of R&D, TIDK

The need for explainable AI

With the spread of AI-based solutions, more and more organizations would like to understand the reasons for system decisions. It’s especially interested in regulated industries. The session will cover so-called white-box methods, as well as modern approaches to AI explainability which allow understanding more complex models.

Kacper Łukawski

Data Science Lead, Codete

A.I. and his vulnerabilities

Nowdays AI and ML are everywhere as long as look like your project is not cool without them, they are able to to outperform human performaces in some tasks
but can you always trust the math behind? Do we really need A.I. & ML always?
Open discussion on vulnerabilities and easy hacks intrinsics in math models and the impact in real life scenarios and our security.

Sebastiano Galazzo


Deploying ML models in real-time using stream processors

Today, streaming engines enable data pipelines to continuously deliver AI-based outcomes towards what is frequently called continuous intelligence. Which are the most important architectural choices when latency comes as the top requirement? How can we scale thousands of models in production? In which ways realtime environments help ML model evolution with zero downtime?

Andrea Spina

CTO, Radicalbit

Data discovery – building trust around your data

Worldwide growth of data has changed business landscape forever. Multiple organizations undergo transformation triggered by the data revolution. While one can understand the benefits of collecting bigger data volumes, it has revealed additional challenges when trying to effectively use it. Ability to explore the data and increasing compliance demands force us to think about solutions to leverage power of metadata. Data description evolve from being a simple schema definition to catching application context, behavior and how it is changing over time.
Let’s discuss data discovery in context of use cases, technologies and possible challenges.

Damian Warszawski

Software Engineer, ING Tech Poland

What to do with my HDP/CDH cluster with new Cloudera licensing model

After the merger with Hortonworks, Cloudera becomes a single vendor that builds a distribution that consists of major components from so called Hadoop Ecosystem (e.g. Hadoop, Spark, Hive, Ranger). While these components itself are open-source, access to binaries that are critical to install/upgrade the clusters will be limited to only customers who purchase a paid subscription. This means that thousands of the companies that currently use Hadoop for free, will need to decide what to do next. Should I pay for a subscription or compile own binaries to build own distribution? Should I stop using on-premise Hadoop and go to the public cloud instead? During this panel we explore this topic and try to answer these questions based on our vendor-neutral experience when working with our customers who have large production installations of HDP/CDH clusters.

Krzysztof Zarzycki

Big Data Architect, CTO and Co-founder, GetInData

Being efficient data engineer. Tools, ecosystem, skills and ways of learning

What does it mean to be a productive (data) engineer? Is it about the tools we use? Is it the mindset we have? Is it the environment we are surrounded by? Let’s share and discuss war stories, learning resources, methodologies and libraries that help us escape the gumption traps in the daily life of an engineer. Discussion will be divided into 4 areas: debugging, implementation, communication and learning.

Rafał Wojdyła

Data Engineer,

Databases in Kubernetes: from bare metal to cloud native

Initial cloud-native conquests started with stateless services but gradually turned towards data management systems. DBMS have relied on centralized, bare metal servers for decades. Cloud-native architectures are a big new technology shift for such systems. Many commercial and open source databases already provide cloud-native adoptions of their products. Among the most interesting cloud-native converts are analytic databases, which imply additional requirements to storage and clustering techniques in order to run fast analytic queries over billions, or even trillions, of rows. Examples include the MySQL clustering project Vitess, which has recently reached CNCF graduation level, and ClickHouse, an extremely fast and scalable analytical database that is being converted to cloud-native operation by Altinity.

Join me to discuss various aspects of running databases in Kubernetes. This is a new technology that has a lot of caveats, such as storage. At the same time databases in Kubernetes promise substantial benefits to the users of such applications as well as companies that operate them. We will explore these issues as well as the path to maturity.

Alexander Zaitsev

Co-founder & CTO, Altinity

Handling Data Science experiments efficiently – methodologies, tools, collaboration

A typical data science project requires usually a lot of playing around with the data. What’s the best method to keep track of all the experiments within in a team? Using a spreadsheet, custom tool or recently popular MLFlow, Neptune or maybe other tools? Let’s talk about what works best depending on the type of a project, a team’s size and other aspects.

Michał Rudko

Big Data Analyst / Architect, GetInData

Managing a Big Data project – how to make it all work well?

How to ensure successful adoption of Big Data and Analytics systems? It is a challenge for most organizations. Let’s discuss how to promote user-centric approach, leverage experience design and manage user expectation on Big Data projects. I would be happy to hear you opinion and answer your questions, based on my practical experience applying Design Thinking and architecture design methodologies. I believe this conversation will be interesting for Architects, Tech Leaders, Product Managers and C-level folks.

Michał Rudko

Big Data Analyst / Architect, GetInData

Analytics and Customer Experience Management on top of Big Data

How to ensure successful adoption of Big Data and Analytics systems? It is a challenge for most organizations. Let’s discuss how to promote user-centric approach, leverage experience design and manage user expectation on Big Data projects. I would be happy to hear you opinion and answer your questions, based on my practical experience applying Design Thinking and architecture design methodologies. I believe this conversation will be interesting for Architects, Tech Leaders, Product Managers and C-level folks.

Taras Bachynskyy

Director, Big Data & Analytics, SoftServe

Challenges of building a modern & future-proof data processing platform

The speed of changes in IT and our companies seems to never stop increasing, especially in the field of BigData. To keep up with it we need to move fast and be smart about it, but how to actually achieve that? How to predict future needs for processing and tools? How to prepare for it? What kind of trade offs we can make? We’ll try to answer that together and share good practices and experience during this session.

Monika Puchalska

Engineering Manager, Zendesk

Serverless data warehousing

Definition what is serverless warehouse. List of solutions which are considered serverless. Data ingestion. Data storage. Data processing, pricing and cost efficiency. Advantages and disadvantages of both serverless and on premise.

Arkadiusz Gąsior

Data Engineer, GetInData

Scale Your Logs, Metrics, and Traces with the Elastic Stack From traditional applications to microservices and Kubernetes

How do you tackle your monitoring and observability problems? There is a high chance that you are using the Elastic or ELK Stack and this session is all about making it scale: From easier collection of data, to scalable multi-tier architectures, and the lifecycle of your data including the deletion.

Philipp Krenn

Developer, Elastic

Full-Text Search with Elasticsearch and Apache Lucene — Finding What Is Relevant “You know, for search” is the tag line here

With ever increasing data volumes, finding what is relevant in a performant way is a common requirement. This discussion is for common problems, recommended architectures, best practices, and new developments around the most widely used search technologies.

Philipp Krenn

Developer, Elastic

SQL on Big Data for batch, ad-hoc & streaming processing

Piotr Findeisen

Software Engineer, Starburst

Bring Data as Products to consumers

How do you define data products? What mindset and approach we should have to make Product approach possible in Data? Let’s brainstorm about implementation of the approach, opportunities and values it brings.

Łukasz Pakuła

RGITSC Team Manager - DataOps , Roche

Data visualization, how to visualize large, complex and dirty data and what tools to use

Adrian Mróz

, Allegro

Addressing challenges of modern analytics with Snowflake

Thomas Scholz

Sales Engineering Manager for EMEA, Snowflake

Stream processing engines – features, performance, comparison
From on-premise to the cloud: an end to end cloud migration journey
Elasticsearch for search of logos, metrics and user-data
Choosing and using the right BI platform for efficient data exploration, reports and analytics
Beyond pre-computed answers – interactive, sub-second OLAP quires with Druid, Kylin and Clickhouse
Big Data on Kubernetes

Tomasz Kogut

Technical Lead, Adform

The Latest and Greatest of Apache Spark
Best tools for alerting and monitoring of the data platforms
Data Auditing – How to get a clear view of your pipeline
Optimizing Spark-based data pipelines

17.30 - 17.45

Coffee break

17.45 - 18.15

Panel discussion: Ways to make large-scale ML actually work

Despite the spread of dedicated AI platforms, ready-to-use ML libraries and tons of data available, running successful large-scale AI/ML projects still faces technical and organizational challenges. According to some studies, 8 out of 10 such projects fail. This panel will explore the necessary technical prerequisites that a company should introduce to build ML-based solutions efficiently. This includes, for example, organizing the data (e.g. data discovery, data lineage, data quality), experimenting with models (e.g. notebooks, libraries, collaboration), one-click-deployment of a model (e.g. AI/ML platforms, infrastructure) and more. While many of these challenges are not that hard when working with small data, everything gets more complex & time-consuming when the data and scale is larger.


Marcin Choiński

Head of Big Data & Analytics Ecosystem, TVN


Josh Baer

Product Lead, Machine Learning Platform, Spotify

Marek Wiewiórka

Big Data Architect, GetInData

Paweł Zawistowski

Lead Data Scientist, Adform, Assistant Professor, Warsaw University of Technology

18.15 - 18.30

Closing & Summary

Przemysław Gamdzyk

CEO & Meeting Designer, Evention

19.00 - 22.00

Networking party for all participants and speakers

At the end of the conference we would like to invite all the attendees for the informal evening meeting at “Dekada” Club , which is located at the Grójecka 19/25, 02-021 Warszawa.