AGENDA 2019

Changes in the order of presentation might occur

8.00 - 9.00 -

Registration and welcome coffee

9.00 - 9.15

Conference opening

9.15 - 10.45

Plenary Session

Stream Processing for Business Insights at Netflix

130 million members from over 190 countries enjoy the Netflix service. This leads to trillion events and petabyte of data flowing through the Keystone stream processing infrastructure to help glean business insights and improve customer experience. The self-serve infrastructure enables the users to focus on extracting insights, and not worry about building out scalable infrastructure. I’ll share my experience building this platform with Flink, and lessons learnt.

 

Monal Daxini

Engineering Lead, Netflix

10.45 - 11.15

Coffee break

11.15 – 15.30 Simultaneous sessions

Architecture, Operations and Cloud

This track is dedicated to architects, administrators and people with DevOps skills who are interested in technologies, techniques and best practices for planning, building, installing, managing and securing their Big Data infrastructure in enterprise environments – both on-premise and the cloud.

Data Engineering

This track is the place for engineers to learn about tools, techniques and battle-proven solutions to collect, store and process large amounts of data. It covers topics like data collection and ingestion, ETL, job scheduling, metadata and schema management, distributed processing engines, distributed datastores and more.

Artificial Intelligence and Data Science

This track includes real-world case-studies demonstrating how data & technology are used together to address a wide range of business problems such as product recommendations, predictive analytics, decision optimization and automation. You find here talks about innovative analytics applications and systems for machine learning, statistics, visualization, natural language processing and deep learning.

Streaming and Real-Time Analytics

This track covers technologies, strategies and valid use-cases for building streaming systems and implementing real-time applications that enable actionable insights and interactions not previously possible with classic batch systems. This includes solutions for data stream ingestion and applying various real-time algorithms and machine learning models to derive valuable insights from the flow of events coming from IoT sensors, devices, users, and front-end applications.

11.15 - 11.45

From legacy to cloud: an end to end data integration journey

Raw data collection from cloud and legacy data centers. Standard data preparation (e.g. binary conversion, partitioning). User driven analytics and machine learning. Challenges and experiences of building and operating data pipelines and computation as a service for hundreds of teams operating at petabytes scale.

Keywords: dataLake, dataPipelines, infrastructure, Spark

Max Schultze

Data Engineer, Zalando SE

11.15 - 11.45

Building Machine Learning platform for Real-Time Bidding

How to build platform for fast machine learning model development? How to serve machine learning models under heavy load and tight timing constraints? How to establish effective cooperation between data scientists and data engineers?

Keywords: machinelearning, bigdata, fastdata, rtb

Tomasz Kogut

Technical Lead, Adform

11.15 - 11.45

Large Scale Land use of Satellite Imagery

Leveraging Convolutional Neural Network models in a Streaming pipeline for Segmentation of satellite images for agricultural use.

Keywords: Computer Vision, Apache Beam, Convolutional Neural Networks

Suneel Marthi

Principal Technologist - AI/ML, Amazon Web Services

11.15 - 11.45

Streaming Visualization

Batch and streaming visualization in big data reference architecture, architecture blueprints for streaming visualization, implementations of the blueprints in a fast data solution.

Keywords: streaming visualization, kafka, bigdata architecture

Guido Schmutz

Solution Architect, Trivadis AG

11.45 - 11.50

Technical break

11.50 - 12.20

The Data Analytics Platform or how to make data science in a box possible

State of data platforms in the tech industry. ING WBAA vision on the future of data analytics. Highlights on the ING Data Analytics Platform main components and features.

Keywords: kubernetes, data platform, data governance, spark

Rob Keevil

Data Analytics Platform Lead, ING

Krzysztof Adamski

Data Infrastructure Architect, ING

11.50 - 12.20

8 Takeaways from building Rakuten Analytics

Introduction – what is Rakuten Analytics, introduce team and roles. Motivation in building an on-premise petabyte-scale analytics service. 8 takeaways from building Rakuten Analytics. Tech and business impact.

Keywords: analytics, data sketches, microservices, real-time

Juan Paulo Gutierrez

Lead Data Engineer and Architect, Rakuten

11.50 - 12.20

Data Science at PMI - The Tools of The Trade

Introduction to Data Products, CI & CD, Modus Operandi and Agile Data Science on the Data Ocean. Best Practices: Docker Containers, Project Templates, Programming Style Standards etc. Reproducible Data Science

Keywords: CI/CD, Best Practices for Data Science, Data Product, Reproducible research

Maciej Marek

Enterprise Data Scientist , Philip Morris International

Michał Dyrda

Senior Enterprise Data Scientist, Philip Morris International

11.50 - 12.20

Driving your marketing automation with multi-armed bandits in real time

Multiarmed bandits vs simple A/B testing. Architecture of solution – how to connect Flink, Nussknacker and R? Other uses cases – what are other good fits for similar architecture.

Keywords: Multi-armed bandit, Marketing automation, Streaming/Flink, R

Wit Jakuczun

founder and co-owner, WLOG Solutions

Maciej Próchniak

Software Lead Developer, TouK

12.20 - 12.25

Technical break

12.25 - 12.55

Scalable machine-learned model serving

Online evaluation of machine-learned models (model serving) is difficult to scale to large data sets. Vespa.ai is an open source solution to this problem in use today on some of the largest such systems in the world, such as the content pages on the Yahoo network and the worlds third largest ad network.
This talk will explain the problem and architectural solution, show how Vespa can be used to implement the solution to achieve scalable serving of TensorFlow and ONNX models, and present benchmarks comparing performance and scalability to TensorFlow Serving.

Keywords: bigdata, opensource, vespa, ml

Jon Bratseth

Distinguished architect, Oath (former Yahoo)

12.25 - 12.55

Enterprise Adoption at Klarna. Software Engineering methods bringing order to the Big Data Galaxy

Klarna provides instant consumer credit at the point of sale, and allows flexible credit lines after purchase. Credit is issued using Klarna’s Checkout product, which is integrated with almost 100’000 merchants on many markets. In order to perform automated decisions on credit and fraud with low latency guarantees, it is mission critical for Klarna that all required data is available at all times. Furthermore, it is important that all decisions are traceable. Finally, our data infrastructure must facilitate engineer and analyst productivity.

Keywords: Teams, Tools, Processes, Performance and scalability, Validation

Erik Zeitler

Lead software engineer, Klarna

12.25 - 12.55

Evolution of search: From a complicated problem to a simplified search experience

What goes into making Booking the leading player to book place to stay? At Booking, with millions of users visiting our platforms every day we have very rich behavioural data about our users – what they type, what they search for, what properties they look at, what filters they apply, how much time they spend on a property page that they book compared to pages of properties that they don’t book, etc. Each of these behavioral data points have helped us build a powerful, personalized search, making the experience of booking a trip easier for our users.

Keywords: search, ranking, bigdata

Arihant Gupta

Software Developer, Booking.com

Priyanka Prakash

Product Owner, Booking.com

12.25 - 12.55

Detecting Patterns in Event Streams with Flink SQL

Introduction to SQL on streams concepts, explain how the new SQL MATCH_RECOGNIZE clause brings the power of pattern matching to (streaming) SQL and demonstrate advanced stream analytics with Flink SQL.

Keywords: Apache Flink, SQL, stream analytics, stream processing

Dawid Wysakowicz

Software Engineer, data Artisans

12.55 - 13.50

Lunch

13.50 - 14.20

Reliable logging infrastructure: Building trust on logs @ Slack

Overview of logging infrastructure at Slack and why should you care about it, a summary of why reliability is critical for logging infrastructure to gain customer trusts.  Lesson learned and best practices for building logging infrastructure.

Keywords: logging, kafka, reliability

Ananth Packkildurai

Senior Data Engineer, Slack Technologies Inc

13.50 - 14.20

Metadata Driven Access Control in Practice

The importance of data access governance is continuously growing due to new regulations, such as GDPR, and industry policies. Manage access policies for each individual dataset is a hassle. In this talk, we will show how Svenska Spel use metadata about datasets to generate access policies. We use it to create policies for access, retention, and anonymization.

Keywords: DataGovernance, GDPR, Security

Magnus Runesson

Data engineer, Tink

13.50 - 14.20

I’m a data scientist and engineers don’t hate me

What’s the role of data scientist in the development of a new project? Do they only produce charts while engineers are doing the real work? In this presentation, I will share a story of data science and engineering collaboration at Twitter that resulted in shipping a successful new product feature. I’ll talk about our tools and environment and specifically focus on inter-team dynamics: what practices did we use? how did we divide tasks? what benefits did each team derive? why do we still want to work together?

Keywords: collaboration, cross-functional, best practices, data scientist’s environment

Mateusz Fedoryszak

Data Scientist, Twitter

13.50 - 14.20

Streaming topic model training and inference with Apache Flink

How to use stateful stream processing and Flink’s Dynamic processing capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself.

Keywords: Streaming, Topic Modeling, NLP, Keyword extraction

Suneel Marthi

Principal Technologist - AI/ML, Amazon Web Services

Jörn Kottmann

Senior Software Developer, Sandstone SA

14.20 - 14.25

Technical break

14.25 - 14.55

We are going to add all topics of each presentations soon.

14.25 - 14.55

We are going to add all topics of each presentations soon.

14.25 - 14.55

Data Science in Roche Diagnostics: From Exploration to Productionization

Data Science applications in Roche Diagnostics – from exploration to productionization of DS initiatives. Real use case #1: machine learning and image processing for quality control. Real use case #2: time series analysis on financial data for business planning.

Keywords: Data Science in Roche Diagnostics, Financial Time Series Analysis, Automated Quality Control, Deep Learning on Product Images, Machine learning, Image Processing

Dr Mohammadjavad Faraji

Data Scientist, Roche

14.25 - 14.55

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

The power of events and unbounded data. Streaming is not just for real-time applications – it’s for everyone. Where a streaming platform fits in an analytic architecture. How event-driven architectures can enable greater scalability and flexibility of systems both now and in the future.

Keywords: kafka, event-driven architecture, streaming, integration

Robin Moffatt

Developer Advocate, Confluent

14.55 - 15.00

Technical break

15.00 - 15.30

We are going to add all topics of each presentations soon.

15.00 - 15.30

We are going to add all topics of each presentations soon.

15.00 - 15.30

We are going to add all topics of each presentations soon.

15.00 - 15.30

We are going to add all topics of each presentations soon.

Architecture, Operations and Cloud

This track is dedicated to architects, administrators and people with DevOps skills who are interested in technologies, techniques and best practices for planning, building, installing, managing and securing their Big Data infrastructure in enterprise environments – both on-premise and the cloud.

11.15 - 11.45

From legacy to cloud: an end to end data integration journey

Raw data collection from cloud and legacy data centers. Standard data preparation (e.g. binary conversion, partitioning). User driven analytics and machine learning. Challenges and experiences of building and operating data pipelines and computation as a service for hundreds of teams operating at petabytes scale.

Keywords: kubernetes, data platform, data governance, spark

Rob Keevil

Data Analytics Platform Lead, ING

Krzysztof Adamski

Data Infrastructure Architect, ING

12.20 - 12.25

Technical break

12.25 - 12.55

Scalable machine-learned model serving

Online evaluation of machine-learned models (model serving) is difficult to scale to large data sets. Vespa.ai is an open source solution to this problem in use today on some of the largest such systems in the world, such as the content pages on the Yahoo network and the worlds third largest ad network.
This talk will explain the problem and architectural solution, show how Vespa can be used to implement the solution to achieve scalable serving of TensorFlow and ONNX models, and present benchmarks comparing performance and scalability to TensorFlow Serving.

Keywords: bigdata, opensource, vespa, ml

Jon Bratseth

Distinguished architect, Oath (former Yahoo)

12.55 - 13.50

Lunch

13.50 - 14.20

Reliable logging infrastructure: Building trust on logs @ Slack

Overview of logging infrastructure at Slack and why should you care about it, a summary of why reliability is critical for logging infrastructure to gain customer trusts.  Lesson learned and best practices for building logging infrastructure.

Keywords: logging, kafka, reliability

Ananth Packkildurai

Senior Data Engineer, Slack Technologies Inc

14.20 - 14.25

Technical break

14.25 - 14.55

We are going to add all topics of each presentations soon.

14.55 - 15.00

Technical break

15.00 - 15.30

We are going to add all topics of each presentations soon.

Data Engineering

This track is the place for engineers to learn about tools, techniques and battle-proven solutions to collect, store and process large amounts of data. It covers topics like data collection and ingestion, ETL, job scheduling, metadata and schema management, distributed processing engines, distributed datastores and more.

11.15 - 11.45

Building Machine Learning platform for Real-Time Bidding

How to build platform for fast machine learning model development? How to serve machine learning models under heavy load and tight timing constraints? How to establish effective cooperation between data scientists and data engineers?

Keywords: analytics, data sketches, microservices, real-time

Juan Paulo Gutierrez

Lead Data Engineer and Architect, Rakuten

12.20 - 12.25

Technical break

12.25 - 12.55

Enterprise Adoption at Klarna. Software Engineering methods bringing order to the Big Data Galaxy

Klarna provides instant consumer credit at the point of sale, and allows flexible credit lines after purchase. Credit is issued using Klarna’s Checkout product, which is integrated with almost 100’000 merchants on many markets. In order to perform automated decisions on credit and fraud with low latency guarantees, it is mission critical for Klarna that all required data is available at all times. Furthermore, it is important that all decisions are traceable. Finally, our data infrastructure must facilitate engineer and analyst productivity.

Keywords: Teams, Tools, Processes, Performance and scalability, Validation

Erik Zeitler

Lead software engineer, Klarna

12.55 - 13.50

Lunch

13.50 - 14.20

Metadata Driven Access Control in Practice

The importance of data access governance is continuously growing due to new regulations, such as GDPR, and industry policies. Manage access policies for each individual dataset is a hassle. In this talk, we will show how Svenska Spel use metadata about datasets to generate access policies. We use it to create policies for access, retention, and anonymization.

Keywords: DataGovernance, GDPR, Security

Magnus Runesson

Data engineer, Tink

14.20 - 14.25

Technical break

14.25 - 14.55

We are going to add all topics of each presentations soon.

14.55 - 15.00

Technical break

15.00 - 15.30

We are going to add all topics of each presentations soon.

Artificial Intelligence and Data Science

This track includes real-world case-studies demonstrating how data & technology are used together to address a wide range of business problems such as product recommendations, predictive analytics, decision optimization and automation. You find here talks about innovative analytics applications and systems for machine learning, statistics, visualization, natural language processing and deep learning.

11.15 - 11.45

Large Scale Land use of Satellite Imagery

Leveraging Convolutional Neural Network models in a Streaming pipeline for Segmentation of satellite images for agricultural use.

Keywords: CI/CD, Best Practices for Data Science, Data Product, Reproducible research

Maciej Marek

Enterprise Data Scientist , Philip Morris International

Michał Dyrda

Senior Enterprise Data Scientist, Philip Morris International

12.20 - 12.25

Technical break

12.25 - 12.55

Evolution of search: From a complicated problem to a simplified search experience

What goes into making Booking the leading player to book place to stay? At Booking, with millions of users visiting our platforms every day we have very rich behavioural data about our users – what they type, what they search for, what properties they look at, what filters they apply, how much time they spend on a property page that they book compared to pages of properties that they don’t book, etc. Each of these behavioral data points have helped us build a powerful, personalized search, making the experience of booking a trip easier for our users.

Keywords: search, ranking, bigdata

Arihant Gupta

Software Developer, Booking.com

Priyanka Prakash

Product Owner, Booking.com

12.55 - 13.50

Lunch

13.50 - 14.20

I’m a data scientist and engineers don’t hate me

What’s the role of data scientist in the development of a new project? Do they only produce charts while engineers are doing the real work? In this presentation, I will share a story of data science and engineering collaboration at Twitter that resulted in shipping a successful new product feature. I’ll talk about our tools and environment and specifically focus on inter-team dynamics: what practices did we use? how did we divide tasks? what benefits did each team derive? why do we still want to work together?

Keywords: collaboration, cross-functional, best practices, data scientist’s environment

Mateusz Fedoryszak

Data Scientist, Twitter

14.20 - 14.25

Technical break

14.25 - 14.55

Data Science in Roche Diagnostics: From Exploration to Productionization

Data Science applications in Roche Diagnostics – from exploration to productionization of DS initiatives. Real use case #1: machine learning and image processing for quality control. Real use case #2: time series analysis on financial data for business planning.

Keywords: Data Science in Roche Diagnostics, Financial Time Series Analysis, Automated Quality Control, Deep Learning on Product Images, Machine learning, Image Processing

Dr Mohammadjavad Faraji

Data Scientist, Roche

14.55 - 15.00

Technical break

15.00 - 15.30

We are going to add all topics of each presentations soon.

Streaming and Real-Time Analytics

This track covers technologies, strategies and valid use-cases for building streaming systems and implementing real-time applications that enable actionable insights and interactions not previously possible with classic batch systems. This includes solutions for data stream ingestion and applying various real-time algorithms and machine learning models to derive valuable insights from the flow of events coming from IoT sensors, devices, users, and front-end applications.

11.15 - 11.45

Streaming Visualization

Batch and streaming visualization in big data reference architecture, architecture blueprints for streaming visualization, implementations of the blueprints in a fast data solution.

Keywords: Multi-armed bandit, Marketing automation, Streaming/Flink, R

Wit Jakuczun

founder and co-owner, WLOG Solutions

Maciej Próchniak

Software Lead Developer, TouK

12.20 - 12.25

Technical break

12.25 - 12.55

Detecting Patterns in Event Streams with Flink SQL

Introduction to SQL on streams concepts, explain how the new SQL MATCH_RECOGNIZE clause brings the power of pattern matching to (streaming) SQL and demonstrate advanced stream analytics with Flink SQL.

Keywords: Apache Flink, SQL, stream analytics, stream processing

Dawid Wysakowicz

Software Engineer, data Artisans

12.55 - 13.50

Lunch

13.50 - 14.20

Streaming topic model training and inference with Apache Flink

How to use stateful stream processing and Flink’s Dynamic processing capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself.

Keywords: Streaming, Topic Modeling, NLP, Keyword extraction

Suneel Marthi

Principal Technologist - AI/ML, Amazon Web Services

Jörn Kottmann

Senior Software Developer, Sandstone SA

14.20 - 14.25

Technical break

14.25 - 14.55

The Changing Face of ETL: Event-Driven Architectures for Data Engineers

The power of events and unbounded data. Streaming is not just for real-time applications – it’s for everyone. Where a streaming platform fits in an analytic architecture. How event-driven architectures can enable greater scalability and flexibility of systems both now and in the future.

Keywords: kafka, event-driven architecture, streaming, integration

Robin Moffatt

Developer Advocate, Confluent

14.55 - 15.00

Technical break

15.00 - 15.30

We are going to add all topics of each presentations soon.

15.30 - 16.00

Coffee break

16.00 – 17.25 Roundtables sessions

16.00 - 16.05

Intro

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be 2 rounds of discussion, hence every conference participants can take part in 2 discussions

 

16.05 – 16.45    1st round

16.50 – 17.25    2nd round

Maciej Bryński

Big Data Architect, XCaliber

Darko Marjanović, Things Solver & Goran Pavlović, Vip Mobile

Radosław Kita

Team Lead, Adform

Radosław Kita

Team Lead, Adform

17.25 - 17.45

Coffee break

17.45 - 18.15

Panel discussion

18.15 - 18.30

Closing & Summary

Przemysław Gamdzyk

CEO & Meeting Designer, Evention

19.00 - 22.00

Networking party for all participants and speakers

At the end of the conference we would like to invite all the attendees for the informal evening meeting.