Check the ONSITE conference location: LOCATION

 

In this year's edition of the conference, we will focus on the areas:
Artificial Intelligence and Data Science, Streaming and Real-Time Analytics,
Data Strategy and ROI, Data Engineering, Architecture Operarations &Cloud.

 

p {margin-top: 5px;margin-bottom:5px;} sup{font-size: 15px;top: -7px;position: relative;}

26.04.2022 - WORKSHOP DAY

9.00 - 16.00

PARALLEL WORKSHOPS (independent workshops, paid entry) | on-site, WARSAW

FIND OUT MORE ABOUT WORKSHOPS

Introduction to Machine Learning Operations (MLOps)

 

DESCRIPTION:

SESSION LEADER:

Machine Learning Engineer
GetInData

Real-Time Stream Processing

 

DESCRIPTION:


SESSION LEADERS:

Data Engineer
GetInData
Software developer
GetInData

Modern data pipelines with dbt

 

DESCRIPTION:

SESSION LEADER:

Software/Data Engineer
GetInData

19.00 - 22.00

EVENING SPEAKERS MEETING (Only for Speakers) on-site, WARSAW

 

 

27.04.2022 - 1ST CONFERENCE DAY | HYBRID: ONLINE + ONSITE

 

8.30 - 9.00

Morning cofee and networking time

9.00 - 9.10

Sesja plenarna
Conference opening
CEO & Meeting Designer
Evention
CEO and Co-founder
GetInData

9.10 - 11.25

PLENARY SESSION

9.10 - 9.30

Plenary Session
Big data at Microsoft: The story behind the tech that powers an exabyte-scale data lake
[su_expand more_text="Read more" less_text="Read less"]Join us at this session to learn about the key technologies and strategy that drove a decade-long big data journey of Microsoft. Based on the example of our company we'll also explore realities of organizations working at massive scale while still maintaining a fast pace of innovation. We will touch multiple technologies during the presentation from non-Microsoft ones (Spark, Python, R, HDFS, YARN) to Microsoft-proprietary cloud services (Azure Data Lake Analytics, Azure Data Lake Store, HDInsight).

 #BigData #Technology #Microsoft

[/su_expand]

Speaker:

Group Product Manager – Azure Engineering
Microsoft

9.30 - 9.55

KEYNOTE PRESENTATION

Plenary Session
Data Mesh in Practice - How to set up a data driven organization
[su_expand more_text="Read more" less_text="Read less"]The Data Mesh paradigm is a strong candidate to supersede the centralized data lake and data warehouse as the dominant architectural patterns in data and analytics. It promotes the concept of domain-focused Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgment of data ownership.

Through personal experience with applying the Data Mesh concept in practice, as well as dedicated field research, the presenter discovered the most common pain points at different stages of the journey and identified successful approaches to overcome those challenges. In this talk, you will gain both technical and organizational insights ranging from companies that are just starting to promote a mindset shift of working with data, to companies that are already in the process of transforming their data infrastructure landscape, to advanced companies that are working on federated governance setups for a sustainable data-driven future.

#DataMesh #DataProduct #DataOwnership #DataManagement #Data

[/su_expand]

Speaker:

Data Engineering Manager
Zalando

9.55 - 10.10

BREAK

10.10 - 10.35

Plenary Session
And then the magic happens: 9 ways to put your big data platform migration at risk
This session offers learnings from multiple big data platform migration engagements that involved Google's Professional Services organization. Each presented customer case had different constraints, necessitating a variety of approaches to ensure success despite critical risks to the deadline.

Data SCE at Professional Services Organization
Google Cloud Poland
Cloud Customer Engineering Manager
Google Cloud Poland

10.35 - 11.00

Plenary Session
Why data analytics is critical for business at Kambi
[su_expand more_text="Read more" less_text="Read less"]As a Vertica customer for more than 10 years, Kambi tracks and reports on 150+ data sources for 425+ users worldwide, while maintaining regulatory and GDPR compliance in this highly regulated industry. Successful sports betting companies offer their customers many betting options, ease in financial transactions and great customer service, all of which generates repeat business and competitive advantages. Please join Andrew Hedengren as he describes how Vertica delivers a centralized version of the truth for a “simple and scalable” solution.

#BigData #PlatformArchitect #dataanalytics #Vertica #Kambi #dataplatform #dataanalyticssolution

[/su_expand]
Data Platform Architect
Kambi

11.00 - 11.25

Plenary Session
Oh my god Hybrid Cloud! Two or more data architectures fused together, how?
[su_expand more_text="Read more" less_text="Read less"]According to 451 Research, 96% of enterprises are actively pursuing a hybrid IT strategy. That’s great but how can I implement that? I just selected my one and only Cloud provider - so what goes where now? Should I migrate to hybrid right away or in phases? Or maybe there is another way?

This session has been designed for you to learn how CDP Hybrid Cloud delivers freedom of choice - private and public clouds, performance and cost management, security and self-service, flexibility and control. Join us to find out how your job can get a whole lot easier!

[/su_expand]

Speaker:

Solutions Engineer
Cloudera

 

Solutions Engineer
Cloudera

11.25 - 11.50

BREAK

11.50 - 13.20

PARALLEL SESSIONS

Host:

BigData Engineer
GetInData

Host:

Data Engineer
GetInData

Host:

Senior Cloud Data Engineer
GetInData

11.50 - 12.20

Data Engineering

Parallel Session
Cloud infrastructure for human beings
[su_expand more_text="Read more" less_text="Read less"]Using components of the public cloud seems to be pretty straightforward. You have a nice and modern UI. If you're a console oriented guy (or freak) - there's even a built-in web shell to make you happy. Problems start to occur when each team uses various components, each in a different way and your task is to make those components meet the company's policies, normalize a process of a deployment and in the end - make this deployment user-friendly. Let me show you how we've automated this process with the help of Terraform and provided a user-friendly way to build an infrastructure putting all of its complexities out of sight. All the examples will be based on the infrastructure creation tasks for the big data processing projects.

#public #cloud #terraform #python #infrastructure #data

[/su_expand]

Speaker:

Data Platform Engineer
Allegro

Artificial Intelligence and Data Science

Parallel Session
Where data science meets software and ML engineering – a practical example
[su_expand more_text="Read more" less_text="Read less"]It is known that Data Science, Software Engineering, ML/AI are highly intertwined fields. When building future-proof advanced analytical solutions, originally devised by data scientists, not only certain best practices from Software Engineering are handy, e.g. pair programming, but also they help achieving e.g. low coupling and high cohesion for ML/AI systems. In this talk we show how this has been achieved on a concrete example of a predictive solution for understanding the impact of production process parameters on final product attributes.

#softwareenginnering #SEbestpractices #ArtificialIntelligence #MachineLearning #injdustry4.0 #crossfunctional

[/su_expand]

Speaker:

Associate Data Scientist
Philip Morris International

Speaker:

Lead Data Scientist Operations
Philip Morris International

Architecture Operations &Cloud

Parallel Session
ING Data Analytics Platform 3 years later. Lessons learned
[su_expand more_text="Read more" less_text="Read less"]Three years ago ING Wholesale Banking Advanced Analytics team set up an ambitious goal to gather in one place a curated portfolio of internal data sources together with a large scale compute platform. At its core the idea of allowing internal projects to get access to a rich toolset of open source and industry standards frameworks and preprocessed data to validate business ideas in the secure exploration environment. Extensive growth with over 300 projects so far and more than 2000 internal users proofs advanced analytics i.e. ML, AI, NLP capabilities should become easily consumable not only by specialised, dedicated teams, but make them close to subject matter experts. Join this session where Krzysztof shares what are in our opinion key elements of the strategy and what is still in front of ING Data Analytics Platform to make 50% of ING employees be part of it.

#cloudnative #analytics #democratization

[/su_expand]

Speaker:

Technical Lead for Data Analytics Platform
ING Hubs Poland

12.20 - 12.25

TECHNICAL BREAK

12.25 - 12.55

Data Engineering

Parallel Session
Auditing your data and answering the life long question, is it the end of the day yet?
[su_expand more_text="Read more" less_text="Read less"]In this talk I’m going to present to you the design process behind Nielsens Data Auditing system, Life Line. From tracking and producing , to analysing and storing auditing information, using technologies such as Kafka, Avro, Spark, AWS Lambda functions and complex SQL queries. The data auditing project was one of main pillars in 2020, the extensive design process we went through paid off, and tremendously raised the quality of our data. We’re going to cover: * A lot of data arrival and integrity pain points * Designing your metadata and the use of AVRO * Producing and consuming auditing data * Designing and optimizing your auditing table - what does this data look like anyway? * Creating an alert based monitoring system and some other add-ons * Answering the most important question of all - is it the end of the day yet?

#data #auditing #kafka #architecture #sql

[/su_expand]

Speaker:

Senior Data Engineer
Aidoc

Artificial Intelligence and Data Science

Parallel Session
Understanding Query Semantics at eBay
[su_expand more_text="Read more" less_text="Read less"] User queries at an e-commerce site exhibit a plethora of information ranging from brands, size, intent or even desires. A better understanding of users intent leads to a better user experience. How can we exploit plain natural language texts to extract semantic and syntactic information. And how can such information help us towards improving the site behaviour? We will talk about enhanced language processing and understanding techniques for semantifying queries using advanced sequence learning techniques. We will also discuss how to design an offline evaluation to quantify the model performance at a large scale. #nlp #queryunderstanding #ecommerce #sequencelearning #ml

[/su_expand]

Speaker:

Staff Data Scientist
eBay Inc

Data Strategy and ROI

Parallel Session
Building a backbone of a data-driven enterprise: Big Delta Lake
[su_expand more_text="Read more" less_text="Read less"]Becoming a data-driven organization is a hot topic that keeps busy a lot of companies. Many leaders and executives see value in unleashing analytics potential and using it to impact business growth. We are going to talk and share experiences about how we… • democratize data across the different levels of the organization by consolidating, integrating and automating data workflows into a single Data Lake; • designed and implemented cloud-based scalable & secure architecture; • impacted business performance by unlocking various use cases; • executed project based on the consolidated data for one of the largest nutrition brands.

#Datademocratization #businessimpact #Spark #datalake #digitization

[/su_expand]

Speaker:

Data Science Manager
Reckitt

12.55 - 13.00

TECHNICAL BREAK

13.00 - 13.30

Data Engineering

Parallel Session
10mln events per day from hearing aids to reports - a case study of bigdata analytics with Azure & more
[su_expand more_text="Read more" less_text="Read less"]Hearing aids nowadays can be treated as a cloud of millions of IoT devices that produce a huge number of events. Multiple R&D engineering teams need insights into how their products are performing and are used out there, for better, data-driven product development. This means creating reports analysing data from hearing aids and digital products like mobile apps for wearers and fitting software used by hearing care professionals. However, getting from the devices to the report is not a simple path. Especially that we need to take into account GDPR, transformations, data democratization or cost optimizations in Azure.

#Azure #EventHub #Datalake #Databricks #Lakehouse #ETL

[/su_expand]
Software Architect
Demant

Real-Time Streaming

Parallel Session
NetWorkS! project - real-time analytics that controls 50% of mobile network in Poland
[su_expand more_text="Read more" less_text="Read less"]The ability to analyze data in real time for mobile network is crucial for diagnostics and ensuring the quality of the service for end customers. To achieve this we have built a real-time ingestion and analytics platform that processes 2.2 billions messages a day from mobile networks hardware. During the talk we will show how we used Flink and Flink SQL to build this platform. The solution includes calculation of more than 5000 KPIs and 1500 aggregation defined in SQL, on 750 Kafka topics. We will describe how we manage Flink jobs at scale using Ververica and Kubernetes, how we monitor the platform using Clickhouse and what problems we need to overcome in the project.

#streaming #flink #real-time #operationalmonitoring #telco

[/su_expand]

Speaker:

Big Data Lead
GetInData

Speaker:

IT Operations Manager
NetWorkS!

Data Strategy and ROI

Parallel Session
Analytics Translator: The New Must-Have Role for Data-Driven Businesses
[su_expand more_text="Read more" less_text="Read less"] What is an analytics translator? What skills and competencies are required? How to build a data-driven business with analytics translators? Do you fit the profile of an analytics translator? Do you like quizzes? #AnalyticsTranslator  #DataDrivenBusiness #Transformatio #PeopleAnalytics #Data-Science

[/su_expand]

Speaker:

Associate professor Data Driven Business & People Analytic
University of Applied Sciences Utrecht

13.30 - 14.25

LUNCH BREAK

14.25 - 16.05

CASE STUDY

14.25 - 14.55

Data Engineering

Parallel Session
Scaling your data lake with Apache Iceberg
[su_expand more_text="Read more" less_text="Read less"]- Common issues with data lakes - What is Apache Iceberg? and what problems does it solve - Building CDC archive at Shopify using Iceberg - Management / considerations when using Iceberg - Brief intro into whats next on deck for Shopify + Iceberg (Type-1 dimensions using Iceberg's V2 spec with row-level deletion)

#iceberg #datalake #columnardata #dataplatform #CDC

[/su_expand]

Speaker:

Senior Data Developer
Shopify

Architecture Operations &Cloud

Parallel Session
Developing and Operating a real-time data pipeline at Microsoft's scale - lessons from the last 7 years
[su_expand more_text="Read more" less_text="Read less"]Microsoft is a data driven company. All client-side software is well instrumented and emits telemetry. Designing, developing and operating (DevOps model) a big data pipeline gathering this data at the Microsoft scale (the pipeline has: 100k+ Azure cores, 13 Data Centers, hundreds of PBs) is a great learning opportunity. In this presentation I will show what we've learned over the last 7 years and describe the DevOps process we use. This will be a journey spanning: our design principles, testing approach, ops mindset (monitoring, automation, continuous-improvement), rollout across 13 Data Centers strategy and more.

#devops #telemetry #continuousimprovement #scale

[/su_expand]

Speaker:

Principal Software Engineering Manager
Microsoft

Artificial Intelligence and Data Science

Parallel Session
Eliminating Bias in the Deployment of AI and Machine Learning
[su_expand more_text="Read more" less_text="Read less"]The primary source of bias in machine learning is not in the algorithms deployed, but rather the data used as input to build the  predictive models. In this talk we will discuss why this is a huge problem and what to do about it. Different sources of bias will be identified along with possible solutions for remedying the situation when deploying machine learning. We will also speak about the importance of transparency when using machine learning to predict outcomes that impact critical decisions. • Learn why most predictive models are biased. • Learn about the sources of bias in predictive models. • Learn how to reduce the negative impact of potential bias in predictive models.

#MachineLearning #ArtificialIntelligence #EliminationofBias

[/su_expand]

Speaker:

Chief Technology Officer
Teradata Corporation

14.55 - 15.00

TECHNICAL BREAK

15.00 - 15.30

Data Strategy and ROI

Parallel Session
Digital Twins 101
[su_expand more_text="Read more" less_text="Read less"]Keep up with the increasing trend in the software world! Learn more about Digital Twins and what role Big Data/AI/XR/IoT and Robotics play in this story. Digital Twin is a broad topic with a number of business areas where it can be applied and wide list of technologies under the hood. On this session you will learn about a framework, intended to handle underlying complexity, facilitate design and adoption of Digital Twins; technical aspects and considerations; real-life examples, covering success stories and delivered value. This session can be interesting for different groups: Architects, tech-savvy business leaders, Product Managers, technology experts and consultants.

#DigitalTwins #AI/XR/IoT #Robotics

[/su_expand]

Speaker:

AVP Technology
SoftServe

Real-Time Streaming

Parallel Session
TerrariumDB as a streaming database for real-time analytics
[su_expand more_text="Read more" less_text="Read less"]TerrariumDB is a column and row store engine designed specifically for behavioral intelligence, real-time data processing, and is the core of the Synerise platform. It simultaneously processes data heavy analytics while executing various business scenarios in realtime. TerrariumDB was designed to analyse behavioural data, where data order and time are important to make business decisions. During the talk, there are described why we are developing our distributed database engine, where were the challenges and pitfalls, for which use cases does TerrariumDB fits best, and how it handles billions of queries per day where 99 percentile does matter.

#Streaming Database #RealtimeAnalytics #ML #OperationalAnalytics

[/su_expand]

Speaker:

Co-Founder and CTO
Synerise

Artificial Intelligence and Data Science

Parallel Session
Feed your model with Feast Feature Store
[su_expand more_text="Read more" less_text="Read less"]What is a feature store ? Why do we need it ? How to use it ? In this session, I would like to show how to use the Feast feature store to build a complete MLOps process. Starting with fetching historical data and model training, thought model versioning and deployment process and finally online features materialization and real-time model inference.

#featurestore #feast #mlops

[/su_expand]

Speaker:

Solutions Architect
Bank Millennium

15.30 - 15.35

TECHNICAL BREAK

15.35 - 16.05

Data Engineering

Parallel Session
Let your analysts build data pipelines on Modern Data Platform using SQL, DBT and the framework developed by GetInData
[su_expand more_text="Read more" less_text="Read less"]Data Engineering used to be a hard problem that only people with a software engineering background could solve. Additionally, a number of use cases and analytics needs in companies waiting to be solved outnumber data engineers teams, burdening them and making business departments wait for ages for their use cases to be implemented. On the other side, Business departments would love to implement the data pipelines themselves. But for a long time, they couldn't do it well, mainly because they lacked the engineering skill required for them to work efficiently and deliver technical quality. Today, we witness the change of this obstacle getting away thanks to the maturity of modern data platforms and thanks to tools that make it easy to implement pipelines according to the best DataOps practices. Like DBT if you have heard about it.

Tools like DBT are great but are just puzzles of the bigger picture. What we at GetInData do is take those puzzles, these advancements of data tools, and combine them into a coherent, unified data pipelines framework that guides analytics engineers by hand in developing the pipelines end-to-end. From the idea to production. 

Come to our presentation to listen about modern data platforms, DBT as well as our framework.

#analytics engineering, #self-service data pipelines, #cloud, #DBT, #SQL

 

[/su_expand]

Speaker:

Software/Data Engineer
GetInData

Real-Time Streaming

Parallel Session
Cloud Native Stateful Stream Processing with Apache Flink
[su_expand more_text="Read more" less_text="Read less"]Stream processing has become vastly popular in the past couple of years. As you're already familiar with Kubernetes, you have the necessary toolbox for deploying and operating low latency mission-critical streaming data pipelines with Apache Flink. After this talk, you'll get a broad understanding of how the complex stream processing applications fit into the cloud-native era and get comfortable with introducing them to your organization.

#k8s #apacheflink #streaming #sql #cloudnative

[/su_expand]

Speaker:

Lead Software Engineer
Ververica

Architecture Operations &Cloud

Parallel Session
COVID-19 is a cloud security catalyst
[su_expand more_text="Read more" less_text="Read less"]Let's discover together how COVID-19 affected the cloud adoption and what are the most common cloud security mistakes that team are doing.

#security #cloud #adotion #architecture

[/su_expand]

Speaker:

Group Head of Cloud Delivery
Endava

16.05 - 16.30

BREAK

PEER2PEER SHARING

16.30 - 17.30

ROUNDTABLES (ONLINE or ONSITE)

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be roundtable sessions, hence every conference participants can take part in 2 discussions, one each day of the conference.

 

 

Roundtable discussion
1. The unexpected journey to data cleansing
The data we use is often unstructured and varied. Conducting a simple analysis could become a burden: Where is the data I need? How do I make sure it’s accurate? Why are my queries taking so long? Designing the correct solutions to answer these questions in the optimal way could be painfully challenging, with very few success stories along the way. In this discussion we will share different approaches and solutions for data normalization. We'll meet people who've experienced different issues and hear how they tackled them.

Moderator:

Software Engineer
Neuralight
Roundtable discussion
2. Super-charge your Pandas code with Apache Spark
 Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python. Apache Spark is an open-source unified analytics engine for distributed large-scale data processing. Both are widely adopted in the data engineering and data science communities. Even though there’s a great value in combining them in terms of productivity, scalability, and performance, it’s often overlooked. Join us for a live discussion, where you will hear and share your experience with combining Spark and Pandas to benefit from both worlds! We welcome all levels of expertise, from intermediate to advanced.

Moderator:

Senior Solutions Architect
Databricks

Moderator:

Senior Solutions Architect
Databricks
Roundtable discussion
3. Vector databases and vector search engines
Vector databases store data with vector embeddings, which are computed with Machine Learning models. Indexed vector embeddings enable fast similarity search and retrieval. An open-source vector search engine like Weaviate can be used to do semantic search, similarity search of text, images and other types of unstructured data, one-shot labeling, etc. These features of vector search engines enables you to scale ML models, build recommendation systems or do anomaly detection.

In this discussion we will talk about vector databases. You can learn about vector search engines, share your experiences, get updates from the latest techniques, meet people working in a similar field and get feedback on your ideas. Whether you're new to vector databases or identify as an experienced user, all are welcome to join!

Moderator:

Community Solution Engineer
SeMI
Roundtable discussion
4. CI/CD good practices for data pipelines
[su_expand more_text="Read more" less_text="Read less"]CI/CD has a significant positive impact on software delivery. Typically when building out data pipelines, the extent of our CI/CD process is: to run unit tests, build packages, and deploy. That is a good start, but I believe we can do better. During our session let's try to answer the below questions:

What are good practices for automated tests of data pipelines? What are good practices for the deployment of data pipelines? What additional steps should we run in CI/CD pipelines? What are innovative ideas to further improve CI/CD pipelines? [/su_expand]

Moderator:

Manager of Data Analytics & BI
TrueBlue
Roundtable discussion
5. Data Governance in Modern Organizations
Traditionally, data governance has been defined as managing data integrity and the access of enterprise systems. Usually it consists of a centralized team with a steering committee, data stewards, process workflows and policies. In a traditional environment when you have a centralized data team, this can work well, but in today’s world where each department has their own analysts creating different analyses, such a project will likely fail. We'll discuss how today's organizations are approaching data governance given the fast-changing, decentralized environment of data today.

Moderator:

Founder & CEO
Select Star
Roundtable discussion
6. Dashboarding Nightmares: What most people forget to scope
Organizations are disappointed on the return on investment of their dashboarding efforts. At the same time, trends like natural language querying, data catalogs, and metric stores are arising. Are dashboards dead or maybe we haven't seen their best days yet.

Moderator:

Analytics Engineer
GoDataDriven
Roundtable discussion
7. From Data Lakes to Data Mesh - applying software architecture paradigms to data
[su_expand more_text="Read more" less_text="Read less"]Over the last years, multiple data transformation projects included significant investments to provide a centralized data lake, which could swallow any type and quantity. There was a promise for a single point of data for analytics and democratization of data. However, the concept brought many expensive problems in both implementation and maintenance. Other issues include a lack of clear ownership and division between data contexts, which lowered the return on investment.

These issues were well known to software architects. Many lessons could be shared from software architecture patterns, including sociotechnical, strategic, and implementation. For example, the concept of Data Mesh connects really well with Domain-Driven Design on multiple levels; both are completing each other. The point of our discussion will be to share learnings from applying software architecture patterns to data architecture paradigms.

[/su_expand]

Moderator:

Head of Product Engineering
Revolut
Roundtable discussion
8. AI act - do we need to regulate AI? When and how to do this?
[su_expand more_text="Read more" less_text="Read less"]AI is like nuclear power. On the one hand, it brings enormous opportunities, but on the other, mistakes at scale can bring disaster. A growing number of examples of poorly tested models are dampening enthusiasm by showing the dark side of AI, discriminating, with hard-to-find errors and rapidly declining performance. This has led to work on the AI act, and legal regulations for AI development and deployment.

At this roundtable, we will share best practices and experiences on how to debug for errors in AI systems, how to build AI systems responsibly, how future AI regulations may affect our business, and what to do to manage these risks.

[/su_expand]

Moderator:

Founder
MI2.AI
Roundtable discussion
9. Being an efficient data scientist. What skills, tools, and mindset are needed to become a data master
[su_expand more_text="Read more" less_text="Read less"]Every year, more and more people are taking jobs as data scientists. During this panel, we will try to answer the question of what you need to be an outstanding specialist and what is the fastest way to become one. Discussion points include e.g.: 

What technical skills and best practices are the traits of a great data scientist? Are soft or business skills important and how do you develop them? How to learn new skills and some tips and tricks to stand out as a data scientist? [/su_expand]

Moderator:

Senior Data Scientist
GetInData

17.30 - 17.35

SUMMARY & PRIZE GIVEAWAY

CEO & Meeting Designer
Evention
CEO and Co-founder
GetInData

18.00 - 22.00

EVENING NETWORKING SESSION | on-site, WARSAW

Let's get together! To talk, to meet new people, to see old colleagues. We invite you for a face 2 face interaction onsite.
More information HERE

28.04.2022 - 2ND CONFERENCE DAY| ONLINE

 

9.30 - 12.00

PARALLEL WORKSHOPS (ONLINE)

Data Vault on BigQuery

 

DESCRIPTION:

SESSION LEADER:

Customer Engineer
Google Cloud Poland

Data SCE at Professional Services Organization
Google Cloud Poland

What is a Data Quality Fabric and what’s in it for you?

 

DESCRIPTION:

SESSION LEADER:

Managing Consultant
Ataccama

Deep Dive into Data Science with Snowflake

 

DESCRIPTION:

SESSION LEADER:

Principal Sales Engineer, Data Science
Snowflake

12.00 - 13.00

BREAK

13.00 - 13.10

OPENING

13.10 - 13.35

KEYNOTE PRESENTATION

Artificial Intelligence and Data Science

Plenary Session
Benefits of a homemade ML Platform
[su_expand more_text="Read more" less_text="Read less"]Building your own platform is often ostracized these days. Everyone is encouraged to reuse existing solutions for known reasons. But using a ready-made platform / tool should not be a mindless process. Reusability is an art. During this presentation, you will learn why we decided to build our own MLOps platform while not re-inventing the wheel by using ready-made components with a touch of custom components. What are the benefits of this, but also what limitations and hurdles we have encountered. We hope that our experience will help you make the right decisions in your projects. Sometimes, maybe more risky ones.

#ML/AI, #MLOPS, #GOOGLE CLOUD, #OPEN-SOURCE

[/su_expand]

Speaker:

Software developer
GetInData

 

Data Science Lead @ Search
Truecaller

13.40 - 14.10

PARALLEL SESSIONS

DATA ENGINEERING

Parallel Session
The Lakehouse - a new architecture to unify your data warehousing and AI use cases
[su_expand more_text="Read more" less_text="Read less"]The Lakehouse is a new architecture that enables Business Intelligence and Artificial Intelligence directly on vast amounts of data stored in Data Lakes. Its open architecture combines the best elements of Data Lakes and Data Warehouses, while providing support for both batch and realtime processing. The Lakehouse is enabled by a new system design: implementing data structures and data management features (similar to those in a Data Warehouse) directly on top of low cost cloud storage in open formats (typically used by Data Lakes). This unified approach simplifies the modern data stack by eliminating the data silos that traditionally separate and complicate data engineering, analytics, BI, data science and machine learning. The Lakehouse is built on open source and open standards to maximize flexibility.

[/su_expand]

Speaker:

Solution Architect
Databricks

Architecture Operations & Cloud

Parallel Session
Fine-Tuning Kubernetes Clusters for Data Analytics
[su_expand more_text="Read more" less_text="Read less"]Kubernetes is an excellent vehicle for driving your analytics, big-data, and ML workloads, but a one-size-fits-all configuration won’t give you all the benefits platform provides. I want to walk you through the things to consider when creating a high-performance Kubernetes cluster for your workloads, from resource utilization, topology, application settings, and consumption of hardware components.

[/su_expand]

Speaker:

Cloud Lead
Mindbox

Architecture Operations & Cloud

Parallel Session
Rise up and reach the cloud - evolution of a modern global healthcare data warehouse
[su_expand more_text="Read more" less_text="Read less"]Processing Electronic Medical Records (EMRs) is super complex in itself, but it gets infinitely more complex when you want to do it at scale, covering multiple countries (and even continents). How did IQVIA EMRs Factory team build the probably largest global Electronic Medical Records data platform in the world? How was a combination of Cloudera, Spark, Kafka and Looker used to build a solid foundation and where are we taking it now on a hybrid-cloud architecture? How and why are we evolving - adopting Snowflake and Databricks to enable rapid business growth? We'll talk about what drove our design decisions (including requirements critical in healthcare like privacy, security and governance) and what we learned over the years running and modernising the platform. Importance od data residency and its impact on the system architecture.

#Cloudera #Spark #Scala #Java #Looker #Cloud #Snowflake #Databricks

[/su_expand]

Speaker:

Senior Technical Architect
IQVIA

14.15 - 14.45

CASE STUDY

DATA ENGINEERING

Parallel Session
Simplifying Data Architectures with Snowflake’s Snowpark
[su_expand more_text="Read more" less_text="Read less"]Increasingly the traditional boundaries between core data platforms and ML platforms, between data lakes and data warehouses, between analysts and data scientists are blurring. Simultaneously many IT organizations are under increasing pressure to simplify what have become incredibly complex data architectures. Both of these trends run perpendicular to data democratization which brings new users with new language- and framework-specific needs. In this session you will see why Snowflake’s Snowpark extensibility framework is receiving so much attention from the market as it allows users to talk to their data in the language and framework of choice without moving and copying governed data to secondary compute platforms. Snowpark is allowing Snowflake's customers to decrease governance risk, reduce cost and complexity and bring the power of the data cloud to many new users with broader requirements.

#python #ML #Snowflake #dataplatform #datascience #dataframe

[/su_expand]

Speaker:

Principal Data Platform Architect, Field CTO Office
Snowflake

Real-Time Streaming

Parallel Session
Ingesting trillions of events per day with Apache Spark
[su_expand more_text="Read more" less_text="Read less"]Streaming trillions of messages per day (and meeting the agreeed SLA's) was a challenging job. In this session I will present how are we handling some critical aspects like reliability, performance, skewed data, debugging and performance testing. This session is targeted to software engineers passioned about performance/handling large amount of data.

#spark #streaming #trillions

[/su_expand]

Speaker:

Software engineer
Crowdstrike

Architecture Operations & Cloud

Parallel Session
Data Quality on Data Lakehouse: Implementation at Point72
[su_expand more_text="Read more" less_text="Read less"]Data quality is a key measure of business success for all organizations, particularly those that are data-driven and operate in the financial sector. In this session I will present the implementation of data quality framework on your cloud platform focusing on the business needs of our financial organization. We will talk about • main goals of data quality in fintech • estimation of potential data issues • integration of data quality framework into the cloud • data quality checks applied on each step of data processing

#dataquality #dataplatform #aws #microservices

[/su_expand]

Speaker:

Data Platform Associate
Point72

PEER2PEER SHARING

14.45 - 15.40

ROUNDTABLES (ONLINE)

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be roundtable sessions, hence every conference participants can take part in 2 discussions, one each day of the conference.

 

Roundtable discussion
1. Hadoop is legacy. Does the future belong to cloud data warehouses and delta lakes?
[su_expand more_text="Read more" less_text="Read less"]• What impact does the trend towards cloud warehouses (like Snowflake) and deltalakes (like Databricks) have on existing data platforms? • How are you using these solutions - migrating legacy workloads, implementing new use cases? • Are you fully committed to one platform (and using all native features, e.g. Snowpark and Databricks Live Tables) or try to stay platform agnostic? • Which features have the biggest impact on your use cases (elasticity, speed, data sharing, ML integration, built-in jobs orchestration, …)? • What roadblocks have you hit when implementing Snowflake and Databricks?

#Snowflake #Databricks #Cloud

[/su_expand]

Moderator:

Director, IT Architecture, Real World & Analytics Solutions
IQVIA
Roundtable discussion
2. The Data Lakehouse – just another buzzword? Or a concept you can really use?
[su_expand more_text="Read more" less_text="Read less"]The way things get named in this industry can be pretty odd. A lot of people have been hearing noise about how cool a “data lakehouse” is, since it combines the power of a data lake with the power of a data warehouse. There are two ideas at work here, having to do with “big data” – the need to store massive quantities of information in the lake, and the need to analyse that data in the warehouse. The business value becomes clear when you have a properly constructed data lakehouse at your disposal. You can: • Support data science as well as BI using the same data sets • Access all of your data, wherever it may reside • Save on the labour and cost of moving data from one place to another, just to do analytics.

OR, why not go beyond this and unify how all teams do analytics? Why not unify user experiences and ways of accessing the data by building the “Unified Analytics Environment” to simplify the architecture and manageability of the environment?

Let’s discuss the advantages of investing in a data lakehouse and challenges related to it, and see how it can make a big difference to your organization.

[/su_expand]

Moderator:

Solution Engineer
Vertica
Roundtable discussion
3. CloudOps: Specialization or standardization?
[su_expand more_text="Read more" less_text="Read less"]What's the optimal strategy towards operations team building in the multi- and hybrid cloud era? Are we going to see all ops-related engineering roles like DevOps, DataOps or MLOps to specialize even more and gravitate out of each other in the responsibilities spectrum? Or should we rather expect them to consolidate, which will mandate a new way of leveraging cross-discipline skilled CloudOps talent as a foundation of the operations teams? And how does it actually relate to hyper automation, software-defined infrastructure and massive adoption of data analysis across organizations? Let's meet and talk about the challenges we will face as Big Data specialists and managers in this vivid and emerging area of high-complexity technology, limited human resources and endless business opportunities.

[/su_expand]

Moderator:

Chief Technology Officer
3Soft
Roundtable discussion
4. Engineers to engineers - Everything you always wanted to know about Big Data at Microsoft
Are you curious about how engineer’s life at Microsoft looks like? Would you like to know how our engineering teams work, learn, and collaborate effectively? Interested in how we build our services and solutions for Big Data workloads? If you answered YES to any of the questions, this round table is for you. Join our hosts from engineering teams at this “ask me anything” session.

Moderator:

Principal Software Engineering Manager
Microsoft

 

Senior Program Manager
Microsoft

 

Group Product Manager – Azure Engineering
Microsoft
Roundtable discussion
5. CI and CD in Data Science projects
[su_expand more_text="Read more" less_text="Read less"]Continuous integration and delivery can be a challenging topic, when it comes to Data Science projects. In many projects this area is neglected, yielding an increased number of bugs and issues in the provided results. In some cases this causes projects to fail.

How does your team approach continuous integration and delivery in Data Science projects? Which tools/frameworks are in use? What kind of testing/QA do your teams apply? Do you impose any data quality restrictions? What should be the preferred code coverage level by tests?

[/su_expand]

Moderator:

Senior Data Scientist
GetInData

15.40 - 16.45

CASE STUDY

15.40 - 16.10

Data Strategy and ROI

Parallel Session
Implementation of BigData Platform in Digital vs Analog Financial company - case study.
[su_expand more_text="Read more" less_text="Read less"]Retrospective overview of the process, methodology and implementation details of Data Platform in new and young digital company vs old fashion global wall street financial instirution.

#BigData #TechLeadership #Global #Projectimplemenations

[/su_expand]

Speaker:

Head of Architecture, Data and Infrastructure
W1TTY

Artificial Intelligence and Data Science

Parallel Session
Digging the online gold - interpretable ML models for online advertising optimization
[su_expand more_text="Read more" less_text="Read less"]The targeting of online advertising is like gold seeking - in our case, the interpretable machine learning models serve as the diggers that discover the most profitable targeting criteria for a particular campaign to maximize the final profit. In an environment consisting of multiple advertising products and traffic counted in millions of PV daily, they need to be fast and reliable to find the best spot before the others do.

In this talk, I will present how we bridged the gap between business goals of KPI maximization, while ensuring scalability and stability of the solution. The journey starts with presenting the kind of gold that we search for, defined by the business context of online advertising. Next, I will present our gold-digging machine learning techniques and discuss technical details of architecture design and implementation in the AWS environment integrated with our in-house ad server. Finally, I will discuss the performance of the deployed models that are constantly monitored on production.

#Machine Learning #Realworldsystem #Adoptimization #Modelinterpretability

[/su_expand]

Speaker:

Data Scientist
Ringier Axel Springer Polska

Architecture Operations &Cloud

Parallel Session
Managing Microsoft Azure at a leading capital markets fintech
[su_expand more_text="Read more" less_text="Read less"]As a global bank we need to follow strict financial regulations and certifications. Our goal is an agile organization, where developers can provision and operate cloud resources, while we keep in control and stays compliant. We provide a controlled development environment, where guardrails force high level security, while staying agile. This includes custom pipeline tasks, template repositories and automated change control. Get an insight in how we control resources, security and cost in automated processes

#azure #devops #compliant #agile #selfservice

[/su_expand]

Speaker:

Chief Cloud Engineer
Saxo Bank

16.15 - 16.45

DATA ENGINEERING

VOD
Analytical cubes in the service of data analysis
[su_expand more_text="Read more" less_text="Read less"]In the world of AI, ML and Big Data analysis, we have forgotten about our main clients - people who are not interested in querying databases or waiting for the result of data preparation - they want to easily play with data themselves. In this presentation, I will explain to you: - what are analytical cubes - who will use them and how can they do it - what are the differences between Apache Kylin and Microsoft Analysis Services and how to prepare a cube in these environments

#kylin #analysisservices #analyticalcubes #ssas #apache

[/su_expand]

Speaker:

Engineering Manager for the Data Team
OLX Group

Artificial Intelligence and Data Science

Parallel Session
Learning From Experiments Without A/B Testing - Case Study From Willa (Swedish FinTech)
[su_expand more_text="Read more" less_text="Read less"]One of the more annoying challenges product businesses face is in establishing causal relationships while trying to determine the impact of making product changes. Frequently, A/B testing is used to this end, but that is expensive and time consuming for engineering and design teams, and often requires tooling as well as randomization. Here at Willa (a payments and invoicing app for US-based freelancers), we used an econometric technique called Difference-In-Differences (D-I-D) regression to tease out the causal impact of simplifying the invoicing process using our app, on the invoice creation rate per user. We took advantage of natural variation in product usage between different kinds of users, and were able to reach statistically significant results cheaper and faster than via A/B testing.

The technologies used were BigQuery and Jupyter/Databricks, and GCP more broadly, although the approach is platform agnostic. Overall, the presentation will be interdisciplinary in nature, so anyone interested in economics, data science and engineering, cloud technologies and general dynamics of product businesses is welcome to attend.

[/su_expand]

Speaker:

Data Scientist
Willa

Speaker:

Director of Data Science
Willa

Architecture Operations &Cloud

Parallel Session
Lessons Learned from Containerizing Data Infrastructure at Uber
[su_expand more_text="Read more" less_text="Read less"]Since Data infrastructure was set up at Uber, we have been managing our own server fleet. Age old practices of managing hosts posed several challenges that stood in the way of innovation. We did an entire ground up re-architecture of our deployment stack, embraced the DevOps model and automated away operational tasks. This effort gained us a lot of benefits across several areas (efficiency, security, etc) and strategically positioned us to leverage the cloud.

In this talk, we'll briefly discuss the challenges we faced as part of our containerization journey, our strategies/solutions to overcome these challenges and mainly focus on lessons we learned along the way.

#containerization #architecture #devops #automation #massive-migration

[/su_expand]

Speaker:

Sr Software Engineer II
Uber

16.50 - 17.20

Parallel Session
How to accelerate your data-driven journey with an analytics framework?
What is a framework? How can the framework help you achieve a faster start? How can working with frameworks help you achieve long-term speed and agility?

#AnalyticsFramework #BusinessValue #EndToEnd #Scaling

Speaker:

Director of data & advanced analytics
ICA Gruppen

17.20 - 17.30

SUMMARY & CLOSING

CEO & Meeting Designer
Evention
CEO and Co-founder
GetInData

ONLINE EXPO + KNOWLEDGE ZONE

Free participation
We have great set of presentation available in the CONTENT ZONE that would be available pre-recorded as Video on Demand for conference participants in advance

VOD
How to keep the Data Lake clean instead of ending up with the Data Swamp using Data Layers a.k.a Bronze / Silver / Gold
[su_expand more_text="Read more" less_text="Read less"]1) When Data Lake becomes a Data Swamp 2) Data access patterns and sharing - what could possibly go wrong? 3) Why do we need to introduce Data Layers 4) Bronze / Silver / Gold Data Layers and the concept behind it 5) How to make it work - tips and tricks.

#data_swamp #good_practices #sharing_data #thinking_about_users #data_lineage

[/su_expand]

Speaker:

CEO and Co-founder
Datumo
VOD
Deduplication and entity resolution with Zingg: open source tool using Spark and ML?
[su_expand more_text="Read more" less_text="Read less"]- Where is entity resolution and deduplication needed - Why is it a tough problem to solve - Demonstration of open source technologies to solve this

#entityresolution #recordlinkae #deduplication #dataquality #ml

[/su_expand]

Speaker:

Founder
Zingg
VOD
Introduce Azure into your environment
[su_expand more_text="Read more" less_text="Read less"]- A look at what features within Azure that can help add functionality to your existing environment - Take a look setting up your Azure environment with some best practice advice - Cost saving advice

#AzureArc #Hybrid #Azure #WindowsServer

[/su_expand]

Speaker:

VOD
Microservice Data Lakehouse
[su_expand more_text="Read more" less_text="Read less"] Microservices Industrialized data ingestion Data virtualization Data privacy Operational data store #Kubernetes #microservices #spark #datalakehouse

[/su_expand]

Speaker:

Chief Architect
Point72

Speaker:

Tribe Technical Lead
T-Mobile PL
VOD
See what’s underground via Machine Learning eyes (powered by cloud solutions)
[su_expand more_text="Read more" less_text="Read less"]-How to transform from building one deep learning model per month into evaluating and deploying hundreds of them in a single week? -Building MLOps solution with CICD practices using CDK -How to detect underground structures based on a bunch of radar signals and no labels? -Should we avoid manual steps in the automatic Machine Learning pipeline? -Can we use lambda aliases to differentiate between dev and prod environment?

#MLOps #CDK #DeepLearning #Automatisation #Python

[/su_expand]

Speaker:

Junior Data Scientist
SGPR.TECH

Speaker:

Senior Data Scientist
SGPR.TECH
VOD
AdTech Big Data in 10 minutes
[su_expand more_text="Read more" less_text="Read less"]Watch this short 10 minute video, see how technologies like Kafka, .NET core, Scala, Java are used in AdTech and learn more about the journey of data in AdTech based on Adform example. The presentation goes through 5 V’s of Big Data (volume, variety, velocity, veracity, value) and shows where does all this data come from as well as gives a glance at the data pipelines.

#AdTech #BigData #Adform #DataPipelines #kafka #scala #.NET #java

[/su_expand]

Speaker:

Big Data Group Director
Adform
VOD
How Incorta Direct Data Platform can change your business
[su_expand more_text="Read more" less_text="Read less"]Traditional data platforms are complex, expensive and time consuming to provide a business value. Why not to consider a solution which your business users would love to use? Why a typical analytical project have to take months to deliver? Let me show you, how a single, fully self-service platform can provide all services orchestrated together to address your analytical needs in action!

#analytics, #bigdata, #ROI, #datadriven, #selfservice

[/su_expand]

Speaker:

Head of Presales and Professional Services
MDSap
VOD
Implementing AI Strategy. You should know – ML is not enough
[su_expand more_text="Read more" less_text="Read less"]AI is as much a management challenge as it is a technological challenge. There has been, and still is, a big debate about data driven management. We've long been encouraging managers and decision makers to use whatever data they have available, to draw insights and to make more informative and verified decisions. As a result we can observe an increase in Data Analytics and Data Science employment. What we can also observe is increasing complexity of systems, tools, applications, regulations and policies around their day-to-day work.Now is the time to encourage everyone to think about how to organize and manage all of the teams, tools, pipelines, and applications that we've brought to life in Data Science.

#ai, #strategy, #mlops, #data science, #ml, #management

[/su_expand]

Speaker:

Centre of Expertise - AI
ING Bank Śląski
VOD
Implementing Augmented Analytics Platform
[su_expand more_text="Read more" less_text="Read less"]During the presentation we will show how we delivered project of implementation of Augmented Analytics platform for one of our clients. We will describe what were the client’s challenges and how the platform was able to solve them.

#Business Intelligence, #AI, #Artificial Intelligence, #Augmented Analytics

[/su_expand]

Speaker:

Advanced Data Analytics Competence Center Manager
ASTEK Polska
VOD
Data Preparation for Machine Learning
[su_expand more_text="Read more" less_text="Read less"]Data preparation is a key phase in any machine learning/data mining process. In most cases is complicated, undocumented and there are serious issues in moving all process from the development environment to production environment. I want to show that, with the right tools, it does not have to be a problem. Ab Initio is a platform that allows for processing, transforming and managing data in every single aspect, from low level details to high abstraction concepts. The core of that process are graphs, which are responsible for data transformations. They provide declarative-streaming approach to data processing. During short presentation I will show how to load the data, process them into required form, and use it to build ML model. This approach is so flexible that will be useful for many people with different technical skills. Programmers can use graphs as a normal programming language, data scientists can operate on data without touching programming details and managers can understand the whole process and have full documentation of it.

#AI/ML #mlops #datamining #ETL #declarativeprogramming

[/su_expand]

Speaker:

Field Consultant
Ab Initio
VOD
Ataccama ONE Platform
[su_expand more_text="Read more" less_text="Read less"]Introduction to Ataccama ONE, an AI-powered metadata-driven data management and governance platform.

#DQ #MDM #DG #DataQuality #MasterDataManagement #DataGovernance #DataFabric #AI #Automation #AtaccamaONE

[/su_expand]

Speaker:

Managing Consultant
Ataccama
.su-expand-link-more a { font-size:0; } .su-expand-link-more a:after { content: 'Read more'; font-size:16px; } .su-expand-link-less a { font-size:0; } .su-expand-link-less a:after { content: 'Read less'; font-size:16px; } /* UKŁAD PRELEGENTÓW */ /* Odpowiednią klasę wpisz w widżecie wystąpienia: Attributes -> Widget Class * * KLASA OPIS * jedna_kolumna Prelegenci wyświetlani jeden pod drugim * dwie_kolumny Dwóch prelegentów w jednym wierszu * trzy_kolumny Trzech prelegentów w jednym wierszu * cztery_kolumny Czterech prelegentów w jednym wierszu * * Domyślnie będą układały się obok siebie */ .prelegent { display:inline-block; } .jedna_kolumna .prelegent { display: block; } .dwie_kolumny .prelegent { display: inline-block; max-width: 45%; } .trzy_kolumny .prelegent { display: inline-block; max-width: 32%; } .cztery_kolumny .prelegent { display: inline-block; max-width: 24%; } @media (max-width: 780px) { .prelegent { display:block!important; max-width:100%!important; } } /* KONIEC UKŁADU */ .prelegent_photo img { max-width:none; } #obrazek_cwc img { border:1px solid gray; max-width:120px; } .prelegent-block .wy_prelegent .prelegent { display:block!important; } #content .so-panel { margin:0px!important; } #sp1 .wy_czas { display:block!important; } #sr1 .wy_czas, #sr2 .wy_czas { display:block; width:100%; text-align:center; } .session-title h5 { line-height:34px; } .green .wy_mini_opis { background-color:#006953; } .lightblue .wy_mini_opis { background-color:#03a4ea; } .darkblue .wy_mini_opis { background-color:#036292; } .warsztaty .wy_mini_opis { background-color:#260456; } #runmageddon img { max-height:50px!important; width:auto; } @media (max-width: 780px) { .godzina_desktop{display: none;} } .wystapienie .wy_mini_opis { background-color: #F25D00 !important; } .wystapienie .su-expand-link.su-expand-link-more a { color: #F25D00 !important; }

BIG DATA TECHNOLOGY
WARSAW SUMMIT 2022

April 26th-28th, 2022
Let's go virtual!

ORGANIZER

Evention sp. z o.o
Rondo ONZ 1 Str,
Warsaw, Poland
www.evention.pl

CONTACT

Weronika Warpas
m: +48 570 611 811
e: weronika.warpas@evention.pl

© 2022 | This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.