Webinar online 2023 - Big Data Technology Warsaw Summit

We prepare for you two events with 2 presentations in the field of data, analytics, ML and cloud on each of them. On February 16th we will host guests from Big Data Institute and Dremio.io, on March 9th the experts from GetInData | Part of Xebia, QuantumBlack. Please check the details about the presentations below.
During the events, you will be able to exchange your knowledge with experts but also you will be able to take part in our competition and WIN an invitation to the Big Data Technology Summit 2023, and access to the recordings from the last edition!

9th March - Recording

16th February - Recording

15.00 - 15.30 | 30min

One does not simply upgrade Airflow. 1.10 -> 2.4 case study

Apache Airflow is the core of many data warehouses. It's a very mature technology, proven to be stable and flexible, a swiss-army knife of Data Engineers. In 2021 project bumped the major version from 1.X to 2.X, modifying all the components, making the upgrade process not trivial, especially if company-critical processes run there. And that's exactly the challenge we faced recently with one of the major Getindata clients.
The legacy Airflow 1.10.5, extended with many plugins and custom operators was controlled by ansible and running on VMs. We not only bumped the version, but also moved components to run entirely on Kubernetes. During the webinar I'd like to share with you the challenges we faced, what did we do to mitigate the risks and what didn't go exactly as planned If you're still on Airflow 1.X, the content will help you to transition smoothly. But even if you're not Airflow 1.X user, I will convince you why upgrading components as often as possible is the key to the success.

Mariusz Strzelecki

Senior Machine Learning Engineer

GetInData | Part of Xebia

15.40 - 16.10 | 30min

Analyze your data at the speed of light with Polars and Kedro

The pandas library is one of the key factors that enabled the growth of Python in the Data Science industry and continues to help data scientists thrive almost 15 years after its creation. Because of this success, nowadays several open-source projects claim to improve pandas in various ways. Polars is one of those new dataframe libraries: it’s backed by Arrow and Rust, and offers an expressive API for dataframe manipulation with excellent performance. In this webinar I will show you how to combine Polars for your data manipulation needs with Kedro, a data science framework that will help you write more maintainable code.

Juan Cano

Developer advocate

QuantumBlack

15.00 - 15.30 | 30min

Optimizing data in Apache Iceberg: Performance strategies

Data Lakes have been built to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, which addresses some of these problems, but falls short on data, user, and application scale.

Apache Iceberg is an open table format designed specifically to address these problems. However, querying 100s of petabytes of data demands optimized query speed specifically when data accumulates over a period of time. We have to ensure that the queries remain efficient because over time you may end up with a lot of small files and your data might not be optimally set out for queries. In this talk, we will go over the various data optimization strategies available by default in Apache Iceberg such as compaction, hierarchical sorting & Z-order clustering that helps us achieve faster performance in data lakes.

In this talk we will go through the various data & file optimization strategies that help to achieve robust performance in Apache Iceberg.

Specifically, we will cover:

Small file problem in Iceberg: Compaction strategy
Reorganization of data within data files
Sorting, Hierarchical sorting
Problems with normal sorting strategies
Z-order clustering for multiple dimensions

Dipankar Mazumdar

Staff Data Engineering Advocate

Onehouse

15.40 - 16.10 | 30min

Foundations of Data Teams

Successful data projects are built on solid foundations. What happens when we’re misled or unaware of what a solid foundation for data teams means? When a data team is missing or understaffed, the entire project is at risk of failure.

This talk will cover the importance of a solid foundation and what management should do to fix it. To do this I’ll be sharing a real-life analogy to show how we can be misled and what that means for our success rates.

We will talk about the teams in data teams: data science, data engineering, and operations. This will include detailing what each is, does, and the unique skills for the team. It will cover what happens when a team is missing and the effect on the other teams.

The analogy will come from my own experience with a house that had major cracks in the foundation. We were going to simply remodel the kitchen. We weren’t ever told about the cracks and the house needs a completely new foundation. In a similar way, most managers think adding in advanced analytics such as machine learning is a simple addition (remodel the kitchen). However, management isn’t ever told that you need all three data teams to do it right. Instead, management has to go all the way back to the foundation and fix it. If they don’t, the house (team) will crumble underneath the strain.

Jesse Anderson

Data Engineer, Creative Engineer and Managing Director

Big Data Institute

"*" indicates required fields

Name*

Surname*

Email*

Mobile number*

Position*

Company*

CONSENTS TO PERSONAL DATA PROCESSING:

I hereby give my consent to the processing of my personal data (including my name, surname, e-mail address, telephone number, place of work, job position) and to receive from time to time information about the activities and initiatives undertaken by Evention and GetInData, including commercial information (in particular by electronic means to the e-mail address provided and to the telephone number provided) for the purposes:

strictly related to the operations of Evention in organising conferences and meetings **

Consent to GetInData

strictly related to the operations of GetInData in the domain of IT services **

* consent is mandatory for acceptance of application to participate

** including direct marketing of own products and services

(required)*

I have read the Consents Terms and Conditions

Brief information about GDPR

Co-controllers of your personal data is EVENTION sp.z o.o. with headquarters in Warsaw at Rondo ONZ 1 and other Organizers and Partners, the full list of which is in the "Organizations" tab on the Conference website for which you register. Some of them may be based outside the European Economic Area. Your personal data will be processed:

by Joint Administrators: in order to organize the Conference, and in the case of marketing consents - also for direct marketing to the provided e-mail address or telephone number;
in addition, separately by Evention: to promote the Conference, consider applications and complaints of Participants, obligations arising from legal provisions, in particular tax and accounting, analyzes and statistics, defense or redress.

You have all the rights arising from the GDPR, i.e. the right to access, object, withdraw consent, rectify and delete your data, limit their processing, the right to transfer them, not to be subject to automated decision making, including profiling.

Full GDPR information can be found in the Privacy Policy of the Evention Conference, which you should read before registering for the Conference. In any case related to the processing of personal data, please contact us at the following email address: [email protected].

Less

JOIN Online Webinars and WIN a ticket for Big Data Tech 2023!

Big Data Technology Warsaw Summit is around the corner, but before we meet in Warsaw, we are happy to invite you for a free online webinars!

Agenda

Check out the recordings of the Big Data Tech webinar experts!

To get access to the recordings, just complete the application form - you will receive a link to the recordings to your email address.

Check out the recordings of the Big Data Tech webinar experts!

To get access to the recordings, just complete the application form - you will receive a link to the recordings to your email address.

Registration

Organizers

BIG DATA TECHNOLOGY
WARSAW SUMMIT

ORGANIZER

CONTACT