Agenda 2024 - Big Data Technology Warsaw Summit
Agenda of 2024 Edition
Our agenda is packed with presentations, arranged into 9 categories – find your most desired topics!
9.04.2024 - WORKSHOP DAY
Registration and networking breakfast for onsite partricipants
Registration of participants at room 12 in the ADN Conference Center on the 4th floor
PARALLEL WORKSHOPS (independent workshops, paid entry)
Join us for a one-day workshop on Generative AI and large language models. This event aims to provide participants in-depth knowledge of the latest advancements in natural language processing, computer vision, and machine learning techniques for Gen AI.
The workshop will explore real-life applications of large language models using cutting-edge models such as GPT, PalM, Gemini, and open-source LLMs. Participants will also learn how to use industry-standard LLMs with APIs, fine-tune models on their data, and deploy private LLM-based assistants.
Upon completing the workshop, attendees will gain a comprehensive understanding of integrating Generative AI into data solutions.
In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.
Meeting and dinner for all speakers and panelists of Big Data Technology Warsaw Summit 2024!
We only invite people actively participating in the conference agenda. Participation for speakers and panelists of the Big Data Technology Warsaw Summit conference is free of charge.
Separate registration is required for the meeting.
10.04.2024 - FIRST CONFERENCE DAY | HYBRID: ONLINE + ONSITE
Registration of participants at the hotel
PLENARY SESSION
OpenAI’s mission is to build safe AI, and ensure AI’s benefits are as widely and evenly distributed as possible. Building on the foundation of our API released in 2020 and the insights gained from our Labs platform with DALL-E 2, we have expanded our reach and impact in the field of artificial intelligence. The rapid adoption of ChatGPT presented challenges in scaling and addressing user needs, but it has also highlighted the potential of AI to impact a broad audience. Join us as we discuss how we navigated the challenges of scaling, and the exciting trajectory that lies ahead.
The presentation addresses the hype surrounding Generative AI technologies and the imperative to construct an end-to-end platform for AI readiness. Highlighting infrastructure, platform architecture, and models as foundational pillars, the discussion emphasizes the need for sustainability and security across the AI stack. It navigates through the complexities of integrating Generative AI practically across industries, unveiling innovative strategies for long-term viability and progress in the AI landscape. Join us as we delve into this transformative journey, paving the way for AI as an ally in our technological advancement.
#bigdata #cloud #AI #insights
In this presentation, we delve into the realm of Machine Learning Operations (MLOps), focusing on the practical lessons learned at Roche. We'll discuss the nuances of assembling MLOps teams, building MLOps platforms, and the intricacies involved in deploying machine learning models into production.
#MLOps #Production #Deployment #DevOps
Hadoop has played an outstanding role in shaping the big data landscape, with numerous Hadoop clusters deployed across enterprises. However, managing these clusters comes with challenges such as complexity, high operational costs, and limitations in scaling on-premises infrastructure. These impact the optimization and efficiency of big data workloads. As a result, companies are increasingly turning to cloud-based solutions that offer alternatives to traditional Hadoop setups, avoiding dependencies on native Hadoop YARN schedulers.
Databricks provides an enterprise-ready Spark runtime within a cloud-native ecosystem and modern infrastructure, offering a compelling alternative. However, migrating from Spark on Hadoop to Databricks presents its own set of challenges. In this presentation, we'll dive into the motivations behind such migrations and explore the complexities involved in making the transition at IQVIA.
#bigdata #spark #hadoop #yarn #databricks #cloud #autoscaling #devops
The panel will focus on the practical applications and implementation of artificial intelligence (AI). The discussion will revolve around how organizations and businesses are leveraging AI technologies to enhance productivity, solve complex problems, and drive innovation. Panelists will discuss challenges, solutions, success stories & failures, the impact on job market and workforce, new opportunities and address the future trends and advancements in AI.
BREAK
PARALLEL SESSIONS
AB Grand Ballrom
(2nd floor)
EF Grand Ballroom
(2nd floor)
Wawel & Syrena
(2nd floor)
Wisła (3rd floor)
Parallel session no 1
Parallel session no 3
Parallel session no 4
Parallel session no 5
Parallel session no 1
Unveiling the integration of Databricks and Delta Lake to orchestrate a seamless Data Mesh architecture. Strategies for developing and implementing a user-friendly data catalog, enhancing searchability and accessibility. Lessons learned in elevating data quality, with a focus on real-world applications and measurable outcomes. Insights into fostering a culture of data observability and its impact on data-driven decision-making
Parallel session no 2
High quality data is the backbone for data driven decision making. Unfortunately in today's world we often focus on technical complexity and rarely define clear expectations towards the data we process, let alone how we can automatically evaluate those expectations as part of our data processing frameworks. In this talk we will describe how HelloFresh went from paper agreements around data that were neither enforceable nor up to date, to config-driven, automatable data contracts that significantly lower the entry barrier to easy governance and automatic enforcement of strong data quality guarantees. We will showcase different types of data quality measurements from pre-flight checks that prevent breaking changes from going to production, to post-publishing checks to prove high quality data content.
#data #contracts #trust #quality #automation
Parallel session no 3
Handling the complexity of integration of multiple data sources in various formats into an organization's single source of truth. How to provide operational and analytical approaches to collected data.
#challenge #complexity #DataPlatform
Parallel session no 4
In the magnitude of travel data you may find clues to uncover global events. We learned this during Covid pandemic: we witnessed the pandemic outbreak, we saw increased flights traffic to the future European disease epicenter, we followed changing country regulations and then industry recovery. In our presentation we will guide you through Covid history seen from the flights big data perspective (8PB of data daily that had to be processed, cleansed and analyzed), and we will show how we extracted the most crucial elements from this wealth of data.
Parallel session no 5
Presentation covering Natwest Group's MLOps platform, starting with problems to overcome, ideas how they were solved and ending up with implementation - focusing on Data Scientist/ML Engineer points of view.
#mlops #sagemaker
Parallel session no 1
From “hello world” to the real world. We will present our FMCG customer's journey that resulted in a complex retail AI-driven solution built on the Cloudera stack.
What architectural decisions were made? Where did we succeed? What challenges did we face?
#ML & Data Science #MLOps #LLMops #AIOps
Parallel session no 2
Data Fabric architecture harmonizes the data landscape across on-premise, cloud, and hybrid environments, offering numerous benefits including enhanced data quality, improved data visibility, and faster time to market. However, integrating metadata across the corporate landscape creates ongoing challenges. In our session, we will share our experiences with clients from pharmaceutical and finance verticals on overcoming these obstacles through the implementation of data products and data contracts within the Dawiso platform.
#datagoverance #datafabric #data #datastrategy
Parallel session no 3
In this presentation Jonathan will focus on why data agility is critical to enduring organizational value, what are the typical challenges enterprises face, and how thinking about data ecosystems from first principles can identify new assets that are fundamental to driving automation and creativity.
#dataecosystem #activemetdata #datastrategy #dataproduct
Parallel session no 4
Application of Machine Learning methods in dealing with data quality issues with the examples from Credit Risk domain.
#dataquality #machinelearning #creditrisk
Parallel session no 5
Let me tell a story about how we automated the ML training pipeline for the intent detection model in Zendesk. It is a critical ML model for the next generation of Zendesk products that already helps our customers more than 1 million times every day, so the ML training process needs to be reliable, trustworthy and compliant with data protection regulations in different countries.
We decided that we would use Metaflow and apply best MLOps practices to accomplish that goal. Did it live up to its promise?
#Machine_Learning #MLOps
#Architecture_Operations&Cloud
Parallel session no 1
AI is powering change in every industry. From generative AI and speech recognition to medical imaging and improved supply chain management, AI is providing enterprises with the computing power, tools, and algorithms their teams need to do their life's work. Since its founding in 1993, Nvidia has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. From the cloud to the office to the data centre to the edge, NVIDIA provides solutions that deliver breakthrough performance on enterprise AI and HPC workloads at any scale, driving business decisions in real time and resulting in faster time to value.
#AI #GenerativeAI #NVIDIA #CUDA
Parallel session no 2
For the last 15 years the mBank data environment has grown to be large in volume and very complex. As the Bank decided to switch technologies and get ready for cloud migration, preparation of transition strategy was needed. The DWH team, with support of PwC, prepared a complex analysis and plan that led to the preparation of strategy of cloud migration, including the area of Big Data. During the presentation representatives of the above mentioned team will present main challenges and obstacles in cloud migration for corporations working in a strongly regulated environment.
The main emphasis of it will concern the area of Big Data technologies.
#cloud #migration #bigdata #banking #data #datawarehouse #strategy
Parallel session no 3
Reliable data is one of the foundations of every modern business, including Allegro. As we've grown, keeping our data clean and accurate has become more complex. In this presentation, I will walk you through our approach for data quality control and discuss the major challenges.
#DataQuality #DataGovernance #DataReliablity #DataEcosystem
Parallel session no 4
This presentation will discuss how Ververica Cloud solves different use cases for real-time Clickstream Analytics on E-commerce websites. We will explore how VERA (Ververica Runtime Assembly) sets a course toward streamlined operations, resource efficiency, and enhanced productivity for Apache Flink applications running on Ververica Cloud. Ultimately, we will show how easy it is to get started and do real-time Clickstream Analytics on E-commerce website data using FlinkSQL on Ververica Cloud.
#apacheflink #streaming #streamprocessing #cdc #dataengineering
Parallel session no 5
Majority of data analysis would benefit us more if we could analyze it as soon as possible. Combining transactional data with historical context allow us to provide better products or services. That’s why Snowflake provides HTAP (Hybrid Transactional & Analytical Processing) based on realtime ingest and hybrid tables. Check how it works during our presentation to see how you can employ Hybrid Tables to your own use cases while remaining aware of its current limitations.
#hybridProcessing #snowflake #dataArchitecture #performance
Parallel session no 1
Get ready for the story of how a team of over 20 people transformed their mindset and productivity approach in 12 months - from measuring success by lines of code and closed tickets to delivering value to customers based on behavioral data. We have gone from mercenaries to missionaries
How many times have you attended meetings filled with blame, arguments, and complaints? In my presentation, I will show how everything changed over 12 months. Our conversion rate improved by 5x, and in top markets by 6x. Our reliability increased by 90%, and we went from taking 4 weeks to deliver the smallest feature to just 2 days. We implemented Domain-Driven Design into our daily work, making every refinement session much more productive, with 100% engagement.
Parallel session no 2
An impThe presentation delves into Watchdog, a robust tool for monitoring data processes and quality. Developed in-house by the Data&AI team at the Team Internet Group, which boasts over 40 brands, Watchdog is divided into three main facets: Input Validation, Service Monitoring, and Data Insights. While I'll briefly mention the latter two, my primary focus will be on Input Validation.
The Input Validation process serves as a vigilant service, stressing the importance of proactive data quality monitoring from the data's entry point. This ensures data ecosystem robustness, early discrepancy identification, and pipeline referential integrity. We achieve this through the application of rigorous Data Quality rules to uploaded data, utilizing advanced technologies like AWS Web Services and Python.
Parallel session no 3
In today’s world, managing data wisely is not just good for business—it’s crucial for the planet. Spotify, known for streaming your favorite tunes, is also playing a leading role in sustainability by committing to reaching net zero greenhouse gas (GHG) emissions by 2030 and leveraging our platform to raise awareness and drive engagement among millions of listeners and creators. In this talk, we will share how we’re making big strides in reducing our carbon footprint by optimizing our Databases and Storage Infrastructure at scale.
Parallel session no 4
An important challenge faced by numerous data-mature organizations is the reliable and efficient deployment of a substantial number of machine learning models. At FreeNow, we have tackled this issue by developing a pipeline that enables us to deploy over 70 machine learning models. To accomplish this, we leverage a range of open-source tools, including GitLab CI, MLFlow, BentoML, Docker, and Kubernetes.
Parallel session no 5
- Educational software functionalities that can be created owing to the AI advancements
- AI algorithms allowing to develop such functionalities, incl. both LLMs and classic methods
- Challenges of using these algorithms, solutions and best practices
LUNCH BREAK
PARALLEL SESSIONS
AB Grand Ballrom
(2nd floor)
EF Grand Ballroom
(2nd floor)
Wawel & Syrena
(2nd floor)
Wisła (3rd floor)
Parallel session no 1
Parallel session no 3
Parallel session no 4
Parallel session no 5
Parallel session no 1
Over the years, companies have modernised their data platforms by moving to the cloud or by building Kubernetes-based platforms that run on-premise or in a hybrid way. In our presentation, using examples from our experiences on the PLAY project, we showcase how to migrate a Hadoop cluster to Kubernetes, as well as how to operate additional software such as HDFS, Airflow, Spark, Jupyter, Superset, Metastore, Trino, Ranger etc. in this environment in a reliable and cost-efficient way. We will demonstrate that this solution is cloud-ready and 100% open-source, so it can be also implemented in any cloud environment, for example, using cloud storage, on-premise, or as a hybrid, all in a fully automated manner.
#Kubernetes #open-source #dataplatform #hybrid
Parallel session no 2
The initial concept of Data Mesh as introduced by Zhamak Dehghani in 2019 is described as a socio-technological approach. It has implications on how data is organized and worked with. This talk will reflect on a real-world Data Mesh implementation journey and the reoccurring question: how to do it right? Every organization has its specific requirements and thus the introduction of a Data Mesh as data operating model can not follow a one size fits all approach. We’ll share pitfalls we had to deal with and how they can be avoided.
#DataMesh #DataBeyondTheory #DomainDrivenDesign #DataPlatform #DataProduct
Parallel session no 3
The wide adoption of table formats is helping many engineers to replace their lambda architectures with kappa architectures. Lamba architecture forces you to develop, deploy, and maintain two different applications to implement your business use case: the batch layer and the streaming layer. Imagine how easy your life would be if you had to handle the lifecycle of just one application. It sounds like a dream, isn't it? But let's imagine also you need to write 3 million messages every 30 minutes against a table of 200TB, you have tens of tables like these, and for each one, you need to guarantee near-real-time analysis. It's clear that, without the right tricks, the dream can immediately turn into a nightmare. In this talk, we'll explore our journey while migrating a data-intensive application from writing against plain parquet (with lambda architecture) versus iceberg table (with kappa architecture). Our stress test against this cool piece of technology uncovers the main challenges you need to face with complex use cases, as fascinating as difficult.
Parallel session no 4
- Explanation of the problem with gathering lineage from streaming jobs
- Quick introduction to OpenLineage and why it's a good fit for gathering lineage from streaming jobs
- Description of what interfaces were available for collecting lineage, and what there are now
- What you can do to collect streaming lineage now
- What's coming in the future to further improve the quality of streaming job observability
Parallel session no 5
Today's world is all about AI models, they are everywhere. Unfortunately, most of these models heavily depend on some kind of feedback, such as labels. The higher the quality and accuracy of our labels, the better our model can be. Our model is as good as our labels.
How do we acquire such labels? How can we monitor and evaluate its quality? What is the correct metric? Where should we start? In this talk, Sivan will go over different types of annotation cycles and the different phases they contain. She will describe the impact each phase has on the quality and usability of the output labels. In addition, she will review the considerations and tradeoffs that need to be taken into account when defining a new label collection project. At the end of this talk you will know where to put your attention next time, in order not to throw your labels away
Parallel session no 1
Parallel session no 2
Semantic search based on vector similarity is crucial for various modern applications, including those based on Large Language Models, such as GPT. Things become challenging when we go from thousands to millions or even billions of embeddings, and we still want to keep the same performance. We came through the harsh lessons of scaling a vector database at Qdrant and would like to share what we've learned on that journey. This talk will be a detailed description of our design choices and the infrastructure that backs them up.
Parallel session no 3
Over the past year we have observed the rise of open-source Large Language Models (LLMs) that in many cases outperform proprietary state-of-the-art (SOTA) ones. On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation. In this presentation, we will show how we built upon these three concepts a robust Data Copilot combining both proprietary and open-source LLMs that can boost performance of everyone working with data platforms.
#LLM #RAG #Copilot #DataOps #dbt
Parallel session no 4
Exploring the power of the Net Adoption Score (NAS) as a data-driven tool enabling companies to measure customer adoption by product. This innovative approach serves as a leading indicator for product usage, empowering stakeholders to make strategic decisions.
Parallel session no 5
Parallel session no 1
Let's dive into the tech behind Revolut's own Event Streaming and Event Store backbone solutions. This powerhouse not only juggles billions of events monthly but also provides financial services to a whopping 35M retail and 500K business users globally.
What's on the menu?
1. Event Streaming 101: Building one from scratch.
2. Our secret sauce: Why we DIY'd our Event Streaming and Event Store.
3. Scale it like Revolut: Keeping it smooth for millions.
4. Playing by the rules: Adhering to global regulations like a boss.
5. Lessons from the trenches: What we learned from this epic tech adventure.
And here's a cherry on top - but let me keep it secret for now!
Parallel session no 2
1. Apache Iceberg Overview: Presenting an overview of Apache Iceberg's features that are pertinent to addressing these challenges.
2. Implementation Insights: Sharing key strategies and steps in integrating Apache Iceberg into the existing pipeline, the main challenges and the lessons learned.
3. Results and Improvements: Explaining the significant improvements like query performance and costs, discussing the future improvements.
Parallel session no 3
The session will provide an overview of how to achieve end-to-end governance in a data mesh environment, highlighting the key principles and practices that enable organizations to manage and govern their data effectively in a distributed and decentralized setting.
#datagoverance #datamesh #data
Parallel session no 4
Exploring the transformative impact of data infrastructure on decision-making within Truecaller. Delving into the platform's data ecosystem, unveiling how robust data infrastructure improves decision making.
#dataengineering #datainfrastructure #datadriven
Parallel session no 5
In this talk, we'll explore the exciting world of digital experimentation and its crucial role in businesses that deal with a lot of data. We'll start by demystifying what digital experimentation is, with a focus on A/B testing—a simple yet powerful tool for improving decisions and strategies. You'll learn how to apply these concepts to your business and discover the main tools to help you. We'll also touch on the connection between this kind of experimentation and causal inference, which allows you to understand the why behind your business outcomes. Finally, we'll look ahead to see how A/B testing is evolving in the fast-paced world of AI, giving you a glimpse into the future of digital experimentation. This talk aims to clearly understand how these methods can be a game changer for your business in the AI-driven landscape.
BREAK
ROUNDTABLES (ONSITE only)
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.
There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.
In order to implement high quality solutions, it is crucial to involve users and business analysts from the very beginning of the software development process(SDLC). BDD makes it possible to ensure that users/business analysts collaborate with developers right from the very beginning of the project, starting from the process of creating Epics and User Stories to automatic acceptance testing. BDD plays a special role during the implementation and deployment of solutions in the area of banking processes (e.g. KYC), where the level of complexity of the process requires cooperation between users/business analysts and developers from the earliest stage of the start of project work. The practical application of the Cucumber tool to support the BDD process in Big Data environments (Hadoop/Spark) will be addressed.
- Behaviour-Driven Development
- Improving the quality of the SDLC process
- Cucumber Framework
Let’s ignite the DataOps discussion and delve into its historical roots, current landscape, and future trajectory. Join us as we shape our conversation around the key aspects outlined below, unraveling the intricate layers of DataOps in the ever-evolving tech landscape.
- How you personally describe DataOps?
- Are you actively using DataOps in your daily professional endeavours?
- What challenges did you notice during introduction of DataOps?
- What’s biggest benefit that DataOps brings to the table in your organisation?
- What’s the biggest tooling gap hindering the seamless implementation of DataOps principles?
- Does DataOps demand an extension or a redefinition in your perspective?
Join us for a roundtable discussion that covers the problematics of data engineering landscape and development process, where we explore the dynamic changes in the world of low-code and code-focused solutions for data engineering, highlighting how both approaches are evolving and incorporating modern platforms and engineering practices. This conversation aims to shed light on the important decision points for data professionals in adopting either approach or a hybrid model, focusing on aspects such as development efficiency, team collaboration, scalability, and the integration of advanced technologies including AI.
This discussion will focus on the perspectives of Product Owners and Managers regarding ML and LLM projects. The growth in popularity of ChatGPT is unmatched, with everyone talking about it these days. Yet, not all companies are integrating it into their systems. What are the challenges associated with integrating LLMs? What potential do they bring to the table? And let’s not overlook traditional ML - how do we decide which to choose? During the discussion, we will explore:
- Avoiding the hype and selecting the right solution - traditional ML, LLM, or something else.
- Deciding on the right scope and ensuring ROI is achievable.
- What’s beyond a simple PoC? - how we can build solutions that scale and are maintainable in the future.
- Cost calculation - from hardware and cost bills to the less obvious prices you have to pay.
- Legal traps.
We will talk about building an analytical platform in the cloud and migrating from an on-prem solution.
This roundtable discussion will try to answer questions such as:
• how to collect requirements
• divide implementation
• find a business application and overcome the next stages smoothly
1. Integration of ML in Online Customer Engagement at PAYBACK
2. Optimization and Automation of Notifications
3. Measurable Business Impact
As the Generative AI revolution accelerates, organizations are increasingly looking to the technology to offer novel solutions to business problems. While product managers consider its external application to solve customer problems, operational leaders explore internal use-cases to help teams perform their jobs more efficiently. Join this roundtable to discuss how savvy organizations can leverage Gen AI to serve both end customers as well as internal teams. We’ll explore:
- Low-hanging fruit organizations can build for customers to quickly deliver value with Gen AI
- Data platform preparations to accommodate rapid Gen AI feature building
- Data privacy concerns that restrict use of LLMs and how teams can appropriately adjust
- Rolling out Gen AI-powered internal tools persuasively
Let's discuss the importance of data governance for AI success and discuss how different aspects of data management can help you to build the right data products that your teams and algorithms can use to drive business value. Topics of this roundtable include data trust, data quality, data contracts, data catalogs, data mesh and many more.
PLENARY SESSION
We propose a novel approach to personalized shopping experience: starting from text description or an image, our assistant can help the customer in shopping IKEA items, or advise on furnishing their room - providing. suggestions, feedback, and information about the products.
#generativeAI #multimodal #personalization #interaction
EVENING NETWORKING SESSION
Let's celebrate the 10th anniversary of BIG DATA TECHNOLOGY WARSAW SUMMIT together!
On 10th of April at 19:00 we invite all participants of the Big Data Technology Warsaw Summit 2024 conference to the evening meeting, which will be an opportunity to get to know each other, exchange experiences and business talks.
Where?
"Dzień i Noc" Restaurant, Plac Mirowski 1, 00-138 Warsaw Poland
Start: from 7.00 PM
More information is here.
11.04.2024 - SECOND DAY (ONLINE TRAINING SESSION) | ONLINE ONLY
ONLINE TRAINING SESSION
The workshop will be conducted in the form of a lecture with examples of real projects. Participants will have the opportunity to ask questions and get answers. At the end of the workshop we will conduct a short test of the knowledge gained. The language of the lecture will be English.
In today’s interconnected business landscape, local market execution teams play a pivotal role in driving growth and customer engagement, especially in multinational organizations. This workshop dives deep into designing a comprehensive model that bridges global data initiatives with local execution needs. We’ll explore the trifecta of technology, people, and processes, ensuring seamless data adoption across organizational layers. From technical stack to user personas, and governance guidelines to communication channels, this workshop equips data practitioners, leaders, and managers with actionable insights to build agile and effective data structures and data teams combining global ambition for streamlining and governing data with local team needs to use it sooner than possible.
How to build an MS Teams BOT with your own knowledge databases based on Bot framework v4 and Azure's AI offerings?
A beginner-friendly introduction into technologies used in developing your very own a chatbot backed by Azure cloud AI offerings for your app to understand conversational language and to provide human-like responses. The lecture will go through general UI elements used in Bot Framework, where the AI stuff fits, and will run through demonstrate behaviour of those components individually and finally all of them working together in a demo of the existing bot built on top of said technologies.
From raw data, to ML predictions. Building an entire Data, Analytics and ML pipeline within one Cloud.
BREAK
ONLINE TRAINING SESSION
Serverless functions are a great fit for dynamic workloads, but have traditionally had a poor developer experience and limiting hardware/GPU constraints. Modal sets out to fix this.
Join us as one of Modal's engineers guides you through an idea-to-production coding session for a GPU-powered data app.
- What is the Streamhouse Architecture
- Flink SQL, Flink CDC, and Apache Paimon
- What's the value for business and data teams
Currently open source Large Language Models have gained popularity due to their advanced capabilities in natural languageunderstanding and text generation. Even though the number of applications is constantly increasing, transitioning them from research to production-ready solutions remains a complex and demanding challenge. In my workshops I would like to demonstrate how Rust's unique capabilities can effectively address these challenges. Several points like open source LLMs models inference and serving will be addressed.
#LLMOps #Rust #ModelsServing
Lack of trust in data is a common problem both for data leaders, data practitioners and business leaders. Common challenges include:
- No prioritization of data assets
- Silos in both organization and tools
- Tedious process of setting up data quality rules, with significant maintenance over time
- Difficulty to understand where and why an issue occurred
Learn how Validio's Data Trust Platform help companies tackle these challenges head on by combining data catalogue, data lineage and data quality in one platform.
ONLINE EXPO + KNOWLEDGE ZONE
Free participation
We have great set of presentation available in the CONTENT ZONE that would be available pre-recorded as Video on Demand for conference participants in advance.
Let me tell you a story of introducing a new data platform at Zendesk. I aim to cover its challenges from a technical (mixing technologies of AWS and Snowflake) and organisational perspective (Zendesk is a heavily remote and geographically scattered workplace).
Data is at the heart of Zendesk, so getting things right with a tight deadline was vital to help the business move forward at the right pace.
The circumstances presented opportunities (diverse technologies and workforce) and brought difficulties simultaneously.
#DataLake #AWS #Snowflake