Data Engineer at Zalando on the data integration at petabyte scale, best practices, and technology tools. An interview with Max Schultze.

What are the main challenges of building an end to end data integration platform at petabyte scale? Max Schultze [MS]: At Zalando building a Data Lake was not the very first thing the company had in mind. Throughout the growth of the company from a startup to the size of thousands of employees that it is now, the technical landscape grew organically and such did the data landscape. Realizing that classical analytical processing was no longer sufficient to sustain the company’s growth – not even speaking of future-oriented use cases like machine learning – the first team of data engineers was
Read More

Data Scientist at Roche on generating business value, biggest challenges and great opportunities. An interview with Dr Mohammadjavad Faraji.

Can data science significantly generate medical and business value at a non-IT company like Roche? Mohammadjavad Faraji [MF]: Definitely yes! The combined strengths of our pharmaceutical and diagnostic business under one roof already have made Roche the leader in personalised healthcare – PHC, offering comprehensive diagnostics and targeted therapies for people with cancer and other severe diseases. The digitalisation in healthcare now also brings the ability to understand and interpret unprecedented volumes of data that allows a higher resolution view of each individual patient than ever before. We are committed to delivering on this opportunity and are drawing on our unique
Read More

Ludzie, procesy i narzędzia – wywiad z Mateuszem Fedoryszakiem

Na czym polega sekret udanej współpracy między data scientist a data engineer? Szacunek i pokora. Kiedy pracujesz z ludźmi o komplementarnych umiejętnościach, łatwo jest pomyśleć: My rozwiązujemy prawdziwe problemy, ich zadania może wykonać licealista. Często nie zdajesz sobie sprawy, dlaczego wdrożenie niewielkiej usługi lub narysowanie prostego wykresu może być wyzwaniem. Z drugiej strony nawet osoby, które nie rozumieją w pełni twojej dziedziny, mogą dostarczyć cennych sugestii i opinii. Czy rozwiązaniem jest ścisły podział obowiązków? Czasami częścią problemu jest zmuszanie naukowców zajmujących się danymi do wykonywania zadań inżynierskich. W drugą stronę prawdopodobnie zdarza się to rzadziej? Mieliśmy odwrotny problem – wydawało
Read More

Big Data Technology Warsaw 2019 Recap: from technology to people

Big Data Technology Warsaw 2019 Recap: from technology to people The rise of the Kubernetes, open source in the cloud, market consolidation and a shortage of data science and data engineering skills top Big Data Technology Warsaw Summit 2019 takeaways Big Data has always been evolving fast. Not so long ago the Hadoop and open source revolution have reshaped the data analytics landscape. But big data and AI technology landscape are still changing quite rapidly. Today we see new megatrends that might completely change the Big Data landscape: containerisation, hybrid, and public cloud, and ML/AI adoption.
Read More

Flink committer on the new generation big data framework and processing engine, project development plans and why it is great to contribute to the open source projects and community – an interview with Dawid Wysakowicz, Software Engineer at Ververica

Why can Apache Flink be considered the best choice for processing data streaming? Dawid Wysakowicz [DW]: One reason is that it addresses all streaming use cases: bounded stream – aka batch, streaming analytics, event-driven applications, etc. It also has the best of class support for state management and event time. It is also industry proven as it runs at Netflix-scale. In contrast to other open-source stream processors, Flink provides not only true stream processing at the event granularity, i.e., no micro batching, but also handles many batch use cases very well. What are the main directions of Flink development? DW: Right now
Read More