Using cutting-edge open-source technologies to build one of the biggest industrial Data Lake of the World
On this talk, we will discuss how DataSprints is using cutting-edge open-source technologies, such as Dremio and dbt, to build one of the biggest industrial Data Lakes of the World, serving more than 200 reports, processing more than 300Gb and 2000+ columns datasets in near-realtime (8s), keeping the cloud costs very low, and by very low we mean, VERY LOW. We will pass through the architecture now in production, all the challenges and lessons learned on this project.
About Allan Sene:
Allan has a background in Computer Science and Statistics, having worked with Datasince 2010, from genetic data transformation to complex industrial data lakes. He's Co-Founder & CTO at DataSprints and Co-Founder of Data Hackers.