We’ve heard of data warehouses, but what about data lakehouses?
Dael Williamson is the CTO of Databricks, covering Europe, the Middle East and Africa. He helps clients navigate the challenges of this data-driven digital transformation. This includes finding their digital strategy, designing their data organisation, figuring out how to find innovation in their sector and transforming their business.
In the latest episode of The Agile CTO podcast, Dael talks about his work in data at one of the top 10 Fastest Growing Private Tech Companies in the world.
Building a new digital world
Databricks is a unicorn business with 7 000 clients in traditional, commercial and emerging businesses. Think everything from disruptive companies to start-ups. Some of them are scaling up, and some of them are incredibly well known. “This is a lot of fun,” says Dael. “You see time in motion as some companies grow and others are forming.”
According to Dael, this data and AI company aim to democratise data and AI to solve the world’s problems. While this may sound like lofty goals, Dael finds the idea quite fascinating. He says, “One of our clients is on a mission to tap into the power of genomic data to bring new medicines to patients in need. One of the use cases was looking at a drug for chronic liver disease, and they found the gene responsible for this using machine learning. So, that showcases some of the total addressable market of what Databricks is capable of. And we do this with something we call the lake house platform.”
Lakehouse is a new category. Think of it as a place where data management plays and the evolution of database engines. Headquartered in San Francisco, the founders of Databricks are also the original creators of Apache Spark, the multi-language engine for executing data engineering, data science and machine learning. They’ve also invented a number of open-source frameworks and projects in the Apache Foundation and the Linux Foundation.
Adds Dael, “The best way to describe it (Delta Lake) is an open data protocol that applies ACID transactionality across files and folders. So, instead of needing a traditional propriety database engine, you can apply it to open data formats like Parquet. For any data scientists out there thinking that they spend too much time finding data, this (MLFlow) is one of those accelerators that help you spend more time building models, experimenting, and deploying them into production. It’s like how DevOps helped us in software engineering, helping data scientists achieve the same kind of productivity. In terms of the addressable market, it’s a unified architectural paradigm for machine learning, data science, data processing and engineering.”
The path to upskilling
Dael assures that there are very accessible ways to become a Databricks data engineer as it supports multiple programming languages like Python, R, and SQL. It’s also easier and more intuitive to use, thanks to productivity tools that take away much of the pain that data scientists normally would go through. “We’re seeing many people upskilling themselves as part of STEM programmes at university. We’re also seeing people joining software houses system integrators, and with partnerships with us, there’s lots of potential for unlocking more training,” he adds.
While data engineering may be a bit hard to get your head around in the beginning due to differences with traditional coding modalities, Dael says the uptake is growing massively. “We’re seeing a huge adoption of it. It’s a developer-enabling platform, but a new type of developer-enabling platform. And it’s very accessible. You’ve got environments to learn from, so upskilling is not hard. Many companies are now using Data Bricks because it unlocks so much value and more creative ideas are starting to emerge.”
When it comes to the Databricks hiring process, Dael advises that there’s a large spectrum of different functions and a huge list of available roles. “There’s always a general buzz to grow. In the field engineering space, there’s a hiring velocity. We need more solution architects and specialist product advocates. At Databricks, there’s a top-notch kind of criticality. We’re looking for really talented people, but at the same time, we’re also looking for diversity. We want to grow an incredible culture. If you’re naturally curious, you can keep up with the pace and enjoy a variety of work, there’s a huge amount of opportunity in this business and industry.”
Journey versus destination
For brand new companies starting in data, hiring a data scientist can be costly. “It would create problems with your current cash flow,” Dael warns. “Outsourcing is probably not the best plan either as you’ve got to get the value and the return. I believe it’s a very good opportunity to bring in interns and grow them within the business.”
“In the beginning, you’re not going to have tonnes of data, and your use cases will be fairly rudimentary. As a new business, it’s about creativity around what you do and recognising that you don’t have to start with the most complicated thing. Don’t start by trying to achieve the hardest goal first. It’s about the journey, not just the destination,” he concludes.
This post is based on an episode of The Agile CTO podcast. Listen to the full episode on Apple Podcasts, Spotify, or our website.