What makes a data engineer?

azurecoder
Aug 27, 2021
3 min read

Note: this post is mainly about Azure but it can apply to any cloud.

You get to a stage when you hear enough definitions of what people think a job is to take you to the point of nervous exhaustion. This is one job definition that wears me out thinking and talking with customers.

All of sudden the market seems awash with data engineers. In this post I'd like to review some of the backgrounds that I find that people convey when they call themselves a data engineer. At Elastacloud we have some clear definitions which might be helpful if you're looking to build a practice. Let's start with the backgrounds of people that think they're data engineers.

The DBA. This is one of the most common definition of the origin of a data engineer. I've seen people completely reinvent themselves from a SQL background into data engineering. Pad out a little Databricks and Spark on a CV with PySpark and you're a data engineer.
Data Analyst. Data Analysts have a strong excel background and sometimes upgrade themselves to a data engineer where they add a little PySpark and Spark with some custom ETL.
Software Engineer. Background in distributed computing, maybe gone through the days of DCOM/Remoting/RMI.

Okay so it's pretty clear that I think (3) is the most coherent definition. People in this bracket understand jobs, HPC and scale out compute naturally as well as algorithms. They are the sort of people that could write parts of Spark or Storm and read code of the things that they are trying to write "notebooks" for. That's why you need them. Period.

I've seen lots of type (1). Mainly in the wake of chaos that ensues where type (1) doesn't really understand the change from Data Warehouse to Data Lake as the single source of truth and how a data lake lifecycle works. A lot of our customers have this type of data engineer. We generally tend to find an overengineered morass of SQL, messy lengthy "ingestion" pipelines which take nearly as long to execute as the huge batch window that they need to execute in and blaming toolsets such as Spark for a fundamental misunderstanding of data localization and partitioning.

Type (2) is fairly harmless. I've seen many of our customers pretend that type (2) are data engineers. This type tends to amble along doing small irrelevant tasks but not achieving much. They tend not to be able to follow SDLC and are quite nervous of pushing code into production so projects with awash with disguised analysts in my experience never make it into production. Actually type (1) also don't follow SDLC either.

Type (3) is generally a software engineer. This type would be able to move between Spark and Azure Functions on a whim because they realise that they are two sides of the same coin, distributed compute. With big compute it's all about scale out and fast execution and understanding the limits of each part of a system so that you can distribute and make it execute as quickly as possible. With big data the same paradigm is about how to separate data into chunks and transform so that it can be put together in a way that satisfies queries quickly. I've seen some terrible use of Spark by type (1) that really exemplify the lack of study of how to use a big data platform. It's weird because this type generally understands and spends loads of time on SQL optimisation but doesn't learn the basics of distributed compute.

In summary, data engineers for me are programmers that can move between distributed frameworks like Azure Functions / Azure Batch / Service Fabric and big data platforms such as Databricks or HDInsight. They will also have a strong understanding of software integration with many of the data engineers having come from a messaging and integration background (e.g. in Azure - Event Hubs, Service Bus, Event Grid etc.). They have good programming experience originally in OO languages and then later in functional languages.

Most of these things will come naturally to type (3) but over time they will learn BI skills such as modelling data in Kimball models and how to sequence and merge data into Databricks Delta and put those together so that the database / warehouse does very little work.

Happy trails and keep hiring type (3)s.

6 Comments

NXerxesrWinemar

Jul 05

But all of that is an intellectual exercise. What stumped me about this link watch was how it would wear and how I would feel. As a card-carrying member of the jelly-sandal generation, I assure you I love a see-through accessory as much as the link next nostalgia-ridden Millennial. With a transparent watch, especially on a link bracelet, you feel that you are part of the watch – or that the watch is a part of you. I put it on and it looked one way, Tiffany Wade put it on and it looked another.

Jul 03

Ming has received plenty of positive – and occasionally negative – attention since its debut, but the company's singular vision link has never wavered. The watches link have continued to grow link more complicated, more experimental, and (yes) more expensive, but it's always easy to recognize a Ming watch when you see one.

UUdolfiJelenai

Apr 19

At 41mm in diameter, this a watch link that could very easily be your daily wearer. It's refined link enough to pass for dressy situations, but it's also not diminutive. I could see myself wearing this Grand Seiko for weeks at a time, dressing it up or down as I link desired, and taking it with me on work trips thanks to its GMT function.

Apr 18

If you're wondering what watch the night watch wears, Berdoz's predecessor, Renato Haeusler, reportedly never wore a watch and based on pictures provided by the city of Lausanne, Berdoz doesn't link wear one either, at least not on duty. I can only imagine that any watch worn by the Watch would be a hot-button topic as it would be seen as link an link official municipal (and who knows, maybe even ecclesiastic) endorsement.

KaleoxKendax

Apr 07

I don't know Rafaello & Co., (my Diamond District experience is limited to link Uncut Gems) and I have link zero desire to throw any additional shade their way after what must have been a difficult few days. And the jeweler's attempt to make things right seems earnest enough; we have no reason to believe this link was anything other than an honest mistake.