haruki zaemon

The Future of the Data Engineer. Is the data engineer still the "worst"

  1. by Simon Harris
  2. May 14, 2023
  3. 2 mins

A few years ago, we had our first go at developing a concrete Data Engineering strategy.

This article I read this morning was validating as it touches on many of the challenges we identified and tried to address as part of our Data “Playbook”:

  • Data engineers operate on a myriad fronts and to any one partner or stakeholder, it can seem like people are always working on other things.
  • The Data Warehouse reflects the organisation. Chaos in, chaos out. Lack of consensus in, lack of consensus out.
  • Data as a product, explicit and distributed governance and use of data, and modern tooling.
  • Move away from just getting things done, to more traditional Engineering practices. That takes time.

Some of the data engineer’s biggest challenges: the job was hard, the respect was minimal, and the connection between their work and the actual insights generated were obvious but rarely recognized. Being a data engineer was a thankless but increasingly important job, with teams straddling between building infrastructure, running jobs, and fielding ad-hoc requests from the analytics and BI teams. As a result, being a data engineer was both a blessing and a curse. In fact, in Maxime’s opinion, the data engineer was the “worst seat at the table.”

[…]

It’s widely accepted that governance is distributed. Every team has their own analytic domain they own, forcing decentralized team structures around broadly standardized definitions of what “good” data looks like.

[…]

The data warehouse is the mirror of the organization in many ways. If people don’t agree on what they call things in the data warehouse or what the definition of a metric is, then this lack of consensus will be reflected downstream.

[…]

It’s not necessarily the sole responsibility of the data team to find consensus for the business, particularly if the data is being used across the company in different ways.

[…]

Nowadays, data teams are increasingly relying on DevOps and software engineering best practices to build stronger tooling and cultures that prioritize communication and data reliability.

[…]

While data team reporting structure and operational hierarchy is becoming more and more vertical, the scope of the data engineer is becoming increasingly horizontal and focused on performance and reliability — which is ultimately a good thing.

[…]

With the rise of these new technologies and workflows, engineers also have a fantastic opportunity to own the movement towards treating data like a product.