haruki zaemon

#software-engineering

The Future of the Data Engineer. Is the data engineer still the "worst"

  1. by Simon Harris
  2. May 14, 2023
  3. 2 mins

A few years ago, we had our first go at developing a concrete Data Engineering strategy.

This article I read this morning was validating as it touches on many of the challenges we identified and tried to address as part of our Data “Playbook”:

  • Data engineers operate on a myriad fronts and to any one partner or stakeholder, it can seem like people are always working on other things.
  • The Data Warehouse reflects the organisation. Chaos in, chaos out. Lack of consensus in, lack of consensus out.
  • Data as a product, explicit and distributed governance and use of data, and modern tooling.
  • Move away from just getting things done, to more traditional Engineering practices. That takes time.

Some of the data engineer’s biggest challenges: the job was hard, the respect was minimal, and the connection between their work and the actual insights generated were obvious but rarely recognized. Being a data engineer was a thankless but increasingly important job, with teams straddling between building infrastructure, running jobs, and fielding ad-hoc requests from the analytics and BI teams. As a result, being a data engineer was both a blessing and a curse. In fact, in Maxime’s opinion, the data engineer was the “worst seat at the table.”

[…]

It’s widely accepted that governance is distributed. Every team has their own analytic domain they own, forcing decentralized team structures around broadly standardized definitions of what “good” data looks like.

[…]

The data warehouse is the mirror of the organization in many ways. If people don’t agree on what they call things in the data warehouse or what the definition of a metric is, then this lack of consensus will be reflected downstream.

[…]

It’s not necessarily the sole responsibility of the data team to find consensus for the business, particularly if the data is being used across the company in different ways.

[…]

Nowadays, data teams are increasingly relying on DevOps and software engineering best practices to build stronger tooling and cultures that prioritize communication and data reliability.

[…]

While data team reporting structure and operational hierarchy is becoming more and more vertical, the scope of the data engineer is becoming increasingly horizontal and focused on performance and reliability — which is ultimately a good thing.

[…]

With the rise of these new technologies and workflows, engineers also have a fantastic opportunity to own the movement towards treating data like a product.


Monoliths are not dinosaurs | All Things Distributed

  1. by Simon Harris
  2. May 6, 2023
  3. 1 min

Amazon Prime Video rearchitected their streaming service from a distributed microservices architecture to a monolith application, resulting in higher scale, resilience, and reduced costs.

Werner Vogels:

My rule of thumb has been that with every order of magnitude of growth you should revisit your architecture, and determine whether it can still support the next order level of growth.

if there are a set of services that always contribute to the response, have the exact same scaling and performance requirements, same security vectors, and most importantly, are managed by a single team, it is a worthwhile effort to see if combining them simplifies your architecture.


Evo AU #98 – Scaling A Development Team

  1. by Simon Harris
  2. May 2, 2023
  3. 1 min

It was lovely to meet and chat with such a great bunch of open, curious, and pragmatic leaders:

We discussed challenges, approaches, mistakes, and lessons we’ve learnt scaling teams.

Mostly though, I just enjoyed the conversation. It didn’t necessarily always stay on topic, in a good way.


Fast-forwarding engineering decision making

  1. by Simon Harris
  2. Apr 16, 2023
  3. 1 min

These scenarios certainly resonated with me as in many ways they speak to reducing cycle time.

All organisations waste a huge amount of time believing that they are making progress on decisions, when in fact they’re just involved in the theatre of decision making. This happens through indirect actions that feel like progress is being made, but in fact contribute nothing to it. Small changes can speed up progress dramatically.

Tangentially related, I often need to emphasise with my Aikido students the importance of reducing intervals between techniques. Reducing a 15 second changeover to 5 seconds could mean getting in another 10 practice runs.

If, like me, you believe in iterating to learn, reducing cycle time is critical.


Trading Design Pain for Runtime Pain

  1. by Simon Harris
  2. Jun 3, 2010
  3. 2 mins

So, since my post on functional programming in object-oriented languages I’ve continued to tread the path with a mixture of gratification and despair. This morning the latter became overwhelming, I’d just had enough. My brain hurt and I just wanted to pump out some code, run it. I threw out the concepts I had been using as my guide and fell back on years of “old-school” object-oriented code.

Unfortunately, I was no more productive. In fact, I’d argue I was less productive. Things began to fail in weird and unexpected ways. The number of tests I needed to write to catch errors at least doubled. I soon returned to the comfort of my hybrid world.

In hindsight, I had traded design pain for runtime pain. All the mental gymnastics that went into working out how to build classes that are inter-related and at the same time immutable, etc. was replaced with time spent writing tests for anticipated edge cases as well as debugging the unexpected ones.

I concluded that the “pain” I had been experiencing was largely the result of being forced to deal with the complexity of the underlying problem. Once solved however, the code fell out with few or no bugs. By contrast, when I reverted to my previous approach, the code flowed far more freely but I spent a lot more time working out how to ensure the code didn’t do nasty things to itself.

What’s perhaps as interesting to me is that my designs are resulting in smaller and smaller classes. The more I think about problems in a functional way, the more I’m am able to design solutions that are essentially pipelines. The irony being that even though we think of imperative code as being step-by-step, it more often than not turns out to be a big, intertwined blob. Functional code on the other hand is almost by definition a series of steps, or transformations applied one after the other on some input.

These two observations are drawing me ever closer to just “getting over it” and using a functional language. The issue for me is the only FP language I know and actually like is Haskell and the only FP language I’d be likely to get into production is Clojure. Which is all I’ll say in public as I have no desire to start a flame war :)