Popularized by GAMFA companies (and considered a prerequisite skill for increasing levels of seniority), System design skills have become highly prized among developers. In a nutshell, system design is the practice of taking a problem expressed in real-world terms (e.g. “design a system that works like Twitter” or “design an efficient instant messaging system”) and translating it into real-world components and subsystems, without implementing them.
As GAMFA companies prove to be an inspiration, the practice of holding system design interviews has permeated through the industry as a testament to advanced delivery skills. …
In our last post, it seemed like we had full error coverage within our grasp.
Why aren’t we done? What is it that stops us from calling out every error our application makes as an event and raise it?
The amorphous nature of the term ‘error’.
True, some things are pretty much obvious errors. This usually applies to technical errors such as the infamous ‘500’ HTTP return code.
But our systems don’t just have to measure to technical standards. They have to be correct from a user’s perspective. What if our login button just turned invisible, for example?
In our previous posts, we’ve covered indicators and metrics. These were attributes of a system which we could define and observe. But we’ve also noticed that they have major limitations around analysis. Indicators supposedly circumvent this by virtue of definition and metrics by the operator’s analysis capability… surely there is something that has more context and detail. Assuming a problematic metric is flagged, we now need to look at the trail of happenings that preceded it. In fact, perhaps just knowing what were the events prior to the metric getting flagged would have sufficed and been more efficient.
I wrote these a few months ago and now seems like a good place to post these.
TW: self-deprecating and dark trans-woman-in-tech humor.
(Modelled after excuses made to management, product and end users):
In our previous post, we have dived into the lovely world of indicators and have explored its value but also its limits. These limits have clearly led us towards observable values that are past indicators. That’s also pretty intuitive, we have everyday examples of metrics.
If indicators were binary observables values, metrics are observables with a broader, more complex value.
Perhaps an example would work well for us here. The amount of fuel left in your car is one such example that comes to mind. This is a quantifiable value (liters or gallons) that can predict a problem (running out…
So, our previous post has defined monitoring for us. As a brief recap, our key takeaway was that good monitoring allows a human to derive insights about your business’ operation, in order to prevent or minimize damage.
In this post we will discuss the indicator, a basic and intuitive building block of monitoring systems. Chances are your monitoring strategy will include these and your code / component should keep these in mind.
An idealized indicator is simply a binary value. On vs Off. Valid vs Invalid. Running vs Stopped. The forces of good uptime vs the forces of…
This series of posts is designed to let you, the reader, broaden your perspectives about the world of tech monitoring by way of a “from scratch” exploration.
As a ‘foundations’ series, this should hopefully prove useful to you regardless of your specific technical domain, even if the focus will be online software systems.
If you’re an ops person or an architect (or the manager of), this should help you evaluate your existing monitoring situation and perhaps gain some more ideas and insights.
If you’re a developer (or the manager of), this should help you empower your ops team and…
A Gil, of all trades. DevOps roles are often called “a one man show”. As it turns out, I’m not a man and never was. Welcome to this one (trans) woman show.