System Design — When Coding Alone Doesn’t Cut it Anymore
Popularized by GAMFA companies (and considered a prerequisite skill for increasing levels of seniority), System design skills have become highly prized among developers. In a nutshell, system design is the practice of taking a problem expressed in real-world terms (e.g. “design a system that works like Twitter” or “design an efficient instant messaging system”) and translating it into real-world components and subsystems, without implementing them.
As GAMFA companies prove to be an inspiration, the practice of holding system design interviews has permeated through the industry as a testament to advanced delivery skills. Books have also become popularized and considered a staple of prepping for those interviews or otherwise learning the trade.
The rising demand for system design skills is not without cause, too. Knowing how to deliver a feature is one thing, but knowing how to construct and assess an entire system is another thing. Modern systems may be composed of dozens or more microservices interacting with each other. Even when taking a monolith into mind, working on a part of it may affect other parts of the whole. Yes, that skill seems to have practical, real-world value. The act of creating globally scalable services has long left the specialized domain of GAMFA companies alone.
Additionally, system design skills are indeed a very different beast than coding itself. The challenges presented are by their nature non-discrete. There’s no ‘leetcode’ for design questions and there won’t be, because no automated grader can handle analyzing soft tradeoffs of applicability, feasibility, implementation complexity, scalability and cleanliness. Soft skills are needed. Analysis and communication are commonly being tested. It’s entirely possible that a programmer might excel at delivering features while failing to see a bigger picture or fall short in system design tasks.
Trouble in Paradise — Where System Design Interviews and Books Fail to Meet Up to the Industry
Browsing through system design reading material and sample interviews made me feel out of place, as if something was fundamentally different or otherwise in need of a refresh. Particularly as a DevOps engineer working in cloud-native environments, the feeling was one of “That’s not how I handled myself at work” or “That’s not what my clients wanted from me.”
Not what I had expected, at all. Wasn’t this supposed to be a real-world practice?
Is the task of system design really that different between what a DevOps engineer may be requested to do and what a programmer would? Perhaps my professional affinity to public cloud has something to do with it?
Is it really bad to any extent? Coding exercises do differ from a programmer’s day-to-day job, don’t they? And assuming it is bad and there’s a gap that we want fixed, what would it be and how would we fix it?
Attempting to scope the problem, discussing it with candidates and colleagues alike made it clear that it wasn’t just about interviews or books, either. Both differ significantly from the real-world work that’s been asked of me.
It looks like I was on to something interesting. Let’s dive deep into the differences.
Diving Deep Into the Differences
Cost, for one, is almost never a factor being asked about. Intuitively, this makes perfect sense — does the candidate really need to know how much a server costs, the costs of network connectivity and perhaps even colocation space? Surely they don’t need to, unless they’re really in that business. Even in occasions where those may matter, they are usually handled by trained specialists.
The world of cloud computing turns many problems into ones of cost. Do you want to create a Facebook clone? A real Facebook clone that can handle billions of daily active users? You can. No one is saying it’s going to be easy, but if you have the money to fork — all the infrastructure you need is there within a few API calls.
While bean-counting is out of the question, cost as a reflection of resource consumption seems like a highly prudent requirement. Not to mention a practical one that can float value to a potential employer. I mean, who doesn’t use a public cloud these days?
Even companies (or particular groups within companies) that do not operate in a public cloud environment, may have a private one or use a similar way to manage their infrastructure. Technology that allows for it is becoming extremely popular (*cough* k8s *cough*) in such enterprises as well. Being able to generate designs that use such infrastructure wisely will definitely apply to more than just public cloud.
Public cloud is more than just IaaS with dynamic resource allocation, too. In fact, many times the DevOps dilemmas turn into “build vs. buy” and being a prudent buyer according to customer requirements, up to and including more complex PaaS and SaaS components.
A sample case for that may be the interview section called ‘back-of-the-envelope-calculations’: a candidate may be asked how much storage a system needs allocated.
I’m tempted to brush that one off with a dismissive “good luck filling up S3!” and relegate it to a cargo-cult status, but in truth, it’s just one of those “build vs. buy” facets, probably the clearest of them. S3 is out there for you to buy as much as needed, and so are other, more advanced services: queue systems, databases and whatnot. The wealth of services being offered is staggering as are materially different ways to purchase them — a hard ‘reservation’ allocation, an on-demand per-instance one or a pay-as-you-use plan may all be available for a given service with widely differing cost implications.
(Now, being honest: some problems are worth solving again as a method of proving capabilities. It would be silly to let a programming exercise about parsing JSON to be handed out as “import json , json.parse(input)”. But here this is the norm rather than the exception in most of these — pretending that advanced services or even basic services like s3 aren’t there).
The Developer Culture Implications of Exempting Cloud Skills
So, we don’t ask that of candidates right now. Why? There’s one possible answer — which I really dislike — that those are indeed the ‘realm’ of the DevOps people or maybe in particular the FinOps ones. That system design is truly a differing art between roles, sharing not much more than a name.
But hey, wasn’t DevOps about culture rather than a role? If we start saying “Developers do this and DevOps people do that” we violate that tenant of the movement. We’ll end up paying a price for it (and I don’t mean the cloud bill shock, even though that could apply nicely).
Would we excuse developers from security-conscious practices, just because security positions and DevSecOps positions exist? Of course not. Why are some aspects of DevOps and FinOps exempt?
And it really is a shame, because a significant amount of talent there has been versed in those skills. It’s definitely something aspiring programmers can and should learn. Complex perhaps, but not rocket science. I believe that with a suitable amount of training and teaching, there’s no reason bright developers can’t pick up on these very swiftly. Real-world experience suggests they do.
A Fix and a Summary
I think it’s time for a fix — and a real-world takeaway. This is a problem that has ramifications, but the path to remediation is clear.
Make cloud-aware design the basic requirement, the standard rather than the exception. Add such dilemmas to your system design interviews. Add SLA requirements to generate constraints over resource provisioning. Add cost control questions. Add situations that require dynamic handling of assigned capacity.
If real-world system design has been modernized, our interview procedures and books should reflect that. Candidates will quickly pick up on that too and even brush up their skills accordingly. Everybody stands to gain from adopting this higher standard: candidates improving their real-world marketability, companies improving the quality of their hires and colleagues enjoying greater ability to deliver.