Falling Into the Stargate of Hidden Microservices Costs

This article is the second in a series. Read Part 1

Proponents of microservices claim more development velocity and reliability; more comprehensive test and vertical or horizontal scale with a container orchestrator; tons of flexibility around tool choice. They’re not wrong: When you build with a microservices architecture, you’re likely going to see cost improvements early in your software development life cycle (SDLC), driven mostly by the decoupling of services.

These folks use this early microservices win to argue monoliths are inflexible and impossible to fully understand. While yes, monoliths might not be perfect, their conclusion neglects a fundamental law: You can’t siphon energy from a closed system, and you can’t magically simplify your SDLC and make it cheaper to operate without some kind of reaction down the line.

Last time, we talked about how an intentionally-built monolith almost always beats out microservices on the axes of technical complexity and developer ergonomics. Microservices can certainly do those things well, but only with additional expense, and these unexpected expenses almost always flow away from your organization. Away from your control and governance.

The only way to truly understand what this facade of abstraction costs you is to chase the very idea of monoliths, like Dr. David Bowman in “2001: A Space Odyssey,” into the vast stargate of energy, cost, and missed opportunity.

Standardization

One of the often-touted benefits of microservices is that developers have the flexibility to use different languages, frameworks, libraries and data stores to build the services they’re responsible for. The reality is that developers err on the side of familiarity and an environment they’re most efficient in, contributing to a sprawl of tech stacks. A new discipline, platform engineering, has formed to combat this pattern with “golden paths” — a platform of languages, tools, libraries, configurations and dependencies — to apply some governance around internal development practices.

Who benefits? Your cloud provider. The last stop on the standardization line is the moment of deployment, where your API or app is abstracted to one or a handful of requests, found at a specific domain name, taking action and returning data. The less cohesive your microservices are, the less control you have over their deployments, and the more you rely on the flexibility and resilience of your cloud provider to get the job done. The more you rely on a service, the more likely you are to have a chat with a nice salesperson about upgrading to their enterprise pricing tier.

Automation

Suppose every team relies on the flexibility of microservices to build the way that best suits them. Your production infrastructure now requires six different data stores, a mix of server/serverless resources and shared libraries. If you make each team responsible for configuring and provisioning the infrastructure they need, you’ll have a messy sea of complexity that no DevOps or ITOps team could hope to understand, much less maintain.

Who benefits? Your CI/CD providers. The only way these fragmented stacks get deployed in a reliable and scalable fashion is through automation, which means you’re far more dependent on your pipelines to work reliably and let you know quickly when a sea of fragmented changes will suddenly break. Each CI/CD run adds generously to your monthly tally of minutes and storage, not to mention runners of different compute power, and you’ll need to pay for additional seats to keep folks in the loop.

Network calls

David Heinemeier Hansson, CTO of Basecamp and creator of Ruby on Rails, once wrote: “Replacing method calls and module separations with network invocations and service partitioning within a single, coherent team and application is madness in almost all cases.”

These calls, which traverse both great physical distance and many more hoops of networks, providers, API gateways and so on are inherently expensive. They introduce new and often hard-to-observe ways for your app to fail. You’ll need to develop new defense mechanisms against these errors, like batching and asynchronous communications, and layer in new observability tooling to help catch graceful API failures that don’t affect the end-user experience but indicate a larger problem under the hood.

Who benefits? Your third-party providers for API gateways, service meshes, messaging queues and so on. What began as a decoupling of services leads to near-total disconnection, and the only way to pull them back together is to complicate your infrastructure and pay someone to collect and control the energy you thought you stripped from your system.

Operations and observability

In a microservices architecture, you can take one of two paths: First, each team is responsible for monitoring the health and performance of their service(s), all the way down to on-call shifts and remediation playbooks. Their deployment velocity is tied directly to their mean time to resolution (MTTR), because when issues arise, they’re also responsible for developing and deploying the proper fix. 

That works for isolated bugs, but what about issues that mysteriously traverse multiple microservices?

That’s where the second path comes in. You have a dedicated team — call them ITOps/DevOps/DevSecOps/platform engineering, and so on — responsible for creating a “single pane of glass” observability dashboard. With enough investment, they’ll be able to supposedly see all, know all and solve the most complex of issues that can’t be replicated in a local testing environment due to your infrastructure’s complexity.

Who benefits? Expensive observability platforms. You’ll pay handsomely for each node, container and serverless function in your microservices architecture. Tack on more for every seat you need for individual teams. Even more for the machine learning-driven insights you’ll inevitably need to assist you in pinpointing anomalies and rummaging through a dozen API calls to find a root cause.

Scaling

When you deploy an app in a microservices architecture, you have many options for scale. That is, fundamentally, the core value prop responsible for the rise of microservices — tech-driven startups have long since organized their entire engineering efforts around the hypothetical fear their company will grow so quickly and profoundly they won’t be able to keep up with the load.

In a Kubernetes environment, for example, each service can be independently scaled vertically (with more compute power) or horizontally (spread across more pods/nodes) by editing single lines of YAML configurations. Seeing a service running a little “hot” in your observability platform? Bump up its CPU requests. That didn’t do the trick? Try doubling its replicas from three to six and let the ReplicaSet take care of the rest.

The simplicity of microservices scaling readily masks underlying inefficiencies. Take, for example, Amazon Prime Video’s experience with a microservices-based tool for identifying block corruption and sync problems. They quickly ran into a hard scaling limit at around 5% of the expected load due to the way they were using AWS Lambda and Step Functions.

Even with an enterprise-scale budget, the only solution was to abandon the microservices architecture back into a single-process monolith that could better rely on scalable EC2 and ECS instances.

Who benefits? Once again, your cloud provider. They give you scalability promises and a colorful Band-Aid solution at a higher monthly bill. In return, you get the luxury of pretending, for a little while longer, that a refactoring isn’t looming on the horizon.

Monoliths reduce costs and return benefits to you

We’re not going to argue that monoliths are perfect. But an intentionally-designed monolith has a comprehensible solution to each flaw, and unlike a microservices architecture, each one you resolve creates a feedback loop of improvement with internal scope.

To improve your monolith in some dimension — performance scaling, the ease of onboarding for new developers, the sheer quality of your code — you need to invest in the application itself, not abstract the problem to a third party or accept a higher cloud computing bill, hoping that scale will solve your problems.

Of their experience, the Amazon Prime Video team wrote, “Moving our service to a monolith reduced our infrastructure cost by over 90%. It also increased our scaling capabilities. … The changes we’ve made allow Prime Video to monitor all streams viewed by our customers and not just the ones with the highest number of viewers. This approach results in even higher quality and an even better customer experience.”

Since the Amazon Prime Video engineering team published their blog post, many have argued about whether their move is a major win for monoliths, the same-old microservices architecture with new branding or a semantic misinterpretation over what a “service” is. No matter the reality, their story is still a compelling illustration of the many ways microservices’ benefits are sold as infinite, as though complexity is allowed to simply leave the system. In reality it remains hidden, waiting for you to run up against one of many hard limits.

These challenges and hard limits still exist with monoliths. Scalability, more comprehensive monitoring and a better end-user experience into our deployments are the types of efforts we’re never fully satisfied with. Getting them even halfway right requires time, expertise, grit, creativity and lots of expense. But with a monolith, at least those costs are yours to own. So are your wins.

Another great way to own your costs and wins? Build not just a monolith with intention, but do so on a monorepo. Next, we’ll be back for the third and final part of our mono-centric series, where we’ll focus on a single repository as the magical point where code, conversation and collaboration meet.