Let’s start with a statement everyone can agree on: AI coding is now a permanent fixture of the software development lifecycle.
Even if your team hasn’t introduced it officially yet, it’s only a matter of time before one of your peers kicks off the discussion or pushes a PR that, perhaps even unbeknownst to you, is entirely AI-generated.
Now, let’s rub some folks the wrong way: AI coding in a microservices architecture is a recipe for disaster. Instead of simply complex systems, you’ll have complex systems no one human on your team is responsible for or even understands.
AI-generated microservices become dark energy—you know it’s there, but you don’t know anything about it. When the time comes to troubleshoot an incident and figure out a remediation, you’ll wish you had another human—not an <input> field in your AI assistant of choice—to help you figure out what to do next.
Monoliths are the only answer to our AI-coded future.
The joys and unknown unknowns of working with AI-generated code
We’re generally bullish on AI coding tools like GitHub Copilot and OpenAI Codex. We think they’ll help developers and engineers build faster and focus on more immediately impactful work, like configuring testing suites or cost-optimized deployments, or more exciting work, like building proof of concepts for customers or unique integrations.
We also recognize that AI-generated code is getting much better very quickly. More usage and feedback create better training data, which creates better functions for you to copy-paste into your project and move on to the next thing.
It also folds new and unsolved problems into your software development lifecycle.
Sheer volume
After analyzing hundreds of millions of lines of code from commercial products and open-source projects, the folks at GitClear produced a report with some harrowing numbers: After the rise of AI assistants for code around 2021, the number of commits of most types (such as adding, deleting, updating, copy-pasting from one file to another) has more than doubled. Changed LOCs increased from nearly 37 million in 2021 to 66 million in 2023.
AI coding helps developers work faster, but at what cost of technical debt and unreviewed code?
Novel implementations
AI coding tools are very good at generating functional code for specific actions or user stories.
If you ask ChatGPT to write you a Ruby on Rails controller to call the Coffee API and return a list of names and descriptions, it’ll quickly give you step-by-step directions for bundling httparty to fetch the API and creating an appropriate View and route. Your results will almost always result in a function/controller/method that works straightaway.
The problem is that working isn’t the same as following your organization’s standards or using unconventional patterns that aren’t immediately understandable without multiple jumps in a single file’s LOCs. The more complex the request to your AI coding assistant, and the more microservices involved, the likelier you find implementations that work for end users, but not you.
Resource bloat
Your AI coding assistant might quickly create a new view or function but use less-performant algorithms or methods.
Performance optimization is a highly specialized function that requires proper observability platforms, deep experience, and intuition for root cause analysis. Without those, you’re more likely to let performance bottlenecks reach production and negatively impact the end-user experience.
Bias and data recency
Your AI code assistant only understands how to write code because it’s been trained on a specific dataset. If that data is biased toward certain conventions, like how it names variables/functions or data/component architecture, it could easily create patterns that are deeply incongruous with how your team already works, creating rework.
If that data is buggy, so are the results, creating even more complex rework.
There are few guarantees that AI companies will continue to have access to new training data to consistently improve their models. Just days after Stack Overflow and OpenAI announced a new partnership to pipe answers into ChatGPT, the users who supplied those enormously valuable answers have started deleting and defacing them in protest.
Imprecise quality and security
A recent Stanford study found developers with access to an AI coding assistant were “significantly more likely to provide an insecure solution” to common and well-documented attack vectors, like SQL injection. They were far more likely to use “trivial ciphers” and fail to validate user input.
That GitClear report also projected that code churn—the percentage of LOCs reverted or updated less than two weeks after being committed—will double compared to their 2021, pre-AI code baseline, and up nearly 40% from 2022 to 2023. Churn is a leading indicator of initial quality, and a strong signal that AI-generated code requires more post-commit clean-up than its human-written counterparts.
Lack of documentation
By default, AI assistants don’t layer documentation into their code. They might explain their methods as part of the larger output, but capturing that “knowledge” becomes another task for you to handle.
You can prod the assistant into documenting a specific method, but there’s no guarantee it’ll write inline documentation in a way that’s valuable outside of code review.
During an incident, you’ll need to reverse engineer the AI code to understand how it operates, how it’s gone wrong, and which LOCs you’ll need to fix to get your app functioning again.
Lacking skills
Meaning your skills. Yes, the human who’s reading this.
We have the utmost confidence in your skills as a developer and reviewer of your peers’ code. But are you a good reviewer and maintainer of AI-generated code? Are you as good at the latter as the former? Given how recent this phenomenon is, the answer is most assuredly not.
Given all the issues bundled into AI-generated code commits, you’ll have plenty of opportunities to practice. But will you get enough practice to keep up with the sheer volume?
Microservices are uniquely at risk
Even the folks most bullish on microservices architectures and the cloud native ecosystem can agree that microservices systems on Kubernetes clusters are inherently complex, even at their best. The Cynefin framework is a good way to model this conversation.
For the most part, microservices don’t solve technical problems nearly as often as they solve the organizational and collaborative problems that most affect massive tech companies with dozens of teams and hundreds of developers. They enjoy the microservice architecture because it provides clear boundaries of responsibility, codified into API contracts, instead of fuzzy, porous zones distinct only due to programming discipline. Your organization isn’t likely working at the scale where such a complex solution does you enough good.
Any microservices architecture trades the simplicity of the API surface for services so loosely coupled their interactions become ridden with unknown unknowns that are extremely difficult to discover and diagnose. Even if you have deep experience in troubleshooting systems, your intuition only gets you so far because the relationship between cause and effect is rarely a straight line. The root cause analysis effort and building a remediation feels like playing detective, pinning clues to the wall and hoping the answer will eventually come into view.
This process will only become harder once you have no human peers to ask for help. Once AIs own the code domain, you will have to own the puzzle-solving domain—entirely from scratch and as quickly as possible, because even if ChatGPT wrote your code, the buck still stops with you.
Your mitigation strategies?
The clearest solution to the challenges around AI coding, particularly in a microservices environment, is to layer in even more observability around reliability and application performance monitoring (APM).
If you can catch anomalies during the pull request phase, perhaps thanks to a staging environment generated by your CI/CD pipeline that actively monitors performance, you’ll have a less time-crunched path to remediation. Naturally, this comes with a significant escalation in the size of your observability infrastructure, the cost of aggregating data from dozens of microservices, and ongoing maintenance. Not to mention the brainpower necessary to understand the many flows and correlate them.
You’ll also probably need more complex integration and regression testing suites. The more your AI-generated code interacts with many existing services, which were designed and developed by a human, the more you’ll find modes in which their interactions fail in hard-to-identify ways.
AI coding tools meet monolithic apps
In our experience, monolithic applications’ simpler architecture and better developer experience will clarify the inevitable future of AI-generated code.
1. Narrow and deep (not wide and shallow) monitoring. In a monolith, tracing the flow of execution and debugging is more straightforward because all application components are contained within a single process. With far less of a process/service landscape, your observability platform can instead go deep on the nuances, collecting and aggregating application performance data into useful APM dashboards with actionable insights.
That single source of comprehensive detail will eventually help you recognize and debug complex issues faster—especially during the stress of an outage or incident.
2. Easier path from development and testing to production. The simplicity starts early. As you develop, you maintain better visibility into how all your code (even the LOCs your AI assistant took care of) interacts with other functions/methods by operating from a well-documented codebase.
The same idea applies to your testing frameworks—because there is no orchestration between services, you operate all end-to-end and integration tests in a unified environment to protect end users from show-stopping bugs.
You deploy your production monolith as a single unit, minimizing the operational overhead typically associated with scaling microservices. That’s time you get back to owning potentially massive wins around standardization, automation, and networking simplicity.
3. Consistent and reliable performance. A monolith avoids the latency inherent in cross-service communications, where Service A and Service B might operate from different regions and cloud providers—even more important when most AI-generated code is currently not optimized for distributed systems.
They also minimize the number of discrete services that must function in lockstep to deliver the right end-user experience, which in turn minimizes the possible vectors for bugs and unexpected behavior.
We can’t claim a monolith will always be more performant than the same application written in microservices, but we can say with certainty that its performance will be far less affected by external factors. When you’re ready to dig into performance optimization, implementing a robust APM solution takes only a few minutes—not weeks of work and the enormous per-node cost that comes with microservices-specific observability.
4. Deeper, more intuitive understanding. A monolith is inherently a single source of truth. One that’s far easier to understand holistically.
With a monolith, there’s no long-winded discovery process of scanning multiple codebases and coordinating with many teams. You’ll be grateful for that simplicity during your next incident as you’re trying to understand on the fly how an individual function, seemingly gone haywire, works in relation to the big picture.
Unlike their micro-sized counterparts, these qualities don’t disappear—or even get meaningfully more complex—once you throw AI-generated code into the mix. You’ll still need to deal with issues around volume, quality, bias, and resource bloat, but with a more comprehensible suite of testing and APM platforms.
How Scout APM monitors AI (and human)-generated monoliths
Over the next few months, we’re preparing our customers for the volume, novelty, and imprecision of AI-generated code with new features designed for observing monolithic applications without the complexity and cost of platforms like Splunk, AppSignal, and Sentry:
1. A new free tier that includes all functionality and unlimited users, with up to 10,000 daily transactions to monitor the performance of your next Rails or Django MVP project—exactly the environment you or others are most likely to leverage AI code.
2. Log aggregation (coming soon) that lets you ingest logs from all your applications for a unified search experience, which will significantly streamline your process of understanding an incident and whether a human- or AI-generated code is responsible.
3. A durable list of insights history, providing more context to your traces, to help you monitor performance progress over time and spot those pesky intermittent issues that otherwise go unnoticed… except by your end users.
What’s next?
If you’re thinking about building a new app with an AI assistant at your side, don’t make your life even more complex by throwing microservices into the mix. If you care about future-proofing your application in a very unpredictable time, monoliths are your best and safest bet.
That said, no matter which route you choose, you should always apply the same standard of quality to both human-generated and AI-generated code through comprehensive code review and revision.
When you make the right choice and go with a monolith, know that Scout APM is here to help you monitor your application’s performance throughout its lifecycle and over time. Join us on the new free MVP tier to quickly build the knowledge and intuition of your application’s performance so you can commit new code—even the AI-generated kind—with confidence.
Honestly, it’s the only hope many of us have even seeing, much less keeping up with, the spread of AI-generated dark matter that’s still to come.
sarah@scoutapm.com