Monorepos: HAL 9000-Approved Code Management and Collaboration

This is the third in a series. Read Part 1 and Part 2.

2001: A Space Odyssey” doesn’t spend time on the training contexts of its antagonist, HAL 9000. The only clues we get are upon its “death,” where it regresses back to its earliest lessons, like singing “Daisy Bell” as it winds down for the last time, suggesting it started with the very basics and grew more complex only with time and practice.

In our present version of the future the movie projected, AI models are often developed by applying deep neural networks to identify and eventually understand the patterns and relationships between entities in a vast corpus of unstructured data. Same goes for HAL 9000, which would have been asked to explore natural language, facial expressions, emotional expressiveness and piloting spacecraft to build the intelligence it required for the mission at hand — and don’t forget high-skill chess gameplay.

A neural network, like the human brain itself, is a messy web of connections that only work because of the sheer size and complexity of the data on which it was trained. There is no formal schema. You can’t isolate one facet of intelligence from the next. The result is a single point of reference, knowledge and conversation for the organization of astronauts on their way to Jupiter.

The same goes for a repository of code — the more isolated its parts, the less intellectual and effective the whole. The “mess” of a monorepo is actually its power. As this distinction becomes clearer, the next time your CTO, Maeve, interrupts your standup to complain about your monorepo and demand you break it apart, you might finally have the confidence to say:

“I’m afraid I can’t do that, Maeve.”

A monorepo ≠ a monolith

This is an important distinction because a monorepo and a monolith are not the same thing. They are, however, close cousins.

If you haven’t been following our three-part series on what we love about intentionally-built monolithic applications, you just might want to start with the first, “Monoliths: A Space Odyssey to Better DX via Intentional Design,” followed by the second, “Falling Into the Stargate of Hidden Microservices Cost.” Now that we’re all on the same page…

A monorepo relates to how you store, organize and version control your code. Much like your closet and all your childhood toys, a monorepo is a single repository, organized into a hierarchy of folders for independent projects, features and services. No matter what you’re looking for, you know it’s just a few clicks or a search away in your IDE or on GitHub. The alternative approach is to break your app into multiple logical components and version control that code in separate repositories.

A monolith relates to the architecture of an app or service. In most cases, a monolithic app is a tightly coupled collection of features and services, which is then deployed on-premises or in the cloud as a single artifact, either a binary or cohesive unit of interpretable source code. The opposite of a monolith is microservices, where you’re orchestrating dozens of repositories and codebases using serverless functions or a container orchestrator like Kubernetes.

Most teams develop monoliths in a monorepo, but that’s not always true. It’s possible to compile multiple repositories together during deployment, so that it’s a monolith only in production. The same flexibility applies here. Google famously uses a monorepo with more than 2 billion lines of code to version control and organize massive swaths of its complex apps and infrastructure.

The many reasons to love monorepos

With that resolved, let’s talk about what makes monorepos so flexible, powerful and fun to work in day to day.

Monorepos offer a single source of truth. The more you tend toward splitting your work into small systems or repositories, the more your knowledge and peers are siloed. You can more easily forget about the development of services or shared libraries you don’t deal with regularly, which at first feels like a luxury, then becomes a hindrance when the time comes to get yourself back up to speed. A monorepo incentivizes code sharing, as simplicity is the key to building an intentional monolith that outlives its common demises.

Monorepos simplify cross-project development or refactoring. If you create a pull request (PR) in your monorepo that introduces breaking changes to a shared library, your responsibility now extends to informing other folks of what you’ve “fixed.” In a monorepo, this probably happens via your CI pipeline running a test suite, which catches the broken calls or unexpected data formats, giving you and others context as to what they should do next.

In a multi-repo environment, you need to manually jump into downstream repos to inform others of what needs to be fixed — if they can figure out how to test your PR against their service.

Easier collaboration and code sharing. A monorepo flexibly enforces code ownership, which breaks down the strict silos between repositories. At the same time, a properly set-up CODEOWNERS file enforces validation from said stakeholders while still allowing everyone to contribute and learn more about the system, giving you the best of both worlds.

When two teams collaborate on building a new feature that requires a harmony between their services, they can more easily work across a monorepo to write new code and introduce proper changes without creating an entirely new repository of questionable ownership. Instead of multiple PRs in multiple repositories, which must be merged and deployed in a particular order, a monorepo narrows it down to one, or at least a short sequence in a single changelog.

Smoother dependency sharing. Once multiple services require common transformations and functions, logic tells you to build a shared library to keep folks from reinventing the wheel. But a shared repository, stuffed into yet another repository, isn’t a panacea. It’s too easy for services to pin a specific version, instead of the latest, because they don’t want to deal with change. Everyone in your organization loses sight of who's using the library, for what purpose and whether it’ll continue working in the future.

In a monorepo, your shared library becomes a Tier 1 operation. Folks are synced with its current state, not whichever version last worked for them. Consistent, quality implementation is almost always worth a little bit of extra work.

Simpler deployments with clear visibility into shared logic. In a monorepo, you can deploy a new major version with a single pull request. Multiple repositories, especially for organizing a microservices architecture, require a far more granular approach, with more synchronous movements that need to be carefully orchestrated and monitored on the big day. Each task incurs hidden costs and a deeper reliance on third-party platforms, like your cloud provider or CI/CD pipeline platform.

With a single point of visibility into what’s changing, monorepos leave far fewer opportunities for the unknown unknowns of release day to rear their fearsome heads.

Faster code reviews and tests. A monorepo gives developers a far better experience for tracking changes and delivering insightful PR reviews. With a proper CODEOWNERS file, required stakeholders get notified about relevant changes and block the merging of code they haven’t explicitly approved. No one is prevented from cross-team collaboration, even providing code quality suggestions for services far outside their purview, and everyone stays synced on what impact each change will have on the larger system.

Better team culture. A monorepo is a technological decision that, for once, minimizes technology. Where multiple repositories incentivize isolation and less shared responsibility while also introducing more technological complexity, a monorepo is more about team culture, transparent process and collaboration across functions. 

A monorepo might look intimidating at first, but one you build with intention offers clear points of ingress for new developers to contribute, track changes and ultimately learn more about the system as a whole.

Building a monorepo with intention (and love)

Like a monolithic app, you shouldn’t build a large monorepo because it’s the default for a new app, or even when your framework encourages it. Instead, you should do so intentionally and with full awareness of the technical hurdles, like git performance, code ownership, merge/rebase strategies, long-living branches and more.

You also have plenty of technology at your back these days, with projects like Bazel, Grade, Lerna and Nx all moving quickly to address the inevitable growing pains of building a monorepo at even a fraction of Google’s. For the more short-term concerns, you can use shallow clones with git for better performance, and finding the right CI platform goes a long way in resolving common misconceptions around monorepos with features like blue-green deployments to replace complex rolling deployments.

Conway’s Law argues that the structure of a system is determined by the communication structure of the organization that built it. Would you like that structure to be well-groomed but ultimately fractured, or could your monorepo operate like a central hub, where all your developers and engineers can meet, chat and build? It’s a bit like a less antagonistic HAL 9000: everything you need, all in one welcoming place (or voice), and no risk of being locked out of the pod bay doors.