HACKER Q&A
📣 factorialboy

What are the pros / cons of using monorepos?


Or in other words, when would you recommend using them, and when would you avoid them?


  👤 gen220 Accepted Answer ✓
I've worked in environments across the version-control gamut. The best-run places have had monorepos. But, for the life of me, I would not trust the companies that didn't have monorepos, to operate a monorepo.

To go mono is to make an org-level engineering/cultural commitment, that you're going to invest in build tools, dependency graph management, third-party vendoring, trunk-driven development, and ci/cd infrastructure.

Can you make a mono repo work without all of those things? Yes, but you are sacrificing most of its benefits.

If your eng org cannot afford to make those investments (i.e. headcount is middling but there is zero business tolerance for investing in developer experience, or the company is old and the eng org is best described as a disconnected graph), forcing a monorepo is probably not the right idea for you.

Monorepo vs microrepos is analogous in some ways to static vs dynamic typing debate. A well-managed monorepo prevents entire classes of problems, as does static typing. Dynamic typing has much lower table stakes for a "running program", as do microrepos.

edit:

It's worth noting that open source solutions to build tooling, dependency graph management, etc. have gotten extremely good in the last 10 years. At my present company (eng headcount ~200), we spend about 2 engineers-per-year on upgrades and maintenance of this infrastructure. These tools are still quite complex, but the table stakes for a monorepo are lower today than they were 10 years ago.


👤 joshuapark
Things that change together, go together. Monorepos prevent people from having to dig in all imaginable places of your version system to find all pieces of your application. At the same time, if you have 10 micro services which are accessed by 1 frontend, it may be a little bit messy to keep all that code in the same place.

Common sense (which is not that common) is what should be used to determine. Ask yourself some questions:

- Does these repositories change together ? - If the code is not together, one repo should be linked to the over via what (git tag ? version number ?) - All pieces of the application are contained within a context ? Or they can be split (example, 10 micro services and 1 frontend, but 5 of those micro services are also used by another frontend in another project).

The main goal is to make things easier, some PoC and experiments may make it more clear because it really depends on the situation.


👤 lallysingh
Pros:

* Single version / branching for everything

* Commits that go across components/apps are atomic.

Cons:

* When it gets big, those features matter less

* Churn from other dev's stuff gets in your merge/rebase work.

* 'git log' and other commands can be painfully slow

* Mistakes in the repo (e.g., committing a password) now affect many more people.

Use for highly-coupled source bases. Where releases together and atomic commits are very useful. Every other time, you're probably better off splitting early and taking a little time to build the coordination facilities needed for handling dependency versioning across your different repos.

Note: I used to really prefer monorepos, but I've had some time away now to take a better look at it. Now they feel like megaclasses where small things are easier (because everything is in one place) but large things are way harder (because there's no good split point).


👤 dcolkitt
For me the biggest pro and con are the same thing. With a monorepo your respective projects' codebases become tightly coupled.

Why that can be a bad thing? Good software engineering practices tend to be associated with loose coupling, modularity and small, well-defined interface boundaries. Putting each project into its own repo, encourages developers to think of it as a standalone product that will be consumed by third-party users at arms length. That's going to engender better design, more careful documentation, and less exposed surface area than the informal cross-chatter that happens when separate projects live in the same codebase.

Why that can be a good thing? The cost of that is that each project has to be treated as a standalone project with its own release schedule, lifecycle support, and backwards compatibility considerations. Let's say you want to deprecate an internal API in a monorepo? Just find all the instances where it's called and replace accordingly. With a multi-repo it's nowhere near as easy. You'll find yourself having to support old-style calling conventions well past the point you'd prefer to avoid breaking the build for downstream consumers.


👤 fetbaffe
It usually begins as

let’s split our project into multiple libraries so it can be reused within the company or even better if we put it on GitHub everyone can use it.

After a few months you notice nobody within the company cares about your nicely versioned libraries and on GitHub you get more and more complaints that the libraries are very limited and need more features.

After that you merge everything back into your monorepo and try to forget all the time wasted on git bureaucracy, versioning, dependency handling and syncing changes between repos.


👤 dehrmann
Monorepos:

- Work best when you have an open culture

- PCI compliance will be annoying

- The obvious--everything in one place

- Need good tooling around keeping master building

- As they grow, become an uphill battle to use with an IDE

- Test are likely to slow down as the repo grows, so tooling around tests

- Usually lead to a rats nest of dependencies

- Third-party library upgrades can be painful

- Coupled with CD (and it really needs to be coupled with CD), it's easy to get surprise breaks

Multirepos:

- Every team will need to dabble in build and release engineering

- Changes across repos are slow and painful (I claim this is a feature because it makes you think about versioning and deployment)

- Library developers have to think more about versioning

- You'll probably need a binary repository like Artifactory

- More time and tooling needed to do library upgrades (especially interesting for security issues)

- Harder for people to switch teams


👤 gregcoombe
Google posted a paper detailing their reasons for choosing monorepo: https://research.google/pubs/pub45424/

Caveat: your company probably isn't Google, so your challenges may be different.


👤 scarmig
It depends. At a high level, "at scale" you'll have to solve all the same problems for both, to the point where you have a dedicated team or teams solving those problems. Monorepos don't automatically solve issues of version skew or code search or code consistency, and multirepos don't automatically solve problems of access control or performance or partial checkouts. All a monorepo strategy does is say that all your source files will share the same global namespace, and all a multirepo strategy does is say that they can have different namespaces (often corresponding to a binary or grouping of closely coupled binaries). Everything after that is an orthogonal concern. As far as it goes, conceptually monorepos appeal to me, and they offer more discoverability and a simpler, more consistent interface than multirepos. It's also worth considering that there must be some kind of trade-off if you need to pull in the abstraction of "separate repos" to handle code: typically you have fewer guarantees about the way source files will interact when they're in separate namespaces, which makes some things harder.

But if you're just starting out, you're going to be going with off-the-shelf components. Usually this is git hosted on GitHub, GitLab, or something similar; there's a good chance you're going to be using git. Vanilla git works sub-optimally once you reach a certain number of contributors, a certain number of different binaries being generated, and a certain number of lines of code, as a lot of its assumptions (and the assumptions of folks who host git) focus on "small" organizations or single developers. You aren't going to have a good time using a vanilla git monorepo with tens of millions of lines of code, and hundreds of developers, and dozens of different projects, even though in principle you could have a different source control system that would function perfectly well as a monorepo at that scale.

My general approach would be to start with a git monorepo, do all development within it, and once that becomes a pain point migrate to multirepo.


👤 quotemstr
You always have a monorepo whether you realize it or not. The only difference is how your monorepo is organized.

If you split your codebase into lots of little SCM repositories and manage dependencies by pushing prebuilt artifacts here and there, all you've done is create a new meta-repo system on top of your SCM. And usually, this dynamic meta-repo coordination system is not only implicit, not only undocumented, but also not even understood by any single human being --- yet your project's success depends on the operation of this meta-repository built accidentally and unconsciously from coordinated development in tons of little repositories.

I'm strongly in team formal-monorepo. Just make a single big repository that holds everything you depend on. Be honest and direct about the interactions of your components --- explicit, not accidental, as in a microrepo ecosystem.

The performance-based arguments against monorepos don't apply as strongly as they once did: Mercurial and Git (moreso the former) have seen a lot of work for optimizing big repositories, and both support a sparse checkout model that allows people focused on specific tasks to focus on a small part of the larger repository without splitting the repository apart.


👤 anonymoushn
I wish I could use monorepos more often!

The only legitimate reason I have encountered so far for not using a monorepo is open-source dependencies that the company maintains patches to/contributes changes to. In this sort of situation, you'll want to have them in their own repository so that you can go from "foolib v5 plus our changes" to "foolib v6 plus our changes" without too much fuss.


👤 dec0dedab0de
Use them whenever things are one project, only split them up when it's legitimately separate projects.

At work I set up a repo with our back-end code in one directory, the front-end in a directory, and the ansible playbooks/other deploy stuff in another..

This allows us to automatically create servers for every branch because we know that everything goes together. Working on other projects where they have all of these separate is a nightmare. Especially when they use microservices and each service has its own repo and each library has a repo... I want to smack anyone who ever thinks that is a good idea


👤 marcrosoft
Reduced overhead for ci/cd, code review, code syncing and branching.

No need to sync multiple repos and have complicated dependency versioning for in house libraries.

In house libraries are easier to refactor as dependent code can be updated in a single PR.

Code tends to be more reusable and you tend to make more reusable libraries as it is as easy as creating a new directory in the repo.

Services and tools can still have separation of control with all the monorepo advantages.

The only reason to not monorepo is if you are thinking of open sourcing the lib/tool you’re building but even then you can cut it out of the monorepo.


👤 mcherm
The big advantage of a monorepo is that dependency on changes to a "library" or component that gets reused in multiple places can be made explicit. A user can commit a change that safely modifies a library or component in a non-backward-compatible fashion AND changes all the places that use it. This is far simpler than the alternative of releasing a non-compatible version of the library/component and maintaining both in parallel for a period while the users transition.

The big advantage of separate repositories is that the work done on them is very clearly marked as independent. It becomes easier to take one component out and utilize it elsewhere... but at the cost of having to deal with dependency management. Dependency management is simple and automatic in easy cases and really terrible in hard cases (like when two components you use mandate mutually exclusive versions of a third component).

So I think the driving factor in the decision is the tradeoff between the degree to which you are willing to invest in careful packaging of each component (and the corresponding dependency management), versus your willingness to force everyone and everything to utilize the monorepo.

(Notice I didn't mention performance issues. I really don't think they are important enough the control the decision.)


👤 closeparen
A monorepo is a transfer of convenience from people who make services to people who make libraries. A library author's change automatically runs the tests (pre land) and CI/CD pipelines (post land) of every dependent service, where previously it would have taken months of wrangling to distribute a library release this widely. But service author laptops are now struggling under the weight of every line of code at the company. Git operations take long enough for you to lose interest and tab over to HN. After a git pull, many of your dependencies have changed which can break things (at worst) or just invalidate a lot of cached compilation and make the next build slow (always). You need the build system to generate a "mask" for your IDE so that code intelligence doesn't try to index the whole thing in memory, etc. Buck and Bazel are less familiar and less ergonomic than languages' native or customary toolchains. You need further tools (or a lot of manual effort) to keep the monorepo build system's own configuration files in sync.

👤 Communitivity
Mono-repos are either a shortcut to avoid release management and dependency management, or are a way to manage development at scales approximating Google's.

First, for almost all companies copying non-selectively what Google does is harmful - you are not Google.

Second, if you are a small team, your code only produces one binary that is shared across team boundaries, then you might be able to do with a monorepo. But if you are refactoring pieces of your code into libraries (you probably should be) and sharing those libraries for use by other teams in your company (you maybe should be), then you probably shouldn't use a monorepo.

However, converting may be a painful process. Using multiple repos, one per shared product (library, SDK, web site, documentation package) is harder to manage because you need to manage your dependencies much better and pull in the correct versions for the current product.

You also will need to communicate changes on products better across your teams, ideally with automatic notification of new releases that include a changelog and examples of new/changed features.

As part and parcel of the above you'll need to have better release management, including a release repository. What you use will depend on your environment, the language(s) your team uses, and your budget. There are some good Open Source dependency repositories out there that can be used to accomodate NPM-style dependencies, JAVA dependencies, and others (e.g., Artifactory).

In summary migrating away from monorepos is going to mean: investing in good DevOps people and giving them what they need to create/install/manage good processes; learning these new processes and dealing with the added work they impose on developers. But, it also will likely give you better products and over time both speed up development time and reduce defects (in part through more well-defined API touchpoints).


👤 bergie
Pros: it is easy to make changes across multiple parts of a system

Cons: it is easy to make changes across multiple parts of a system


👤 dinkleberg
Mono repos can be useful if you have different services that are highly dependent on each other. This lets you package releases in a simple way by just tagging the repo. That tag/release gives you the exact version of all the components that you need.

But CI becomes substantially more complicated. You have to figure out with every commit what actually changed and based on that change what needs to be tested and built. You don’t want to run a build of all components if you only changed a small piece of one component.


👤 bbotond
I don't have a lot of experience with monorepos but I've found them very useful for:

- Simple projects with a server and an SPA component - frontend and backend code for the same feature is on the same feature branch, can be tested and reviewed together.

- Projects with a couple of microservices that share some common libraries - these libs can be directly referenced by the microservices instead of being published on an internal package server. This of course has its drawbacks too but overall, the benefits have outweighed them.


👤 jzoch
I find internal libraries are incredibly frustrating to manage in microservices. I spend more time changing the library, updating each service to the new version, getting PR approvals, and deploying then with a monorepo. Additionally, identifying that changes in my library dont break any of my clients are more difficult in distributed repos.

This is all with a fairly robust CI/CD pipeline so its not like i'm marred by inefficiencies elsewhere -its just an annoying process that makes me (and others) not want to write libraries.

edit: I should qualify that this is most annoying when you own the library and the services that consume the library. If the burden of the upgrade was left to my customers then the tradeoff might be different for me :)


👤 corytheboyd
I’ve pondered the same thing in my time working for web startups.

I think a good goal to instead shoot for is being able to extremely trivially build and ship a “latest” of your entire product.

You know your product doesn’t grow in sane predicable ways, and you know you won’t have resources dedicated to maintaining internal dependencies, so why try to half-ass a complex internal dependency tree? What ALWAYS happens is unexpected feature requests break clean APIs setup by engineers, which creates all this fake work of “making the breaking change” across the dozens of dependents.

Something like a monorepo could help prevent issues like this, but it’s not a guarantee you’re making it easier to ship one latest version of your software.


👤 influx

  Twitter: Monorepo
  Facebook: Monorepo
  Google: Monorepo
  Amazon: Multirepo
Of these companies, compare AWS and the number of services, new features and quick turn around time with their competitors, and get back to me.

👤 jiveturkey
This isn't answerable in short form comment thread like you would find here.

Besides the structure of the repo itself, you have to consider the "base" SCM software, as well as the interfaces that are and can be built around it.

You are probably limiting yourself to a specific implementation of git, and for that, a specific and relatively concise answer can probably be given. But the issue in general is much, much more expansive than that. The question as posed doesn't give us enough information to give a proper answer. As such, you can see how the answers so far are all over the board.


👤 jayd16
Pros:

- You can track changes across projects.

- Finding source for a project can be easier.

- You can more easily switch to code dependency instead of binary dependency. With things like gradle becoming able to pull and build git commits this is less of a win.

- Its easier to kick off downstream builds because everything is in the monorepo and you can more easily track dependencies.

Cons:

- Its a technical challenge. Git isn't great at it although its getting better.

- If you do want to lean into linking source instead of binaries you now need to have a more consistent build system across your codebases.

I prefer working in them and in my opinions its approaching a personal preference choice.


👤 simonaco
Pros:

1. single source code location 2. easier dependency mgmt 3. related changes can go in the same commit

Cons: 1. abstractions and boundaries may diverge across packages 2. build times can become unpredictable 3. CI can be unwieldy to set up

The creator of Nx Tools wrote about this recently: https://aka.ms/createmonorepo


👤 KerrickStaley
Having experienced both the monorepo approach (at Google and Lyft's L5 autonomous division) the and manyrepo approach (at Lyft's main rideshare division), my conclusion is that keeping as much code in a single repository as possible (i.e. a monorepo) is generally the best approach.

The downside of manyrepos is that you often have to merge multiple changes into different repos in order to achieve a single logical change, and each of these changes requires a code review, waiting for CI, etc. For example, you may have a library that is shared by several services that your team owns. If you want to change some logic that lives in the library and propagate that change to a service, you have to make the change in the library and then bump the library version in the service. The latter change, while simple, is pure busywork, and at Lyft this second step easily adds 1-2 hours of overhead between waiting for code reviews and waiting for CI. Monorepos therefore make it much easier to share code and meaningfully reduce overhead for your team.

One of the cited downsides of monorepos, VCS scalability, only really kicks in for very large teams. At Lyft L5, we have a single shared Git monorepo hosted on GitHub that hundreds of engineers contribute to daily, and to my knowledge we haven't hit serious problems with Git itself (although I last worked in that org about a year ago). We did run into a few peripheral issues though:

- People kept inadvertently merging things that broke master. This would happen when two incompatible changes were merged at nearly the same time, or two incompatible changes were merged a few days apart but the second change was based off a stale master from before the first change, so tests pass on the branch but not after merge. We ended up solving this by having "deliver" branches that are basically master branches for a single team; if you break the deliver branch the only people you have to answer to are your immediate teammates, and it doesn't stop all other merges across the org. Deliver branches are periodically merged into master by a release manager who handles merge conflicts.

- CI got progressively slower. We addressed this by making improvements to the build system, optimizing tests, granularizing dependency graphs, using C++ idioms like forward declarations and PImpl to reduce dependency chains, and so on.

If you have a monorepo, you probably want to use a tool like Bazel. And since Bazel is, to my knowledge, the best tool in the world for doing what Bazel does, that means you probably want to use Bazel. Bazel has you specify your dependency structure as a DAG, and then allows you to quickly re-run tests on PRs based only on the code that changed. It also makes builds for C++ and other compiled languages blazingly fast, and if you're building C++ I don't think there's a better build system out there, monorepo or no. Bazel is a complex tool though so I'd encourage you to read through its highly detailed user manual, and if you have any questions ask on Stack Overflow where you'll often get a response directly from one of the core maintainers.


👤 solarhess
We are breaking our monorepo into 3 kind-of mono repos based on release cadence. A repo for each of two large applications plus a third repo for shared library code. We import the share library code onto the app repos using a git sub module. Has anyone else used git sub-repos to import common code?

👤 parsley27
I've heard many teams are breaking repos up by team, and I think that makes sense for many non-Google companies, but what I wonder about is whether or not clients and backends should ever be bundled in one repo.

My gut tells me they should be separate, but I'm curious what other people think.


👤 collyw
Is there any webpage or book that explain how to set up a monorepo and what not to do? I haven't really though about it until now, but certain things sound kind of complicated, though I could see it simplifying other things.

👤 2rsf
Simplicity and being able to checkout and work on single folders, this is not directly related to the repo being mono but usually you wouldn't use git as your vcs when working with mono repos.

👤 lumost
code reviews and ownership boundaries are less clear on a large code base in a mono-repo. Depending on your team culture, this will either be a pro or a con. While careful design of the build system can mitigate build times, there is likely to be some disagreement on basic practices such as "how to launch a service" or when to use spring vs. guice vs. rails that will be tougher than simply letting everyone go their own way.

Refactors and cross-service deployments can become much simpler in a MonoRepo.


👤 x87678r
Do people big monorepo's always clone the whole thing? Our CI/CD (TeamCity) always seems to check out the whole repo even if you want just one sub directory.

👤 ssivark
It would also be interesting to hear about the tooling available/recommended for maintaining monorepos.(ideally FOSS tooling, but good proprietary ones as well)

👤 snicker7
Tangentially: How does one manage a monorepo based on git?

👤 dboreham
Pros: some tooling features just don't work across repos (e.g. npm dependency resolution). A monorepo works around these limitations.

Cons: un-separation of concerns.


👤 ajhurliman
Pros: reusable tooling, consistency across projects

Cons: huge blast radius, makes it difficult for groups to be truly independent


👤 quicklime
I won't try to be exhaustive here, but I think it's worth mentioning a few things:

One con is that most open source (as well as publicly available but proprietary) tooling is geared towards the non-monorepo approach. So if you want to use a monorepo, you're going to have to fight a bit of an uphill battle because most tools and processes assume you have lots of little repos. For example, build triggers in a lot of CI/CD tools operate on a per-repo basis.

There are some random benefits - it's easier to make everything public-by-default, which encourages people to look at source code written by other teams, and creates a culture of transparency and internal openness.

But the big thing in my opinion isn't directly the monorepo itself, but what's known inside Google as the "One Version Rule"[1]. Basically this means that only a single version of any package should be in the repo at a time. There are exceptions, but that requires going to extra effort to exempt your package from this rule.

I guess Chrome and Android are examples of this - they are made up of lots of little Git repos that are stitched together, but they generally follow the One Version Rule. On the other hand, if you just stick a lot of npm modules in there and every single one has a separate package.json file, then it's technically a "monorepo" but it's not following the One Version Rule.

You also need good test coverage. Not just in terms of line coverage or some other artificial metric, but to the point where you could say "I feel reasonably comfortable that a random change will be caught by my tests". This lets people in other teams catch regressions without having to have a detailed understanding of your team's codebase.

So once you've got these three things - 1) a monorepo 2) with everyone following the One Version Rule and 3) lots of tests - it means that dependency owners can update all of their consumers at once without much effort. They just make a change to their base library, and the build system will walk the dependency tree and figure out all the consumers that could possibly break, and runs all the appropriate regression tests.

This is the inverse of how it normally works at large companies, where each team pulls in their dependencies and pins them to a specific version. At most companies, updates require extra effort, so the default is to let everything go stale. This is especially problematic when security vulnerabilities are released (e.g. to an ancient version of jQuery) but teams can't update until they migrate off of an old API. It also means that library owners regularly have to maintain multiple old branches for months or even years after the initial release, because everyone's too afraid to update.

I personally think it's a myth that you need to be "Google scale" to benefit from a monorepo. In my opinion, you only need a few tens of repos before all the different combinations of semvers get unweildy. For me, going from Google's monorepo to a company that is built around lots of little repos in GitHub Enterprise felt like going back to the CVS/RCS days, where every single file had a separate revision number and changes weren't made atomically.

[1]: https://opensource.google/docs/thirdparty/oneversion/


👤 sakisv
A couple of disclaimers first.

1. I've only used monorepo professionally once in a big org and it was SVN based 2. I'm going to look at monorepos from the perspective of git and github

The main argument I can think of in favor of monorepos, is to maintain the cohesion as high as possible between different parts of your system. For example, if you want to make a change to the load balancer regarding TTL, you can also go and make the same change in your API and your mobile clients and in the end you create one single PR and you have your tests run against that single revision.

Compare and contrast the same scenario in the traditional multi-repo approach: You make your changes to your `infra` repo, then you make the same change in your `api`, `android-client`, `ios-client` repos and in you create 4 PRs, that need to be reviewed at the same time. Which one would you prefer to do?

Potential arguments against are:

1. Too much noise - If multiple people are working on the same repo, you'd be getting emails for PRs for parts of the code that you may not care about.

2. Longer clone/pull times - In the same vain as before, the initial `git clone` and maybe every `git pull` after that will be bringing in a lot of code that you may not care about, increasing the time of each operation and increasing your frustration.

3. Access control - How do you limit who has write access to which part of your repo? AFAIK this isn't possible in git but it may be possible with codeowners in github, I don't know.

4. Organization - How do you structure your monorepo? Do you know in advance how many components it's going to have? How is a restructuring going to affect your history and/or dependencies between different components of your code?

5. History and code sharing - With a monorepo it's more difficult to share/open source just part of your code and keep the history intact.

Having said all that, I think the monorepo is a good match at a service level, not at an organization level. I know Google and Facebook have gone all in at an org level, but first of all they don't use git and second, they have the luxury and the resources to make it work for their use case. At a service level you should have a good idea of your boundaries, your applications and your infrastructure and assuming a relatively small engineering team (~10-12 people) eventually everyone would be confident enough with all parts of your system.

Personally I really like the idea of having one hash define the entire state of the world. I have a few common utilities that I use across my projects and I find it too annoying to keep them all up-to-date between different repos. I hope that eventually I will find some time (or be annoyed enough) that I will converge everything in one repo.


👤 nendroid
Always use a monorepo.

There is zero difference between using repos to organize things or using folders to organize things, except for the fact that updated dependencies across repos require extra steps to update if they change. More repos = more annoyance.

Use folders to organize things because common sense.

Additionally, don't use classes to organize your code, use combinators.

Folders and combinators will solve basically imo 95 percent of all organizational problems related to design.

Things like micoservices, multiple repos, and Gof design patterns only serve to make the organizational problem worse.