To go mono is to make an org-level engineering/cultural commitment, that you're going to invest in build tools, dependency graph management, third-party vendoring, trunk-driven development, and ci/cd infrastructure.
Can you make a mono repo work without all of those things? Yes, but you are sacrificing most of its benefits.
If your eng org cannot afford to make those investments (i.e. headcount is middling but there is zero business tolerance for investing in developer experience, or the company is old and the eng org is best described as a disconnected graph), forcing a monorepo is probably not the right idea for you.
Monorepo vs microrepos is analogous in some ways to static vs dynamic typing debate. A well-managed monorepo prevents entire classes of problems, as does static typing. Dynamic typing has much lower table stakes for a "running program", as do microrepos.
edit:
It's worth noting that open source solutions to build tooling, dependency graph management, etc. have gotten extremely good in the last 10 years. At my present company (eng headcount ~200), we spend about 2 engineers-per-year on upgrades and maintenance of this infrastructure. These tools are still quite complex, but the table stakes for a monorepo are lower today than they were 10 years ago.
Common sense (which is not that common) is what should be used to determine. Ask yourself some questions:
- Does these repositories change together ? - If the code is not together, one repo should be linked to the over via what (git tag ? version number ?) - All pieces of the application are contained within a context ? Or they can be split (example, 10 micro services and 1 frontend, but 5 of those micro services are also used by another frontend in another project).
The main goal is to make things easier, some PoC and experiments may make it more clear because it really depends on the situation.
* Single version / branching for everything
* Commits that go across components/apps are atomic.
Cons:
* When it gets big, those features matter less
* Churn from other dev's stuff gets in your merge/rebase work.
* 'git log' and other commands can be painfully slow
* Mistakes in the repo (e.g., committing a password) now affect many more people.
Use for highly-coupled source bases. Where releases together and atomic commits are very useful. Every other time, you're probably better off splitting early and taking a little time to build the coordination facilities needed for handling dependency versioning across your different repos.
Note: I used to really prefer monorepos, but I've had some time away now to take a better look at it. Now they feel like megaclasses where small things are easier (because everything is in one place) but large things are way harder (because there's no good split point).
Why that can be a bad thing? Good software engineering practices tend to be associated with loose coupling, modularity and small, well-defined interface boundaries. Putting each project into its own repo, encourages developers to think of it as a standalone product that will be consumed by third-party users at arms length. That's going to engender better design, more careful documentation, and less exposed surface area than the informal cross-chatter that happens when separate projects live in the same codebase.
Why that can be a good thing? The cost of that is that each project has to be treated as a standalone project with its own release schedule, lifecycle support, and backwards compatibility considerations. Let's say you want to deprecate an internal API in a monorepo? Just find all the instances where it's called and replace accordingly. With a multi-repo it's nowhere near as easy. You'll find yourself having to support old-style calling conventions well past the point you'd prefer to avoid breaking the build for downstream consumers.
let’s split our project into multiple libraries so it can be reused within the company or even better if we put it on GitHub everyone can use it.
After a few months you notice nobody within the company cares about your nicely versioned libraries and on GitHub you get more and more complaints that the libraries are very limited and need more features.
After that you merge everything back into your monorepo and try to forget all the time wasted on git bureaucracy, versioning, dependency handling and syncing changes between repos.
- Work best when you have an open culture
- PCI compliance will be annoying
- The obvious--everything in one place
- Need good tooling around keeping master building
- As they grow, become an uphill battle to use with an IDE
- Test are likely to slow down as the repo grows, so tooling around tests
- Usually lead to a rats nest of dependencies
- Third-party library upgrades can be painful
- Coupled with CD (and it really needs to be coupled with CD), it's easy to get surprise breaks
Multirepos:
- Every team will need to dabble in build and release engineering
- Changes across repos are slow and painful (I claim this is a feature because it makes you think about versioning and deployment)
- Library developers have to think more about versioning
- You'll probably need a binary repository like Artifactory
- More time and tooling needed to do library upgrades (especially interesting for security issues)
- Harder for people to switch teams
Caveat: your company probably isn't Google, so your challenges may be different.
But if you're just starting out, you're going to be going with off-the-shelf components. Usually this is git hosted on GitHub, GitLab, or something similar; there's a good chance you're going to be using git. Vanilla git works sub-optimally once you reach a certain number of contributors, a certain number of different binaries being generated, and a certain number of lines of code, as a lot of its assumptions (and the assumptions of folks who host git) focus on "small" organizations or single developers. You aren't going to have a good time using a vanilla git monorepo with tens of millions of lines of code, and hundreds of developers, and dozens of different projects, even though in principle you could have a different source control system that would function perfectly well as a monorepo at that scale.
My general approach would be to start with a git monorepo, do all development within it, and once that becomes a pain point migrate to multirepo.
If you split your codebase into lots of little SCM repositories and manage dependencies by pushing prebuilt artifacts here and there, all you've done is create a new meta-repo system on top of your SCM. And usually, this dynamic meta-repo coordination system is not only implicit, not only undocumented, but also not even understood by any single human being --- yet your project's success depends on the operation of this meta-repository built accidentally and unconsciously from coordinated development in tons of little repositories.
I'm strongly in team formal-monorepo. Just make a single big repository that holds everything you depend on. Be honest and direct about the interactions of your components --- explicit, not accidental, as in a microrepo ecosystem.
The performance-based arguments against monorepos don't apply as strongly as they once did: Mercurial and Git (moreso the former) have seen a lot of work for optimizing big repositories, and both support a sparse checkout model that allows people focused on specific tasks to focus on a small part of the larger repository without splitting the repository apart.
The only legitimate reason I have encountered so far for not using a monorepo is open-source dependencies that the company maintains patches to/contributes changes to. In this sort of situation, you'll want to have them in their own repository so that you can go from "foolib v5 plus our changes" to "foolib v6 plus our changes" without too much fuss.
At work I set up a repo with our back-end code in one directory, the front-end in a directory, and the ansible playbooks/other deploy stuff in another..
This allows us to automatically create servers for every branch because we know that everything goes together. Working on other projects where they have all of these separate is a nightmare. Especially when they use microservices and each service has its own repo and each library has a repo... I want to smack anyone who ever thinks that is a good idea
No need to sync multiple repos and have complicated dependency versioning for in house libraries.
In house libraries are easier to refactor as dependent code can be updated in a single PR.
Code tends to be more reusable and you tend to make more reusable libraries as it is as easy as creating a new directory in the repo.
Services and tools can still have separation of control with all the monorepo advantages.
The only reason to not monorepo is if you are thinking of open sourcing the lib/tool you’re building but even then you can cut it out of the monorepo.
The big advantage of separate repositories is that the work done on them is very clearly marked as independent. It becomes easier to take one component out and utilize it elsewhere... but at the cost of having to deal with dependency management. Dependency management is simple and automatic in easy cases and really terrible in hard cases (like when two components you use mandate mutually exclusive versions of a third component).
So I think the driving factor in the decision is the tradeoff between the degree to which you are willing to invest in careful packaging of each component (and the corresponding dependency management), versus your willingness to force everyone and everything to utilize the monorepo.
(Notice I didn't mention performance issues. I really don't think they are important enough the control the decision.)
First, for almost all companies copying non-selectively what Google does is harmful - you are not Google.
Second, if you are a small team, your code only produces one binary that is shared across team boundaries, then you might be able to do with a monorepo. But if you are refactoring pieces of your code into libraries (you probably should be) and sharing those libraries for use by other teams in your company (you maybe should be), then you probably shouldn't use a monorepo.
However, converting may be a painful process. Using multiple repos, one per shared product (library, SDK, web site, documentation package) is harder to manage because you need to manage your dependencies much better and pull in the correct versions for the current product.
You also will need to communicate changes on products better across your teams, ideally with automatic notification of new releases that include a changelog and examples of new/changed features.
As part and parcel of the above you'll need to have better release management, including a release repository. What you use will depend on your environment, the language(s) your team uses, and your budget. There are some good Open Source dependency repositories out there that can be used to accomodate NPM-style dependencies, JAVA dependencies, and others (e.g., Artifactory).
In summary migrating away from monorepos is going to mean: investing in good DevOps people and giving them what they need to create/install/manage good processes; learning these new processes and dealing with the added work they impose on developers. But, it also will likely give you better products and over time both speed up development time and reduce defects (in part through more well-defined API touchpoints).
Cons: it is easy to make changes across multiple parts of a system
But CI becomes substantially more complicated. You have to figure out with every commit what actually changed and based on that change what needs to be tested and built. You don’t want to run a build of all components if you only changed a small piece of one component.
- Simple projects with a server and an SPA component - frontend and backend code for the same feature is on the same feature branch, can be tested and reviewed together.
- Projects with a couple of microservices that share some common libraries - these libs can be directly referenced by the microservices instead of being published on an internal package server. This of course has its drawbacks too but overall, the benefits have outweighed them.
This is all with a fairly robust CI/CD pipeline so its not like i'm marred by inefficiencies elsewhere -its just an annoying process that makes me (and others) not want to write libraries.
edit: I should qualify that this is most annoying when you own the library and the services that consume the library. If the burden of the upgrade was left to my customers then the tradeoff might be different for me :)
I think a good goal to instead shoot for is being able to extremely trivially build and ship a “latest” of your entire product.
You know your product doesn’t grow in sane predicable ways, and you know you won’t have resources dedicated to maintaining internal dependencies, so why try to half-ass a complex internal dependency tree? What ALWAYS happens is unexpected feature requests break clean APIs setup by engineers, which creates all this fake work of “making the breaking change” across the dozens of dependents.
Something like a monorepo could help prevent issues like this, but it’s not a guarantee you’re making it easier to ship one latest version of your software.
Twitter: Monorepo
Facebook: Monorepo
Google: Monorepo
Amazon: Multirepo
Of these companies, compare AWS and the number of services, new features and quick turn around time with their competitors, and get back to me.
Besides the structure of the repo itself, you have to consider the "base" SCM software, as well as the interfaces that are and can be built around it.
You are probably limiting yourself to a specific implementation of git, and for that, a specific and relatively concise answer can probably be given. But the issue in general is much, much more expansive than that. The question as posed doesn't give us enough information to give a proper answer. As such, you can see how the answers so far are all over the board.
- You can track changes across projects.
- Finding source for a project can be easier.
- You can more easily switch to code dependency instead of binary dependency. With things like gradle becoming able to pull and build git commits this is less of a win.
- Its easier to kick off downstream builds because everything is in the monorepo and you can more easily track dependencies.
Cons:
- Its a technical challenge. Git isn't great at it although its getting better.
- If you do want to lean into linking source instead of binaries you now need to have a more consistent build system across your codebases.
I prefer working in them and in my opinions its approaching a personal preference choice.
1. single source code location 2. easier dependency mgmt 3. related changes can go in the same commit
Cons: 1. abstractions and boundaries may diverge across packages 2. build times can become unpredictable 3. CI can be unwieldy to set up
The creator of Nx Tools wrote about this recently: https://aka.ms/createmonorepo
The downside of manyrepos is that you often have to merge multiple changes into different repos in order to achieve a single logical change, and each of these changes requires a code review, waiting for CI, etc. For example, you may have a library that is shared by several services that your team owns. If you want to change some logic that lives in the library and propagate that change to a service, you have to make the change in the library and then bump the library version in the service. The latter change, while simple, is pure busywork, and at Lyft this second step easily adds 1-2 hours of overhead between waiting for code reviews and waiting for CI. Monorepos therefore make it much easier to share code and meaningfully reduce overhead for your team.
One of the cited downsides of monorepos, VCS scalability, only really kicks in for very large teams. At Lyft L5, we have a single shared Git monorepo hosted on GitHub that hundreds of engineers contribute to daily, and to my knowledge we haven't hit serious problems with Git itself (although I last worked in that org about a year ago). We did run into a few peripheral issues though:
- People kept inadvertently merging things that broke master. This would happen when two incompatible changes were merged at nearly the same time, or two incompatible changes were merged a few days apart but the second change was based off a stale master from before the first change, so tests pass on the branch but not after merge. We ended up solving this by having "deliver" branches that are basically master branches for a single team; if you break the deliver branch the only people you have to answer to are your immediate teammates, and it doesn't stop all other merges across the org. Deliver branches are periodically merged into master by a release manager who handles merge conflicts.
- CI got progressively slower. We addressed this by making improvements to the build system, optimizing tests, granularizing dependency graphs, using C++ idioms like forward declarations and PImpl to reduce dependency chains, and so on.
If you have a monorepo, you probably want to use a tool like Bazel. And since Bazel is, to my knowledge, the best tool in the world for doing what Bazel does, that means you probably want to use Bazel. Bazel has you specify your dependency structure as a DAG, and then allows you to quickly re-run tests on PRs based only on the code that changed. It also makes builds for C++ and other compiled languages blazingly fast, and if you're building C++ I don't think there's a better build system out there, monorepo or no. Bazel is a complex tool though so I'd encourage you to read through its highly detailed user manual, and if you have any questions ask on Stack Overflow where you'll often get a response directly from one of the core maintainers.
My gut tells me they should be separate, but I'm curious what other people think.
Refactors and cross-service deployments can become much simpler in a MonoRepo.
Cons: un-separation of concerns.
Cons: huge blast radius, makes it difficult for groups to be truly independent
One con is that most open source (as well as publicly available but proprietary) tooling is geared towards the non-monorepo approach. So if you want to use a monorepo, you're going to have to fight a bit of an uphill battle because most tools and processes assume you have lots of little repos. For example, build triggers in a lot of CI/CD tools operate on a per-repo basis.
There are some random benefits - it's easier to make everything public-by-default, which encourages people to look at source code written by other teams, and creates a culture of transparency and internal openness.
But the big thing in my opinion isn't directly the monorepo itself, but what's known inside Google as the "One Version Rule"[1]. Basically this means that only a single version of any package should be in the repo at a time. There are exceptions, but that requires going to extra effort to exempt your package from this rule.
I guess Chrome and Android are examples of this - they are made up of lots of little Git repos that are stitched together, but they generally follow the One Version Rule. On the other hand, if you just stick a lot of npm modules in there and every single one has a separate package.json file, then it's technically a "monorepo" but it's not following the One Version Rule.
You also need good test coverage. Not just in terms of line coverage or some other artificial metric, but to the point where you could say "I feel reasonably comfortable that a random change will be caught by my tests". This lets people in other teams catch regressions without having to have a detailed understanding of your team's codebase.
So once you've got these three things - 1) a monorepo 2) with everyone following the One Version Rule and 3) lots of tests - it means that dependency owners can update all of their consumers at once without much effort. They just make a change to their base library, and the build system will walk the dependency tree and figure out all the consumers that could possibly break, and runs all the appropriate regression tests.
This is the inverse of how it normally works at large companies, where each team pulls in their dependencies and pins them to a specific version. At most companies, updates require extra effort, so the default is to let everything go stale. This is especially problematic when security vulnerabilities are released (e.g. to an ancient version of jQuery) but teams can't update until they migrate off of an old API. It also means that library owners regularly have to maintain multiple old branches for months or even years after the initial release, because everyone's too afraid to update.
I personally think it's a myth that you need to be "Google scale" to benefit from a monorepo. In my opinion, you only need a few tens of repos before all the different combinations of semvers get unweildy. For me, going from Google's monorepo to a company that is built around lots of little repos in GitHub Enterprise felt like going back to the CVS/RCS days, where every single file had a separate revision number and changes weren't made atomically.
1. I've only used monorepo professionally once in a big org and it was SVN based 2. I'm going to look at monorepos from the perspective of git and github
The main argument I can think of in favor of monorepos, is to maintain the cohesion as high as possible between different parts of your system. For example, if you want to make a change to the load balancer regarding TTL, you can also go and make the same change in your API and your mobile clients and in the end you create one single PR and you have your tests run against that single revision.
Compare and contrast the same scenario in the traditional multi-repo approach: You make your changes to your `infra` repo, then you make the same change in your `api`, `android-client`, `ios-client` repos and in you create 4 PRs, that need to be reviewed at the same time. Which one would you prefer to do?
Potential arguments against are:
1. Too much noise - If multiple people are working on the same repo, you'd be getting emails for PRs for parts of the code that you may not care about.
2. Longer clone/pull times - In the same vain as before, the initial `git clone` and maybe every `git pull` after that will be bringing in a lot of code that you may not care about, increasing the time of each operation and increasing your frustration.
3. Access control - How do you limit who has write access to which part of your repo? AFAIK this isn't possible in git but it may be possible with codeowners in github, I don't know.
4. Organization - How do you structure your monorepo? Do you know in advance how many components it's going to have? How is a restructuring going to affect your history and/or dependencies between different components of your code?
5. History and code sharing - With a monorepo it's more difficult to share/open source just part of your code and keep the history intact.
Having said all that, I think the monorepo is a good match at a service level, not at an organization level. I know Google and Facebook have gone all in at an org level, but first of all they don't use git and second, they have the luxury and the resources to make it work for their use case. At a service level you should have a good idea of your boundaries, your applications and your infrastructure and assuming a relatively small engineering team (~10-12 people) eventually everyone would be confident enough with all parts of your system.
Personally I really like the idea of having one hash define the entire state of the world. I have a few common utilities that I use across my projects and I find it too annoying to keep them all up-to-date between different repos. I hope that eventually I will find some time (or be annoyed enough) that I will converge everything in one repo.
There is zero difference between using repos to organize things or using folders to organize things, except for the fact that updated dependencies across repos require extra steps to update if they change. More repos = more annoyance.
Use folders to organize things because common sense.
Additionally, don't use classes to organize your code, use combinators.
Folders and combinators will solve basically imo 95 percent of all organizational problems related to design.
Things like micoservices, multiple repos, and Gof design patterns only serve to make the organizational problem worse.