HACKER Q&A
📣 tcgv

How to successfully manage a long-term codebase?


My company's codebase is completing seven years since its first line was written. I know it's not that old, but as I talk with fellow developers from other companies I can spot a few differences as a result of how each codebase has been managed that directly affect team productivity, engagement and happiness:

- Perception of architectural consistency

- Perception of quality and robustness

- Willingness to adopt "new trends"

- Amount of technical debt

- Frequency of partial/total rewrites

- Operational complexity and costs

About a year ago I posted a few learnings from my own experience managing our codebase since its inception:

- https://thomasvilhena.com/2019/11/system-design-coherence

So I'm interested in your experience and learnings managing a long-term code base, both what your team did right and what went wrong.

Thanks in advance!


  👤 poletopole Accepted Answer ✓
Like k0t0n0 said but it’s probably more like a trillion dollar question. Like yourself, I worked at a company for seven years before I left and it was a complete exercise in what not to do. As a result I learned many things, but what I concluded was that code grows in complexity exponentially proportional to its network and I/O effects. The hard pill to swallow is that in imo at least, is that we need the freedom to have multiplexed protocol diversity in the same vein as how we use Docker as an industry not ruled and dictated by the IETF or tech giants. The way the industry uses HTTP today for example is like trying to write Shakespeare with a choice of only 5 words. I won’t go into the details of what I call the “L8” protocol level that would be the answer to this trillion dollar question because (even after 5 years of contemplation) I really am not 100% sure; I’m just saying that it’s needed. Lastly, we need a modern way of representing, encoding, encrypting, and transmitting data with zero trust but also be able to measure its Shannon entropy and semantic consistency in a distributed network topology. This is because if we want a truly free and open semantic web not ruled by Google and SEO bad apples, we need to develop a means to understand data and network topology in an ambient isomorphic way. Here, the answers may lie in homology, representation theory, number theory, etc for all I know but no single person will solve this challenge that we really can’t put into words. Lastly, the solution will need to use existing industry standards such as HTTP/TCP/IP, XML/JSON, WASM, MIME, UTF-8 etc but also will need something like IPFS or Dat where they are used in a way that doesn’t require the whole internet to change—most P2P protocols don’t get that fact of life. The sad part is no one is talking about this elephant in the room because it’s hard to put into words except other than Alan Kay’s perhaps.

👤 aprdm
I work in a company with ~20 years old code. Huge code base as well (TBs of code).

It is mostly team dependent... the company adopted a monorepo for the majority of its code and I think in the monorepo everything has a place and you know where to find. When you put merge requests it automatically CCs the team leads of that path of the mono repo.

Everything also either works together or doesn't through CI/Build. It uses a trunk driven development, the trunk of the repo is always released.

Also, avoid adding dependencies at all costs. Those are the hardest to maintain over long period of times.

It has worked well so far ...


👤 antoineMoPa
I joined a company with a very old codebase this year (code started in the 80s).

It looks to me that adherence to very strict standards made the code very maintainable today. The standards are precisely defined by a multi-team committee. This is combined with just enough careful evolution of the codebase over the years to use more current tools and libraries.


👤 k0t0n0
Its a million dollars questions. From the last 9 years I am searching for the answers.