HACKER Q&A
📣 crummy

How do you managing staging database content?


You have a staging environment, or maybe a bunch of them, that you push code to. The code is short-lived but you have a database as well. What do you put in this staging database? Some options I can think of:

1. Staging DB spins up empty. You create a user during use, probably data is never cleaned, reproducing prod issues is kind of a pain.

2. Staging DB is populated with some dummy data from scripts, possibly as part of deployment. Nice but you have to maintain the scripts.

3. Staging gets a copy of prod. Great for reproducing issues from prod, and possibly viable at small scale, but has some security issues - you'd probably need to censor some columns.

Perhaps there are other options, or ways to alleviate the pain here?


  👤 tony-vlcek Accepted Answer ✓
Also 2, as part of migrations:

There 3 types of migration files/scripts: structure, basic-data, dummy-data.

structure - new table, add column goes here basic-data - e.g. default config values go here dummy-data - gets used on local and stating

Run migrations with a flag to include the dummy-data migrations.


👤 alganet
It depends on where the rest of the team is.

Have they gone through the experience of having a persistent staging environment that slowly drifts from production (1)? If they haven't, they can't possibly understand why that is a bad idea. I'll just go with the flow until they realize. Maybe I'll hint at the possible issues sometimes.

Have they figured out that copying prod is a bad idea (3)? If they haven't, same thing. They can't understand why that sucks and why that's not true reproducibility.

Finally, (2). Fixtures! It's also a journey. There are so many things that can go wrong. Knowing those things depends on having gone through those journeys with a persistent staging and production copies.

There is no relief from the pain. No magic bullet. No product or solution that will ever solve this. You have to go through those stages. If you're lucky, someone will guide you through them (in practice). The journey can be sped up, but I haven't seen a shortcut that works (like forcing the team to adopt a practice without them internalizing it).


👤 noir_lord
4) is same as three but you take a subset of the data with correct relations, anonymise it carefully and use that, it has a lot of complexity at scale but is about as faithful as I’ve seen, it’s also one of those tools that forever requires maintenance as the core db mutates.

For local devs, seeding the database with plausible correct data works pretty well.