Besides these automated methods another way to monitor data integrity is for a human to manually analyze the data for inconsistencies. And to ask other domain experts at your company to analyze the data for issues. Sometimes this is called the “smell test”. This method can be an effective way to find deeper integrity issues with your data that would be hard to find with automated tests.
Maintaining data integrity is very difficult, domain dependent, and takes lots of work. Good luck!
For example, you could have mapping data telling drivers how to travel between sites, but what if a site's entrance moves? What if there's temp road construction? What if a new bridge is built shortening the route? All of which have harmed your data's integrity, without the data itself changing.
So my only strategy is:
- When you find yourself correcting bad data, consider if a constraint is possible.
- Referential constraints are superior to logical ones, but both are better than nothing (because logic could risk placing business rules into a constraint, which can balloon over time).
- Data Types are "super weapon" constraints. Define exactly what you plan on storing, don't future proof, and avoid nullable types whenever possible. Storing "magical strings" is often wrong, create a table.
- Simpler data sets are generally higher quality for longer. Consider simplifying and streamlining your actual data design (e.g. using generic building blocks and references for common domain tasks).
In general finding bad data/low integrity is like being able to completely define the whole problem domain in one go, and then deciding not to use that same information for constraints, it doesn't really make sense. If you're consistently finding yourself looking for and correcting data integrity problems then go back to the design and fix that.