HACKER Q&A
📣 vira28

Is giving AI agents DB access the new BI-tool problem?


A couple of early-to-mid-stage startups I'm consulting with are asking the same question: their AI/ML team wants production Postgres data, and nobody's quite sure how to give it to them.

I've handled this before for BI teams — read replica with a generous `max_standby_streaming_delay` and `hot_standby_feedback` on, accepting the occasional bloat on the primary. Worked fine. But the AI/ML ask feels different in ways I can't fully articulate yet, which is part of why I'm asking.

A few things I'm trying to calibrate:

Where does the agent actually connect? Primary with RLS, read replica, warehouse (Snowflake/BigQuery/Redshift), lakehouse (Iceberg/Delta on S3), or something else?

If you're not doing this — is it compliance, cost fear, bad experiences (runaway queries, PII in prompts), or something else?

And the one I'm most curious about: does this actually feel different from giving BI tools DB access, or is it the same problem wearing new clothes?

Not looking for product recommendations. Trying to get a real sense from people who've actually faced this challenge.


  👤 matrixgard Accepted Answer ✓
It is different from BI and the gap is worth naming. BI connects deterministic queries from a known operator schema. An agent is an unbounded query generator, so your risk surface includes both what it asks for and what it synthesises from what it gets back. RLS alone won't help you, because the moment the agent gets handed a view it is already reasoning over rows you wanted to protect.

The practical answer I've seen hold up: push column-level redaction before the agent layer, not after. A logical replica with PII columns replaced by null or a stable hash gives you the same query surface, plus one audit row per session at the connection pooler, not the app. The AI team gets its data, you get a hard boundary that doesn't rely on prompt engineering.

The harder question is ownership. In a startup where the ML lead, the infra person and the security person are often the same tired CTO at 10pm, the right answer depends on who gets paged when a hallucinated query wakes up the primary. Usually the answer is nobody, which is the real problem behind the technical one.