HACKER Q&A
📣 ReDeiPirati

How are you evaluating your LLMs in production?


Hello HN! Which tools do you use to evaluate your LLMs and agents in production?


  👤 znpy Accepted Answer ✓
Sysadmin here ("cloud engineer" is what's in my contract).

> Which tools do you use to evaluate your LLMs and agents in production?

None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.

It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).

So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.