How do I use LLMs to generate test cases for groundedness benchmarks?
How do I use LLMs to generate test cases for groundedness benchmarks?
👤 this_steve_j Accepted Answer ✓
What are some ways to avoid common methological pitfalls when generating test cases for "groundedness" benchmarks with automation?
Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic.