This is frustrating for two reasons:
Firstly, LLMs are famously bad at counting characters (e.g. the number of "r"s in "strawberry"), so it's no wonder this approach of generating and counting characters doesn't work very well.
Secondly, balancing parentheses is trivial for traditional, non-LLM algorithms; so it feels like an entirely avoidable problem (without resorting to larger, more-expensive models).
Is anyone using LLMs successfully on Lispy projects? If so, what workflows, tooling, etc. have you found to work well? I've tried guiding them to use Emacs `check-parens` rather than counting "manually"; but maybe inferring from indentation might work better? Perhaps tree-based generation/tools would avoid introducing such problems in the first place?