The challenge is doing that quickly and reliably. It's easy to get in a situation where the user has to wait 30 seconds for a UI to fail to render properly.
I would look into Groq, Cerebras, and the experiments in generating games frame-by-frame (as an interesting research direction).