The secret sauce is using a big LLM to generate many “responses” for some given intents, then use phi or something cheap to do intent detection and pick one of the pre made responses.
You can generate 100s of responses per intent, so the user may not ever get the same response twice.
Ofc it depends on your use case, but smoke and mirrors are your friend.