Does such a thing exist?
If you're looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: https://llm-utils.org/List+of+AI+APIs
We ended up using Otter.ai, which if I remember correctly didn't have as good a speaker separation model, but it was good enough for the price: https://otter.ai/
There's also the much more expensive, human-powered Rev: https://www.rev.com/
But I do not think it can distinguish between speakers.
How well does Whisper work in terms of correctness for single speakers?