What is your recommended speech to text/audio transcription tool?

Question

Currently, I use a GUI for Whisper AI (https://github.com/Const-me/Whisper) to upload MP3s of interviews to get text transcripts. However, I'm hoping to find another tool that would recognize and split out the text per speaker.Does such a thing exist?

tikkun · Accepted Answer

For an end user application, Otter.ai is the best I've seen - I wish there was a better faster one built on top of Whisper, but there isn't a good one that I've seen.If you're looking for an API - then check AssemblyAI, Google Cloud transcription, Deepgram. I have a list here: https://llm-utils.org/List+of+AI+APIs

solardev · Answer

Descript.com was pretty good at it when I tried it, but it's pretty expensive: https://www.descript.com/transcription
We ended up using Otter.ai, which if I remember correctly didn't have as good a speaker separation model, but it was good enough for the price: https://otter.ai/
There's also the much more expensive, human-powered Rev: https://www.rev.com/

tmaly · Answer

Microsoft has a tool that accepts wav or mp3 and transcribes it.But I do not think it can distinguish between speakers.How well does Whisper work in terms of correctness for single speakers?

What is your recommended speech to text/audio transcription tool?

Currently, I use a GUI for Whisper AI (https://github.com/Const-me/Whisper) to upload MP3s of interviews to get text transcripts. However, I'm hoping to find another tool that would recognize and split out the text per speaker.
Does such a thing exist?

Microsoft has a tool that accepts wav or mp3 and transcribes it.
But I do not think it can distinguish between speakers.
How well does Whisper work in terms of correctness for single speakers?