Is there no good OCR available?

Question

I'm wondering if tools like Tesseract are still the open-source (and offline) gold standard. There are, in the meantime, document intelligence services from all large cloud providers, but there is still not really a usable AI model that is capable of doing good OCR (image, not necessarily scans -> text). Do you know any active projects or resources in that field?

latexr · Accepted Answer

Apple&rsquo;s operating systems have been doing stellar OCR since 2019. When the feature was announced I was uninterested, but now I&rsquo;m surprised how much I use it. It works without any extra work in Preview, Safari, and other apps. You can call it programatically via Shortcuts or the Vision APIs.https://developer.apple.com/documentation/vision/recognizing...

solardev · Answer

(Edit: Nevermind, sorry. I misread your question. I think you're mainly interested in free offline apps.)
Does it have to be an "AI" model in the modern usage of it (LLMs, etc.?)
In the past, I found Google's Cloud Vision API to be pretty good for this sort of thing (images in text): https://cloud.google.com/vision?hl=en#demo
AFAIK Tesseract was never state of the art, it was just free and cheap. The commercial offerings (in my limited experience) were usually much more accurate.