HACKER Q&A
📣 leokster

Is there no good OCR available?


I'm wondering if tools like Tesseract are still the open-source (and offline) gold standard. There are, in the meantime, document intelligence services from all large cloud providers, but there is still not really a usable AI model that is capable of doing good OCR (image, not necessarily scans -> text). Do you know any active projects or resources in that field?


  👤 latexr Accepted Answer ✓
Apple’s operating systems have been doing stellar OCR since 2019. When the feature was announced I was uninterested, but now I’m surprised how much I use it. It works without any extra work in Preview, Safari, and other apps. You can call it programatically via Shortcuts or the Vision APIs.

https://developer.apple.com/documentation/vision/recognizing...


👤 solardev
(Edit: Nevermind, sorry. I misread your question. I think you're mainly interested in free offline apps.)

Does it have to be an "AI" model in the modern usage of it (LLMs, etc.?)

In the past, I found Google's Cloud Vision API to be pretty good for this sort of thing (images in text): https://cloud.google.com/vision?hl=en#demo

AFAIK Tesseract was never state of the art, it was just free and cheap. The commercial offerings (in my limited experience) were usually much more accurate.