Is there an OCR that might be able to handle field datasheets?

Question

I am an ecologist looking for OCR that can take .pdf scans of my Rite-in-the-Rain field notebooks (which are sometimes quite dirty) with length measurements and extract them. I've tried tesseract in R, but it doesn't handle them well. I plan on using this as an additional QC step after I enter them by hand. Thanks in advance!

keepsweet · Accepted Answer

I've also tried tesseract in the past with handwritten notes, which didn't provide very accurate results. Then I started looking into some commercial solutions and stumbled upon many different tools, but the only one that could handle my handwriting was Klippa DocHorizon: https://www.klippa.com/en/ocr/ It uses machine learning and OCR instead of just plain OCR like tesseract does, so it might be an option to look into. You could also test it out at https://www.klippa.com/en/ocr/tools/I've been using it for a while and would highly recommend it. hopefully it can work out for your use case

solardev · Answer

In my limited experience, Google Cloud Vision API was much better than Tesseract: https://cloud.google.com/vision#demo

atsaloli · Answer

have you tried ai chatbots? they are pretty good at ocr nowadays