HACKER Q&A
📣 clamlady

Is there an OCR that might be able to handle field datasheets?


I am an ecologist looking for OCR that can take .pdf scans of my Rite-in-the-Rain field notebooks (which are sometimes quite dirty) with length measurements and extract them. I've tried tesseract in R, but it doesn't handle them well. I plan on using this as an additional QC step after I enter them by hand. Thanks in advance!


  👤 keepsweet Accepted Answer ✓
I've also tried tesseract in the past with handwritten notes, which didn't provide very accurate results. Then I started looking into some commercial solutions and stumbled upon many different tools, but the only one that could handle my handwriting was Klippa DocHorizon: https://www.klippa.com/en/ocr/ It uses machine learning and OCR instead of just plain OCR like tesseract does, so it might be an option to look into. You could also test it out at https://www.klippa.com/en/ocr/tools/

I've been using it for a while and would highly recommend it. hopefully it can work out for your use case


👤 solardev
In my limited experience, Google Cloud Vision API was much better than Tesseract: https://cloud.google.com/vision#demo

👤 atsaloli
have you tried ai chatbots? they are pretty good at ocr nowadays