Looking for Library to Parse Receipts

Question

I am building an app in order to make the process of splitting bills easier (https://copay.digital/) currently I am using amazon textract which works well, but requires the user to have a stable network connection (using websockets for the communication with AWS) which ofc is not always the case for mobile apps. I am therefore looking for a library or SDK that I can use on device to achieve the same thing. I am not overly concerned about accuracy as users can always go and edit the result if there's something off.I have looked into Apple's VisionKit library and have a basic implementation working, but I've ended up implementing a lot of heuristics when it comes to extracting things like line items which isn't great given the unstructured nature of receipts. I'm using react native so would be happy to use something like react-native-fast-tflite and deploy an actual model but not sure where to find one or if any known "good ones" exist. Would love to be able to train my own but ML is not my forte.Any help or advice would be much appreciated!

solardev · Accepted Answer

The challenge is doing this offline/on-device. There are a billion receipt scanner APIs and services in the cloud that are basically "upload image, get back structured receipt data", but that's not what you want?
Some links...
- A 2019 machine vision competition about this exact use case: https://rrc.cvc.uab.es/?ch=13. Some of the submissions in the results table will discuss their methods in depth (usually in Python though).
- Implementing this without AI/ML, just using Tesseract (open-source OCR, but poor performance compared to commercial apps or ML): https://pyimagesearch.com/2021/10/27/automatically-ocring-re...
- Using Google's ML-kit (on-device Document Scanner SDK): https://teresa-wu.medium.com/googles-ml-kit-text-recognition..., but the tabular data is hard to work with. This writeup goes one step further: https://hackernoon.com/using-google-mlkit-text-recognition-t...
- 2019 HN discussion on the same: https://news.ycombinator.com/item?id=10338199
- A cloud OCR vendor (Nanonets) discussing their ML-based pipeline: https://www.linkedin.com/pulse/how-build-ocr-receipt-scanner...
----------
(Edit: Cloud options)
Last time I looked into this (for a similar use case) I didn't get very far, and decided that using the cloudy Nanonets receipts API (https://nanonets.com/ocr-api/receipt-ocr) or the Google Cloud document classifier (https://cloud.google.com/blog/products/application-moderniza...) is going to work a lot better and easier than anything I could do on my own. But that's probably no better than Textract.
There's just so many entrants in this space, so many huge companies working on their own proprietary APIs, that the small open-source libs don't really stand much of a chance in comparison. It's too bad :(