My handwriting is not great!
None of the off the shelf solutions come even close to recognizing my handwriting.
Can you think of anything better than just opening every single file and manually transcribing it?
I have been thinking about training a model to first divide the images into lines of text. Then, it will be easier to transcribe, and automatically those transcriptions will be associated with areas of the image, in case I figure out a good handwriting model.
If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.
If assessing other solutions, that's at least an upper bound on the cost of any other solution.
https://cloud.google.com/vision/docs/handwriting
I threw together a basic UI with the transcribed text in an editable area next to the image where I would edit any adjustments as it wasn't 100% perfect.
It is designed to do exactly what you are looking for, and has been used very successfully by many others for that same purpose (I’m the founder).
It is not as cheap per page as Google Document AI, for example, but it does tend to be much more accurate for handwriting, so usually ends up cheaper when editing time is factored in.
If you find it does work well with your handwriting, please get in touch and I can try to fit the pricing to your use case.
I found an app online (I wont even name it) which promised incredibly accurate handwriting transcription. Signed up and found it was true, but they were just sending images directly to chatGPT and returning the result and then charging a fee on top.
I started working on an open source version. It took me only a few hours and I'm sure anyone else could pull it together. used chatGPT example code to connect to API and send an image with a prompt along the lines of "please transcribe the text in this image and return only that, nothing else". even with that instruction it still sometimes prefaces with "sure! I can do that.", which I think is the AI equivalent of Homer Simpson writing "ok" in the "please leave this section blank" part of the form. Anyhoo, I had a basic job queue written, pull in images in order of file creation date and fire them off, append the text to a text file after. There was some cleanup of the file required (weird line breaks) but it saved me days of typing.
You still need a chatGPT API key for it but it does take a good bit of the work out.
At the moment I'm investigating using a free local model. LLava is just as accurate but takes longer than sending it to ChatGPT. but if you were worried about burning credits it would be the way to go.
I put special stop words like highlight/return so then I can post process and ensure the markdown formatting looks good.
My notes will have instructions to reach the black mass state, a computer image scanner will try to learn my handwritings, take them as instructions, connect dots etc.
The design of this system is cryptic and challenging. because, side effect to create a computational program will result in a circling thoughts for me. And its hard for me to convert it into an action.
Taking that as an inspiration, this program is a circling program, which means, it will constantly spiral upwards in a value that is definitive to its actions in the past.
All my notes has information or points or ideas about this fictional concept. I burned the notes which were repetitive, kept the rest.
When I did that, It created more head space for me. The headspace, helped to solve problems and have more space for more learnings.
(Likely under the hood Mathpix has done exactly what you're proposing, with image segmentation, text/image/math classification, then transcription.)
I've been using an Apple Shortcuts automation that turns my handwritten PDFs into notes in Obsidian, with the transcription up top and the PDF embedded below. Could pretty easily be adapted to turn a library of PDFs into a folder of Obsidian markdown notes. Here's a writeup: https://riddle.press/a-marriage-between-handwritten-notes-an...
Works with English and Japanese. Sadly I'm no longer with the team there but the work is solid. Try it out.
If you're into dreaming up cool solutions, you could try using smart pens or tablets to write stuff and then teach a model to recognize your handwriting. But for now, it's just a dream.
You have to think about what your goal is. Handwritten notes can be perfectly digitized into handwritten notes. What do you need the ocr for? Publishing? … transcribe what you need, or better, rewrite.
Searching? As you scan, make a basic index so that you can refer to the notes. Organize the folders properly with your notes, use a useful naming scheme.
Try LLMwhisperer[1] pdf extraction API. You are only one "curl" command away from extracting your handwritten text.
The best thing is it preserves the layout of your notes, which means it can keep tables as tables and lists as lists.
Check this screen grab for extracting handwritten notes > https://imgur.com/fXk0tcR
[1]: https://llmwhisperer.unstract.com/ [2]: Try it with your document here > https://pg.llmwhisperer.unstract.com/
[edited] added links
It will take time but you will have a pretty tailored solution.
Also of course: first of all try to process the images so that they only are white and black (not greyscale, actual B/W pictures)
Take scans of your journal pages, split the jpegs/pics into word fragments, display a couple of fragments to captcha clients, generate completed journal entries when the consensus gets reasonably high for each word fragment.
Not sure how captcha services start from scratch - probably ask around/check with google search.
Privacy goes out the door, but you should be able to show disjointed word fragments so no one could reconstruct enough of a single journal entry to expose your more personal info unless they were very determined. Or maybe split the scans into individual letter fragments instead?
Then monetize this for other people in the same situation...
1. Scan them (or take photos of each page) 2. put them files in a directory 3. Make a Python script that sends them to OpenAI GOT-4o 4. Store the text as a new file in the directory.
There's a free trial so you can check if it works for your handwriting.
If your notes are anything like mine there might be arrows, drawings or text effects like underlines and circling of words that you'd want to conserve.
You can later ocr the whole thing.
imho. (!)
* if you have a lot of "uniform" pages - read something like A4 -, get yourself a scanner with an automatic sheet-feeder
or throw some rainy-weekend afternoons on it & scan your notes with some decent SOHO scanner
* don't get too excessive with resolution, 400+ pixels/inch are enough for OCR ...
i always scan with 1200 and reduce the images to 600 px via simple batch-processing / for example imagemagick "convert".
* get yourself a decent OCR software, which is able to read your notes ...
i'm a big fan of abbyys "finereader", but sadly its prohibitively expensive ... ;)
idk how well FOSS OCR software a la tesseract works for hand-written notes.
* create pdfs with automatically detected text in the background for search and the scanned image of the notes.
it additionally generates XML-metadata & from there: whatever you want (web frontend ... :)
just my 0.02€
That will guide what's important. Inaccuracies aren't as much of a problem if you're using it as a search index where you'll return the image of your writing.
I'm sorry but the type it out solution seems the the best choice. You will probably remember something interesting by doing it that way.
If it works then scan all the pages and run though it with a script.
Shouldn't take you more than about an hour to code ( with Chat GPT!) in Python.
No, because the work of manual transcription is a way of telling if transcribing them is worth doing. Or maybe pay someone to transcribe it. Spending money is also a good way to tell if something matters (assuming you have sufficient money).
Orthogonally, maybe building a system is what you really want to do (for many people that would be more enjoyable than revisiting old journal content).
Finally, starting from hand transcription is an entry point into rewriting what you wrote. Rewriting is writing and if there's publication on your roadmap, you will be rewriting anyway.
There's no easy way to write well. Good luck.
greener pastures