What am I missing, why isn't this a solved problem by now?
https://youtube.com/watch?v=shnW3VerkiM
https://youtube.com/watch?v=VQhS6Uh4-sI
First one is more impressive looking. Second one more reliable.
I think the real hard part is nobody wants to maintain these, and nobody really wants to pay to use them either. It's a lot of work and not something people do for free. It's no surprise these emerged (and won) in hackathons.
All the major operating systems are dedicating their full efforts into this, so it doesn't make much sense to actually raise money and do it.
If they wanted to be easy to work with, they'd offer a simple API, or plain HTML form interface.
Whilst they are a massive Step forward ... We still have a long way to go for that...
Why not try it yourself with ollama a large model and some rented hardware ... You will get something ... But it will not be consistent...
I will list some of simpler problems:
1. Some sort of reliable screen read, capable for all sorts of screen output (not just html-like or any other already structured markup).
2. Some sort of universal optimizer, capable to solve any task, solvable for human in simplified computer environment.
3. Some sort of reliable "Understanding Engine", to make queries with simplified language, easy to use by human, which we could theoretically solve using few different ways (I list only two most known).
3a. Some deep learning AI.
3b. Some huge implementation of semantic AI.
Presently, doing this requires a fair bit of continuous work.
Many websites don't want bots on them and are actively using countermeasures that would block operators in the same way they block scrapers. There is a ton of stuff a website can do to break those bots and they do it. Some even feed back "phantom" data to make the process less reliable.
There are a lot of businesses out there where the business model breaks if someone else can see the whole board.
There's an intersection between "high accuracy" and "low cost" that AI has not quite reached yet for this sort of task, when compared to simpler and cheaper alternatives.