Also, training an AI to navigate and test every type of app takes serious setup. Every app works differently, and AI needs context: what’s the goal? what’s a normal result? what counts as broken? Without that, it just pokes around randomly or follows a script, not much better than traditional automation.
That said, we’re getting close. Some teams already use LLMs to write test cases or spot UI issues in screenshots. Give it a couple years, and you might actually see TestFlight bots pointing out bugs before users ever get there.
https://engineering.fb.com/2018/05/02/developer-tools/sapien...
AIs that I find useful are still just LLMs and LLMs power comes from having a massive amount of text to work with to string together word math and come up with something ok. That's a lot of data that comes together to get things ... kinda right... sometimes.
I don't think there's that data set for "use an app" yet.
We've seen from "AI plays games" efforts that there have been some pretty spectacular failures. It seems like "use app" is a different problem.
I’d want more control over what’s remembered and when. Curious if anyone here has used this yet — is it actually helpful in practice?
It works without AI, but there's a MCP and stuff, so you should be able to connect Claude etc with your emulator/device now.
Current AI is not even designed to do that. It is just a very sophisticated auto-complete.
It is sophisticated enough to fool some VCs that you can chop your round peg into square hole. But there is no ground to expect a scalable solution.