Why aren't AIs being used as app beta testers yet?

Question

For example, why don't beta testing services such as TestFlight have ChatGPT as a possible beta tester along with the human testers?

rajkumarsekar · Accepted Answer

One big reason AI isn&rsquo;t doing much beta testing yet is that it doesn&rsquo;t use apps the way humans do. It doesn&rsquo;t get confused, frustrated, or delighted by a clever UI. Most bugs that matter, like a button that&rsquo;s in the wrong place, a flow that doesn&rsquo;t make sense, or something that feels off, are things a human notices because they&rsquo;re actually trying to do something with the app.Also, training an AI to navigate and test every type of app takes serious setup. Every app works differently, and AI needs context: what&rsquo;s the goal? what&rsquo;s a normal result? what counts as broken? Without that, it just pokes around randomly or follows a script, not much better than traditional automation.That said, we&rsquo;re getting close. Some teams already use LLMs to write test cases or spot UI issues in screenshots. Give it a couple years, and you might actually see TestFlight bots pointing out bugs before users ever get there.

danbrooks · Answer

I worked with a team that did this for the Facebook app.https://engineering.fb.com/2018/05/02/developer-tools/sapien...

duxup · Answer

I'm going to throw out my own ignorant theory.
AIs that I find useful are still just LLMs and LLMs power comes from having a massive amount of text to work with to string together word math and come up with something ok. That's a lot of data that comes together to get things ... kinda right... sometimes.
I don't think there's that data set for "use an app" yet.
We've seen from "AI plays games" efforts that there have been some pretty spectacular failures. It seems like "use app" is a different problem.

shibatanaoto · Answer

I think it depends on the situation. Unit test can be done by Claude code. I use it everyday. For E2E testing, browser-based tools are already pretty convenient. AI could definitely help by suggesting UX improvements, but setting up a smooth workflow is still tricky. You&rsquo;d need to figure out things like where to put the AI&rsquo;s feedback, when it should kick off testing, and who&rsquo;s going to sort through all the suggestions it generates. But technically it can be useful and quality is good enough.

drakonka · Answer

They are; we're working on agents for web application testing over at qa.tech.

Nerd_Nest · Answer

I&rsquo;m still torn on this. On one hand, memory could make ChatGPT more useful, especially for people using it regularly for work or coding. But on the other hand, the idea that it &ldquo;remembers&rdquo; me just feels a little uncomfortable.I&rsquo;d want more control over what&rsquo;s remembered and when. Curious if anyone here has used this yet &mdash; is it actually helpful in practice?

afrederico · Answer

They should totally be able to. If there's "vibe coding" there should be "vibe testing." We're working on just such a product (https://actory.ai); right now it only does websites but just imagine when we turn it on mobile/apps, etc. How cool would that be?

muzani · Answer

https://docs.maestro.dev/It works without AI, but there's a MCP and stuff, so you should be able to connect Claude etc with your emulator/device now.

aristofun · Answer

Because for meaningful tests of an app (assuming b2c or b2b for end users) you are supposed to be or imitate a human being.
Current AI is not even designed to do that. It is just a very sophisticated auto-complete.
It is sophisticated enough to fool some VCs that you can chop your round peg into square hole. But there is no ground to expect a scalable solution.

HeyLaughingBoy · Answer

Anecdotally, I know someone who tried to have ChatGPT generate unit tests and it was an abject failure.

v5v3 · Answer

Are llm testers doing anything traditional scripts with for loops can't?

bravesoul2 · Answer

That's a good idea. You are on to something