Yet it seems like with AI all the investors/founders/PMs don’t really care and just ship a broken product anyway
I feel like I’m going crazy seeing all the AI stuff ship in products that gives straight up wrong outputs
It’s like a big collective delusion where we just ignore it or hand wave that it’ll get fixed eventually magically
Once I started seeing these behaviors in our robots, their appearance became much more pronounced every time I dug deeply into proposed ML systems: autonomous vehicles, robotic assistants, chatbots, and LLMs.
As I've had time to reflect on our challenges, I think that neural networks very quickly tend to overfit, and deep neural networks are incomparably overfitted. That condition makes them sensitive to hidden attractors that cause the system to break down when it is near these areas - catastrophically.
How do we define "near"? That would have to be determined using some topological method. But these systems are so complicated that we can't analyze their networks' topology or even brute-force probe their activations. Further, the larger, deeper, and more highly connected the network, the more challenging these hidden attractors are to find.
I was bothered by this topic a decade ago, and nothing I have seen today has alleviated my concern. We are building larger, deeper, and more connected networks on the premise that we'll eventually get to a state so unimaginably overfitted that it becomes stable again. I am unnerved by this idea and by the amount of money flowing in that direction with reckless abandon.
I feel like I can't trust anything it says. Mostly I use it to parse things I don't understand and then do my own verification that it's correct.
All that to say, from my perspective, they're losing some small amount of ground. The other side is that the big corps that run them don't want their golden gooses to be cooked. So they keep pushing them and shoving them into everything unnecessarily and we just have to eat it.
So I think it's a perception thing. The corps want us to think it's super useful so it continues to give them record profits. While the rest of us are slowly waking up to how useless they are if they will confidently tell us incorrect answers and are moving away from it.
So you may just be seeing sleezy marketing at work here.
LLMs are not factual databases. They are not trained to retrieve or produce factual statements.
LLMs give you the most likely word after some prior words. They are incredibly accurate at estimating the probabilities of the next word.
It is a weird accident that you can use auto-regressive next word prediction to make a chat bot. It's even weirder that you can ask the chatbot questions and give it requests and it appears to produce coherent answers and responses.
LLMs are best thought of as language generators (or "writers") not as repositories of knowledge and facts.
LLM chatbots were a happy and fascinating (and for some, very helpful) accident. But they were not designed to be "factually correct" they were designed to predict words.
People don't care about (or are willing to accept) the "wrong answers" because there are enough use cases for "writing" that don't require factual accuracy. (see for instance, the entire genre of fiction writing)
I would argue that it is precisely LLMs ability to escape the strict accuracy requirements of the rest of CS and just write/hallucinate some fiction that is actually what makes this tech fascinating and uniquely novel.
There is, additionally, the fact that there is no easy (or even medium difficult) way to fix this aspect of LLM's, and it means that the choices are either: 1) ship it now anyway and hope people pay for it regardless 2) admit that this is a niche product, useful in certain situations but not for most
Option 1 means you get a lot of money (at least for a little while). Option 2 doesn't.
What gets difficult is evaluating the response, but let's not pretend that's any easier to do when interacting with a human. Experts give wrong answers all the time. It's generally other experts who point out wrong answers provided by one of their peers.
My solution? Query multiple LLMs. I'd like to have three so I can establish a quorum on an answer, but I only have two. If they agree then I'm reasonably confident the answer is correct. If they don't agree - well, that's where some digging is required.
To your point, nobody is expecting these systems to be infallible because I think we intuitively understand that nothing knows everything. Wouldn't be surprised if someone wrote a paper on this very topic.
Garry Tan from YC is a great example of this.
It's not that he doesn't care. It's just that he believes that the next model will be the one that fixes it. And companies that jump on board now can simply update their model and be in prime position. Similar to how Tesla FSD is always 2 weeks away from perfection and when it happens they will dominate the market.
And because companies are experimenting with how to apply AI these startups are making money. So investors jump in on the optimism.
The problem is that for many use cases e.g. AI agents, assistance, search, process automation etc. they very much do care about accuracy. And they are starting to run out of patience for the empty promises. So there is a reckoning coming for AI in the coming year or two and it will be brutal. Especially in this fundraising environment.
AI is like that right now. It's only right sometimes. You need to use judgement. Still useful though.
The question then becomes, "How wrong can it be and still be useful?" This depends on the use case. It is much harder for applications that require high deterministic output but less important for those that do not. So yes, it does provide wrong outputs, but it depends on what the output is and the tolerance for variation. In the context of Question and Answer, where there is only one right answer, it may seem wrong, but it could also provide the right answer in three different ways. Therefore, understanding your tolerance for variation is most important, in my humble opinion.
From a coding perspective, proper technical systems already have checks and balances (e.g. test cases) to catch bad code, and is something that's important to have regardless of generative AI usage.
From a creative/informational perspective, there are stories every day of hallucinations and the tech companies are correctly dunked on because of it. That's more product management error than AI error.
AI hallucination isn't a showstopper issue, it just has to be worked around.
If this is correct, then it's less of "people don't care" and more "the hype is louder than them."
That said: I, too, am completely perplexed by people within the tech community using LLMs heavily in making software while, unironically, saying that they have to keep an eye on it since it might produce incorrect work.
Do the mundane stuff in school/college/boot camp. Do the cool stuff at work.
- slack
- github
- microsoft
- google
- atlassian
- notion
- clickup
- hubspot
So ask yourself: Who benefits from the hype? And who would benefit from a better general understanding of the flaws?
It’s the same reason why we heard about blockchain for years despite it having near zero practical uses
Granted it might have to do with how I use LLMs. If you just blindly ask a question you increase the chance of hallucinations. If you give a lengthy input, and the output is highly dependent on the input than you will get better results. Think email re-writing, summarizing, translation.
The tech industry is an environment composed almost entirely of companies running a loss to prove viability (and don't see that as ironic) to raise more funding from investors. AI is just the latest in a long series of empty hype to keep the gravy train running, last year it was VR, and it looks like at this point that the whole thing is teetering on a cliff. It's a bunch of MBAs scrambling for a sales pitch.
LLMs are useful. But "extremely lossy compression of documents with natural language lookup built in" doesn't sell endless subscriptions as well as "we created a mind." So they sell hype, which of course they cannot live up to because LLMs aren't minds.
LLMs are language models, not magical information models with all information in the world somehow fit into several gigabytes. Use them right.
As for why people are paying for a product that returns incorrect results, could be any number of reasons:
- People buy into the hype/marketing and actually think AI-thing is going to replace some part of their workflow
- People want to experiment and see how well it does at replacing part of their workflow
- Whatever the AI-thing does for a customer is not reliant on it being correct, therefore producing incorrect output simply doesn't matter
A good example would be my company's corporate IT AI bot that is effectively a very poor search engine for internal corporate wiki self-help articles on IT and HR related stuff. The actual IT/HR portal has a traditional search that, if you know the terms to search for, does a much better job. So most people ignore the AI, but I'm pretty sure we bought the engine from someone.
I suppose that there is also some hope that the hallucination problem will erode as more effort/compute is poured into the training. There may need to be a paradigm shift though, the current structure around generating tokens based on probabilities seems like it will forever be a 'regurgitator'.
Nonetheless, as with autopilot, you don't want to substitute paying attention with it. "Trust, but verify" as Reagan said.
Tolerable Pizza delivery is ruined. The Internet is a walled wasteland now. Far too much "content" that doesn't need to exist. Everything is an ad.
None of our lives have been improved by software.
A lot of money has poured into AI, money potentially well in excess of the return on investment over the next several years. The field, from investors to CEOs and downwards to developers, is in a state of collective suspension of disbelief. There is going to be a lot of people out of work when reality reasserts itself.
Unfortunately that's good enough for a lot of people, especially when you don't actually care and just need an output to give to someone else (office jobs etc).
Developers work with what we have on the table, not what we may have years later.
There is a belief, cynical or otherwise, that AI will make (a very small number of) people extraordinarily wealthy. The drive to stuff it into every facet of the digital experience reflects this belief.
Every AI chatbot I’ve ever interacted with has been unable to help me. The things I’ve had them write do usually pass the Turing Test, but are rarely even close to as good as what I could write myself. (I admit, being self-employed for a long time, I can just avoid a lot of busy work that many people cannot, so I may be missing lots of great use cases there. I never find myself having to write something that isn’t great and wanting to just get it over with. AI might be great if you do. )
I’ve been trying to use image/video creation to do lots of other things and I’ve not even come close to getting anything usable.
I appreciate certain things (ability to summarize, great voice to text transcription, etc.) but find a lot of it to be not very useful and overhyped in its current form.
(2) It's a big topic that could be addressed in different ways but I'll boil it down to "people are sloppy" and that many people become uncomfortable with complex problems that have high stakes answers and will trade correctness for good vibes.
(3) LLMs are good at seducing people. To take an example, I know that I was born the same day as a famous baseball player who was also born exactly a year before an even more famous cricket player. I tried to get Microsoft's Copilot to recognize this situation but it struggled, thinking they were born on the same day or a day apart rather than a whole year. Once I laid it out explicitly and my own personal connection it had effusive praise and said I must be really happy to be connected to some sports legends like that, which I am. That kind of praise works on people.
(4) A lot of people think that fixing LLMs is going to be easy. For instance I'll point out that Copilot is completely unable to put items in orders that aren't excessively easy (like US states in reverse alphabetical order) and others will point out that Copilot could just write a Python program that does the sorting.
That's right and it is part of the answer, but it just puts off the problem. What's really irksome about Copilot's inability to sort is that it doesn't know that it can't sort, if you ask it what the probability is that it will sort a list in the right order it will tell you that it is very high. It's not so easy to know what is possible in terms of algorithms either, see
https://en.wikipedia.org/wiki/Collatz_conjecture
as evidence that it's (practically) impossible to completely understand very simple programs. See the book
https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach
for interesting meditations on what a chatbot can and can't do. My take is that LLMs as we know them will reach an asymptote and not improve explosively with more investment, but who knows?
Earlier today I asked ChatGPT to give me a Go script to parse a Go codebase (making heavy use of Go AST libraries which I never use normally) and it gave me a 90% good solution which saved me a lot of time. To be clear the solution was non functional on its own, but it still saved me from doing exploration work and gave me a quick overview of the APIs I would need.
A few days ago it helped me generate code for some obscure AWS API using aws-sdk-go-v2. It was again almost fully working, and better than the examples I could find online.
I have examples like this every week. It’s not as amazing as some people say, but still pretty useful. I rejected AI stuff at first but don’t regret adding LLMs to my toolbelt.
It hasn't been flagged.
Just today, ChatGPT4o screwed up a rudimentary arithmetic problem ( https://i.imgur.com/2jNXPBF.png ) that I'd swear the previous GPT4 model would have gotten right.
And then there's this shitshow: https://news.ycombinator.com/item?id=40894167 Which is still happening as of this morning, only now all my previous history is gone. Nothing left but links to other peoples' chats. If someone at OpenAI still cares what they are doing, it's not obvious.
Move fast and break things, and don't pay anyone, but when you do that long enough and burn billions in VC money, you end up rich. Why does that work?
Why can someone like Trump lie and lie and lie and be convicted for felonies and turn up on the worst people list and nobody seems to care?
There are no more consequences. You break software, people don't care if it's the only thing available in the walled garden. You fuck up games, people don't care if you shove a TB worth of updates down their pipes later. You rugpull millions of dollars and walk out unscathed, as long as someone made a profit they will keep praising you.
You used to be actually shunned and driven out of the village for shit behavior. Not anymore. We find all kinds of ways to justify being terrible at stuff.
So along comes tech that costs us barely anything to use and produces meh results most of the time. That's amazing. It used to take thousands of talentless hacks to come up with all that mediocre wrong shit and they all wanted a paycheck. It's progress in a world where nothing means anything anymore.