How do systems (or people) detect when a text is written by an LLM

Question

Hello guys, just curious about how can people or systems (computers) detect when a text was written by an LLM. My question is mainly focused to if there is some API or similar to detect if a text was written by an LLM. Thanks!!!

Someone1234 · Accepted Answer

They cannot.Unfortunately many believe they can, and it is impossible to disprove. So now real people need to write avoiding certain styles, because a lot of other people have decided those are "LLM clues." Bullets, EM Dash, certain common English phases or words (e.g. Delve, Vibrant, Additionally, etc)[0].Basicaly you need to sprinkle subtle mistakes, or lower the quality of your written communications to avoid accusations that will side-track whatever youre writing into a "you're a witch" argument. Ironically LLM accusations are now a sign of the high quality written word.[0] https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

PufPufPuf · Answer

I "detect" them through overuse of some patterns, like "It's not X. It's Y."This is an artifact of the default LLM writing style, cross-poisoned through training on outputs -- not an "universal" property.

moonu · Answer

Pangram is probably the best known example of a detector with low false positives, they have a research paper here: https://arxiv.org/pdf/2402.14873. They do have an API but not sure if you need to request access for it.For humans I think it just comes down to interacting with LLMs enough to realize their quirks, but that's not really fool-proof.

mjlee · Answer

People Look For:Specific language tells, such as: unusual punctuation, including em&ndash;dashes and semicolons; hedged, safe statements, but not always; and text that showcases certain words such as &ldquo;delve&rdquo;.Here&rsquo;s the kicker. If you happen to include any of these words or symbols in your post they&rsquo;ll stop reading and simply comment &ldquo;AI slop&rdquo;. This adds even less to the conversation than the parent, who may well be using an LLM to correct their second or third language and have a valid point to make.

dipb · Answer

Humans detect them mostly through pattern matching. However, for systems, my guess is that a ML model is trained on AI genres texts to detect AI generated texts.

gwbas1c · Answer

I don't think you can 100% detect AI content, because at some point someone will just prompt the AI to not sound like AI.I think the better question to ask is: What are your goals? Is it to prevent AI SPAM, or to discourage people copy-pasting AI? Those are two very different problems: in the case of AI SPAM you look for patterns of usage, (IE, unusually high interaction from a single IP, timing patterns around when things are read and the response comes in,) and in the other case it all comes down to cultural norms.

m_w_ · Answer

I don&rsquo;t think there&rsquo;s a reliable system or API for doing so, unclear that arms race will ever favor the side of the detectors.As far as how I / other people do it, there are some obvious styles that reek of LLMs, I think it&rsquo;s chatgpt.There&rsquo;s a very common structure of &ldquo;nice post, the X to Y is real. miscellaneous praise &mdash; blah blah blah. Also curious about how you asjkldfljaksd?"From today:This comment is almost certainly AI-generated: https://news.ycombinator.com/item?id=47658796And I'm suspicious of this one too - https://news.ycombinator.com/item?id=47660070 - reads just a bit too glazebot-9000 to believe it's written by a person.

sigotirandolas · Answer

I don't look at whether the text is written by an LLM but at whether it has substance and whether the writer understands what they are doing and is respecting my time.
If the text is full of punchy three word phrases or nonsense GenAI images then that's an obvious sign. But so is if the other person has some revolutionary project with great results but they can't really explain why their solution works where presumably many failed in the past (or it's a word salad, or some lengthy writing that doesn't show any signs of getting you to an "aha, that's some great insight" moment).
A good sign is also if the author had something interesting going before 2022, and they didn't fall into the earliest low quality LLM waves. Unfortunately some genuinely talented people have started using LLMs to turbocharge their output while leaving some quality on the table nowadays, so I don't really know. I'm becoming a lot more sceptical of the Internet, to be honest.

tatrions · Answer

The principled approaches are statistical. Things like DetectGPT measure per-token log probability distributions. LLM text clusters tightly around the model's typical set, human writing has more variance (burstiness). Works decently when you know the model and have enough text, breaks down fast otherwise.
Stylistic tells like 'delve' and bullet formatting are just RLHF training artifacts. Already shifting between model versions, compare GPT-4 to 4o output and the word frequency distributions changed noticeably.
Long term the only thing with real theoretical legs is watermarking at generation time, but that needs provider buy-in and it slightly hurts output quality so adoption has been basically nonexistent.

noufalibrahim · Answer

It's a lot easier to detect when you mostly interact with non English speakers.
I asked an LLM to rewrite this to make it nicer and got the following. I'd flag the first because I don't usually hear "majority of your interactions" in conversation but I might miss it. The second will probably get by me. As for the third, I never say "considerably easier" unless I'm trying to sound artificially posh.
1. It becomes much more noticeable when the majority of your interactions are with non-native English speakers.
2.It tends to stand out more when most of the people you interact with speak English as a second language.
3. It's considerably easier to identify when most of your interactions involve people whose primary language isn't English.

leumon · Answer

You can try to use an ai detector, here is a leaderboard of the best ones according to this benchmark: https://raid-bench.xyz/leaderboard Results should of course always be taken with a grain of salt, but in most cases detectors are quite good in my opinion.

dezgeg · Answer

For HN comments, the LLMs seem to really like 2 or 3 paragraphs long responses. It's pretty obvious when you click a profile's comments and see every comment being that exact same structure.

mghackerlady · Answer

Overuse of "it's not X, it's Y" kind of writing, strange shifts in writing or thinking patterns, and excessive formatting (or, when I'm on wikipedia especially, ineffective formatting (such as using MD where it isn't supported))

kaindume · Answer

I believe if you have access to the training data of the specific LLM and the generated text is long enough, using statistics you might be able to tell if its LLM generated.I am writting an LLM captcha system, here is the proof of concept: https://gitlab.com/kaindume/llminate

blanched · Answer

I don't think there's any reliable way to tell.To me, it often feels like the text version of the uncanny valley.But again, that's just "feels", I don't have proof or anything.

Havoc · Answer

You don't really.There are a couple of tells like em dashes and similar patterns but you should be able to suppress that with even a simple prompt.

rcxdude · Answer

There are some systems which can use the LLMs themselves to detect writing (basically, if the text matches what the LLM would predict too well, it's probably LLM generated), but they are far from infallible (with both false positives and false negatives). There's also certain tropes and quirks which LLMs tend to over-use which can be fairly obvious tells but they can be suppressed and they do represent how some people actually write.

block_dagger · Answer

Em dashes, &ldquo;it&rsquo;s x, not y&rdquo;, excessive emojis and arrows.

RestartKernel · Answer

People look for tells, systems detect word distributions. Though neither is as reliable as active fingerprinting using an encoded watermark.

rwc · Answer

Contrastive negation continues to be a dead giveaway.

fwip · Answer

You can smell it.

booleandilemma · Answer

I'm not going to tell you. I don't want that information going into the dark forest :)

How do systems (or people) detect when a text is written by an LLM

Hello guys, just curious about how can people or systems (computers) detect when a text was written by an LLM. My question is mainly focused to if there is some API or similar to detect if a text was written by an LLM. Thanks!!!

I "detect" them through overuse of some patterns, like "It's not X. It's Y."
This is an artifact of the default LLM writing style, cross-poisoned through training on outputs -- not an "universal" property.

Humans detect them mostly through pattern matching. However, for systems, my guess is that a ML model is trained on AI genres texts to detect AI generated texts.

You can try to use an ai detector, here is a leaderboard of the best ones according to this benchmark: https://raid-bench.xyz/leaderboard Results should of course always be taken with a grain of salt, but in most cases detectors are quite good in my opinion.

For HN comments, the LLMs seem to really like 2 or 3 paragraphs long responses. It's pretty obvious when you click a profile's comments and see every comment being that exact same structure.

Overuse of "it's not X, it's Y" kind of writing, strange shifts in writing or thinking patterns, and excessive formatting (or, when I'm on wikipedia especially, ineffective formatting (such as using MD where it isn't supported))

I believe if you have access to the training data of the specific LLM and the generated text is long enough, using statistics you might be able to tell if its LLM generated.
I am writting an LLM captcha system, here is the proof of concept: https://gitlab.com/kaindume/llminate

I don't think there's any reliable way to tell.
To me, it often feels like the text version of the uncanny valley.
But again, that's just "feels", I don't have proof or anything.

You don't really.
There are a couple of tells like em dashes and similar patterns but you should be able to suppress that with even a simple prompt.

Em dashes, “it’s x, not y”, excessive emojis and arrows.

People look for tells, systems detect word distributions. Though neither is as reliable as active fingerprinting using an encoded watermark.

Contrastive negation continues to be a dead giveaway.

You can smell it.

I'm not going to tell you. I don't want that information going into the dark forest :)