HACKER Q&A
📣 nerder92

Does GPT-3 generated texts follows Benford's law?


As in title, i was wondering if it will be possible to detect a GPT-3 generated blog post by the fact that it might not respect Benford's law for word frequency. I did try to do this myself using as sample the article that tricked people here on HN https://adolos.substack.com/p/feeling-unproductive-maybe-you-should and it actually seems to have a weird word frequency pattern compared with other human generated articles, but i'm not so sure of my findings even because the sample is quite small in order to get to a conclusion. Is this makes sense? I would be nice if someone could help me figure it out.


  👤 bjourne Accepted Answer ✓
Benford's law doesn't apply to word occurrences. Analyzing word frequencies (1-grams) genereally don't work because it overlooks the order of words. Shuffling the words of this comment doesn't affect 1-gram frequencies yet turns it into gibberish.

👤 ksaj
Interesting thinking. People have fairly predictable / individual word usage patterns and modes. GPT-3 is trained on the words of a whole lot more than one person, so that would probably skew a word frequency analysis quite a lot - even on a paragraph by paragraph basis.