Within the last year it's become much worse with AI bots vacuuming my content on my dime.
Why is there not some community-driven attempt block them? Is this problem just not solvable?
I’ve done a fair bit of scraping for various things. Companies that attempt to defeat scraping often ruin the user experience along the way.
As an example I am aware of, there is an airline that keeps doing all sorts of things to defeat scrapers.
Problem is, the site now constantly throws errors for regular users doing searches and people are regularly getting banned for doing too many searches.
And they still haven’t done much but made scrapers increase their retry counts.
Still in beta but successfully protecting millions of requests daily.
During beta I'm giving free service in exchange of user feedback. Do register if interested!
Not at all. I block them on my silly hobby sites with great success. [1] I've never used a CDN on my personal stuff.
What's wrong with Cloudflare?
Anything that a human can read can be used for AI training. You can maybe avoid by paywalling. Or limiting users to people you have spoken to. But maybe you can't trust em all.
There are commercial offerings to help solve this but there is a reason you need to pay.