Currently we use a simple question/answer addon at registration time - it works against all untargeted bots and is just a little "what is 4 plus six" or "what is the abbreviation for this website" type of question. It's worked fine for years and we don't really get general untargeted spam.
I am somewhat ethically disinclined to use reCAPTCHA, and there are some older members that can't reasonably solve hcaptcha easily. Same for using heavy fingerprinting or other privacy invading methods. It's also donation-run, so enterprise services that would block something like this (such as Distil) are both out of budget and out of ethics.
Is there a way I can possibly solve this? Negotiation is not really an option on the table, the last time one of the other volunteers responded at all we got a ~150Gbps volumetric attack.
I've tried some basic things, like requiring cookie and JS support via middleware; they moved from a Java HTTP-library script to some kind of Selenium equivalent afterward. They also use a massive amount of proxies, largely compromised machines being sold for abuse.
* Don't delete ban accounts, don't notify them in any way, but tag their IPs and cookies to auto shadow-ban any sock puppets, so that these don't even make into an approval queue.
* Use heuristics to automate the approval process, e.g. if they looked around prior to registering, or if they took time to fill in the form, etc.
* Add a content filter for messages, including heuristics for an ASCII art as a first post, for example, and shadow-ban based on that.
* Hook it up to StopForumSpam to auto shadow-ban known spammers by email address / IP.
* Optionally, check for people coming from Tor and VPN IP, and act on that.
Basically, make it so that if they spam once, they will need both to change the IP and to clear the cookies to NOT be auto shadow-banned. You'd be surprised how effective this trivial tactic is.
All in all, the point is not to block trolls and tell them about it, but to block them quietly - to discourage and to frustrate.
Focus on making it not fun to troll. Never acknowledge the disruption. Make all your countermeasures as silent as possible. Never address the script kiddy directly. Don't accidentally make a "leader board" or similar by counting number of bans/deleted posts etc.
Eventually it just becomes a waste of time to scream into nothingness and they will go elsewhere.
So you point various compression algorithms against your community content to form an average/median/other statistical points. Try both compressing each post individually and compressing it as one giant text corpus and counting the size growth a post generates by being added. These are your measurement points.
An incoming post must solve a captcha to be able to post, however, the likelihood of solving it is tied to the compressability of your post.
A compressable post is likely to be spam or ascii art. The captcha fails even if the data was entered correctly. IIRC I used a relationship of 'min(1, sqrt(1/compress_factor)-1.05)'.
A non-compressable post is not only likely to succeed a captcha, they might succeed even if they actually failed it.
The entire point is that it shifts balances. Trolls will have to submit their posts a few times and resolve captchas, which slows them down. Making content that does not compress well across a variety of compression algorithms, especially if you also account for existing text corpus, is a very hard problem to solve. They'd have to start to add crap to the post to bloat it up, at which point you can counter with the next weapon.
Repeat all of the above, except instead of compression, you estimate entropy. High entropy blocking means you can block messages containing compression decoys.
I don’t know if it was coincidence, but after some sleuthing found his real name and did the online FBI tip-off form about his emails to us. He had a bad history and may of been on bail.
Stopped pretty promptly after that - guessing he got a phone call.
I attempted half a dozen mitigation strategies to prevent spam on one forum I ran. I tried honeypots, questionnaires, other captchas, and proxying services to block bots. They slowed the bots at best, but when there's a torrent of bad actors it really doesn't matter if you slow them down 50%.
I finally installed reCaptcha and it solved the problem instantly. Not a single bot has signed up in 6 months. I started getting suspicious that signups were just broken, but I tested it and it was fine.
After that experience, I'm very much on team reCaptcha. I tried hCaptcha as well (on a different project), but found it was much harder to solve.
won't block all spammers, but will increase the server cost (even for selenium) to the point where they'll have to get GPU instances which will be too expensive for a script kiddie.
this is what cloudflare is sorta doing when they say "verifying your browser"
https://www.cloudflare.com/products/bot-management/
https://www.akamai.com/us/en/products/security/bot-manager.j...
Edit: I didn't see your comment about budget. I expect Akamai may be out of reach, not sure about Cloudflare's options. Most bot detection is going to need to finger print behavior of the interaction to the site (Captcha as well). If that data is handled correctly, (not being sold/made available to a third party/destroyed after use), I believe it can be done ethically. Obviously my ethics are not yours.
Track the network of invites and shadow-ban linked accounts when you detect the spammer popping up. The spammer will eventually run out of invitees.
You can combine this with "no invitation required" short periods, where you make changes to the signup flow, spam detection, etc. and make the window short enough for the spammer to not have the time to adjust their bots.
There is also alternatives to recaptcha, that might be more ethical, for example https://www.phpcaptcha.org/ - there are some image matching ones too, but I don't know any specific ones.
Their list is a combo of https://en.wikipedia.org/wiki/Special:GlobalBlockList and https://en.wikipedia.org/wiki/Special:BlockList?wpTarget=&wp... and TOR (which is handled automatically) [there is also an api version in json format]
More work as well but when you whois some of the attacking machines you can find out what the abuse@ email is for them and contact them. That can put the provider on notice if you later also go with some legal action.
Or add 2FA with a text message for sign up. That is a lot harder to automate and unless he is willing to spend a ton of money on extra phone numbers, he should run out of them quickly.
For an example, check a WordPress plugin I made 2 years ago: https://wordpress.org/plugins/la-sentinelle-antispam/
There is also the slider thing on Ali Express, that you could check out. I haven't looked into it, not sure how it exactly works.
Alternatively, you can take away the instant gratification by adding a cooldown of, say, three days for each created account. Then he'll have to register them in bulk and hope the humans don't spot the patterns.
You could also try using Bayesian filtering, but you'd have to block the ASCII art first.
“Thank you for registering. Please send an SMS to number XXX with code YYY to activate your account.”
Kind of like a reverse 2FA.
https://stackoverflow.com/questions/33225947/can-a-website-d...
Remember, the goal is to flag accounts for cheap bulk rejection, without telegraphing to the attacker.
We launched Shibboleth (a CAPTCHA service) about a year ago, and you can select from a variety of different CAPTCHA types (including some non-traditional types; different types have different strengths and fun factors): https://www.nettoolkit.com/shibboleth/demo There are a variety of options that you can set, and you can also review user attempts to solve CAPTCHAs to see if you want to make the settings more or less difficult.
Recently, we launched Gatekeeper ( https://www.nettoolkit.com/gatekeeper/about ) which competes against Distil and others, but without fingerprinting. Instead, site operators can configure custom rules and draw on IP intelligence (e.g. this visit is coming from Amazon AWS or this IP address has ignored ten CAPTCHAs in two minutes), and Gatekeeper will indicate to your website how it should respond to a request based on your rules. There's also other functionality built in, such as server-side analytics. Some light technical integration is required, but we're happy to help with that if need be.
As with all NetToolKit services, we have priced both of these services very economically ($10 for 100,000 credits, each visit or CAPTCHA display using one credit).
We would very much appreciate a conversation, even if it is only for you to tell us why you think our solutions don't fit what you are looking for. I would be happy to talk to you over the phone if you send me your phone number via our contact form: https://www.nettoolkit.com/contact
I run a gaming community with several thousand members and we regularly have to fend off attacks on both the community (spam bots in Discord) and the game servers themselves (targeted DDOS attacks usually in the 200-300Gbps range.)
From my experience, they tend to get bored and move on rather quickly so often times whatever we have to implement is more temporary in nature and doesn't really affect the existing community much if at all.
You don't provide many details of what you do and do not have at your disposal in terms of skills, tech stack, access to log files etc so this is a non-expert cut and paste from SO [1]. Yeah I know (StackOverflow) and it doesn't even relate directly to your problem ....But if you read the long bit below it might give you a bit of blue-sky thinking.
>> The next is determining what behavior constitutes a possible bot. For your stackoverflow example, it would be perhaps a _certain number of page loads in a given small time frame from a single user (not just IP based, but perhaps user agent, source port, etc.)_
Next, you build the engine that contains these rules, collects tracking data, monitors each request to analyze against the criteria, and flags clients as bots. I would think you would want this engine to run against the web logs and not against live requests for performance reasons, but you could load test this.
I would imagine the system would work like this (using your stackoverflow example): The engine reads a log entry of a web hit, then adds it to its database of webhits, aggregating that hit with all other hits by that unique user on that unique page, and record the timestamp, so that two timestamps get recorded, that of the first hit in the series, and that of the most recent, and the total hit count in the series is incremented.
Then query that list by subtracting the time of the first hit from the time of the last for all series that have a hit count over your threshold. Unique users which fail the check are flagged. Then on the front-end you simply check all hits against that list of flagged users, and act accordingly. Granted, my algorithm is flawed as I just thought it up on the spot.
If you google around, you will find that there is lots of free code in different languages that has this functionality. The trick is thinking up the right rules to flag bot behavior. <<
[1] https://stackoverflow.com/questions/6979285/protecting-from-...
One of the forums that I frequent has a "newbie" section, which is not visible to full members or guests (who are not logged in). Whoever registers to the website needs to get a predefined set of "Likes" on their posts. Not every post gets a "like" - only those that contribute to the discussion do (not everyone needs to agree, debates are welcome as long as they are civil).
This helps maintain the quality of the forum to an outside viewer and cuts out a large amount of spam.
When I also started to build scripts and destroyed his own website, he basically realized the harm he was doing, apologized and stopped.
Reminds me of the old good times when you could trap script-kiddies on msn.
"I think you are lying and not capable of hacking my computer. I'm waiting for you, my IP is 127.42.196.8"
Some of the best solutions include very minimal/quick captchas, or simple checks for things like javascript
that's hilarious, have you tried trolling them back like just enjoying their company? start saying pool's closed and things like that
Also, do you have cloudflare in front of you?
You add a JS snippet that does some work, but if you detect a bot you make it do increasingly more work. Think bitcoin mining but not actually that
captcha only benefit google and the like, who couldn't care less for the community or content. Captcha makes honest content (and spam cleanup) more expensive than the spam! it's a losing proposition that only looks good when you look at it without considering all the situations.
make honest content (and spam) easy, but cleaning up easier. Things like every user can flag something, after a certain number of flags, also remove other content from the same IP (or same bundle of users with a close registration time window) automatically. And of course a feature for admins to automatically ban and erase content from users wrongly flagging honest content.
It's harder than captcha, but it is an actual solution. Captcha is lazy and ableist.