HACKER Q&A
📣 NiceWayToDoIT

How does HN sanitizes input?


I have noticed that it is not possible to enter HTML or ASCII emoticons, but on the other hand, it will allow a wide range of letters from different languages.

How is it done? Is it hardcoded Unicode map, rules or algorithm? Does it have anything to do with mdtext2 data type?

I have tried looking into https://github.com/arclanguage/anarki/tree/master/apps source code (I do not know how relevant it is, and is it up to date with deployed version) but I have not found anything useful. As I have not done anything with arc before, I would appreciate a bit of nudging in the right direction.


  👤 mattmanser Accepted Answer ✓
Try emailing them if you don't get an answer here. If you can convince dang you're not trying to hack them, he might tell you!

Email's the contact link in the footer.

Last time I saw the mods mention their email I think they said they try and respond to every email, but it can take a few days, maybe even a week or two.


👤 krapp
The Anarki codebase has diverged quite a bit from the Hacker News codebase, which itself has diverged from the default Arc language implementation, so searching Anarki to see how HN does things may not be useful much of the time, although much of the code is likely to be the same or similar simply due to how infrequently it changes.

Hacker News seems to have added unicode escaping on their own, as far as I know it's not in either Anarki or the Arc forum.I assume HN just has a function strip unicode from comments while they're being processed, based on a blacklist or killing everything outside of the ASCII range. That might happen in the general area of their version of process-story[0] in news.arc, but that's just a guess on my part.

HTML generation and entity escaping are done in html.arc[1] and more or less just consist of escaping some entities. You might also check app.arc and server.arc.

Also, comments are stored as raw HTML in Arc tables in flat files, which implies there is no sanitizing of content being sent to the reader, except for downconverting to markdown and back when editing.

[0]https://github.com/arclanguage/anarki/blob/573e6833289823385...

[1]https://github.com/arclanguage/anarki/blob/master/lib/html.a...