But why do people trust it? How do you know the pages you're archiving haven't been tampered with selectively to change history? This is just out of sheer curiosity, and I am not saying they do this.
This is made further interesting because of the following:
- Analytics from various Russian providers, instead of self-hosted (FYI: I consider GA to be equally privacy-violating as Metrika or Mail.ru)
- Large amounts of reverse proxies off questionable or bulletproof hosting providers
- Indefinitely doing this can't necessarily be cheap either at scale, who is paying for this?
- Demanding tracking or else blocking your access to the site, blocking any resolver that doesn't send the first 3 octets of your IP to them (edns-client-subnet)
- Explicitly tracking you in odd ways: they repeatedly load pixels/do DNS preconnect/preload from wildcard subdomains containing a cookied number, IP, country, tracking IDs. View any archived page and ^F "pixel.archive.is"
At their present scale, going through and manually changing (tampering with) saved content for propaganda (or similar) purposes, would have very little impact. More realistically, it probably has close to zero potential consequential impact. It'd be quite the chore for very little return.
If they become important some day, with dramatically greater scale of usage, then getting answers to these questions might be important.
If they eventually betray trust, they're trivial to replace. Other competing variations of archive.is exist now. It's a relatively easy service to create. Someone should probably challenge them just on the basis of how bad their ui & ux are.
At scale, if they begin abusing their position, it would become well known, they would get a reputation and it'd kill their service. The barrier to competition extremely low.
An additional concern: they've shown signs in the past of being capricious, or at least, easily annoyed by (subjectively) insignificant slights. They continue to block Cloudflare DNS users, last I checked. The "reason" is that Cloudflare doesn't send along the eDNS client subnet, as a way of protecting their users' privacy. [1]
I would argue this means archive.today / is can't be trusted to have the best interests of the community at heart. It's not a public service in the way that archive.org is.
[1] This bad behavior is actually mentioned in their Wikipedia article, along with the additional uncited claim that they throttle users to 20 MB of data per day, upon which they apparently ban your IP address. I haven't verified the latter claim. https://en.wikipedia.org/wiki/Archive.today#Worldwide
Worth noting, it’s probably not that expensive to run. Most of the hosting services they use would be offering “unmetered” bandwidth, so the cost is probably fixed per month, likely under $1000.