Archive.org: the Good, the Bad, and the Ugly

Some of my thoughts on archive.org that I can direct people to, whenever they're confused why I have mixed feelings.


The Good

There is a real need for a repository for archiving lost products and knowledge. As it stands, archive.org is the only great general purpose source for this at the moment. And if you include the Wayback Machine, it is indispensable.

None of us live forever, so an "archive.org" is necessary for every future generation.


The Bad

A lot of software that's still under active copyright or even currently supported, often gets uploaded to archive.org. That isn't preservation, it's straight up piracy. It also defeats the whole point of preserving something that doesn't (yet) need preservation.

Archive.org is pursuing a worthless crusade against book publishers for illegally allowing people to read copyright material (while asking for donations... which they could have avoided burning all of that money entirely). They should have never engaged this, in my opinion. And even with the "digital checkout system", it provides a much easier access for anyone to pirate material still actively sold by publishers. It should only be relegated to legitimately rare and abandoned materials. This also helps reduce the "needle in a haystack" problem as per below. Why flood, or allow the service to be burdened with books that can actively be purchased easily?

There is no guarantee that anything on archive.org will actually be preserved or that the service will stand the test of time. There are in fact copies of certain things that only exist on archive.org, but mirrors are important... which leads into... archive.org is too big to mirror. It may be more doable if the wheat from the chaff was separated, but that would require a team of people and wouldn't solve the problem of duplicate/bad/illegal uploads.

The website and account profiles are broken. I spent a whole day trying to change a profile pic on archive.org, and the site just refused to do it, in any browser.

In my opinion: they have a sloven response to security and don't really take it seriously. The September 2024 breach resulted in a lot of personal data being leaked everywhere.

It goes down, a lot.


The Ugly

The searches. Oh how annoying the searches are as archive.org has a horrible method of parsing metadata. You can spend a lot of time trying to find something that its search engine either doesn't list, or doesn't arrange it to the beginning. I have a handful of techniques I use to work around this. It's like they took a page from Windows 10's infamously broken search :p

The needle in a haystack problem: uploads are not audited for quality and many things are just re-uploaded over and over and over and over and over for the same thing, and some uploaders have worse etiquette than others. Great... now I have to sift through many copies of the same thing to find something unique and rare. I could provide many examples (even to things archived on here that people have duplicated multiple times in droves), but it's just kicking a dead horse at this point.

If the "archive" ever gets shut down, it'll take the Wayback Machine along with it, which is in my opinion, one of the most important services in the world right now. Without it, so much would have been lost. And there continues to be legacy website material that many need to obtain from it. I almost feel that it needs to be a separate legal entity.

False sense of archival security: this one is an interesting phenomenon. When I would advise others that I have not been able to access certain rare software, they would either assume I haven't performed due diligence and already checked archive.org/wayback, or, provide links to something I've already advised is the wrong thing (i.e. a shareware version vs. the full version). Not really a problem with archive.org itself per se, just the social phenomenon it has generated around assumptions.


Bittersweet

And that is the conclusion. While Archive.org is a great and necessary tool, there's a lot of things that need to be done that will probably never happen, and continue to place it in legal jeopardy. You may have noticed it suffers many of the parallel issues as Wikipedia does-- such is the inevitable result to open community projects without auditing and/or bias. I don't really know of a good way to solve that problem, honestly... since 'people' are the issue.

'My' solution to the problem is to use archive.org for rebuilding / mirroring content here, and then you also have a bespoke front end for the content rather than relying on searching the site from thousands of disparate files. Of course I won't be around forever, and there may not be an infinite line of people to carry the torch for hosting my sites, so a solution like archive.org is always needed... in the end...

🏠 Home | ⬅ Back | ⬆ Top