Is there an open source package that the Internet Archive runs? What is it? I assume sites like archive.is run the same. I’d like to know if I can also run it for self-hosted archiving.
Is there an open source package that the Internet Archive runs? What is it? I assume sites like archive.is run the same. I’d like to know if I can also run it for self-hosted archiving.
I believe they used heritrix at one point. The important bit is that there is a special archive format that they use which is a standard. There are several tools that support it (both capturing to it and viewing it) - it allows for capturing a website in a ‘working’ condition with history or something. I’m a bit fuzzy on it since it’s been some time since I looked into it.
It seems like all of their software is in the parent account of heritrix - https://github.com/orgs/internetarchive/repositories?type=all.