Do you use anything to archive content for yourself or others? (research, videos, articles, and anything that could be lost to time or censorship)
I saw this post and I was curious what was out there.
https://neuromatch.social/@jonny/113444325077647843
Id like to put my lab servers to work archiving US federal data thats likely to get pulled - climate and biomed data seems mostly likely. The most obvious strategy to me seems like setting up mirror torrents on academictorrents. Anyone compiling a list of at-risk data yet?
You are viewing a single comment
I have a script that archives to:
Internet Archive: Digital Library of Free & Borrowable Texts, Movies, Music & Wayback Machine
Webpage archive
Ghostarchive, a website archive
Self-hosted https://archivebox.io/
I used to solely depend on archive.org, but after the recent attacks, I expanded my options.
Script: https://gist.github.com/YasserKa/9a02bc50e75e7239f6f0c8f04fe4cfb1
EDIT: Added script. Note that the script doesn't include archiving to archivebox, since its API isn't available in stable verison yet. You can add a function depending on your setup. Personally, I am depending on Caddy and docker, so I am using caddy module [1] to execute commands with this in my
Caddyfile
:[1] https://github.com/abiosoft/caddy-exec
isn't this prone to a
or something similar at the end of the URL?
if you can
docker exec
, you have a lot of privileges already, so be sure to make sure this is not a dangerThank you for the warning. You are correct. It's prune to command injection. I will validate the URL before executing it. This shoud suffice until archivebox's rest API is available in stable.
Would you be willing to share it?
Sure.
I hope you are also donating to the projects for uploading multiple copies to different services.