How do I know which pics on my Lemmy instance are safe to delete

maor@lemmy.org.il to Selfhosted@lemmy.world – 33 points –
$ cd lemmy-dir
$ du -sh *
456K    lemmy-ui
15G     pictrs
4.3G    postgres

Guys this is no longer funny please I feel literally chased by the "no space left" message. Please help I don't need those pics I did not upload them

17

Have you posted this question on the lemmy_admin community over on lemmy.ml? Or possibly joined their matrix chat as linked on their github project? I suspect you will be able to get much more targeted support directly from the team or their community rather than the selfhosted community which is more general to all kinds of self hosting.

Thanks a lot, I was looking for this exact kind of community. Posted there <3

Did you get any solution?

Haha I'm literally on it right now. My instance crashed a couple of hours ago because of it, so I emptied ~/.rustup to get some time, but idk how to go about it from here. LPP didn't do anything. That seems really curious, does literally everyone use S3?

Okay, you may not gonna like it but I rented a 1TB storage box from Hetzner for 3 euros a month, just to get that foot off my neck. It's omega cheap and mountable via CIFS so life is good for now. I'm still interested in what I described in the OP, and I even started scribbling some Python, but I'm too scared of fucking anything up as of now.

The annoying part in writing that script was discovering that the filenames on disk don't match the filenames in the URLs. E.g., given this URL:
https://lemmy.org.il/pictrs/image/e6a0682b-d530-4ce8-9f9e-afa8e1b5f201.png. You'd expect that somewhere inside volumes/pictrs you'd find e6a0682b-d530-4ce8-9f9e-afa8e1b5f201.png, right...? So that's not how it works, the filenames are of the exact same format but they don't match.

So my plan was to find non-local posts from the post table, check whether the thumbnail_url column starts with lemmy.org.il (assuming that means my instance cached it), then finding the file by downloading it via the URL and scanning the pictrs directory for files that match the exact size in bytes of the downloaded files. Once found, compare their checksums to be sure it's the same one, then delete it and delete its post entry in the database.

When get close to 1TB I'll get back here for this idea... :P

Sort by date created and delete oldest? Idk, I have no clue how Lemmy self-hosting works, but I guess that any picture you delete is a post that will be missing a picture.

Best solution? Just download more RAM 😉

I should've mentioned it in the post, but I already tried deleting pics modified more than X days ago. The catch is that I don't wanna delete pics uploaded to my server, I just want to delete pocs cached from other instances :(

They're thumbnails of other instance posts. I suggest migrating pictrs to the S3 for cheaper/easier storage.

S3 isn't always cheaper though... It's highly redundant storage (multiple copies in multiple data centers) so it's often going to cost more than a single copy on a single VPS or dedicated server or whatever. I guess in some cases it might end up cheaper compared to upgrading your storage to something larger though.

If you do want to migrate your images "to the cloud", Backblaze B2 should end up cheaper than S3.

  • You don’t pay for storing on multiple servers. I never saw something like this on any provider I know.
  • Upgrading storage is not cheaper. Instance media storage reaches 500GB in a month and S3 is always cheaper than data volumes with given options for pictrs.
  • Backblaze is not cheapest. It has egress fees so it will cost much more than others. Although its cheaper than AWS.

You don’t pay for storing on multiple servers.

For services like S3, it's included in the price.

Instance media storage reaches 500GB in a month and S3 is always cheaper than data volumes

Not sure where you got the idea that S3 would always be cheaper. $5/TB/month is a standard benchmark price for storage "in the cloud", and S3 is way more than that.

As an example, a Hetzner storage box is around $3.50/month (+ VAT if you're in Europe) for 1TB of space with unlimited traffic. The same amount of space with S3 is $23/month, plus the traffic.

For caches of media files, you don't need redundant storage like what S3 provides. You can save money by using a cheaper option.

Backblaze is not cheapest.

I didn't say it was the cheapest, just that it's cheaper than S3. Cheapest would probably be a Kimsufi server or something similar.

This is for an SSD-based volume, which you really don't need for media storage. If you're using Hetzner, just get a storage box.

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NUC Next Unit of Computing brand of Intel small computers
SSD Solid State Drive mass storage
VPS Virtual Private Server (opposed to shared hosting)

3 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.

[Thread #73 for this sub, first seen 21st Aug 2023, 23:45] [FAQ] [Full list] [Contact] [Source code]

I ran this query:

select distinct thumbnail_url as url from post where not local and thumbnail_url like 'https://campfyre.nickwebster.dev/pictrs%'

(replace with your instance's url)

I then sent delete requests to /internal/purge on pictrs to delete all of those old thumbnails, which cleared out a lot of space. After deleting the thumbnails I ran an UPDATE query to set all of those old thumbnail URLs to null in the DB. I also patched the version of lemmy that I run to stop caching thumbnails in the future. Hope this helps!

I'm interested in this as well. Mastodon has RAILS_ENV=production /home/mastodon/live/bin/tootctl preview_cards remove and RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove but I don't know if something equivalent exists for lemmy.