Lemmy self-hosters. What is your image cleanup process?

idle@158436977.xyz to Selfhosted@lemmy.world – 53 points –

I'm self-hosting the docker containers and I noticed the pictrs directory is steadily growing because of the cached images. Does anyone know if it gets cleaned up automatically or are hosters running scripts to clean it up after a certain amount of time? The install guides make no mention of it from what I can find.

25

Honestly, If I can get posts to stay synced up, that will be a good day for me.....

Seriously, federation/sync issues, are not fun.

I've had lot of issues with lemmy.ml. I just unsubscribed from everything over there since zero comments were federating over to my instance.

I noticed that they'll show up eventually where "eventually" could be like, 10-12 hours.

I suspect that they're just absolutely slammed to the point they can't actually push the federated content out to subscribers because EVERYONE is subscribing.

Might be an architectural thing due to not having a sufficiently scalable job queue/worker thread infrastructure, or just like, not enough CPU cycles to do it.

It's hard to say. I don't know if the admins of Lemmy.ml have been public about their issues or not. I know that Lemmy.world hasn't been having the same issues, at least from my perspective. Makes me think it's less an architectural or design problem, but rather a lack of server resources like CPU, as you suggested.

I read somewhere that Lemmy.ml has basically maxed out its VPS with its provider, so they’re stuck for the time being, whereas Lemmy.world actually just upgraded its server hardware. Hoping they’ll migrate to a beefier server soon.

Yup, I've read something similar. Hopefully they're able to get things sorted out soon!

Beehaw has been my bigger problem child.

However, tonight it's smooth as butter. Things are syncing, I'm getting alerts.

Could be due to some of the maintenance I did earlier too.

I've not personally noticed any federation issues with Beehaw on my instance. Glad to hear things are better tonight.

IIRC, I've read comments elsewhere that pictrs caches for 6 months, but I can't independently verify. I hope this gets a broader answer because I'm still on the fence about getting an instance set up for myself and some small communities.

I believe the activity table in Postgres is retained for 6 months (although I’m purging mine daily) and the pict-rs cache is 168 hours (1 week).

I knew I read something was kept for 6 months ;)

Glad to see that even here, the best way to get the right answer on the internet is to provide a wrong one.

Only 1 week? That should be fine. Thanks!

I was starting to sweat a little because my instance, that only I use, already has 600MB of pictures after less than 24 hours. The server has more than enough space, but I still wouldn't like it. A week is far more swallow-able.

1 more...
1 more...

How do you purge daily? Also, does that delete any post history or anything in a similar vein?

I’m running the following SQL, although I’m not actually sure it’s as necessary since 0.18.3. It doesn’t delete any post history or anything.

DELETE FROM activity WHERE published < NOW() - INTERVAL '1 day';
1 more...

Related note, pictrs is super cool. Its like an OSS imgur backend, but no one really talks much about it or its potential.

1 more...

I'm just letting mine do whatever it wants, got plenty of local storage. If/when I have storage issues I'll add an s3 bucket, pretty easy to modify the entrypoint for pictrs to pass s3 connection info in the docker-compose deployment.

Remote images are not cached or proxied right now as far as I know. Edit: seems I was wrong and there is some image caching happening. For sure for the small image thumbnails, but also sometimes for other pictures, but it seems very inconsistent.

Your growning pictrs directory might be also due to the extremely verbose default logging that Pictrs (and the Lemmy backend too btw) uses.

When I look in the directories, it's 100s of images that are definitely from posts. Maybe it only caches the images I clicked on?

No, I was wrong and caching is happeing somehow, but not always. I think there might be a strict time-out or something like that for pict-rs trying to cache the images, which is why most images do not get cached in my experience.

In any case, a weeks retention is fine by me. I have a couple hundred gigs available, so long as it's getting cleaned up at some point it's not a problem for me.