PSA: Pictures are back!

Lionir [he/him]@beehaw.orgmod to Beehaw Support@beehaw.org – 34 points –

We have to move to object storage to have efficient image storage. We are currently at 70% of our disk storage.

We are sorry for the short notice and we don't know how long it will take though we believe it will take multiple hours.


Small post mortem :

  1. The migration required us to be down - we thought we could run without images
  2. It took much longer than initially anticipated
  3. It broke at close to 2:30AM - 3 hours into the migration and I couldn't fix it so the server has been down doing nothing for hours now
  4. We're booting back up from the backup made before the migration was attempted, we'll try another strategy.

There might've been some data loss in images, we're looking into it. In the meanwhile, if your profile picture or banner is broken, feel free to re-upload them.


Update :

Hi Beeple!

We're trying this again.

This time, Beehaw should remain available though no pictures will be able to be uploaded. The error will likely be weird because Lemmy will think it is possible but we will block the upload from happening.

We'll take a snapshot of the pictures before the migration in 60 minutes at 21:00 UTC - it should take around one hour, do the migration and testing on our own before shipping on Beehaw.

Once we've resolved all these kinks, Beehaw will momentarily go down and then back up with the migration complete without losing any old pictures.

51

Well that was fun.

Didn't go as planned of course, restored from backups, pre migration attempt. Thank you for your patience while we try to get all these moving parts working well together. Sorry for the troubles.

I once caused an AWS outage that impacted 20% of their customers in their largest region. They called my manager to ask why we were performing around 10k writes per second to a bucket. It was fun times

That was you?!

(jk, I'm on a different cloud 😂)

They don't limit that?! I've worked with a lot of AWS services and most have built in rate limits. That's wild lol

They do now...

lol, that's how rules get made

I can get into it in more detail if anyone's interested. But basically, they had a rate limit on direct writes, but not a rate limit on cross bucket replication if you connected many buckets to replicate into a single bucket

Thanks for the update and hope you have less trouble in the future! Don't worry about the downtime I really appreciate that here it's serving a clear purpose unlike Twitter lol

I can't upload my puppy and flower pics!!! Fucking damn you!!! WTF did I sign up for!!!?!?!??!

Thank you for making it possible to share endless pictures of beans in the future! It will never get old.

Beans, beans, beans, more beans, perhaps a cat, beans, beans, never gets old!, beans.

If it fails, you can always tell users to upload images to pixelfed and share the link here (I'm joking, don't take this seriously)

Surprised beehaw hosts images at all. Sounds like that could become very expensive very quickly.

It could, and will. Hopefully they are taking advantage of CDNs for image delivery so they aren't paying high egress costs and can keep it in slow, cheap, storage.

I'm honestly surprised that Lemmy hasn't embraced distributed, community, hosting. Many existing niche communities (outside of Lemmy) operate with the ability for others to run their service to serve up images and media, or to act as workers (By running the worker application or container) for computationally expensive operations like compression or encoding. Even gamificating it in the case of e-hentai.

Hard drives at home are incredibly cheap compared to cloud storage costs (even including networking, server, redundancy...etc hardware costs), but come with reliability concerns, which is where a distributed community becomes critical.

I feel like Lemmy definitely needs to embrace distributed computing in some fashion. I have no interest in hosting my own instance, but I'm not against running a docker image that would offload some of the processing requirements large instances need. It would just need to be relatively straightforward for me to setup

Distributed computing isn't really a good fit for low computational tasks like forum software. It's good for heavy calculations like "Could you please fold proteins to see if there's any interesting stuff to be found" and "Here are 50 years of radio data. See if any of it is anomalous." You need a sufficiently complex enough long-running task to warrant the computational overhead of a supervisor process assigning and receiving the outputs of tasks. LLMs, epigenetics, and deep space analysis are all good candidates for distributed computing. Lemmy is more of a candidate for an autoscaling clustered multi-tennent approach. The computational tasks are basic, but there's a lot of them. Further, the computational needs are not constant. A fantastic case study for making the most of resources in the Fediverse is mastodon.world and lemmy.world running on the same server and making scale up and scale down requests to the docker daemon. The ideal world topology, in my opinion, for a Fediverse application ecosystem would be a Kubernetes cluster with three supervisor nodes and a minimum of two worker nodes, all with autoscaling enabled. The idea would be that your database resources can hold multiple databases (Lemmy, Mastodon, Peertube) AND can scale. The mechanism you would use to do this would depend on your hosting decisions.

Digression now on database solutions. There are three basic ways I could see running the perfect Fediverse database cluster. The first, and least beholden to any given cloud provider, is to run Postgres in a Kubernetes cluster either on a single machine emulated cluster at your house, or within several clustered machines. The upside to this is that no one but you controls your infrastructure. The downside is that your ability to scale is hard capped to the amount of RAM and CPU resources you physically have in your house. Next would be a similar set-up on a hosted Kubernetes cluster through a cloud provider such as Google, Microsoft, IBM, or AWS. The downside here is that tech giants are all, for various reasons, shit. Google has the best eco-friendliness score, so they're listed first. They're still shit, though, and one of the platforms I'm suggesting hosting is a direct competitor to one of their golden goose products.

Your next option is to just pay one of those cloud providers to host a database cluster for you, rather than using an ad hoc Kubernetes cluster solution. It will cost you more money, but the tools available to you for managing databases through these cloud providers are much better. In terms of user experience and performance, this is a clear upgrade over hosting your databases on your Kubernetes cluster. The final option I'd want to talk about is called "Aurora Serverless." So far, I've only discussed ways you can scale up to meet demand, but Aurora Serverless allows you to scale down. This will be the cheapest option if you run a small instance with clear peaks and valleys of load. It's not the best answer for a user like Beehaw, but would come with the lowest cost in terms of management and money for someone running an instance for a low number of people.

So, does that solve the image hosting problem? No. Not really. Postgres is TERRIBLE for image hosting. Right now, Beehaw is, per my understanding, using the simplest image storing solution, which is "Just keep it on the server." This is great for a first pass at hosting a web service, and will remain fine long term for a low user instance, but will fast run into issues with any instance that hosts numerous users uploading pictures. Basically, servers have finite space because they're running the Harvard architecture. The only solution is to bring the service down and put in bigger disks. Eventually, you reach the upper limit of how big of disks are manufactured, and how many disks you can attach via the interfaces that connect to a motherboard. A much better solution, and in fact the best solution, is what Beehaw is implementing right now: block object storage. If I'm going to tie all of this first in the DIY "I'm a strong independent Fediverse citizen, and I don't need no corporations," I'll start by recommending Ceph. Ceph can run on Kubernetes and will provide block object storage based on Kubernetes persistent volumes. But more likely, you will want to aim for something with infinite storage capabilities, and your only real options for that currently are the cloud providers. You don't have to worry about disks running out of space, and they do not charge you very much money.

I get where you're coming from, though. "How do we all own the images so that the instances don't run out of space but without being beholden to the corporations who own the storage?" The closest we come right now is peer 2 peer solutions, but all of them have a discovery and durability problem. In terms of discovery, the problem is "how does a server providing the Lemmy service find the peer 2 peer hosted files?" There's no way to perform get object operations to serve the files via HTTP other than for the host server to fetch (download) the file from the peer 2 peer network and then deliver that to the user who made the request. The problem with this is that the server synced the file to its local storage, and is now hosting it, thus defeating the purpose of the peer 2 peer hosting solution. The other problem, the durability problem, is what happens when a low number of people are interested in an image, and the last person online hosting the image closes their laptop. Now no one can get the image as there was never a canonically available version of the file. The only solutions that I know of that come close to solving these problems right now are Nostr and Secure Scuttlebutt. There are major issues with these protocols as they stand right now. Firstly, people already find joining the Fediverse too hard. For Nostr you have to generate GPG keys to create your identity. This isn't... horrible, but it definitely takes some work and some doing. You have to generate the files and then load them into your Nostr client. Secure Scuttlebutt is based on a protocol where to follow someone, someone has to invite you to follow them. People already complain about Beehaw asking you a question about what you like about Beehaw to make sure you read the rules. Imagine the frustration with a pure invite only social network where you can't join until someone you know has joined.

The second problem is moderation. Secure Scuttlebutt is fine for this. You only ever follow people you like, you only ever see updates from people you like. Fantastic. Nostr has basically no moderation at all. If you've spent any time at all on the internet, you've probably realized by now that this is TERRIBLE. My time on Nostr was basically opening the app, seeing an entire feed full of pro-Russian propaganda, and then uninstalling it. I do think there's something to be said for the idea of a pure peer 2 peer social network, but I don't think we're anywhere close to implementing it yet. So, where does that leave us?

The Fediverse. It was designed for a distributed governance system in which each instance acts as its own country with its own rules and governance, and it accidentally has some pretty neat clustering features that help it perform better under heavy load and keep data more permanent and durable. I want to emphasize that, too. The current computational and architectural benefits of the Fediverse are accidental. They're side effects of the distributed governance, not the core purpose. I don't expect anyone to put focus into enhancing these aspects of the Fediverse, at least not for a while. We're much more likely to see someone design a community based social network from the ground up on peer to peer technologies. I'd be excited about that, but it will need to have more open signups than Secure Scuttlebutt, and moderation tools like... At all, unlike Nostr. The most likely solution for the latter would be collaborative blocklists. Maybe me and two of my friends have a shared view of what is and isn't hate speech. So, we all spend some time just blocking the shit out of users. But, no one of us is who writes the block list, the block list itself is a peer 2 peer distributed construct so that we don't all have to reach consensuses about "Hey, was this guy being a jerkass"

Lemmy definitely needs to embrace distributed computing in some fashion

It would just need to be relatively straightforward for me to setup

Pick one.

Every technical bump in the road we hit now is one we won't hit/will know how to handle quickly in the future! Thank you for doing what you do for Beehaw!

I concur. A minor inconvenience on occasion is a small price to pay for your amazing efforts! Thank you for doing what you do.

Yeah, moving to object storage is best to do now. Arguably, we should've done it sooner since the longer we've waited, the more it was gonna catch up to us and cost us in time and money.

Guess you guys will have to wait a little longer to see my grandma in her latest night gowns...

You guys are the best!

I did an ADHD, and misread as you saying you were turning off pictures for good, but given how much I'm enjoying the Beehaw community and the hard work you guys to keep it online, I wasn't even that upset about that! A short, well telegraphed, partial outage is nothing in comparison!

Thanks to all you wonderful people!

Just to be clear, this is just a moving of images, and it will be back correct? Just a temporary measure?

No worries on the short notice, thank you for the heads up! Sincerely appreciate the transparency.

AM or PM?

Yes.

(Sorry, I’m from Reddit. It’s hard to resist.)

It was AM, it's already passed and we need to retry because it failed..

I saw that after my comment, but left it since I still found it odd that the announcement didn't include AM/PM

I am utterly ignoranant of technological mumbojumbo. I was just trying to add a pic to the Creative sub, but nothing uploaded. Is that why? Can I stop trying to make it work?

Yes. Be patient. Assume that when Fediverse stuff is not working exactly right, it's not you, it's probably the Fediverse. These are early days of self-organized effort, like thousands of people trying to lash rafts and boats together in the middle of the ocean. They're busy trying to make sure the whole thing doesn't sink - don't worry about the photos.

With kindness, I very much suggest against dismissing both the technology and your ability to understand it by calling it "mumbojumbo". Don't let the engineers make this stuff something only they can understand and work with.