/kbin server update - or how the server didn't blow up

ernest@kbin.social to /kbin meta@kbin.social – 1100 points – 1 years ago

Currently, on the main instance, people have created 40191 accounts (+214 marked as deleted). I don't know how many are active because I don't monitor it, but once again, I greet all of you here :) In recent days, the traffic on the website has been overwhelming. It's definitely too much for the basic docker-compose setup, primarily designed for development use. I was aware of the possible consequences of the situation happening on Reddit, but I assumed that most people would migrate to one of the Lemmy instances, which already has an established position. I hoped that a few stray enthusiasts would find their way to kbin ;)

The first step was to upscale the VPS to a higher version (66.91EUR). It quickly turned out that it wasn't enough. I had to enable CF protection just to keep the website responsive, but the response times were still very slow. At this stage, the instance was practically unusable. The next step was a full migration to a dedicated server (100EUR, the current hardware). It can be done relatively quickly, so it resulted in a 5-minute technical break. Despite the much higher parameters, it didn't get any better. It became clear that the problem didn't lie there. I'm really frustrated when it comes to server administration. That was the moment when I started looking for help. Or rather, it found me.

A couple days ago I wrote about how kbin qualified for the Fast Forward program. To be honest, I did it out of pure curiosity and completely forgot because a lot was happening during that time. During the biggest fire incident, Hannah ( @haubles ) reached out with a proposal to help. I outlined the situation (in short: the server is dying, I don't even know what I need, help! ;). She quickly connected us with Vlad ( @vvuksan ) and Renaud ( @renchap ). I was probably too tired because I don't know if the whole operation lasted 60 minutes or 6 hours, but after a series of precise questions and getting an understanding of the situation, the guys themselves adjusted the entire job. I love working with experts, and it's not often that you come across individuals so well-versed in the fediverse. Thanks to Hannah's kindness, we will be staying there a bit longer. Currently, fastly.com handles the caching layer and processes images. Hence those cool moving thumbnails ;)

Things were going well at that point. I could disable Cloudflare protection. Probably thanks to that, many of you are here today, and we got to know each other a bit better :) However, even then, when I tried to enable federation, the server would stop working.

Around the same time, Piotr ( @piotrsikora ), whom I already knew from the Polish fediverse, contacted me. He is the administrator of the Polish Mastodon instance pol.social, operates within the ftdl.pl foundation, and specializes in administering applications with a very similar tech stack. I made the decision to grant him server access. It only took him a few moments, and he came back to me with a few tips that allowed us to enable federation. In the following days, there was more of it, and we managed to reach the current level. I think it's not too bad.

Nevertheless, managing the instance has taken up about 60% or more of my time so far, which prevents me from fully focusing on current tasks. That's why I would like to collaborate with Piotr and hand over full care of the server to him. Piotr will also take care of the security side. Now I have to take this much more seriously. We still need to work out the terms of cooperation, but I want you to know the direction I intend to pursue.

We also need to migrate to a new environment because one server will sooner or later become insufficient. This time, I want to be prepared for it. This may be associated with transient issues with the website in the coming days.

The next two updates will still be about project funding (I still can't believe what happened) and moderation. The following ones will be more technical, with descriptions of changes and what contributors are doing on Codeberg. I would like to be here more often, but not as an admin, just as myself.

Thank you all for this.

P.S. In private messages, I also received numerous offers of help that I didn't even have a chance to read and respond to. You are the best!

You are viewing a single comment

View all comments

Hey @Ernest and @piotrsikora,

I haven't looked too closely at how kbin is architected yet, but would it benefit from horizontal scaling? I do full-time development of tooling to administrate very large k8s clusters for a company that you've probably interacted with today without knowing it. Not sure if k8s is the right orchestration system for you, but I'd be more than happy to provide some input on a potential migration to k8s (if kbin is a good fit there). I know there's a community on Matrix as well — I'll try to reach out there too, although it may be a bit.

If the post is anything to go by it's using the included "mostly for dev work only, mostly" docker-compose files. It would absolutely be able to be scaled out since at it's core it's just a webapp with workers. The app is already configured to use Redis for session storage so should be able to go super wide.

Only limitation is how performant you could make your postgres cluster.

Bullseye

Hi @Badabinski

K8S is one of option, but we decided to use some mix of bare-metal and docker swarm.

Almost everything is prepare to grow horizontally. Only (like always) problem is in database, and also we want to have flexible software that run on big cluster and small node without changes in code.

Give us few days, and after that we will show something ;)

@haubles @vvuksan @renchap @ernest

I was thinking the same thing. Shouldn't this be one of the cases where k8s shines with a horizontal autoscaler? Wouldn't want to manage your own k8s though, so I imagine managed k8s is the best option. If it's the cheapest option is another question.

@Babinski do you know if there are other horizontal autoscaling options besides k8s?

As @BiggestBulb said, most cloud providers have container platforms that support horizontal scaling, although generally not as elegantly as k8s (imo, others may disagree). Also totally agree about managed providers. EKS, AKS, and GKE weren't suitable for what we use k8s for (very large shared clusters) until recently, so we've been administrating our own custom k8s distro. The managed stuff has gotten a lot better, and I'd definitely recommend that for running kbin. Running k8s yourself is hard, etcd is an evil bastard. I've had plenty of chances to see what works and what doesn't in my role, however. There are some development/deployment patterns that are robust, and there are many that are not.

I'm not familiar with the architecture of the app (nor where it's hosted), but if it happens to be on AWS then you should be able to spin up an ECS cluster (especially since it's already containerized) and load balance it that way with an ALB configured during setup. Imo that would be the fastest way to do it (again, assuming this is on AWS)

I read somewhere that it's on Hetzner.