Lemmy World outages

Lemmy.World Announcements@lemmy.world – 3639 points – 1 years ago

Hello there!

It has been a while since our last update, but it's about time to address the elephant in the room: downtimes. Lemmy.World has been having multiple downtimes a day for quite a while now. And we want to take the time to address some of the concerns and misconceptions that have been spread in chatrooms, memes and various comments in Lemmy communities.

So let's go over some of these misconceptions together.

"Lemmy.World is too big and that is bad for the fediverse".

While one thing is true, we are the biggest Lemmy instance, we are far from the biggest in the Fediverse. If you want actual numbers you can have a look here: https://fedidb.org/network

The entire Lemmy fediverse is still in its infancy and even though we don't like to compare ourselves to Reddit it gives you something comparable. The entire amount of Lemmy users on all instances combined is currently 444,876 which is still nothing compared to a medium sized subreddit. There are some points that can be made that it is better to spread the load of users and communities across other instances, but let us make it clear that this is not a technical problem.

And even in a decentralised system, there will always be bigger and smaller blocks within; such would be the nature of any platform looking to be shaped by its members.

"Lemmy.World should close down registrations"

Lemmy.World is being linked in a number of Reddit subreddits and in Lemmy apps. Imagine if new users land here and they have no way to sign up. We have to assume that most new users have no information on how the Fediverse works and making them read a full page of what's what would scare a lot of those people off. They probably wouldn't even take the time to read why registrations would be closed, move on and not join the Fediverse at all. What we want to do, however, is inform the users before they sign up, without closing registrations. The option is already built into Lemmy but only available on Lemmy.ml - so a ticket was created with the development team to make these available to other instance Admins. Here is the post on Lemmy Github.

Which brings us to the third point:

"Lemmy.World can not handle the load, that's why the server is down all the time"

This is simply not true. There are no financial issues to upgrade the hardware, should that be required; but that is not the solution to this problem.

The problem is that for a couple of hours every day we are under a DDOS attack. It's a never-ending game of whack-a-mole where we close one attack vector and they'll start using another one. Without going too much into detail and expose too much, there are some very 'expensive' sql queries in Lemmy - actions or features that take up seconds instead of milliseconds to execute. And by by executing them by the thousand a minute you can overload the database server.

So who is attacking us? One thing that is clear is that those responsible of these attacks know the ins and outs of Lemmy. They know which database requests are the most taxing and they are always quick to find another as soon as we close one off. That's one of the only things we know for sure about our attackers. Being the biggest instance and having defederated with a couple of instances has made us a target.

"Why do they need another sysop who works for free"

Everyone involved with LW works as a volunteer. The money that is donated goes to operational costs only - so hardware and infrastructure. And while we understand that working as a volunteer is not for everyone, nobody is forcing anyone to do anything. As a volunteer you decide how much of your free time you are willing to spend on this project, a service that is also being provided for free.

We will leave this thread pinned locally for a while and we will try to reply to genuine questions or concerns as soon as we can.

You are viewing a single comment

View all comments

Are you guys using a load balancer at all? How about a tool like CrowdSec?

I use that and the nginx Bad Bot Blocker to stop malicious shits on the sites I operate (medium-large e-commerce) to great success. We used to get scraped heavily by competitors but now they get the middle finger.

I presume you have fail2ban too?

crowdsec can only monitor and execute ban actions, which doesnt't help with SQL execution attacks. Same with f2b.
blocklists only work for known bad actors, and usually pretty old or stale. You need to be able to catch and stoo new attacks quickly
Looks like lemmy.world is using Cloudflare, so need to block entrance at the network there. Crowdsec could do this, but only after a successful attack was identified, which would have already executed, so doesnt help.
SQL attacks in parallel only need a few good clients to get off a number of parallel requests at a time to lock up a DB. Block them, and the attacker can just get a new source IP and repeat. The fix is to not let those kinds of executions happen.

Are bad actors able to access the database to execute queries or is it through the main front end site and accessing API endpoints over and over? Then surely they can be blocked at this point?

These attacks are just through the public API, not malicious SQL-injection attacks. They are just non-optimized queries regular users can execute thag will bog down the system enougg to make it crawl, at which point, intervention is needed to either kill the runnimg slow queries, or just restart the db.

Lemmy.world should just start charging to use the API. That'll stop them /s

Then surely those routes can be protected with various methods such as CrowdSec? And help mitigate overwhelming the endpoints slow process time. Especially if the attacks come from known IPs. Or at least repeat offenders (x requests in 1s from an IP for example) can get blocked straight away.

I found a lot of crawlers were using HTTP1.1 traffic so I just blanket denied anything that wasn’t HTTP2 at the lowest level. Certainly helped that small menace!

There has to be away to stop the pricks

Well...I think you miss the point though. These arent dangerous queries which normally need to be protected. They are just normal ways to interact with the server.

They CAN be exploited by clever people who know how to make them cost a lot of execution time though. Lemmy is open source, so finding thise weaknesses is not hard. Patching and keeping things running is way more difficult.

Well yes of course but the API route should be guarded still both internally and externally. If it’s like a fetch all post with certain filters and parameters then it being run over and over over the space of a few seconds by thousands of requests then that takes up execution time on the database. Identifying that is easy as is preventing it. Rate limiting and banning undesirable requests. No normal user will be executing grandiose requests multiple times a second. That’s what constitutes a denial of service.

Anyway, you do you.

Pal, if you have a clever way of discerning the difference between normal and malicious patterns for publicly availble endpoints, we are lining up to give you some HJs.