Some system load graphs of last 24h

Ruud@lemmy.worldmod to Lemmy.World Announcements@lemmy.world – 1197 points –

For those who find it interesting, enjoy!

176

It gets me every time seeing people using the product I build šŸ„¹

You worked on Grafana? Your product is awesome, I use it in my homelab for performance metrics

Yes, I'm one of the designers šŸ‘šŸ¾

Poggers. Couldnā€˜t live wuthout it. Thank you for your work!

Love Grafana, especially the new UI. Great work, man. :)

Thatā€™s so cool! Grafana is awesome, the whole team did a great job

Thank you very much.

Do you work on Loci too?

Grafana is one of those tools which everyone should use if they have something they maintain themselves. Superb tool.

Grafana is the most essential application in my job. I can use Notepad to code in a world without IDEs. I couldn't keep a damn thing running in the real world without Grafana. And I've been forced against my will to use alternatives in the past.

How did you learn it?

Basically brute force, I'm not great with it but I was the one on my team responsible for setting up our dashboards. I wrote the prometheus metric collection in our microservices and built the dashboards from that data.

There are tons of free dashboards though for monitoring resources and such so a lot of things I use are just downloaded from the Grafana website. And the docs are good too. So looking at examples + documentation is how I learn. It would be helpful if I was better with math though.

I really enjoy your transparency and style of communication!

Comparing it to Spez and how Reddit became prior to the migration, this is such a refreshing change

/u/Ruud is like /u/Spez but only if /u/Spez was actually cool.

Soooā€¦ /u/Ruud is nothing like /u/Spez? Same energy as ā€œCommunism is like Capitalism but only if Capitalism got rid of the concept of capitalā€.

It's been very snappy today, nice work! Is it all under Docker Compose with the node handling Nginx and Postgres as well?

Yes.

Why did you guys roll back the UI to .7 from .10? I enjoyed some of the UI improvements, but I guess there were some bugs?

Edit: I see its back to .10 maybe I had a browser tab open from before that I never refreshed

Iā€˜m really grateful for your and your colleaguesā€˜ work. Thank you for letting us lemmy around here!!!

I canā€™t believe how fast youā€™ve managed to crowdsource and fix things on this instance. I havenā€™t seen many problems at all sharing comments and things.

This is awesome! As a systems engineer for my day job, I love seeing stuff like this!

Dang that's a lot of RAM

31.2% load it's damn fine considering how much attention Lemmy.world has been getting lately. Server is up for 3 weeks already so I guess that's when you upgraded it?

How can I throw some bucks in your direction?

From the lemmy.world front page:

Donations

If you would like to make a donation to support the cost of running this platform, please do so at the mastodon.world donation URLs:

    https://opencollective.com/mastodonworld
    https://patreon.com/mastodonworld

Where in the frontpage can we see this?

Edit: thank you all!

It's on the right-hand sidebar of lemmy.world:

Awesome! I'm on mobile, so I cannot see it. Will check it out when I get to my computer.

You can view sidebar on mobile. I think it's in the three dots, but it's somewhere!

EDIT: On Jerboa it's under Community Info, under the three dots. On the mobile web app for L.W. there's a sidebar button.

Damn thatā€™s a huge chunk of (what looks like) a 64 core CPU there. Impressive!

Itā€™s cool it can aggressively cache that much. Although I am perplexed why one would have a swap file configured in this case? What does it give you here? Sorry not trying to be an elitist or anything just have no idea what advantage you get!

To be honest I tend to use swap less and less. But this was in the build that Hetzner does and I didn't remove it.

If your application goes wild with RAM usage, a properly configured swap will make sure the underlying OS remains responsive enough to deal with it.

The OOM killer is usually triggered after it starts hitting the disk. Which means your system is unresponsive for a long time until it finally kills something.

Using something like oomd can help trigger before it hits swap but then why are you using swap in the first place?

The bigger issue is that the kernel sometimes ignores the swappiness and will evict code/data pages long before file cache even when set to 0 or 1. I'm still not sure if that was because of an Ubuntu patch or if it was an issue that's been resolved in the years since I last saw this

How far do you see lemmy.world capable of scaling to? One thing I've been noticing is the centralisation of Lemmy users on a few top servers, surely that cannot be healthy for federation? What are your thoughts on this?

How much is this costing you? Also who is your host? Is it on a virtual machine?

They have a dedicated server: https://lemmy.world/post/75556

It's actually pretty funny to see him mention the growth (almost 12k users!) considering they've added, what 50k or so users recently?

I signed up three days before that post. They were the largest instance with open signups. Almost 1000 users.

Dedicated means local?

Dedicated usually means itā€™s not splitting cpu time with another instance. It could mean a local machine but it does not have to be one.

Tbh I'd see it hard to be local, so maybe it is cloud computing but a standalone instance as you just said.

This is so cool to see. Thanks for posting! Lemmy.world has been super smooth today

pretty gauges. the instance seems to be more stable/responsive today

I know that the RAM cache is just taking advantage of otherwise free RAM and will be dropped in favor of anything else, but it does stress me out a bit to see it "full" like that.

It would stress me even more to see a lot of RAM doing nothing, that would be a shame! ;-)

Difference between Windows and Linux. Windows would only use what it needs. Linux pre-empts more and fills the RAM for what coul dbe needed.

It used to stress the shit out of me when I switched to Linux as I'd gotten used to opening task manager and seeing 90% free RAM. On Linux I'd be seeing 10% free and panicking thinking it was a resource hog.

The Linux-way is the best way.

I use Arch btw ;)

Both OSes do pre-caching and for both the standard tools to check usage nowadays ignore pre-cached elements when counting RAM usage.

I had a feeling that 'factoid' may be out of date! Since I learnt it about the time of Windows XP when we were shown examples of how Linux and Windows memory management differed. It all made sense why Linux seemed to have full RAM even after a big upgrade but WinXP gave the 'illusion' of having lots of free RAM to use. ~ 20yrs ago!

I think we used SuSE Linux 7.3!

I still hold a savage hatred of all RPM-based distros after dealing with the hell of early 2000's editions (Redhat, Mandrake & Suse). Though I did like SuSE KDE's colours when it worked!

But Windows also does pre caching?

It probably just didn't mark that memory as "used" in the task manager.

I discovered this about 20yrs ago and there's been a lot of drugs & drink since then.

I do remember I could open my shit-hot 256Mb RAM desktop with Windows XP taskmanager and it shows a whopping 128Mb free RAM. šŸ˜Ž

Then I'd boot into my '733T H4X0r' Suse Linux 7.3 and top would show 5Mb free RAM. šŸ˜±

This caused much upset until I found out the two OS's have (had?) fundamentally different memory utilisation philosophies.

May not be the case anymore but it was late 90s/early 00s.

That's how it supposed to work, free RAM does nothing :)

Itā€™s free real estate!

If you had this much buffer memory what are the reasons to have swap space as well?

With my servers Iā€™m paranoid having swap enabled will inadvertently slow stuff down. Perhaps thereā€™s a reason to have it that Iā€™m unaware of?

If you had this much buffer memory what are the reasons to have swap space as well?

Many programs do stuff once during startup that they never do again, sometimes creating redundant data objects that will never get accessed in the configuration its being run in. Eventually the kernel memory manager figures out that some pages are never used but it can't just delete them. If swap is enabled it can swap them to disk instead. It frees up that RAM for something more important. It's usually minor but every few MB helps.

I personally like having some swap as during low memory situations (which lemmy gets at least once a day on my small instance) everything slows down rather than getting culled by the oom killer. It's not a replacement for monitoring, but it does extend the timeframe to react to things.

Memcache usually takes all the assigned memory regardless of usage so seeing high usage isn't always unusual. That's assuming the lemmy servers are using some kind of session caching solution.

Some of my usage is in this data and I like that.

Looks Awesome! Glad to see the patches seem to be working.

The entire team is doing an amazing job. Lemmy is getting smoother with each passing day. I hope it keeps growing (and none of you get too burnt out in the process)!

I hate that radial graphs are so popular with *Grafana dashboards. Radial/pie charts are terrible representations for humans to interpret. I tend to try and convert them either to a stat with the line/time display or a bar chart. Humans are better judging linear relationships than radial.

Radial graphs are a bit of a meme where I work as one of the C-suite managers despises them for precisely that reason.

This is indeed interesting, thanks again for the service!

I think you can export the dashboard the way it looks to you - into Grafana cloud. Like a snapshot. Click "Share" then "upload" and share the link.

We won't be able to see historical data as it takes only dashboard snapshot with visible data.

Would be cool, isn't it?

Can someone give me a hand. I see tons of posts of people talking about a picture in the OP but i see nothing. Am i doing something wrong? Is my connection bad? This seems to be happening quite a lot. For example the meme instance has almost zero pictures but i know just about every post should have one.

hmm yeah it was gone.. need to investigate..

I notice your defederation list is completely depopulated today. Is that intentional?

No it's just moved to the bottom of the page apparently. I preferred it on the side. Maybe a tab would be better.

So Iā€˜m currently on planning to host an instance myself. This graph helped me quite a lot to get an idea what system resources are required.

Do you use any reverse proxy in front of it?

Lemmy world has a lot of users. So your instance initially will require a lot less resources āœŒļø

Yeah I saw that. Iā€˜m a big fan of minimalistic, yet super performant architectures and Iā€˜m just trying to get a feeling on how I could solve this problem. I try to avoid any downtime, whenever possible

Nginx runs on the server , proxying to the lemmy docker containers

Thatā€˜s what I had in mind. To run nginx on a seperate vps, so I can scale it easier. Run fediverse instances in the back, either all on one vps or on different vps. This way I could provide a hub while increase performance (due to compression and caching) and provide redundancy/load balancing if necessary.

Whatā€˜s the typical traffic you experience? Peak (Gbit/s) and average/daily traffic (GB)

I was hoping to see some uptime, but thanks for the window into your server! Are you still having to kill the instance every half hour?

It says uptime is 3.3 weeks in the top right.

Hmmm... maybe the instance uptime is different from the server uptime.

Nice! That's a nice-looking dashboard, would you mind sharing its JSON config? Thanks!

I could share the template, if ya like.

Thanks for the offering, but no worries, some user posted it and I found it already

I have a love and hate relationship with Grafana but it probably feels the same

Ahh look at all those nice charts and diagrams, that's true server porn lol.

Again thank you very much for your awesome job. We all really appreciate that <3

Does Lemmy have a memory leak?

Lemmory meak?

Yes at least until yesterday's version...

From those graphs, memory usage is very low. Most of it is being used for disk caching, which is what linux does with memory it has no other use for (may as well use it for something).

Yes, but we still restart the containers every 30 min. I'm gonna see if that's still needed after the recent changes.

Ah, so that's the reason for the regular dips in the memory graph I assume? They do indeed seem to be spaced every 30 minutes.

The consistent, sharp dips every 15 minutes made me assume that the container was being restarted.

You just can't beat the dopamine hit from "pointy chaos graph go smooth". Delicious. Great work!

Everytime I open a post and go back to previous page it scrolls back to top. Is this fixable? Im on windows 11, chrome.

Thats ~19 cores pegged at 100%, eating 128GiB of ram (OS disk cache included) and bleeding onto swap. šŸ¤Æ

I think you're misreading it. The olive green in the CPU chart is idle. RAM cache taking up most of system memory is also normal on most Linux systems, even on desktop. That cache is freed for applications to use as needed.

Welp, my only calculation was "64 cpu threads * 30% load -> ~19 cores busy", I may be guilty of rounding up too much... The RAM usage is intresting however, since the kernel seems to be caching all it can, to point ejecting uneeded data into swap in order to retain the disk cache. If more ram is reserved by running processes, the (likely pict-rs, database services) disk access times will begin to degrade.

It could also be all 64 threads being used lightly with the scheduler trying to spread the load out evenly.

Not sure what the exact situation regarding swap in that graph, but I've also had the kernel preemptively use swap for rarely used chunks of memory in favor of cache when running long-running processes. Its probably relatively normal.

Is there anything Grafana cant do?

I have so many things pumping data ā€œintoā€ Grafana these days Iā€™m surprised they havenā€™t tried to force me to pay for an enterprise license.

Anyway, thanks for sharing these, @ruud@lemmy.world. As a performance engineer, I love to see this level of detail and commitment on your part to keep the user experience for lemmy.world at acceptable levels.

It can't make me pancakes.

Wrong tool for the job, but if you want to order pizza, you can use terraform:

https://registry.terraform.io/providers/MNThomson/dominos/latest/docs

I suppose you could then feed your Terraform runs into Grafana and use it to track your pizza consumption.

Bwahaha:

4) Even if you do want a pizza, you should probably be careful with this provider. In testing, I once nearly ordered every item on the Domino's menu, which would probably have been expensive and embarrassing.

Reminds me of the old adage:

A computer lets you make more mistakes faster than any invention in human history -- with the possible exceptions of hand guns and tequila.

In the early days of the pandemicā€¦and the early days of my Ansible learningā€¦I set up a playbook to scrape several websites for hand sanitizer and Clorox wipes.

If it found one in stock, it would email my cell phone carriers SMS gateway. Tasker would then make a loud audible alert.

Ran for weeks before it found some in stock. And then it did. At 2am. And again at 2:05, and 2:10, and 2:15ā€¦

And it was an error on the shops webpage. It wasnā€™t actually orderableā€¦once it got in your cart, it wouldnā€™t let you check out.