(Newbie question) Did i handle my system crashing correctly?
Ive just installed Linux (Fedora 40 KDE) on my main PC over the weekend, so im a complete newbie and i apologize if some of my questions are nonsensical π . Yesterday evening the system seemed to completely lock up at a certain point while playing Red Dead Redemption 2 for the first time (installed & run via steam using proton experimental). Id love to know if i handled this situation correctly and how to avoid this or handle it more gracefully in the future. Ill begin by recounting what happened and then ask my questions:
The game froze during a cutscene and continued to play audio for a bit after it froze visually but then that stopped too. I have two monitors, the second completely black screened and the first one was frozen on the last frame of the game. As far as i could tell nothing in KDE was still responding to normal key presses or the mouse.
After a some searching online i decided to try through the ctrl + alt + (f2, f3, ... , f6) key combinations to get into a console, that didnt work. As a last resort i tried alt + sysreq (print screen) + REISUB to safely reboot it. That ALSO didnt work, it was p. damn late in the day so i just decided to risk it and use the power button on my pc.
I was prepared for it not to boot anymore due to data corruption or sth, but it seemed mostly fine? My KDE panels were slightly messed up (but that took like 10 sec to fix) and besides that the only odd thing i've found so far is that steam refused to start properly and i had to reinstall it.
So did i handle this situation correctly? Specifically:
-
did alt + printscreen + REISUB save my system or do nothing? As i said it didn't reboot when i did it so i thought it was useless. But after i forcibly restarted my pc and looked it up some more it seems all but alt + printscreen + S may have been disabled, so was alt + printscreen + S responsible for my system still starting without too many problems after i forcibly shut it down?
-
why did this happen & how to prevent it? My system should b powerful enough to run RDR2 (Radeon RX 6800, Ryzen 5 5600X, 32GB ram) and i had nearly no problems up until the crash. So whats at fault? On protondb RDR2 has p. good ratings, did i just get unlucky and found one of the few edge cases where it breaks? But even then, why would a proton/game crash take seemingly the whole OS with it?
-
is it a bad or a good idea to try and trigger this again on purpose? Id really like to know if this was a freak accident or a consistent problem (and if its consistent if eg. switching to proton 9.0.1 alleviates it). So was i lucky that nothing on my PC got badly damaged from this incident and i shouldn't try to trigger it again for fear of permanent damage? Or can i expect that having to reinstall Steam everytime it crashes is the worst that could happen while testing this?
UPDATE: I went back and did the same part of the game again but this time running it with proton 9.0.1 and the crash still occurred and in the exact same spot in the cut scene too. For reference, it crashed both times during this cutscene: https://www.youtube.com/watch?v=7UHv0SiVhWY @ around 1:23 when the explosion goes off (i only get to hear it briefly the visuals freeze seemingly just before it explodes).
Trying ctrl + alt + f keys didn't seem to do anything again. I had at least enabled the sysreq keys and REISUB appeared to work and got me back into the system this time without having to adjust KDE panels or reinstalling Steam. Visually the crash was a little different this time, i hit win/meta soon after it happened which after a second or two exchanged the stuck game visuals for a half cutoff browser window on my main monitor (and black otherwise) and my secondary monitor was filled with black and white noise with a bit of color in between.
UPDATE 2 (17/06/2024): I tried it again for the first time since the original post, im now on Kernel 6.9.4 and the crash occured in the exact same spot and looking more or less as described in the previous instances. I managed to get back into a normal state due to alt + sysreq + i (alt + sysreq + k didnt seem to have had any effect).
UPDATE 3 (16/09/2024): I've tried it again, proton 9.0-2 and kernel 6.10.9 and its still crashing at the exact same pont as usual. Only difference is that this time alt + sysreq + REIB didnt seem to have any effect. Tho i might have forgotten "I" now that i think about it again. I had to do a hard restart using the power button, but it doesnt seem like anything broke.
UPDATE 3.5 (16/09/2024): Tried the next newest proton version steam has (experimental). Now the dialogue during the gameplay bit just before the cutscene doesnt trigger, then the game goes into "cutscene mode" (i think, i get black bars top and bottom and the menu becomes unavailable) but no cutscene plays and i (presumably) get softlocked. I tried waiting in case it was playing but i didnt see, i waited 5 min or so and it never ended.
you probably got a kernel panic, which froze the system. it's like a BSOD on windows, except on linux, there isn't a proper stack to handle them when they happen while you have a graphicam session running, so it kinda just freezes
i don't think reisub would do anything, because the kernel was probably already dead
you don't risk corrupting much data by hard-reseting your pc on linux -- journaling filesystems, like ext4 or btrfs, are built to be resilient to sudden power loss (or kernel crashing). if a program was writing a file at thz time the kernel crashed, this one file may be corrupted, because the program would get killed before it finished writing the file, but all in all, it's pretty unlikely. outside of fs bugs, which are thankfully few and far between on time-tested filesytems like ext4, you shouldn't have to worry much about sudden power loss!
unfortunately, figuring out the cause of these issues can be challenging -- i've had many such occurences, and you have no logs to go off of (because the system doesn't have time to save them), so you'd most likely need to figure out a way to send your kernel logs onto another system to record them
as general mitigation steps, you should try monitoring your cpu temperature a bit closer - it could be high temperature tripping the safeties of your motherboard/cpu to avoid physical damage to them - in which case, try installing a daemon to control your cpu frequency, like auto-cpufreq, or something like thermald specifically made to throttle your cpu if it gets too hot (though i think that one is intel specific)
I always thought a kernel panic ended the graphical session... Turns out I was wrong. Again.
This is not very common knowledge, but it is no longer recommended to press S or U before B for SysRq. The official documentation of sysrq has stopped recommending this practice, as it may be harmful to modern filesystems. Writing to a storage device while the kernel is in a bad state has the potential to cause corruption, and modern journaling filesystems like EXT4 and BTRFS are designed to survive crashes like this with minimal (or no) corruption. Instead, you'll likely want to use Alt+SysRq+REIB (and make sure you are waiting multiple seconds between each keypress, as they do not complete instantly!).
You may instead try to kill the most memory intensive non-vital process with Alt+SysRq+RF, which may stop you from crashing to begin with (this works especially well for memory leaks). SysRq+F will invoke the oom (out of memory) killer, which will kill the most memory intensive non-vital process without causing a kernel panic.
If you need to restart, the most ideal situation is to enter a TTY and cleanly reboot, in which case you can do Alt+SysRq+R to grab control from the display manager, then Ctl+Alt+F3 or Ctl+Alt+F4 (I believe most distros have the first login session run on the TTY accessible from Ctl+Alt+F2) to switch to another TTY. You can then log in and do
sudo systemctl reboot
if your computer is still responsive. You may need to kill some processes before your system becomes responsive enough to log in on a TTY, which is where Alt+SysRq+F is useful, but in extreme situations it may require Alt+SysRq+EIB.So a basic order of steps to try may look like:
sudo systemctl reboot
. Else move onto 3.In the spirit of other users giving mnemonic devices, you could remember REIB with Reboot Even If Broken, or the oom killer RF with Resolve Freeze (someone else can probably think of something better for RF; I'm not great at making mnemonic devices).
TL;DR: There are SysRq combinations that are less prone to damage/corruption than Alt+SysRq+REISUB, so use the above flowchart, or just remove the S and U for Alt+SysRq+REIB (if you don't want to troubleshoot first) for less chance of filesystem corruption from a bad kernel. You can often recover the system without having to hard reset (Alt+SysRq+B). And ALWAYS wait between SysRq keys, as they do not finish instantly.
This might be just me, but I prefer remembering what the keys actually do:
Also good to know:
Thanks for the detailed explanation on the sysreq keys & when & how to use them for unlocking a frozen system :D. Also for the
systemctl
bit because i wasnt even sure what to do if i had gotten to a console lol.There's also the option of ssh-ing in to remove the offending process, and possibly restart the display manager.
You'll need to hav another device available of course.
Your overall process is perfect: first try to solve it from the UI, then the console, then the magic sysreq key.
The fact that your kernel was not responding to the sysreq key could mean a couple things: is it enabled on your install? (cat /proc/sys/kernel/sysrq to check)
Before trying to understand why the kernel locked up, are you sure everything is solid on the hardware side? ie. Did you overclock anything? If yes did you burn test the PC on some GPU demo?
journalctl
,dmesg
and your steam logs (in~/.steam/steam/logs
usually) could be worth a look, or worth showing someone else at least if you aren't sure whats going on in there.r-e-i-s-u-b
handle it more gracefully than a forced shutdown at least!The linked article is really helpful in learning why Raising Elephants IS Utterly Boring ;-)
I like the Arch wiki's version: Reboot Even If System Utterly Broken.
Nice. Both versions easily get stuck in the head.
Thanks for the link! I managed to set up sysrq with it, which might have saved me from reinstalling steam when the crash occurred the second time (see the update in my post).
You're much more patient than I am, my method is hold the power button down until it shuts off lol
Only additional thing I would do would be to try to ssh into it to. Sounds like that wouldn't have worked anyway. But if you can ssh into it while it's in a degraded-but-not-completely-borked state you can poke around, troubleshoot, and of course cleanly reboot.
Sounds to me like the kernel or the video driver died. Try pressing caps-lock a few times -- if the keyboard's LEDs don't change, your inputs are dead and pretty much your only option is to power down or reset the computer. Most modern filesystems, like ext4 and btrfs (you likely have one of those) are very robust and can easily handle an ungraceful shutdown. When you start the OS again, it'll run
fsck
on the root partition and get it into a functional state. Data loss can still occur if the computer dies while a process is still writing a file, but I think it was inevitable the moment the OS froze.Unfortunately I can't offer much advice other than to use a numbered Proton version instead of experimental, and to try again at a lower quality setting. You should also try Gamemode to temporarily optimize your system for running games.
I've played RDR2 on a weaker system than yours. It's a very intensive game to run in terms of memory usage, streaming from mass storage, and CPU/GPU. Install it on an SSD to give it the best chance, and use a system monitor like
bpytop
orhtop
to check the RAM/CPU stats and temperatures.That capslock idea was pretty good. Next time Iβll start with that to see if all hope is lost.
That's about what I would have done. Try to switch to a console first like you did, but if that doesn't work, force reboot and pray.
The worst that could happen is that you might have some data loss or mess up the OS (which has happened to me before). That's why you should always have an up-to-date backup! It'll save you one of these days.
Either way, try switching to a different proton version, or do some other fixes and test it out until it stops crashing. This honestly sounds like a freak (but common) problem. I wouldn't worry about data loss or corruption that much if there's no other signs of it (e.g., SMART reporting a dying drive), but a backup is generally helpful for any scenario.
You did fine, as a new user I am surprised that you even attempted to get to the console and so on.
With forcefully powering off, the biggest (or I guess most likely) thing that could happen is losing data. As long as your install wasn't working on updates or major system configuration changes, you'll probably be fine in that regard; it would boot back up just fine.
GG.
Ah, Fedora 40 is on kernel version 6.8.9 it seems, the bad one for amdgpu, and it doesn't look like their current build is patched to fix it, so it could be this bug. It's fixed in 6.8.10 and 6.9 if you have the ability to upgrade to those. Otherwise you might want to try reverting to the previous kernel version if that's easy in fedora.
(Edit to add that I didn't see the "im a complete newbie" bit... I'm just very aware of this recent bug because it gave me some trouble. Sorry if you did happen to start with a version that has this problem. It's really bad luck if so. But I don't really know.)
Honestly idk how id even begin to do that lol, and id also maybe rather not start my first week of linux use by immediately trying to change the kernel version on my own XD (either down or up). I did hear about an issue with rdr2 and kernel 6.8.9 from a reddit post which i found through someone writing about problems with the game on its protondb page. But i thought i was fine as my game worked normally until i encountered the crash & because the reddit and protondb post say its solved by enabling rebar which (iirc) i already have.
However idk if that reddit posts issue is the same/related to the one you linked. Since the rest of the game and my system seem to be mostly fine i think ill either just not play the game or specifically avoid the cutscene when i do (its in an optional quest luckily). And then ill maybe return to it after the updated kernel arrives on fedora to see if it solves the crash or not.
I don't remember that scene from RDR2, I guess I need to go back and play it from the beginning again!
Regarding the crash, here are my two cents. I've played RDR2 on Pop!_OS and EndeavourOS and have not really had any major issues (I'll get to that in a second). My specs are similar but not quite the same as yours:
The issues I've had have not been as major as the game/system crashing the same way yours did. However, I have had serious screen tearing and worsened performance (although not very often at all). Every time, I have updated the graphics driver or kernel (or both) and upon rebooting and starting the game again the issue was fixed. This may very well be the case for yourself as others have already mentioned.
Do the sysreq sometime when your system isn't hung. If it isn't enabled, welp you have to enable it harder.
Having ssh set up would be a way in when the whole graphics stack falls over (but the kernel is still alive in there). On intel there are /sys entries to dump GPU state, ATI probably has something similar. You have a reproducible bug, if you can get in and grab data while the gpu is in la-la-land, you might be able to submit a valuable bug report.