Diagnose system crash during gameplay?

Carter@feddit.uk to Linux Gaming@lemmy.ml – 18 points –

I'm trying to play Horizon Zero Dawn on OpenSUSE TW but every 10 minutes or so my entire system crashes leaving me with both monitors displaying a green screen and forcing me to pull the plug on my PC. This appears to be the only game with the issue. I've tried multiple versions of Proton including GE but all suffer from the same crashes. How do I go about diagnosing the issue?

PC specs are if relevant are:

i7-4790K

AMD 5700XT

32GB RAM

2TB nvme SSD

5

sounds like maybe your power supply might be dying or your system is pulling more than that psu can handle. mine recently blew a capacitor with a loud bang and magic electronics smoke after a few months of odd problems with shutting down in the middle of an intensive game. i think the fan died about a year ago bc my temps had been higher than they used to and after it popped the side of the case by the psu was very warm. i came to find that my gpu upgrade pushed my power draw to 450w when my psu was only rated for 530w and not 750w like i thought i remembered. makes sense it was pushed to it's limit. maybe recheck your power requirements with an online psu calc and compare to your psu rating. and also look to see if its fan is still running and not clogged up with dust.

I only just bought a new PSU to go with the GPU upgrade. It's 850W so should be more than enough power. Just seems odd that it's only an issue with one game.

Check temps of your hardware, check power cables inside the PC as well, then have a look at the system logs

Temps are absolutely fine and sometimes the game crashes immediately before anything can even get hot. Only happens with Horizon. All other games run fine.

I have had the same issue for two years and it has progressively gotten worse. I use Ubuntu and have very similar hardware. The piece of hardware I have traced my issue to is the 5700XT. I ran some Phoronix stress tests on my video card and reproduced the same crashes. I also looked up the MCE errors that were being produced and they also pointed towards that card. I’ve given up trying to fix it and I will be purchasing a different graphics card soon.

To diagnose your issue, do you see any MCE errors when your system reboots? If not, run journalctl and grep for ‘MCE’.