AMD GPUs are cursed for me

aksdb@lemmy.world to Linux@lemmy.ml – 101 points –

Each time I try AMD graphics, something is fucked for me. Back with fglrx, fglrx just sucked, so I used Nvidia. Then I had an AMD right around when they finally had opensource drivers, but it was still buggy as hell. So I went with Nvidia again (first a GTX 790, then a GTX 1060). In the meantime I had a new work notebook where I also went with an AMD APU, and had driver crashes for a long time when I was in video calls and it had to decode multiple streams. That thankfully stabilized with Linux 6.4.

Since sooo many people in the community swear by AMD, I thought "dammit, let's try it again for my new desktop" and got an 7800rx ... and I have to reboot ~5 times until I finally make it to a running xserver or wayland session. Apparently I am hit by this problem (at least I hope so). But that doesn't even read nice ... the fix seems to be to revert another fix for powermanagement. So I either have a mostly non-booting card or suboptimal power management.

I start to regret having chosen AMD .... again :-/ I seem to be cursed.

111

And here I am with a 3090 having more issues than I have time for wishing I went with an AMD card. Sadly we both can see grass ain't necessarily greener.

Thanks for that perspective. At least that makes me regret my decision less.

I've tried the open source drivers, the proprietary dkms variant, and standard proprietary drivers and all give me issues.

What kind of issues do you get? Generic instability?

Wow, I can't believe I missed your response. Sorry for such a late reply.

General instability, absolutely. Multi display issues. And seemingly no matter what I do Wayland on KDE is basically unusable for me.

Ah, I can relate then. I drove my previous NVidia also on X11, with only occasional experiments into Wayland. Since X11 was good enough for me, I wasn't too sad about this.

Even with X11 I have had nothing but instability sadly.

I wanted to switch to Arch like I did for my laptop, but the cons outweighed the pros ultimately for me.

This reads like an alternate reality for me. I bought a new 3060 ti and using wayland with it is nearly impossible for me. I tried in ubuntu and had tons of errors and in debian/kde it wont even login without x11 enabled.

When you go to protondb.com every game has tons of fixes for nvidia cards and every forum has fixes for nvidia cards while amd mostly works oob.

Try out COSMIC with the NVIDIA 550 beta driver.

No. I‘m not being dictated by my gpu what OS I can use.

Nvidia is just shit.

The guy that's shilling out cosmic DE is also a lead developer in the company making it lmfao

Self shill lol

Yo… good eye. I cant believe how many people are just plain manipulative.

Then have fun with your bad experience. NVIDIA is working quite well in Wayland on COSMIC.

Great that it works well on COSMIC. They don't want to use COSMIC and that's their choice. You don't have to be so salty about it.

What makes you think I'm "salty"? I'm not the one complaining about NVIDIA not working in Wayland, or saying that I'm going to sell my GPU.

The only person who is salty is the one who would rather sell their GPU than use a Wayland desktop environment that supports NVIDIA as a first class citizen.

Pretty sure the 7000 series is known to be not well supported yet since they're new and didn't have massive uptake, so I don't want to be that guy but...

Some research before hand on what GPU to get from AMD wouldn't hurt?

I've got a 6800XT and had absolutely 0 issues since I got it about a year ago. I see from your replies you're on Arch, so I guess just wait for things to improve unfortunately.

I’ve been running a 7900XTX for months without issue. Only thing that was missing was some stuff around power setting, fan curve etc but even that I think has been fixed in recent kernels.

As I said... I had a lot of trouble in the past and went with nvidia most of the time. It wasn't just a quick shot picking that specific AMD card. My research ended up looking positive. The 6000 series wouldn't have cut it, since the AV1 encoder isn't good enough (or maybe not present at all; I forgot). I also buy this thing to last a few years, so having to take a card from the last generation would have certainly be the point where I just picked nvidia again.

Since people normally only report on negative experiences: I was lucky enough to get a reference AMD 6900 XT during the GPU shortages.

Switched from Ubuntu to Fedora for it because Ubuntu didn't have firmware for it yet.

Ever since then it has been a rock solid GPU. Never even had such a stable GPU under Windows.

Have been running Fedora with Wayland for more than 2 years now and can count the crashes on a single hand, most were my fault.

I'm sure once that issue is sorted out that GPU is going to ride along for years with minimal maintenance required.

(You might want to downgrade your kernel until then though)

I couldn't get my 6900XT to drive my G9 at 240Hz, but 120 isn't too bad. I should probably try again soon.

Been 20+yrs of some random flavor of driver problems for me, since my 9700 Pro at the very least.

Over DisplayPort? That's interesting, I knew AMD can't do HDMI 2.0 but there shouldn't be a problem with DP.

Might wanna try a proper new certified DP 2.1 cable, just to be safe.

I "only" drive a AW3423DW but no issues at 3440x1440 with 165Hz.

Indeed over DP. It works fine at 240Hz in Windows, but of course the graphics quality in games is not as good as with nvidia.

Anything interesting going on in the kernel log while connection doesn't work?

If so, you could maybe write a bug report at the amdgpu repo.

One thing I could imagine that is happening is that Linux chooses a higher chroma subsampling than on Windows. Had that issue before with a monitor that had a wrong EDID. Unfortunately it's a real pain to set the chroma subsampling on Linux with AMD.

Yup I'm hit by the exact same bug currently. But I was able to go back to before I updated with Snapper and now I'll wait until the fix is in the Tumbleweed repos.

But other than that I'm much happier with the AMD than with my Nvidia (on Linux that is). VRR with Wayland on multiple monitors just works without issues. And before this week I never had any issues at all with the 7800XT.

I need to give the LTS kernel a shot tomorrow, but I could swear I tried that and had the same issue. Which now makes me fear that I might have a different problem. Argh.

Dammit, same symptoms. Which, I guess, is not a good sign. Maybe my issue is different or I have another issue on top.

What kernel version are you using? 6.7? Unfortunately using the latest and greatest kernel means you'll be among the first to get bitten by new bugs. Does the issue also occurs on 6.6 and 6.1?

Funnily, I only run AMD now for the same reasons, except with Nvidia as the PITA. Always ongoing driver issues, power management or fans running like jet turbines.... Last 3 machines AMD, no issues with the GPU's/drivers.

On EndeavorOS I haven’t had issues with a Vega64 and now with a 6800XT. I followed the AMD Gpu guides from Arch wiki to get everything up and running but that was back when I started the build with the Vega 64. After the upgrade I didn’t even need to touch anything and all non anti-cheat games work quite well. Maybe I got lucky though.

Sorry to hear that. For what it's worth, I've had no problems with integrated AMD graphics, so maybe it's a PCIe issue?

Hmm, interesting idea. I need to investigate that. The dmesg output is full of amdgpu irq errors, but of course that could also happen with an issue on the board.

I would rule out a generic hardware issue, since 1) I get graphics during boot up until it needs to do a modeswitch (I guess) and b) it works fine so far on Windows.

I did have a similar issue after the first boot on Windows as well and assumed so far that the modeswitch after the initial driver install caused the problem. But Windows likely also installed chipset drivers at that time, so PCIe could be a possibility. Then again... I know that Windows reloads graphics drivers on-the-fly... but chipset drivers? Probably not. Which would speak against that theory.

I have no clue how Linux initiates the communication with a PCIe board, and whether the amdgpu driver would take care of that. But hardware excluded, some misconfiguration on the driver's part could be present. Good luck!

1 more...
1 more...

I just got a 7600XT. My only complaint is that it isn’t pushing quite enough frames so I would need something more beefy, but then I will also lose GSync because of my monitors so I will probably simply return it and go back to the 3080. Lower TDP and thermals was quite nice though and wayland was much less buggy. No crashes, I’m on ubuntu tho.

The 3080 has ass performance on Wayland

My favorite bug is when I resume from suspend and everything becomes rainbow colored.

Same but with a Vega APU, also love it when it merges the console screen with whatever was on there bufore suspend and it's just a text graphics rainbow mess

Nvidia by default does not preserve video memory when you suspend.

Relatively easy to fix if you follow arch wiki.

Blah, I kinda tried, but no dice yet, only managed to stop my suspend from working. I have modprobe/nvidia.conf and with the tmpfile option, updated initramfs, added the services.. but only my monitors turn off. I can probably live without it for now though.

I had a rock solid AMD RX 580 up until the release of kernel version 6.7. Now I'm lucky to get a system that can remain up for longer than thirty minutes. Sticking to 6.6 has worked for me and definitely something you should try as well, but it's worth noting that any amount of time spent on the issue tracker for AMD GPU stuff will reveal tons of issues from 6.6 as well.

I have a similar story with an RX580, I replaced my GTX 1060 3GB for a 8GB RX 580 mostly because the 3GB of vram were an issue for BeamNG.

Now I can't record my 3 displays with the RX 580, it just fails when trying to do so, and 2 displays results in constant encoder overloads, something that the 1060 had issues at all, also my colors are off when recording and I have no idea why, it even happens when recording with the CPU:

https://bbs.archlinux.org/viewtopic.php?id=292196

Also kernel 6.6 broke the power reporting on all polaris GPUs, thankfully that was fixed recently in kernel 6.7.2, but holy shit it took like 6 months to fix that.

I probably shouldn't have read tests and forums, but simply searched for crashes and open bugs to get a feeling for what I am getting into. Then again I also read from people with very ugly problems with nvidia, so it's not a really good measure.

I really want AMD to be good; they offer more VRAM where nvidia always seems to cheap out in pretty suspicious ways. Then again nvidia seems to be more power efficient.

My time with nvidia on linux was 0 issues in performance or usability.

The only sort of issue that I had was that the GTX 1060 drew 20W at idle when using the 3 displays, this was a bug that nvidia fixed for the RTX 20 series and newer cards but never fixed for pascal lol.

But even on BeamNG, there was a period were the native linux version didn't work on mesa while it worked for nvidia, now to be fair with amd this was because the vulkan implementation of beamng is horrible and right now it does not work on either lol.

Polaris GPUs had very weak video encoders, I also had an RX 580 and had issues on Linux as well as Windows. To my knowledge the AMF encoders worked better for those, but I could never get them working with OBS

Oh I did try to use the AMF driver, my first attempt ended with i3 crashing upon startup. What was worse is that even after removing those drivers and putting mesa back it still crashed on startup, good thing I had a btrfs snapshot before messing with that.

My second attempt I was able to use the AMF on OBS, but it still failed to record the 3 displays.

My biggest issue right now is the issue with the colors, I don't care if I have to use the cpu to record at this point.

I've always had great experiences with AMD and not Nvidia. Maybe its just there newer cards.

RX 6700 XT here... once I refreshed the thermal pads and the thermal paste, it works great in Windows and Linuxes... Ubuntu, Mint, Fedora, Bazzite (Immutable Fedora but for gaming), it had no issues with the amdgpu driver builtin on any of them.

It's a completely new card, so I will not fiddle around with it. Also it runs almost flawless on Windows (aside from a similar crash on the very first boot during driver install).

It could be your monitor or even monitor cable. I have this monitor which absolutely fucking refuses to work with AMD oved HDMI. If you have inexplicable system sleep issues, black screen issues, startup issues, etc. It could be the monitor at fault

Thanks for the suggestion!

While it's a possibility, I think it's unlikely, since the machine works fine with Windows. I also compiled the tkg 6.7.2 kernel which includes the revert-patch for the offending change and so far the machine booted three times without issues, so it seems to fit.

That doesn't rule out the possibility of display issues tho, back when I had the faulty monitor it was much more severe under Linux, I never managed to track it down tho (using AMD hardware for over 10 years now, this one issue busted my nuts pretty hard)

If you have a TV or something, at least try it to rule out possible outside factors

It can't hurt. I'll grab another display and another cable and try a few combinations. Thanks!

1 more...
1 more...

oh man, reading the comments fill me with fear, as I just ordered a new computer after stretching my old laptop for 8 years or so. I was super close to getting an AMD but went with Nvidia in the end... but so much bad juju in the comments for Nvidia too...

You may wish to pick a distro that makes a point of nvidia compatibility.

I use nobara, who have a few options in the welcome script specifically to improve compatibility with nvidia. I've specifically heard popOS mentioned several times as one people have liked with nvidia as well.

Some only ship with or distribute alternative open source nvidia drivers that tank performance.

Thanks for the advice, but a distro change for me would be a huge annoyance. I haven't have issues with my laptop's 1060 nvidia on Arch, and never had issues with the proprietary driver.

My worry is that even though mature GPU are probably well supported, I bought a relatively new one (4070 super ti) so maybe the new models have some issues due to having more features/being more extreme. Most complains here are about 30/40 models after all.

Yeah, until this thread I was convinced I should stay away from nvidia GPUs when building a new PC with Linux in mind, but I'm not so sure anymore.

I use an AMD 7900rx with an AMD 7950x processor since almost a year with Gnome / Wayland on Arch. No problems up to now. Yes, I am a gamer too.

As others said it depends on the distribution you use.

Arch is not exactly homogeneous. Which Kernel package and version do you use? Which firmware package and version?

I use Arch, btw, and have these issues with default kernel 6.7.2, 6.7.3 and lts kernel 6.6.14. Firmware package seems to be from 2024-01-15, IIRC.

It also matters what Linux distro you have. Some of them are horrible. I'm super happy with amd graphics on arch, and have no issues whatsoever, with probably 30 games in steam library that all works very well.

So I think it may be your system and what drivers you installed, or some other config.

I have a 6900 XT card, latest kernel, latest drivers. But I've had this graphics card since kernel 5.8 I think, with no issues.

I am running Arch here as well. Most people who referenced that issue I linked also seem to come from Arch. So it seems like a problem due to the "latest" kernel. I don't accept this as an excuse, though. It's still a stable kernel an I don't expect drivers to be published that were not tested in advance. And it looks like this has happened here. Maybe bad timing on my part and this was/is the only hiccup in a long time (see "cursed"), but I guess I'll find out.

Having a bleeding edge kernel can and will come back to bite you. There's a reason why many distros hold back with kernel updates for so long, there's issues that only can be found with user feedback.

From experience, "stable" in the kernel world doesn't mean much unfortunately. I encountered dozens of issues over various versions and different hardware already and it's the main reason I don't run rolling release distros on my main rig.

There's also been enough times where the latest Nvidia driver borked my test system at work so I'm fine with just not running the latest kernel instead.

I have the same problem with the LTS kernel. Just tried. First time it booted but locked up on shutdown. The next cold boot it immediately went to black screen after loading amdgpu. ([drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed). Next boot too. All with kernel 6.6.14.

That's strange, 6.6.14 is the same version that's on Fedora currently. My friend with a 7900 XTX is still on 6.5.0 so I can't get him to test that version right now.

Fix is merged already though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c2f0338bbd132a4b12b988004d796798609d297 Should hopefully not be long before it is backported.

In the linked issue the user uncle jack wrote

I've set up Fedora Workstation 39 with hibernate to rule out potential issues caused by my current system. Rebooting from Linux to Linux still leads to dark boots before reaching the OS. This has kernel 6.6.12. Cold boots seem to be fine after having cut power to the PSU earlier.

So I fear that there is still a deeper issue somewhere. But I'll see what happens after those fixes get backported to 6.7 and hit archlinux. Until then I might have to live with Windows or don't reboot too often. I have yet to figure out how long I need to keep this machine off the power for it to behave like a cold boot. 10s apparently didn't cut it in my latest try (it's a new PC so I still have to learn its quirks).

KDE Neon user here, I have not touched any graphics settings and my AMD card runs 32:9 120 Hz flawlessly

Me with a Vega 64... the forgotten platform. A few games will just straight up reset my gpu with certain instructions, taking the whole system with it. I can't even play Minecraft with a Mesa version newer than 2 years anymore due to regressions.

Good thing to know 7800 XT is also cursed though, I was planning on getting that one to escape my situation. lol.

Kinda weird, is the first gen Vega Apu different enough to not have these problems? Cause I've been pushing that thing hard enough it's starting to have actual hardware faults, very rarely had software related crashes that couldn't be resolved with a temporary kernal rollback

My PC mysteriously reset itself every now and then, and I got Vega 64. Is it because of that?

Using amd GX 6600... Mostly going fine, tho I haven't tried any big heavy games. One thing tho... Everytime I turn on my computer, no display. I reboot it and then ot works fine, but ot never does the first time. One path I'll investigate is the monitor: my monitors are both older and use DVI or VGA ports, so I have to use converters. I might try and get my hand on a more recent monitor to see if I still get the same problem. But if I do, I'm not even sure where to ask. I don't even think it's a linux problem, because I tried removing my drive with linux living one with windows and the problem remains. I also was using mint when the problem started and switched to Arch (btw) since and it doesn't change a thing.

I had a similar problem which was resolved by disabling the motherboard integrated graphics in bios settings.

Thank you ! It didn't seem to work on it's own, but I also noticed I wasn't booting in EFI mode, so maybe if I just change my booting partition and combine it with your advice it'll work...

Mine went back to no display only on boot, so I guess it didnt work for me either :( good luck tho!!

Aw, too bad :( Good luck to you as well, tho! I've bookmarked your comment, so I'll be back to tell you if I find the solution, however long it takes!

I still haven't found the solution, have you had any luck with yours?

I tried switching every UEFI setting that seemed to have something to do with booting or gpus, reinstalled gpu bios, upgrading mobo bios, getting a monitor I could plug without a switch... All to no avail.

Well, I think before upgrading the BIOS, one thing had a slightly different result: Setting the boot mode to UEFI and disabling CSM made it display "no gop (graphic output protocol)" after a few minutes, and it offered to either take me to the uefi settings or loading defaults (which implied going back to CSM), after which it boot this time go back to doing the same thing.

I don't think I've had this error since the mobo bios upgrade, but still no display unless I reboot, unless the computer had been turned in until recently. I'm kinda out of ideas...

…unfortunately no.. I work around it by knowing what buttons to press but it’s pretty stupid.

the most bug-free gpu experience I have with Linux is Nvidia GPU + KDE X11 with compositor disabled. Pure bliss. I've had a 6700XT and it was terrible too, now I have a 4070. For my laptops, intel igpu works decently well with wayland KDE, but there are few bugs, like having to clear some apps gpucache (vscode) quite often

At least with my 1060 compositing wasn't an issue. But true, I rarely used Wayland. Do you have specific issues when compositing is enabled or do you just prefer the simpler rendering?

I prefer without for the aesthetics but also for functionality: compositing x11 with multi monitors of different refresh rates is still broken, everything becomes locked at 60hz instead of the max for each monitor.

Run sudo dmesg | grep amdgpu and look for errors.

You may have a firmware file missing, for instance. If that’s the case, it’s an easy fix - just download the firmware files from the kernel tree and put them wherever your system wants them.

This is how I do it on Debian but it should be easy enough to adapt to whatever distribution you’re using (it might be exactly the same tbh): https://blog.c10l.cc/09122023-debian-gaming#firmware

Thanks for the idea!

dmesg shows the same errors as in the referenced bug ticket. So I don't think missing firmware is the issue. I would not be surprised however, if the problem in general is a combination of amdgpu and firmware behavior. (IMO the hardware should not crash as hard as it does, so the firmware seems to be a bit wonky too)

Ohh, so that's the bug I've been experiencing ever since Fedora 39 updated to kernel 6.7. But I only get this on restarts, so cold starts work just fine. I actually have a 7800 XT as well.

But other than that I only noticed one issue: video playback in Firefox sometimes shows visual artifacts across the screen while a game is running in the background (well, with Baldur's Gate 3 at least). Fedora 39, KDE Plasma. Kernel 6.6 or 6.7 (or 6.5 for that matter). That said I also had some suboptimal experiences with browser video playback on an AMD APU notebook under Windows (severe framedrops), so I'm not sure where to point my fingers at.

Other than that it's honestly been great. I switched from Windows + Nvidia to Linux + AMD basically January 1st of this year and only ever booted Windows twice to transfer game saves over for the few games that don't have Steam Cloud.

Turns out most of the problems I had with Linux desktop was with Nvidia. I spent more time troubleshooting than actually using software. AMD isn't perfect on Linux and with new kernel versions you're suspect to run into more issues, but AMD (and Intel) mostly work out of the box.

If it makes you feel any better you're not the only one. I also have this problem. Whenever it was time to upgrade my video card I'd try Ati and later AMD and it would always have some annoying issues. Meanwhile I'm on my 7th or 8th Nvidia card over the years and they're always great.

I've had similar issues. I don't understand the love for AMD. My whole rig is AMD, but it's constantly having GPU crashes. All games run at high FPS and my CPU temps seem nominal. But the games will crash. Everything from RimWorld to Baldurs Gate 3. They all run pinned at 60fps but randomly crash. I've tried a thousand different configurations and drivers. I've tried Ubuntu and Linux Mint. I'm now just accepting that I can't rely on it as a gaming rig. I like that AMD is trying to be progressive with open source drivers but the quality doesn't seem to be there. My next rig might be Nvidia and Intel. But we will see.

Weird. I would check PSU next.

My issue was the GPU fan and the PSU fan would blow into each other. I opened the PSU and reversed the fan

Hah, I would not expect that to kill it. Maybe a small build. The other day I was switching the cards and realized my CPU fan and case fan were both disconnected, idk how the hell it was running without overheating.. except I always have the side of the case off because the 3080 will shut me down otherwise.

Yeah, but having the fans off just means the heat is passively dissipated. Having another fan blow the hot air back in is worse since it just stays there

Did you check the system logs to see what caused it?

Many things can result in seemingliy random crashes. Any overclock (including XMP and Expo) or undervolt or even a bios version can be problematic.

I would check first if it's stable on windows.

It's not stable on Windows either. But I haven't looked at logs because I didn't really know what - or how - to check.

Most distros use systemd and its logging solution: journald. You can use journalctl to read the logs around the time of the crash for e.g.:

  • journalctl -S -5m this shows the last 5 minutes. Use this when a game crashes but the system continues working and did not reboot.
  • journalctl -b -1 -S -10m this shows the last 10 minutes from the previous boot. Use this if the crash froze the whole system and rebooted.

Look for red lines (errors) and what wrote them. AMD GPU faults usually have the 'amdgpu' mentioned, memory errors could appear as 'protection fault'.

journalctl -S -5m

Looks like this is the errors I'm seeing. I know it's not helpful to just drop this in the chat, but I'm doing it for posterity (and to let you know your comment did in fact help me)!

Feb 04 16:47:40 computer kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
Feb 04 16:47:40 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=17063130, emitted seq=17063132
Feb 04 16:47:40 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 161654 thread redDispatcher9 pid 161668
Feb 04 16:47:40 computer kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Feb 04 16:47:40 computer kernel: amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Feb 04 16:47:40 computer kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Feb 04 16:47:40 computer kernel: amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Feb 04 16:47:40 computer kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Feb 04 16:47:40 computer kernel: [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx

Happy to help! Tough you are right, this is a rather generic error that doesn't help much just confirms that the GPU is the issue.

At this point it could be a driver issue since there are similar open bug reports. A hardware problem is still possible since you previously said that it's unstable on windows too, and power related issues can also lead to this error message.

EDIT: Tentative solution: CoreCtrl

CoreCtrl allowed me to underclock my Radeon 5600XT GPU (currently set values to GPU 800MHz and memory set to 500MHz). I say "tentative" because this problem has been persistent for years, but I've been running Cyberpunk for 1 hour at 60FPS on High settings (and mostly 60FPS on Ultra, but I had some FPS drops). Even if this solution isn't 100% perfect, I think some combination of changing the GPU values is probably going to make my rig much more functional.

I found CoreCtrl based on a Reddit thread last night but didn't have time to test it until this evening after work. Seems to have made a world of a difference.


Yeah I've tried just about every feasible kernel parameter for amdgpu module, updated my kernel, to 6.2 on Linux Mint, and I've tried several different BIOS settings. My system runs everything reasonably. Even Cyberpunk 2077 is generally at 60FPS. But after about 5minutes of gaming on Cyberpunk 2077, it crashes. Other games last longer, which is why I use Cyberpunk 2077 to stress test my system.

These are my system specs:

  • PSU: 850 Watt 80 PLUS Gold Fully Modular ATX
  • CPU: AMD Ryzen 7 2700 Eight-Core Processor × 8
  • GPU: Radeon 5600XT
  • RAM: G-SKill DDR4-3600 CL16-19-19-39 1.35V (2x16GB = 32GB total system memory)
  • SSD: Samsung (MZ-V7E500BW) 970 EVO SSD 500GB - M.2 NVMe
  • MOBO: Asus x470 Pro
  • Other: TP-Link AC1200 PCIe WiFi Card for PC (Archer T5E) - Bluetooth 4.2, Dual Band Wireless Network Card installed in PCIEx1_3 which seems like it could be a variable I should remove, but I've tried removing it and didn't see any changes in behavior. I've tried various PCIEx1_* slots with similar results.

I don't really see where I might be going wrong here. I bought this all ~4 years ago and I've always had these intermittent crashes. It's admittedly worse on Linux, but it still occurred on Windows.

Anyways, I spent about 5 hours last night reading bug forums, testing various amdgpu mod parameters, settings in my BIOS, and even re-configuring my fans to provide (potentially) more optimal cooling. None of this really made a difference. I run two 1080p monitors (not exactly breaking the bank here). I had a lot of hope regarding one forum about ring gfx_1.0.0 errors related to how AMD reads the GPU in Linux. My graphics card is detected as: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] and apparently some machines used to accidentally use the total allocated memory for 5700XT instead of the 5600XT. This resulted in some form of corrupt memory allocation. That sort of behavior would make sense for my system since it runs well, but just fails suddenly.

Other errors I've seen are:

Feb 04 20:17:01 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=116669, emitted seq=116671
Feb 04 20:17:01 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 3668 thread redDispatcher12 pid 3684
...
Feb 04 20:26:16 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=34068, emitted seq=34071
Feb 04 20:26:16 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 4208 thread redDispatcher13 pid 4232
Feb 04 20:26:17 computer kernel: [drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:77:crtc-0] hw_done or flip_done timed out
...
Feb 04 21:00:43 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.3.0 timeout, signaled seq=3085, emitted seq=3086
Feb 04 21:00:43 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 3771 thread redDispatcher8 pid 3783
...
Feb 04 22:28:50 computer kernel: [drm:amdgpu_device_ip_early_init [amdgpu]] *ERROR* early_init of IP block  failed -19
Feb 04 22:28:50 computer kernel: [drm:amdgpu_device_ip_early_init [amdgpu]] *ERROR* early_init of IP block  failed -19
Feb 04 22:36:57 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=171774, emitted seq=171776
Feb 04 22:36:57 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 4122 thread redDispatcher5 pid 4131
...
Feb 04 22:45:46 computer kernel: [drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:77:crtc-0] hw_done or flip_done timed out
Feb 04 22:45:56 computer kernel: [drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:80:crtc-1] hw_done or flip_done timed out
Feb 04 22:46:19 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.1.0 timeout, signaled seq=123, emitted seq=124
Feb 04 22:46:19 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 4187 thread redDispatcher8 pid 4202
...
Feb 04 23:49:45 computer kernel: [drm:gfx_v10_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
Feb 04 23:49:45 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=435155, emitted seq=435157
Feb 04 23:49:45 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 3668 thread redDispatcher12 pid 3690
...
Feb 04 23:58:58 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=66268, emitted seq=66270
Feb 04 23:58:58 computer kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process GameThread pid 4180 thread redDispatcher11 pid 4196
Feb 04 23:58:58 computer kernel: [drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:77:crtc-0] hw_done or flip_done timed out

^ These are all errors which occurred from various tests of amdgpu module settings and/or BIOS settings. The common thread is some form of ring XXXX timeout.

These two threads seemed like my best chance, but their proposed solutions didn't help:

  1. https://bugzilla.kernel.org/show_bug.cgi?id=201957
  2. https://bugzilla.kernel.org/show_bug.cgi?id=202665#c7

Update: today I was able to update to kernel 6.7.5 and the issue disappeared for me.

I don't know if you've tried it yet, but having recently installed 6.7.3 I noticed a whole lot of amdgpu fixes in the changelog. Maybe it will help?

Linux is cursed for me

Sorted that for you

Not really. I use Linux as my main driver since about 2006. My intel laptops and the mentioned nvidia gpus work fine.