Super weird error, what's happening?

SuperSpruce@lemmy.zip to Linux@lemmy.ml – 56 points –

I'm not sure if this is the best community to post in, but I just bought a used computer and slotted in an RX480 as the GPU. I installed KDE Neon 5.27 on it, and it worked flawlessly for 2 days.

Then, even though it was working earlier today, it slept and then would not wake up. So I turned off the power and turned it back on again, and was greeted with this error screen:

The only prior error message I'd gotten from the system was when I tried to install wine for one application, it told me some packages weren't up to date, without a way to fix it. I can enter the BIOS just fine.

What is going on? How do I fix this?

36

None of what's visible helps identifying the error. Try journalctl -xb as suggested it might show more relevant information

Edit: oops should've been joirnalctl instead of journal

I tried to do that, and it couldn't find the journal package. So I tried to install it, but apt, flatpak, nor snap could find the package to install.

This was probably supposed to say "journalctl -xb"

Okay, that command works for me. The last line says that /etc/hosts:7: hostname "SuperSpruce_Iron_3900X" is not valid, ignoring.

Not sure if this is the root cause of your boot failure, but underscores in hostnames are not allowed. A- Z, 0-9 and - are the only allowed characters.

See. Stuff like that is why i started going non-Systemd.

You're welcome to use whatever init system you want, but Systemd solves a lot of the bullshit problems and limitations that come from init.d init scripts. Systemd also has a lot of its own bullshit and bloat, but it does an excellent job at actually being an init system and service manager if you know how to properly use it.

solves a lot of the bullshit problems and limitations that come from init.d init scripts.

So do the other ~7 init systems developed since then. And, as far as i know, all of them print their relevant trouble directly to stderr. Who cares about SysV still?

Hey guys, why all the downvotes? Systemd is known for throwing all the irrelevant stuff at you, making it troublesome to debug. Which is why i switched. And i can confirm: Runit, S6, OpenRC and even simple Dinit are way better in that regard (and they do make less trouble generally).

Almost everything you said is mere brochureware perpetuated by a tribe stronger than the vi mafia.

Sysvinit starts fast, starts well, and doesn't try to control mounts, cron, Getty, and everything else.

The"but it retries things" whine was a solved problem in 2001. So easy.

The EL6 machines I have in storage start faster than the el7 machines joining them. PCLinuxOS is a very valid non-systemd system that only lacks a documented kickstart emulant.

Shit's broke yo.

Sleep/wake issues with AMD gpu and platform drivers are super, super, super common. Fish back through your kernel journal after a reboot (journalctl -kb -1 should do it) and look for the driver errors immediately after the wake event. If this has been fixed in a later kernel release then update your kernel, if not go report it to either the Ubuntu folks or on the amdgpu gitlab.

Can you remove the GPU and use onboard?

It is mentioning gpu in the errors, so it would be the first thing I would try, to see if the errors change, because I have no idea what's going on here

The computer is running a Ryzen 9 3900X, which does not have onboard graphics unfortunately.

Have you tried booting in with a live usb? You might be able to do some sort of recovery from there.

Having said, Iā€™m still very much a Linux noob.

Before doing anything, if your screen allows it, swap DP to HDMI or HDMI to DP as output, that may fix this to the point of being able to actually boot and further fix the issue.

I've had this before with drivers where suddenly it would fail on either port but would still run on one of the others.

Try updating your DM / your entire system from the emergency mode.

How do I get into emergency mode?

The Shell where you typed "systemctl reboot" and "exit".

If you are running KDE neon, try "apt update" and "apt upgrade". If It doesn't work do "sudo apt update" and "sudo apt upgrade"

I ran apt update and some index files failed to download. It was just a warning though.

But systemctl reboot and exit still fail the same way.

Run "journalctl --lines 200" and send photos of output.

NOTE: This is all of the logging of the computer, and it's long (that command select the last 200 entries), so you might have to scroll down using the PageDown key (or arrow down) in order make the photos of everything

The RAID1 seems to be failing according to that screenshot. That breaks the "Local File Systems" task and since quite a lot of things tend to depend on that, many things usually end up failing in an annoying cascade failure. It's also failing with a timeout instead of a strict error, which is odd.

Either way, I'd try commenting that line for /mnt/raid in /etc/fstab for now and seeing if that makes the system boot. It's possible that journalctl -u dev-md0.service or systemctl status dev-md0.service might tell you more, but it's 50/50 if it'll be anything useful.

How do I edit /etc/fstab if I'm not even able to boot the system? Or am I already booted in the system, just in a CLI environment?

You're most likely booted, otherwise you might need a live USB. Hopefully, the system isn't in read-only mode. What I'd recommend doing is:

cp /etc/fstab /etc/fstab.backup

To make a copy once. Then, nano /etc/fstab to run nano, a basic CLI editor. You can use the arrow keys to navigate and type freely in it. The hints like ^O shown on the bottom mean ctrl+o.

You'd use the arrow keys to go down to the line that probably says /dev/md0 /mnt/raid morecrap, put a # in front of it, press ctrl+w then enter to save. If that worked, ctrl+x to exit and try a reboot again.

Obviously can't promise this is "the" error preventing the system from booting, but it's generally a good idea to disable broken stuff like this to get the system working again, then fix it from there. Hopefully, this does the trick. Your RAID setup will not be activated on reboot after you do this but it's not going to permanently delete data or anything.

I used nano to edit /etc/fstab and commented out the last line and the system booted into GUI mode!

This leaves me with some questions:

  1. Why does fstab fail to mount the NTFS raid array?
  2. Why does the raid array failing to mount block the EDID signal? It's not like the OS lives on the raid array.
  3. How do I properly mount the raid array and how do I automate it every boot if I can't use fstab?

Looks like you need to look for messages about /dev/md0 and why it may be timing out. Also maybe add nofail to the raid entry in fstab so you can still boot if the root fs is not on it and it fails ( is root on NTFS possible or good?)

I don't think the edid message is a problem, just an artifact of your monitor not talking to your video card?

Maybe NTFS is the problem, I think it needs special options to automatically remove the dirty bit and replay the journal

Note: The computer has an SSD where the OS lives and two HDDs, sda and sdb, set up in RAID 1 because the computer is 3.5 years old.

2 more...

Before the "systemctl" command: try removing the GPU and booting it up without the GPU If it works, you can skip the "systemctl" commands

2 more...
2 more...
2 more...
2 more...

Read the messages on the screen. It's telling you how to check the logs for the error