System sometimes crash with cpu error
Hi,
My system sometimes crashes suddenly and reboots itself. It's random, browsing web, idling, checking mails, I couldn't find the trigger. This is the only log I could find about the crash
mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: baa0000000030150 microcode: CPU23: patch_level=0x0a201025 fbcon: Taking over console mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000002 IPID 500b000000000 mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1689019332 SOCKET 0 APIC 2 microcode a201025
EDIT: my thermals are fine btw, 40C at idle and 70C at max on heavy tasks
The only software thing to try would be making sure your CPU's microcode and BIOS/firmware is up to date.
But that's definitely pointing to a hardware issue otherwise. Could be PSU if you have somewhat unclean power here you are, it takes just a tiny dip to cause the CPU to miscompute and report a machine check error.
my microcode and bios is up to date. Yesterday I had power outage while PC was on so PSU may have damaged?
Did it start doing it after that or has it done it before that? Also did you update anything since?
If you didn't update your computer, changed nothing and it definitely started after the power outage then yes, clues definitely points towards the PSU.
It's really a process of elimination: if you had it before the power outage then it can't be the power outage. If it started after but you also installed a bunch of updates, now you have two potential things to blame.
yes I'm 99% sure that it was after power outage. In any case i disabled PBO on my cpu and if it restarts again I will look for psu. Thanks for your support
Woohoo an mce. If it's always the same core you could disable it with some thing like 'echo 0 > /sys/devices/system/cpu/cpu3/online'
This would have to be run every boot, there may be kernel options to do the same thing.
TIL
I can save like 20 W per real core. Nice tip for a home server.
Lol.
This is barbaric and I love it.
Lol those cores are totally there for redundancy... Right? :P
I have an old itanium server that 'boots' with like 3/8 working cores... Unfortunately the hardware has some other unknown issues that panic Linux shortly after loading. Somehow the efi system seems to be stable...
Recently replaced my computer. For how long have you been using yours?
Its more than 5 years but i upgraded many parts