How to remotely reboot a Linux host if SSH fails to connect?

sylverstream@lemmy.nz to Selfhosted@lemmy.world – 65 points –

Edit2: Thanks all for your responses! I have checked the logs, https://lemmy.nz/comment/6192604, and based on that removed tracker-miner-fs as it's a search/index tool which I don't need. No idea why it took over all memory. I'll also get a WiFi Smartplug as a kill switch. Hopefully that solves it. Thanks again heaps!


I've got a HP ProDesk G3 which I'm using as home server, I've installed Ubuntu on it. Earlier this week the services I host on it stopped (Immich & Frigate). I tried to SSH, but it just hung after asking for a password. I could ping it, but it was just unresponsive.

I had to force reboot it manually. This is fine, but I'm not always at home.

The chip has Intel vPro as far as I know, which could be an option, but I have no idea how this works. The documentation on the Intel site seems focused on enterprises. I tried to connect with RealVNC which does not work, so I think I've got to install/configure something on the server first.

I also asked Bing Chat but it came up with non existing packages & commands. Welcome your thoughts!

/edit: I just found this, which seems to be exactly what I need: https://manpages.ubuntu.com/manpages/focal/en/man7/amt-howto.7.html

49

You are viewing a single comment

Check if your motherboard has a watchdog function. If the OS can't ping the watchdog every 5 min or whatever you set it to, the board resets.

This is how we handled camera servers at one of my former jobs, we just setup HP SFF desktops with Windows and the software and turned on the watchdog timer, always did the trick when power outages or system hangups happened.

Thanks, I've got a HP SFF as well. Not 100% sure how to turn it on though from Ubuntu. There's a software based version: https://manpages.ubuntu.com/manpages/xenial/en/man8/watchdog.8.html

But I guess that's not the one using the motherboard watchdog function.

You need an OS app to run and a setting in the BIOS. The app at the OS level gives a heartbeat to the watchdog module on the mother board. If you miss some heartbeats, the firmware on the motherboard sends the reset command.

This is how you lose data. Hope you have a good backup on a NAS?

No, this is a tool that can be used in a well designed architecture. Would I do this with a single database server, probably not. Would I ever run a single database server? Also probably not.

Also, by this point, you've probably already kernel panicked or something. There's not much left that can be saved and you probably needed that backup five minutes before the host came up.