If you had to redo your self hosting setup, what would you do differently this time around?

hogofwar@lemmy.world to Selfhosted@lemmy.world – 142 points –
103

I would documented everything as I go.

I am a hobbyist running a proxmox server with a docker host for media server, a plex host, a nas host, and home assistant host.

I feel if It were to break It would take me a long time to rebuild.

Ansible everything and automate as you go. It is slower, but if it's not your first time setting something up it's not too bad. Right now I literally couldn't care less if the SD on one of my raspberry pi's dies. Or my monitoring backend needs to be reinstalled.

IMO ansible is over kill for my homelab. All of my docker containers live on two servers. One remote and one at home. Both are built with docker compose and are backed up along with their data weekly to both servers and third party cloud backup. In the event one of them fails I have two copies of the data and could have everything back up and running in under 30 minutes.

I also don’t like that Ansible is owned by RedHat. They’ve shown recently they have zero care for their users.

if by "their users" you mean people who use rebuilds of RHEL ig

I didnlt realize that about ansible. I've always thought it was overkill for me as well, but I figured i'd learn it eventually. Not anymore lol.

I would have taken a deep dive into docker and containerised pretty much everything.

Converting my environment to be mostly containerized was a bit of a slow process that taught me a lot, but now I can try out new applications and configurations at such an accelerated rate it's crazy. Once I got the hang of Docker (and Ansible) it became so easy to try new things, tear them down and try again. Moving services around, backing up or restoring data is way easier.

I can't overstate how impactful containerization has been to my self hosting workflow.

I'm mostly docker. I want to selfhost Lemmy but there's no one-click Docker Compsoe / Portainer installer yet (for Swag / Nginx proxy manager) so I won't until it's ready

Same for me. I've known about Docker for many years now but never understood why I would want to use it when I can just as easily install things directly and just never touch them. Then I ran into dependency problems where two pieces of software required different versions of the same library. Docker just made this problem completely trivial.

Same, but I've never once touched Docker and am doing everything old skool on top of Proxmox. Others may or may not like this approach, but it has many of the benefits in terms of productivity (ease of experimentation, migration, upgrade etc)

I wouldn't change anything, I like fixing things as I go. Doing things right the first time is only nice when I know exactly what I'm doing!

That being said, in my current enviroment, I made a mistake when I discovered docker compose. I saw how wonderfully simply it made deployment and helped with version control and decided to dump every single service into one singular docker-compose.yaml. I would separate services next time into at least their relevant categories for ease of making changes later.

Better yet I would automate deployment with Ansible... But that's my next step in learning and I can fix both mistakes while I go next time!

I do the same. I use caddy reverse proxy, and find it useful to use the container name for url, and no ports exposed

What is the benefit for making changes with separate files?

If you have relevant containers (e.g. the *arr stack) then you can bring all of them up with a single docker compose command (or pull fresh versions etc.). If everything is in a single file then you have to manually pull/start/stop each container or else you have to do it to everything at once.

This. In addition, I've read that it's best practice to make adding and removing services less of a pain.

You're not messing with stacks that benefit from extended uptime just to mess around with a few new projects. Considering my wife uses networks that the homelab influences, it would be a smarter choice for me long term to change things up.

Go with used & refurb business PCs right out of the gate instead of fucking around with SBCs like the Pi.

Go with "1-liter" aka Ultra Small Form Factor right away instead of starting with SFF. (I don't have a permanent residence at the moment so this makes sense for me)

Ah, but now you have a stack of PiS to screw around with, separate from all the stuff you actually use.

I should have learned Ansible earlier.

Docker compose helped me get started with containers but I kept having to push out new config files and manually cycle services. Now I have Ansible roles that can configure and deploy apps from scratch without me even needing to back up config files at all.

Most of my documentation has gone away entirely, I don't need to remember things when they are defined in code.

For me:

  • Document things (configs, ports, etc) as I go
  • Uniform folder layout for everything (my first couple of servers were a bit wild-westy)
  • Choosing and utilizing some reasonable method of assigning ports to things. I do not even want to explain what I need to do when I forget what port something in this setup is using.

I would have gone with an Intel CPU to make use of iGPU for transcoding and probably larger hard drives.

I also would have written down my MariaDB admin password... Whoops

I already did a few months ago. My setup was a mess, everything tacked on the host OS, some stuff installed directly, others as docker, firewall was just a bunch of hand-written iptables rules...

I got a newer motherboard and CPU to replace my ageing i5-2500K, so I decided to start from scratch.

First order of business: Something to manage VMs and containers. Second: a decent firewall. Third: One app, one container.

I ended up with:

  • Proxmox as VM and container manager
  • OPNSense as firewall. Server has 3 network cards (1 built-in, 2 on PCIe slots), the 2 add-ons are passed through to OPNSense, the built in is for managing Proxmox and for the containers .
  • A whole bunch of LXC containers running all sorts of stuff.

Things look a lot more professional and clean, and it's all much easier to manage.

Does that setup allow access to PCIe GPUs for CUDA inference from containers or VMs?

Yes, you can pass through any GPU to containers pretty easily, and if you are starting with a new VM you can also pass through easily there, but if you are trying to use an existing VM you can run into problems.

Can't say anything about CUDA because I don't have Nvidia cards nor do I work with AI stuff, but I was able to pass the built-in GPU on my Ryzen 2600G to the Jellyfin container so it could do hardware transcoding of videos.

You need the drivers for the GPU installed on the host OS, then link the devices on /dev to the container. For AMD this is easy, bc the drivers are open source and included in the distro (Proxmox is Debian based), for Nvidia you'd have to deal with the proprietary stuff both on the host and on the containers.

Instead of a 4-bay NAS, I would have gone with a 6-bay.

You only realize just how expensive it is to expand on your space when you have to REPLACE HDDs rather than simply adding more.

Yes, but you'll be wishing you had 8 bays when you fill the 6 :) At some point, you have to replace disks to really increase space, don't make your RAID volumes consist of more disks than you can reasonably afford to replace at one time. Second lesson, if you have spare drive bays, use them as part of your upgrade strategy, not as additional storage. Started this last iteration with 6x3tb drives in a raidz2 vdev, opted to add another 6x3tb vdev instead of biting the bullet and upgrading. To add more storage I need to replace 6 drives. Instead I built a second NAS to backup the primary and am pulling all 12 disks and dropping back to 6. If/when I increase storage, I'll drop 6 new ones in and MOVE the data instead of adding capacity.

This. And build my own instead of going with synology.

I ended up getting a Raspberry Pi 4 and the Argon Eon case. It all goes through one USB 3 channel however, and for some reason I am stuck at 10MB/s transfer speeds even though USB 3 standards support much more.

I would like a SBC which supports SATA. I suppose there is a the Raspberry Pi CM4, although there's no cases for it to support multiple drives

I’ve got the argon one v2 with a m2 drive. Works well haven’t tested speeds. Not used as a nas though.

I've been pretty happy with my Synology NAS. Literally trouble-free, worry-free, and "just works". My only real complaint is them getting rid of features in the Photos app, which is why I'm still on their old OS.

But I'd probably build a second NAS on the cheap, just to see how it compares :)

What OS would you go with if you had to build one?

I’m happy with synology too for the most part. But I like a bit more flexibility I’d probably build one and use truenas or unraid.

Setup for high availability. I have a hard time taking things down now since other people rely on my setup being on.

Actually plan things and research. Too many of my decisions come back to bite me because I don't plan out stuff like networking, resources, hard drive layouts..

also documentation for sure

I always redo it lol, which is kind of a waste but I enjoy it.

Maybe a related question is what I wish I could do if I had the time (which I will do eventually. Some I plan to do very soon):

  • self host wireguard instead of using tailscale
  • self host a ACME-like setup for self signed certificates for TLS and HTTPS
  • self host encrypted git server for private stuff
  • setup a file watcher on clients to sync my notes on-save automatically using rsync (yes I know I can use syncthing. Don't wanna!)

Wireguard is super quick and easy to setup and use, I'd highly recommend to do that now. I don't understand the recent obsession with Tailscale apart from bypassing cgNAT

Tailscale is an abstraction layer built on top of Wireguard. It handles things like assigning IP addresses, sharing public kegs, and building a mesh network without you having to do any manual work. People like easy solutions, which is why it's popular.

To manually build a mesh with Wireguard, every node needs to have every other node listed as a peer in their config. I've done this manually before, or you could automate it (eg using Ansible or a tool specifically for Wireguard meshes). With Tailscale, you just log in using one of their client apps, and everything just works automatically.

self host wireguard instead of using tailscale

You cam self-host a Headscale server, which is an open-source implementation of the Tailscale server. The Tailscale client apps can connect to it.

What is the downside of using tailscale over wireguard?

I don't think there's any significant downsides. I suppose you are dependent on their infrastructure and uptime. If they ever go down, or for any reason stop offering their services, then you're out of luck. But yeah that's not significant.

The reason I want to do this is it gives me more control over the setup in case I ever wanted to customize it or the wireguard config, and also teaches me more in general, which will enable me to better debug.

I suppose you are dependent on their infrastructure and uptime

AFAIK their infra is only used for configuring the VPN. The VPN itself is a regular peer-to-peer Wireguard VPN. If their infra goes down while a VPN tunnel is connected, the tunnel should keep working. I've never tested that, though.

You can self-host your own Headscale server to avoid using their infra.

Make sure my proxmox desktop build can do GPU passthrough.

My current homelab is running on a single Dell R720xd with 12x6TB SAS HDDs. I have ESXi as the hypervisor with a pfsense gateway and a trueNAS core vm. It's compact, has lots of redundancy, can run everything I want and more, has IPMI, and ECC RAM. Great, right?

Well, it sucks back about 300w at idle, sounds like a jet engine all the time, and having everything on one machine is fragile as hell.

Not to mention the Aruba Networks switch and Eaton UPS that are also loud.

I had to beg my dad to let it live at his house because no matter what I did: custom fan curves, better c-state management, a custom enclosure with sound isolation and ducting, I could not dump heat fast enough to make it quiet and it was driving me mad.

I'm in the process of doing it better. I'm going to build a small NAS using consumer hardware and big, quiet fans, I have a fanless N6005 box as a gateway, and I'm going to convert my old gaming machine to a hypervisor using proxmox, with each VM managed with either docker-compose, Ansible, or nixOS.

...and I'm now documenting everything.

I’ve had an R710 at the foot of my bed for the past 4 years and only decommissioned it a couple of months ago. I haven’t configured anything but I don’t really notice the noise. I can tell that it’s there but only when I listen for it. Different people are bothered by different sounds maybe?

I had an r710 before the r720xd. The r710 was totally fine, the r720xd is crazy loud.

Huh that’s interesting, thanks!

That's crazy to me! I had an R710 and that thing was so loud. I could hear it across the house.

For me if I can hear it at all when sitting near it in a quiet room, it's a no-go.

I'm generally pretty happy with it, though I'd have used podman rather than docker if I were starting now.

I'd use Terraform and Ansible from the start. I'm slowly migrating my current setup to these tools, but that's obviously harder than starting from scratch. At least I did document everything in some way. That documentation plus state on the server is definitely enough to do this transition.

Not accidentally buy a server that takes 2.5 inch hard drives. Currently I'm using some of the ones it came with and 2 WD Red drives that I just have sitting on top of the server with SATA extension cables going down to the server.

Get a more powerful but quieter device. My 10th gen NUC is loud and sluggish when a mobile client connects.

I'd put my storage in a proper nas machine rather than having 25tb strewn across 4 boxes

I already have to do it every now and then, because I insisted on buying bare metal servers (at scale way) rather than VMs. These things die very abruptly, and I learnt the hard way how important are backups and config management systems.

If I had to redo EVERYTHING, I would use terraform to provision servers, and go with a "backup, automate and deploy" approach. Documentation would be a plus, but with the config management I feel like I don't need it anymore.

Also I'd encrypt all disks.

Also I’d encrypt all disks.

What's the point on a rented VPS? The provider can just dump the decryption key from RAM.

bare metal servers (at scale way) rather than VMs. These things die very abruptly

Had this happen to me with two Dedibox (scaleway) servers over a few months (I had backups, no big deal but annoying). wtf do they do with their machines to burn through them at this rate??

I don't know if they can "just" dump the key from RAM on a bare metal server. Nevertheless, it covers my ass when they retire the server after I used it.

And yeah I've had quite a few servers die on me (usually the hard drive). At this point I'm wondering if it isn't scheduled obsolescence to force you into buying their new hardware every now and then. Regardless, I'm slowly moving off scaleway as their support is now mediocre in these cases, and their cheapest servers don't support console access anymore, which means you're bound to using their distro.

I’d encrypt all disks. Nevertheless, it covers my ass when they retire the server after I used it.

Good point. How do you unlock the disk at boot time? dropbear-initramfs and enter the passphrase manually every time it boots? Unencrypted /boot/ and store the decryption key in plaintext there?

I run openbsd on all my servers so I would be entering the passphrase manually at boot time. Saving the key on unencrypted /boot is basically locking your door and leaving the key on it :)

I would use terraform to provision servers, and go with a “backup, automate and deploy” approach. Documentation would be a plus

Yea. This is what I do. Other than my Synology, I use Terraform to provision everything locally. And all my pi holes are controlled by ansible.

Also everything is documented in trillium.

Whole server regularly gets backed up multiple times, one is encrypted and the other via syncthing to my local desktop.

Terraform is the only missing brick in my case, but that's also because I still rent real hardware :) I'm not fond of my backup system tho, it works, but it's not included in the automated configuration of each service, which is not ideal IMO.

Not go as HAM on commercial server hardware. iLO is really nice for management though...

I have things scattered around different machines (a hangover from my previous network configuration that was running off two separate routers) so I’d probably look to have everything on one machine.

Also I kind of rushed setting up my Dell server and I never really paid any attention to how it was set up for RAID. I also currently have everything running on separate VMs rather than in containers.

I may at some point copy the important stuff off my server and set it up from scratch.

I may also move from using a load balancer to manage incoming connections to doing it via Cloudflare Tunnels.

The thing is there’s always something to tinker with and I’ve learnt a lot building my little home lab. There’s always something new to play around with and learn.

Is my setup optimal? Hell no. Does it work? Yep. 🙂

I'd plan out what machines do what according to their drive sizes rather than finding out the hard way that one of them only has a few GB spare that I used as a mail server. Certainly document what I have going, if my machine Francesco explodes one day it'll take months to remember what was actually running on it.

I'd also not risk years of data on a single SSD drive that just stopped functioning for my "NAS" (its not really a true NAS just a shitty drive with a terabyte) and have a better backup plan

That's a pretty good question: Since I am new-ish to the self-hosting realm, I don't think I would have replaced my consumer router with the Dell OptiPlex 7050 that I decided on. Of course this does make things very secure considering my router is powered by OpenBSD. Originally, I was just participating in DN42 which is one giant VPN semi-mesh network. Out of that hatched the idea to yank stuff out of the cloud. Instead, I would have put the money towards building a dedicated server instead of using my desktop as a server. At the time I didn't realize how cheap older Xeon processors are. I could have cobbled together a powerhouse multi-core, multi-threaded Proxmox or xcp-ng server for maybe around 500-600 bucks. Oh well, lesson learned.

I recently did this for the second time. Started on FreeNAS, switched to TrueNAS Scale when it released and just switched to Debian. Scale was too reliant on TrueCharts which would break and require a fresh install every couple of months. I should've just started with Debian in the first place.

To be honest, nothing. Running my home server on a nuc with proxmox and a 8 bay synology Nas (though I'm glad that I went with 8 bay back then!).
As a router I have opnsense running on a low powered mini pc.

All in all I couldn't wish for more (low power, high performance, easy to maintain) for my use case, but I'll soon need some storage and ram upgrade on the proxmox server.

I would’ve gone with a less powerful nas and got a separate unit for compute. I got a synology nas with a decent amount of compute so I could run all my stuff on the nas, and the proprietary locked down OS drives me a bit nuts. Causes all sorts of issues. If I had a separate compute box I could just be running some flavor of Linux, probably Ubuntu and have things behave much more nicely

I have ended up with 6x 2TB disks, so if I was starting again I'd go 2x10TB and use an IT mode HBA and software RAID 1. I'd also replace my 2x Netgear Switches and 1x basic smart TP-Link switch and go full TP-Link Omada for switching with POE ports on 2 of them - I have an Omada WAP and it's very good. Otherwise I'm pretty happy.

Use actual nas drives. Do not use shucked external drives, they are cheaper for a reason, not meant for 24-7. Though I guess they did get me through a couple years, and hard drive prices seem to keep falling.

Probably splurge just a bit more for CMR hard drives in my ZFS setup. I've had some pretty scary moments in my current setup.

Getting a better rack. My 60cm deep rack with a bunch of rack shelves and no cable management is not very pretty and moving servers around is pretty hard.

Hardwarewise I'm mostly fine with it, although I would use a platform with IPMI instead of AM4 for my hypervisor.

The only real pain point I have is my hard drive layout. I've got a bunch of different drive sizes that are hard to expand on without wasting space or spending a ton.

Depending on your comfort level and setup, you could use LVM. Then the differently sized hard drives wouldn’t be such a problem.

Or if you want a much more complex situation, you could set up Ceph. It will also give you redundancy, but it’s a really steep learning curve.

Or mergerfs if you are not too concerned with performance

Btrfs also allows for mixed size drive. It's the reason why I use it

Edit: autocorrect

I'm on btrfs. I have a 14 TB, a 16TB, and two 7TB drives in RAID1. I'm running out of space for all my linux ISOs and I'd really like to transition to some sort of 3 or 4:1 parity raid, but you're not supposed to use that and I don't see a clear path to a ZFS pool or something

I built a compact nas. While it's enough for the drives I need, even for upgrades, I only have 1 pcie x4 slot. Which is becoming a bit limiting. I didn't think i'd have a need for for either a tape drive or a graphics card, and I have some things I want to do that require both. Well, I can only do one unless I get a different motherboard and case. Which means i'm basically doing a new build and I don't want to do either of the projects I had in mind enough to bother with that.

I would go smaller with lower power hardware. I currently have Proxmox running on an r530 for my VMs, plus an external NAS for all my storage. I feel like I could run a few 7050 micro's together with proxmox and downsize my NAS to use less but higher density disks.

Also, having a 42U rack makes me want to fill it up with UPS's and lots of backup options that could be simplified if I took the time to not frankenstein my solutions in there. But, here we are...