I am using Talos Linux. It’s Linux specially cooked for kubernetes. It’s been super simple and has been very reliable so far.
Also, if I mess something up badly, it’s really fast to get back to a clean k8s install.
For storage, I am using LocalPathProvisioner for fast local nvme storage (I use CNPG to run postgres, which handles clustering in postgres). https://github.com/rancher/local-path-provisioner
There is an issue about backing up PVCs https://github.com/rancher/local-path-provisioner/issues/85
For distributed/redundant storage I use Ceph Rook. https://rook.io/docs/rook/v1.9/ceph-storage.html
Again, super simple to set up.
Docs for backups:
https://rook.github.io/docs/rook/latest/Troubleshooting/disaster-recovery/?h=veler#steps
My current workload is lots of short-lived stuff, so I make sure I have automated and replicable deployment of stuff, so I haven’t dug into backups.
Which limits how much help I can provide.
Lemmy.ml is the instance run by the developers. Pretty sure there are some discussions there.
Other than that, GitHub issues. I’m surprised they haven’t enabled GitHub discussions.