k8s storage (CSI)
from eutampieri@feddit.it to selfhosted@lemmy.world on 15 Feb 00:34
https://feddit.it/post/26790529

I’m looking for storage classes for a multi node cluster. I’m currently using Longhorn and NFS, but I’m not happy with the performance. My cluster doesn’t have beefy nodes, so Ceph/Rook is out of the question (for now).

Nodes:

  1. 8 GB RAM, 4 cores VM, control plane. 256 GB SSD
  2. 4 GB RAM, 2 cores, control plane, currently cordoned. 128 GB SSD
  3. 8 GB RAM, 4 cores, ARM, control plane. 512 GB SSD
  4. 8 GB RAM, 4 cores. 256 GB SSD
  5. 16 GB RAM, 6 cores. 256 GB SSD + 1 TB HD
  6. RPi 4, 4 GB RAM. 128 GB SSD

#selfhosted

threaded - newest

h3ron@lemmy.zip on 15 Feb 07:20 next collapse

I have two storage nodes and one is much faster than the other.

I’m currently evaluating a juicefs deployment based on two minio instances (one per node, replicated with async bucket replication) through a load balancer (sidekick) in failover. Because juicefs also needs a db for metadata, I went with valkey + sentinel.

Juicefs provides a CSI driver that supports ReadWriteMany volumes and CSI snapshots and manages both read and write cache. Performance is much much better than Ceph. In theory it should be riskier (because of the async replication) but in practice I haven’t yet lost a bit.

eutampieri@feddit.it on 15 Feb 14:00 collapse

Thanks! A bit more involved that I’d have thought but still worth considering! Could you update us after your evaluation?

h3ron@lemmy.zip on 16 Feb 06:30 collapse

Well actually it is very easy to spin up in docker and most of the configuration happens through env variables.

juicefs itself only exists on the client side, so you basically only have to install and configure the CSI driver with helm.

as it took me a few days to come up with this solution I’d be happy to share my config files.

Performance wise is quite fast on sequential reads (it saturates my 2.5G bandwidth) and slower than I expected on sequential writes (for me it caps at 60MB/s). Postgresql seems happy. I saw no visible performance degradation with Authentic, Immich and Opencloud. Nextcloud installation took ages. I’ve yet to try it with jellyfish and the *arr suite.

A simple NFS share would be faster, but it doesn’t support replication, failover and CSI snapshots.

supersheep@lemmy.world on 15 Feb 00:38 next collapse

I’m currently using Piraeus / LINSTOR and am quite happy with it: github.com/piraeusdatastore/piraeus

eutampieri@feddit.it on 15 Feb 01:02 collapse

Found a Reddit thread that says that LINSTOR has a lower CPU usage (which is my main gripe with Longhorn). Might as well try this and report back. Is there a good way to migrate PVs and PVCs?

nebula@lemmy.ca on 16 Feb 08:09 collapse

It’s great, I spent 2 years finding perfect CSI for homelab and landed on Piraeus. The best part is you get full read performance of your local disk so I didn’t have to use 10G, write are limited by network link between nodes. But that hasn’t been a problem for me. Also, they’re super responsive for any issues/bugs you hit.

Let me know if you have any specific questions about this.

eutampieri@feddit.it on 16 Feb 11:51 collapse

Thanks, I will!

Decronym@lemmy.decronym.xyz on 15 Feb 03:20 next collapse

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters More Letters
NFS Network File System, a Unix-based file-sharing protocol known for performance and efficiency
SSD Solid State Drive mass storage
k8s Kubernetes container management package

[Thread #95 for this comm, first seen 15th Feb 2026, 11:20] [FAQ] [Full list] [Contact] [Source code]

performation@feddit.org on 18 Feb 12:52 collapse

I started setting up longhorn today. My hardware is a little it beefier, but I still am curious: what performance problems did you run into exactly?

eutampieri@feddit.it on 19 Feb 00:46 collapse

My greatest problem is that the CPU load is too high and I ran into an issue with iSCSI that would occasionally peg one core to 100%. It would also make nodes NotReady when too many PVs were scheduled

performation@feddit.org on 19 Feb 22:09 collapse

What CPUs are you running exactly? I was under the assumption my greatest bottleneck would be an 1Gbps LAN

eutampieri@feddit.it on 19 Feb 22:45 collapse

Just don’t go below 4 cores of x86 and you’ll be fine