Date: 2024-07-25

The Summary

Previously, I mentioned my AI stuff. It's been a lot of fun, but I had set it
up on a test box. After I got past the PoC phase, I wanted to get
everything put into k8s.

The Setup

Setting this up involved figuring out a few things:
  1. Prioritizing container startup-time vs build-time
  2. GPU scheduling
  3. Helm Charts

Startup-time vs Build-time:
I'm running this setup on a desktop, so I need to be sensitive to how
big my containers are. This can be a problem as models tend to be pretty big,
especially if you have a few different ones. My Stable-Diffusion setup
currently has 19 models, totalling 33G. That doesn't include my LLMs. This
would take a while to build the container and move it around, so I decided
to use NFS instead. This means it takes longer to swap models out, but since
I don't do that frequently, it's not a big issue.

GPU-scheduling and Helm Charts:
I've not used Helm Charts before, so I had a slight learning curve. This ended
up not being a big deal, as I was expecting them to be more complicated. I
needed to use them to set gpu-scheduling as that is nvidia's official method.
They ended up being pretty easy, as they are just a packaged up set of k8s
files. Nvidia's documentation could be a bit better, but I was able to finally
get everything set up and working. It currently takes ~20 minutes to rollout
a new update to my AI stuff, but some of that is from me manually building a
few things that could probably be automated by now.

I'd already had a few systems set up to run on specific hardware, as I have a
few Raspberry Pis in my cluster. They have different CPU architectures, and
one of them has a special hat for MPD.
So specifying hardware for the GPU was easy, especially as it was set up in
its own namespace.

Overall I'm pretty happy with the setup. Everything is becoming more uniform
which takes out a lot of my cognitive load. The less exceptions and one-offs
the better for me.