Homelab - Part 4: From the ground up October 30, 2021 on Jonathan's Blog Tags: homelab, devops, self-hosting, ansible, k8s

Contents

For the last month I have been working on restructuring my Kubernetes cluster to make everything a bit more organised and sensible. During the process I also scripted the setup of the host using Ansible and made some changes to the way I do things.

If you’re just interested in seeing the GitOps repository, you can check that out here: https://github.com/Jonnobrow/coffee-shop.

Motivation for change

Before I dive in to the new structure of my cluster, the things I have added and the things I have changed, I want to explain why this was necessary (or why I felt it was anyway).

Three main things prompted the changes:

Some new tools

A couple of new tools were integrated into my workflow during this process. Some replaced tools I was already using while others simply improved my quality of life or were necessary for the automation I was trying to accomplish.

Mozilla SOPS2

The Github repository describes SOPS (Secrets OPerationS) as being a “simple and flexible tool for managing secrets”. It supports a wide range of file formats and can encrypt using a few different mechanisms depending on your needs.

For me it replaces Bitnami Sealed Secrets3, but also gives me a way to encrypt secrets that don’t like within my cluster, like secrets used in Ansible deployments.

Some examples of usage:

FluxCD provides a guide4 on using Mozilla SOPS, as does Ansible5.

Task6

Task is a task runner … that aims to be simpler and easier to use than … GNU Make.

Instead on the Makefile format, Task uses yaml, which I am much more familiar with. It also has a more natural feel to usage (for me) and I found the ability to use templates, include other task files and define dynamic variables really useful for the kinds of things I wanted to do with it.

I predominantly use Task for running small admin tasks like:

The documentation6 explains how to install and use Task and I recommend checking it out for a little bit more automation in all of your projects. For an example of multiple task files, see my GitOps repository.

pre-commit7

This tool managed pre-commit hooks for Git. These hooks mean that on every commit my code or manifests are run through a linter and any other tools I want.

So far I am only using this for a few checks for my manifests such as:

However, in the future I can see me using this to run linters and tests before committing python code and much more.

A small configuration file is included in my GitOps repository and simply running pre-commit install-hooks will add the hooks to that repository, causing them to run on every git commit command, stopping the commit if a check fails.

Still Messy Manifests

My original plan was to just put the manifests in to a new directory structure, perhaps move around some services into more sensible namespaces and then be done. However, I am somewhat prone to diving deep into the rabbit hole10 and found myself looking at other GitOps Kubernetes Clusters on k8s-at-home/awesome-home-kubernetes. This quickly led to me excitedly creating a new branch on my own GitOps repository and the procrastination had officially begun.

A new cluster is born

I run a single node cluster, so by “a new cluster” what I really mean is “a new node”. In order to test the re-done manifests I decided it was best to create a separate environment because I wasn’t sure how long it would take and I didn’t want to be without some of my services for an unknown amount of time.

For ease, I decided to look into using k3s11, rather than doing everything from scratch, and here begins the first descent into the rabbit hole.

Many of the aforementioned repositories contain a directory with deployments using either Ansible, Terraform or a combination of the two. Of course, I liked the idea of having a single command I could run that would install dependencies, configuration files and setup k3s - so I made it happen. It was a chance to learn some more about Ansible12 and also make my life easier in the future should I decide to move to new hardware or rebuild my cluster again.

So after about two days work I had a new cluster up an running, but still hadn’t done anything that I originally set out to do, oh well.

Making use of Kustomize

Another thing that was common across most of the repositories linked in the awesome-home-kubernetes project was the use of Kustomize13 and the Flux Kustomize Controller14.

Kustomize is a powerful tool that can do some really cool stuff like generating secrets and config maps from the actual files, so you aren’t managing two copies of something. It can also apply patches, perform replacements and much more. When coupled with the Kustomize Controller for flux it allows me to have quite fine-grained control over my manifests and I had to have that.

These are the things I mainly use Kustomize for:

Kustomizations also mean that I can specify exactly which manifests should be applied, rather than the alternative which seemed quite flakey to me. I can reconcile a whole kustomiztion in the knowledge that my volumes, secrets and configs will also get applied whereas before only the helm release itself would be updated.

Once again, this is all very cool but at this point I’m three days in an not a lot has changed in the way of removing messy manifests.

Setting up Tasks and SOPS

Of course, things must come in threes, so here we are with a third change before the real work even begins. As mentioned in the “Some new tools” section, I wanted to use SOPS to manage secrets and Task to run tasks. I set up both of these tools when creating the Ansible deployments so that I could easily keep secret variables and run playbooks. However I needed to add a couple more tasks and generate some extra keys before I could use everything with Kubernetes.

I wrote a couple of tasks that simply ran Flux commands I use all the time, so instead of running flux reconcile source git flux-system I could type task flux:sync which saves me quite a few key presses over a day of reconciliations.

Additionally, I followed the instructions on the flux website4 for using SOPS and generated a separate key that the cluster would use to decrypt secrets. I then added some tasks to my task files for generating those secrets again in the future, although hopefully I will never have to as that would be very bad! I also created a task to generate the secret in the cluster that flux uses to decrypt everything and that was that.

Finally getting somewhere

Now that there is new cluster, I have decided to use Kustomize and I’ve set up some new tools, I can finally start re-organising my services. I started by copying a structure from the repos I was using as inspiration.

/coffee-shop
├── cluster
│  ├── apps
│  ├── base
│  ├── core
│  └── crds
├── server # Notes live elsewhere, see above
└── Taskfile.yml # Contains tasks

Let’s break that down a bit:

Okay, so now a nice new structure is in place, what does a typical service look like in that structure. Here is an example using Jellyfin:

/coffee-shop
└── cluster                         # Top Level
   └── apps                         # Jellyfin is a service, so under apps
      └── media                     # Jellyfin is a "Free Software Media System"
         ├── _pvc                   # Namespace level Persistent Volume Claims
         └── jellyfin                # Jellyfin get's its own directory
            ├── helm-release.yaml   # The helm release itself
            ├── kustomization.yaml  # A kustomization to say what should be included
            └── config-pvc.yaml      # Service level Persistent Volume Claims

By grouping resources into common namespaces I avoid having separate persistent volume claims and persistent volumes for each service. I have six services in the media namespace that all use the same volumes, but I only have one definition for each, not six like in my old setup.

The logical next step would be moving everything over, but of course the rabbit hole re-opened and I couldn’t help but jump in.

Switching to Traefik15

My cluster had been running fine using ingress-nginx16 for a few months, so I had no real reason to change to a different proxy. However, the main aim of all of this is to learn AND to follow trends, so naturally I took a look at something new that goes by the name Traefik. Traefik is “The Cloud Native Application Proxy” and provides a lot of nice features:

Further to this, Traefik seems to be the in-thing so with the aim of staying somewhat current with my homelab, I decided to move to it.

The process didn’t go off without a hitch though. For most services it was as adding some annotations for Traefik service discovery. Something like:

annotations:
  traefik.ingress.kubernetes.io/router.entrypoints: "websecure"
  traefik.ingress.kubernetes.io/router.middlewares: "list,of,middlewares"

However, for services like Nextcloud that required more configuration, I had to define middlewares (see my nextcloud middleware here) in order to replace inline nginx configuration snippets in my old setup. This took a bit of working out and quite a few commits to get sorted.

As for some of the benefits, I could now define middlewares as an extra layer of security. For example, I have a middleware that only allows traffic from cloudflare IP addresses, for my external traffic, and another than only allows RFC191817 IP addresses for my local traffic. I could also then define multiple ingresses for some of my services, like those with a /api path so that only things on my local network could access the API, and only traffic through cloudflares proxy could access the rest18.

Articles from blogs I read Generated by openring

Neurodivergence and accountability in free software

In November of last year, I wrote Richard Stallman’s political discourse on sex, which argues that Richard Stallman, the founder of and present-day voting member of the board of directors of the Free Software Foundation (FSF), endorses and advocates for a ha…

via Drew DeVault's blog September 25, 2024

Status update, September 2024

Hi! Once again, this status update will be rather short due to limited time bandwidth. I hope to be able to allocate a bit more time slots for my open-source projects next month. We’re getting closer to a new Sway release (fingers crossed), with lots of help f…

via emersion September 20, 2024