The code and data behind xeiaso.net
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

my kubernetes adventure

Signed-off-by: Xe Iaso <me@xeiaso.net>

Xe Iaso ab2fbe45 e278b211

+1337 -6
+2 -2
lume/src/_components/XeblogConv.tsx
··· 15 15 }: XeblogConvProps) => { 16 16 const nameLower = name.toLowerCase(); 17 17 name = name.replace(" ", "_"); 18 - const size = standalone ? 128 : 64; 18 + const size = standalone ? 256 : 128; 19 19 20 20 return ( 21 21 <> 22 22 <div className="my-4 flex space-x-4 rounded-md border border-solid border-fg-4 bg-bg-2 p-3 dark:border-fgDark-4 dark:bg-bgDark-2 max-w-full min-h-fit"> 23 23 <div className="flex max-h-16 shrink-0 items-center justify-center self-center"> 24 24 <img 25 - style="max-height:6rem" 25 + style={`max-height:${standalone ? "6" : "4"}rem`} 26 26 alt={`${name} is ${mood}`} 27 27 loading="lazy" 28 28 src={`https://cdn.xeiaso.net/sticker/${nameLower}/${mood}/${size}`}
+6 -2
lume/src/_includes/blog.njk
··· 3 3 --- 4 4 5 5 <article class="prose dark:prose-invert max-w-none"> 6 - <h1>{{title}}</h1> 7 - <p class="text-sm text-fg-3 dark:text-fgDark-3 mb-2"> 6 + <h1 class="mb-2">{{title}}</h1> 7 + <p class="text-sm text-fg-3 dark:text-fgDark-3 my-1"> 8 8 Published on <time datetime={{date | date("DATE")}}>{{date | date("DATE_US")}}</time>, {{ readingInfo.words }} words, {{ readingInfo.minutes }} minutes to read 9 9 </p> 10 + 11 + {% if desc %} 12 + <p class="text-sm font-serif text-fg-3 dark:text-fgDark-3 my-1">{{desc}}</p> 13 + {% endif %} 10 14 11 15 {% if patronExclusive %} 12 16 <div class="bg-yellow-50 border-l-4 border-yellow-400 py-1 px-4 mb-4">
+1326
lume/src/blog/2024/homelab-v2.mdx
··· 1 + --- 2 + title: "Rebuilding my homelab: Suffering as a service" 3 + desc: With additional Kubernetes mode! 4 + date: 2024-05-15 5 + tags: 6 + - Homelab 7 + - RockyLinux 8 + - FedoraCoreOS 9 + - TalosLinux 10 + - Kubernetes 11 + - Ansible 12 + - Longhorn 13 + - Nginx 14 + - CertManager 15 + - ExternalDNS 16 + hero: 17 + ai: "Photo by Xe Iaso, Canon EOS R6 mark II with a Rokinon Cine DSX 85mm T1.5 lens" 18 + file: ../xedn/dynamic/766623e0-26d1-4068-9a63-a91d274f23d0 19 + prompt: "A field of dandelion flowers in the sun, heavy depth of field. A thin strip of the field is in focus, the rest is a blur." 20 + --- 21 + 22 + I have a slight problem where I have too many computers in my office. These extra computers are my [homelab](https://www.reddit.com/r/homelab/), or a bunch of slack compute that I can use to run various workloads at home. I use my homelab to have a place to "just run things" like [Plex](https://plex.tv) and the whole host of other services that I either run or have written for my husband and I. 23 + 24 + <Conv name="Cadey" mood="hug"> 25 + I want to have my own platform so that I can run things that I used to run in 26 + the cloud. If I can "just run things locally", I can put my slack compute 27 + space to work for good. This can help me justify the power bill of these nodes 28 + to my landlord! 29 + </Conv> 30 + 31 + Really, I just wanna be able to use this to mess around, try new things, and turn the fruit of those experiments into blogposts like this one. Until very recently, everything in my homelab ran NixOS. [A friend of mine](https://fasterthanli.me) has been goading me into trying Kubernetes again, and in a moment of weakness, I decided to see how bad the situation was to get Kubernetes running on my own hardware at home. 32 + 33 + - `kos-mos`, a small server that I use for running some CI things and periphery services. It has 32 GB of ram and a Core i5-10600. 34 + - `ontos`, identical to `kos-mos` but with an RTX 2060 6 GB. 35 + - `logos`, identical to `kos-mos` but with a RTX 3060 12 GB. 36 + - `pneuma`, my main shellbox and development machine. It is a handbuilt tower PC with 64 GB of ram and a Ryzen 9 5900X. It has a GPU (AMD RX5700 non-XT w/8GB of vram) because the 5900X doesn't have integrated graphics. It has a bunch of random storage devices in it. It also handles the video transcoding for xesite video uploads. 37 + - `itsuki`, the NAS. It has all of our media and backups on it. It runs Plex and a few other services, mostly managed by docker compose. It has 16 GB of ram and a Core i5-10600. 38 + - `chrysalis`, an old Mac Pro from 2013 that I mostly use as my Prometheus server. It has 32 GB of ram and a Xeon E5-1650. It also runs the IRC bot `[Mara]` in `#xeserv` on Libera.chat (it announces new posts on my blog). It's on its last legs in multiple ways, but it works for now. I've been holding off on selling it because I won it in a competition involving running an IRC network in Docker containers. Sentimental value is a bitch, eh? 39 + 40 + <Conv name="Mara" mood="hacker"> 41 + When the homelab was built, the Core i5-10600 was a "last generation" 42 + processor. It also met a perfect balance between compute oomph, onboard iGPU 43 + support, power usage, and not requiring a massive cooler to keep it running 44 + happily. We could probably get some more use out of newer processors, but that 45 + will probably have to wait for one or more of our towers/their parts to get 46 + cycled out in regular upgrades. That probably won't happen for a year or two, 47 + but it'll be nice to get a Ryzen 9 5950x or two into the cluster eventually. 48 + </Conv> 49 + 50 + Of these machines, `kos-mos` is the easiest to deal with because it normally doesn't have any services dedicated to it. In the past, I had to move some workloads off of it for various reasons. 51 + 52 + I have no plans to touch my shellbox or the NAS, those have complicated setups that I don't want to mess with. I'm okay with my shellbox being different because that's where I do a lot of development and development servers are almost always vastly different from production servers. I'm also scared to touch the NAS because that has all my media on it and I don't want to risk losing it. It has more space than the rest of the house combined. 53 + 54 + A rebuild of the homelab is going to be a fair bit of work. I'm going to have to take this one piece at a time and make sure that I don't lose anything important. 55 + 56 + <Conv name="Numa" mood="delet"> 57 + Foreshadowing is a literary technique in which... 58 + </Conv> 59 + 60 + This post isn't going to be like my other posts. This is a synthesis of a few patron-exclusive notes that described my steps in playing with options and had my immediate reactions as I was doing things. If you want to read those brain-vomit notes, you can [support me on Patreon](https://patreon.com/cadey) and get access to them. 61 + 62 + When I was considering what to do, I had a few options in mind: 63 + 64 + - [Rocky Linux](https://rockylinux.org/) (or even [Oracle Linux](https://yum.oracle.com/)) with Ansible 65 + - Something in the [Universal Blue](https://universal-blue.org/) ecosystem 66 + - [Fedora CoreOS](https://fedoraproject.org/coreos/) 67 + - [K3os](https://k3os.io/) 68 + - [Talos Linux](https://talos.dev) 69 + - Giving up on the idea of having a homelab, throwing all of my computers into the sun (or selling them on Kijiji), and having a simpler life 70 + 71 + <Conv name="Aoi" mood="wut"> 72 + Wait, hold up. You're considering _Kubernetes_ for your _homelab_? I thought 73 + you were as staunchly anti-Kubernetes as it got. 74 + </Conv> 75 + <Conv name="Cadey" mood="coffee"> 76 + I am, but hear me out. Kubernetes gets a lot of things wrong, but it does get 77 + one thing so clearly right that it's worth celebration: you don't need to SSH 78 + into a machine to look at logs, deploy new versions of things, or see what's 79 + running. Everything is done via the API. You also don't need to worry about 80 + assigning workloads to machines, it just does it for you. Not to mention I 81 + have to shill a [Kubernetes product for 82 + work](https://fly.io/docs/kubernetes/fks-quickstart/) at some point so having 83 + some experience with it would be good. 84 + </Conv> 85 + <Conv name="Aoi" mood="facepalm"> 86 + Things really must be bad if you're at this point... 87 + </Conv> 88 + <Conv name="Cadey" mood="enby"> 89 + Let's be real, the latest release is actually, real life, unironically named 90 + uwubernetes. I can't _not_ try it. I'd be betraying my people. 91 + </Conv> 92 + <Conv name="Aoi" mood="facepalm"> 93 + You really weren't kidding about technology decisions being made arbitrarily 94 + in the [Shashin talk](/talks/2024/shashin/), were you. How do you exist? 95 + </Conv> 96 + 97 + I ran a poll on [Mastodon](https://pony.social/@cadey/112345742472623188) to see what people wanted me to do. The results were overwhelmingly in favor of Rocky Linux. As an online "content creator", who am I to not give the people what they want? 98 + 99 + ## Rocky Linux 100 + 101 + [Rocky Linux](https://rockylinux.org/) is a fork of pre-Stream CentOS. It aims to be a 1:1 drop-in replacement for CentOS and RHEL. It's a community-driven project that is sponsored by the [Rocky Enterprise Software Foundation](https://resf.org/). 102 + 103 + For various reasons involving my HDMI cable being too short to reach the other machines, I'm gonna start with `chrysalis`. Rocky Linux has a GUI installer and I can hook it up to the sideways monitor that I have on my desk. For extra fun, whenever the mac tries to display something on the monitor, the EFI framebuffer dances around until the OS framebuffer takes over. 104 + 105 + <Video path="video/2024/oneoff-mac-boot" /> 106 + 107 + <Conv name="Cadey" mood="coffee"> 108 + I really hope one of the GPUs isn't dying. That would totally ruin the resale 109 + value of that machine. I wasn't able to recreate this on my 1080p crash cart 110 + monitor, so I think that it's just the combination of that mac, the HDMI cable 111 + I used, and my monitor. It's really weird though. 112 + </Conv> 113 + 114 + The weird part about `chrysalis` is that it's a Mac Pro from 2013. Macs of that vintage can boot normal EFI partitions and binaries, but they generally prefer to have your EFI partition be a HFS+ volume. This is normally not a problem because the installer will just make that weird EFI partition for you. 115 + 116 + <Picture 117 + path="blog/2024/homelab-v2/IMG_0256" 118 + desc="an error message saying: resource to create this format macefi is unavailable" 119 + /> 120 + 121 + However, the Rocky Linux installer doesn't make that magic partition for you. They ifdeffed out the macefi installation flow because Red Hat ifdeffed it out. 122 + 123 + <Conv name="Cadey" mood="coffee"> 124 + I get that they want to be a 1:1 drop-in replacement (which means that any bug 125 + RHEL has, they have), but it is massively inconvenient to me in particular. 126 + </Conv> 127 + 128 + As a result, you have to do a very manual install that looks something like this [lifted from the Red Hat bug tracker](https://bugzilla.redhat.com/show_bug.cgi?id=1751311#c16): 129 + 130 + > - Boot Centos/RHEL 8 ISO Normally (I used 8.1 of each) 131 + > - Do the normal setup of network, packages, etc. 132 + > - Enter disk partitioning 133 + > - Select your disk 134 + > - At the bottom, click the "Full disk summary and boot loader" text 135 + > - Click on the disk in the list 136 + > - Click "Do not install boot loader" 137 + > - Close 138 + > - Select "Custom" (I didn't try automatic, but it probably would not create the EFI partition) 139 + > - Done in the top left to get to the partitioning screen 140 + > - Delete existing partitions if needed 141 + > - Click + 142 + > - CentOS 8: create /boot/efi mountpoint, 600M, Standard EFI partition 143 + > - RHEL 8: create /foo mountpoint, 600M, Standard EFI partition, then edit the partition to be on /boot/efi 144 + > - Click + repeatedly to create the rest of the partitions as usual (/boot, / swap, /home, etc.) 145 + > - Done 146 + > - During the install, there may be an error about the mactel package, just continue 147 + > - On reboot, both times I've let it get to the grub prompt, but there's no grub.cfg; not sure if this is required 148 + > - Boot off ISO into rescue mode 149 + > - Choose 1 to mount the system on /mnt/sysimage 150 + > - At the shell, chroot /mnt/sysimage 151 + > - Check on the files in /boot to make sure they exist: ls -l /boot/ /boot/efi/EFI/redhat (or centos) 152 + > - Run the create the grub.cfg file: grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg 153 + > - I got a couple reload ioctl errors, but that didn't seem to hurt anything 154 + > - exit 155 + > - Next reboot should be fine, and as mentioned above it'll reboot after SELinux labelling 156 + 157 + <Conv name="Cadey" mood="percussive-maintenance" standalone> 158 + Yeah, no. I'm not going to do that. Another solution I found involved you 159 + manually booting the kernel from the GRUB rescue shell. I'm not going to do 160 + that either. I hate myself enough to run Kubernetes on metal in my homelab, 161 + but not so much that I'm gonna unironically slonk the grub rescue shell in 162 + anger. 163 + </Conv> 164 + 165 + So, that's a wash. In the process of figuring this out I also found out that when I wiped the drive, I took down my IRC bot (and lost the password, thanks `A_Dragon` for helping me recover that account). I'm going to have to fix that eventually. 166 + 167 + <Conv name="Aoi" mood="facepalm"> 168 + Yep, called it. 169 + </Conv> 170 + 171 + I ended up moving the announcer IRC bot to be a part of [`within.website/x/cmd/mimi`](https://github.com/Xe/x/tree/master/cmd/mimi). `mimi` is a little bot that has claws into a lot of other things, including: 172 + 173 + - Status page updates for the [fly.io community Discord](https://discord.gg/V4bE5uhtUg)'s #status channel 174 + - Announcing new blogposts on [#xeserv on libera.chat](https://web.libera.chat/#xeserv) 175 + - Google Calendar and its own Gmail account for a failed experiment to make a bot that could read emails forwarded to it and schedule appointments based on what a large language model parsed out of the email 176 + 177 + <Conv name="Cadey" mood="enby"> 178 + I really need to finish that project. Someday! Maybe that upcoming AI 179 + hackathon would give me a good excuse to make it happen. 180 + </Conv> 181 + 182 + ### Ansible 183 + 184 + As a bonus round, let's see what it would look like to manage things with Ansible on Rocky Linux should I have been able to install Rocky Linux anyways. Ansible is a Red Hat product, so I expect that it would be the easiest thing to use to manage things. 185 + 186 + Ansible is a "best hopes" configuration management system. It doesn't really authoritatively control what is going on, it merely suggest what should be going on. As such, you influence what the system does with "plays" like this: 187 + 188 + ```yaml 189 + - name: Full system update 190 + dnf: 191 + name: "*" 192 + state: latest 193 + ``` 194 + 195 + This is a play that tells the system to update all of its packages with dnf. However, when I ran the linter on this, I got told I need to instead format things like this: 196 + 197 + ```yaml 198 + - name: Full system update 199 + ansible.builtin.dnf: 200 + name: "*" 201 + state: latest 202 + ``` 203 + 204 + You need to use the fully qualified module name because [you might install other collections that have the name `dnf` in the future](https://docs.ansible.com/ansible/latest/collections/index.html). This kinda makes sense at a glance, I guess, but it's probably overkill for this usecase. However, it makes the lint errors go away and it is fixed mechanically, so I guess that's fine. 205 + 206 + <Conv name="Mara" mood="hacker"> 207 + Programming idiom for you: the output of the formatter is always the correct 208 + way to write your code/configuration. Less linter warnings, less problems. 209 + </Conv> 210 + 211 + What's not fine is how you prevent Ansible from running the same command over and over. You need to make a folder full of empty semaphore files that get touched when the command runs: 212 + 213 + ```yaml 214 + - name: Xe's semaphore flags 215 + ansible.builtin.shell: mkdir -p /etc/xe/semaphores 216 + args: 217 + creates: /etc/xe/semaphores 218 + 219 + - name: Enable CRB repositories # CRB == "Code-Ready Builder" 220 + ansible.builtin.shell: | 221 + dnf config-manager --set-enabled crb 222 + 223 + touch /etc/xe/semaphores/crb 224 + args: 225 + creates: /etc/xe/semaphores/crb 226 + ``` 227 + 228 + And then finally you can install a package: 229 + 230 + ```yaml 231 + - name: Install EPEL repo lists 232 + ansible.builtin.dnf: 233 + name: "epel-release" 234 + state: present 235 + ``` 236 + 237 + This is about the point where I said "No, I'm not going to deal with this". I haven't even created user accounts or installed dotfiles yet, I'm just trying to install a package repository so that I can install other packages. 238 + 239 + <Conv name="Aoi" mood="wut"> 240 + Do you even really need any users but root or your dotfiles on production 241 + servers? Ideally those should be remotely managed anyways. Logging into them 242 + should be a situation of last resort, right? 243 + </Conv> 244 + <Conv name="Numa" mood="delet"> 245 + Assuming that you don't have triangular cows in the mix yeah. 246 + </Conv> 247 + 248 + So I'm not going with Ansible (or likely any situation where Ansible would be required), even on the machines where installing Rocky Linux works without having to enter GRUB rescue shell purgatory. 249 + 250 + <Conv name="Cadey" mood="coffee" standalone> 251 + One of my patrons pointed out that I need to use [Ansible 252 + conditionals](https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_conditionals.html) 253 + in order to prevent these same commands from running over and over. Of course, 254 + in an ideal world these commands would be idempotent (meaning that they can be 255 + run over and over without changing the system), but that's not always the 256 + case. I'm going to dig deeper into this once I have virtualization working on 257 + the cluster. 258 + 259 + Apparently you're supposed to use pre-made roles for as much as you can, such 260 + as from [Ansible Galaxy](https://galaxy.ansible.com/) or [Linux System 261 + Roles](https://linux-system-roles.github.io/). I don't know how I feel about 262 + this (doing things with NixOS got me used to a combination of defining things 263 + myself and then using third party things only when I really have to, but that 264 + was probably because the documentation for anything out of the beaten path is 265 + so poor there), but if this is the "right" way to do things then I'll do it. 266 + 267 + Thanks for the tip, Tudor! 268 + 269 + </Conv> 270 + 271 + ## CoreOS 272 + 273 + Way back when my career was just starting, CoreOS was released. CoreOS was radical and way ahead of its time. Instead of having a mutable server that you can SSH into and install packages at will on, CoreOS had the view that thou must put all your software into Docker containers and run them that way. This made it impossible to install new packages on the server, which they considered a feature. 274 + 275 + I loved using CoreOS when I could because of one part that was absolutely revolutionary: [Fleet](https://github.com/coreos/fleet). Fleet was a distributed init system that let you run systemd services _somewhere_, but you could care where it ran when you needed to. Imagine a world where you could just spin your jobs somewhere, that was Fleet. 276 + 277 + The really magical part about Fleet was the fact that it was deeply integrated into the discovery mechanism of CoreOS. Want 4 nodes in a cluster? Provision them with the same join token and Fleet would just figure it out. Newly provisioned nodes would also accept new work as soon as it was issued. 278 + 279 + <Conv name="Numa" mood="happy"> 280 + Fleet was glorious. It was what made me decide to actually learn how to use 281 + systemd in earnest. Before I had just been a "bloat bad so systemd bad" pleb, 282 + but once I really dug into the inner workings I ended up really liking it. 283 + Everything being composable units that let you build _up_ to what you want 284 + instead of having to be an expert in all the ways shell script messes with you 285 + is just such a better place to operate from. Not to mention being able to 286 + restart multiple units with the same command, define ulimits, and easily 287 + create "oneshot" jobs. If you're a "systemd hater", please actually give it a 288 + chance before you decry it as "complicated bad lol". Shit's complicated 289 + because life is complicated. 290 + </Conv> 291 + 292 + And then it became irrelevant in the face of Kubernetes after CoreOS got bought out by Red Hat and then IBM bought out Red Hat. 293 + 294 + Also, "classic" CoreOS is no more, but its spirit lives on in the form of [Fedora CoreOS](https://fedoraproject.org/coreos/), which is like CoreOS but built on top of [rpm-ostree](https://coreos.github.io/rpm-ostree/). The main difference between Fedora CoreOS and actual CoreOS is that Fedora CoreOS lets you install additional packages on the system. 295 + 296 + <Conv name="Mara" mood="hacker"> 297 + Once Red Hat announced that CoreOS would be deprecated in favor of Fedora 298 + CoreOS, Kinvolk forked "classic" CoreOS to [Flatcar 299 + Linux](https://www.flatcar.org/), where you can still use it to this day. This 300 + post didn't end up evaluating it because it doesn't let you change Ignition 301 + configurations without reimaging the machine, which is unworkable for reasons 302 + that will become obvious later in the article. 303 + 304 + They are using [systemd-sysext](https://www.flatcar.org/blog/2024/04/os-innovation-with-systemd-sysext/) in order to extend the system with more packages, which is reminiscent of rpm-ostree layering. 305 + 306 + </Conv> 307 + 308 + ### Fedora CoreOS 309 + 310 + For various reasons involving divine intervention, I'm going to be building a few of my own RPM packages. I'm also going to be installing other third party programs on top of the OS, such as [yeet](https://github.com/Xe/x/tree/master/cmd/yeet). 311 + 312 + Fedora CoreOS is a bit unique because you install it by declaring the end result of the system, baking that into an ISO, and then plunking that onto a flashdrive to assimilate the machine. If you are using it from a cloud environment, then you plunk your config into the "user data" section of the instance and it will happily boot up with that configuration. 313 + 314 + This is a lot closer to the declarative future I want, with the added caveat that changing the configuration of a running system is a bit more involved than just SSHing into the machine and changing a file. You have to effectively blow away the machine and start over. 315 + 316 + <Conv name="Aoi" mood="wut"> 317 + What? That sounds like a _terrible_ idea. How would you handle moving state 318 + around? 319 + </Conv> 320 + <Conv name="Cadey" mood="aha"> 321 + Remember, this is for treating machines as replaceable _cattle_, not _pets_ 322 + that you imprint on. I'm sure that this will be a fun learning experience at 323 + the very least. 324 + </Conv> 325 + <Conv name="Numa" mood="delet"> 326 + Again, foreshadowing is a literary technique in which... 327 + </Conv> 328 + 329 + I want to build this on top of rpm-ostree because I want to have the best of both worlds: an immutable system that I can still install packages on. This is an absolute superpower and I want to have it in my life. Realistically I'm gonna end up installing only one or two packages on top of the base system, but those one or two packages are gonna make so many things so much easier. Especially for my WireGuard mesh so I can route the pod/service subnets in my Kubernetes cluster. 330 + 331 + As a more practical example of how rpm-ostree, let's take a look at [Bazzite Linux](https://bazzite.gg). Bazzite is a spin of Fedora Silverblue (desktop Fedora built on top of rpm-ostree) that has the Steam Deck UI installed on top of it. This turns devices like the [ROG Ally](https://www.asus.com/ca-en/site/gaming/rog/handheld-consoles/rog-ally/) into actual game consoles instead of handheld gaming PCs. 332 + 333 + <Conv name="Cadey" mood="coffee"> 334 + I went into this distinction more in my failed review video of the ROG Ally. I 335 + plan to post this to [my Patreon](https://patreon.com/cadey) in case you want 336 + to see what could have been. The video is probably fine all things considered, 337 + I just don't think it's up to my standards and don't have the time/energy to 338 + heal it right now. 339 + </Conv> 340 + 341 + In Bazzite, rpm-ostree lets you layer on additional things like the Fanatec steering wheel drivers and wheel managers like [Oversteer](https://github.com/berarma/oversteer). This allows you to _add_ optional functionality without having to worry about breaking the base system. Any time updates are installed, rpm-ostree will layer Oversteer on top of it for you so that you don't have to worry about it. 342 + 343 + This combined with my own [handrolled RPMs with `yeet`](https://github.com/Xe/x/tree/master/cmd/yeet) means that I could add software to my homelab nodes (like I have with Nix/NixOS) without having to worry about it being rebuilt from scratch or its distribution. This is a superpower that I want to keep in my life. 344 + 345 + It's not gonna be as nice as the Nix setup, but something like this: 346 + 347 + ```js 348 + ["amd64", "arm64"].forEach((goarch) => 349 + rpm.build({ 350 + name: "yeet", 351 + description: "Yeet out actions with maximum haste!", 352 + homepage: "https://within.website", 353 + license: "CC0", 354 + goarch, 355 + 356 + build: (out) => { 357 + go.build("-o", `${out}/usr/bin/`); 358 + }, 359 + }) 360 + ); 361 + ``` 362 + 363 + is so much easier to read and manage than it is to do with RPM specfiles. It really does get closer to what it's like to use Nix. 364 + 365 + <Conv name="Cadey" mood="coffee"> 366 + Not to mention if I did my Go packaging the full normal way with RPM 367 + specfiles, I'd have to have my own personal dependencies risk fighting the 368 + system-level dependencies. I don't want to do that, but you can if you want 369 + to. I'd also like my builds to publish one package, not 50-100. 370 + </Conv> 371 + 372 + I'd also need to figure out how to [fix Gitea's RPM package serving support so that it signs packages for me](https://github.com/go-gitea/gitea/pull/27069), but would be solvable. Most of the work is already done, I'd just need to take over the PR and help push it over the finish line. 373 + 374 + ### Installing Fedora CoreOS 375 + 376 + The method I'm going to be using to install Fedora CoreOS is to use [`coreos-installer`](https://coreos.github.io/coreos-installer/) to build an ISO image with a configuration file generated by [`butane`](https://coreos.github.io/butane/). 377 + 378 + To make things extra _fun_, I'm writing this on a Mac, which means I will need to have a Fedora environment handy to build the ISO because Fedora only ships Linux builds of `coreos-installer` and `butane`. 379 + 380 + <Conv name="Mara" mood="hacker"> 381 + This installation was adapted from [this 382 + tutorial](https://devnonsense.com/posts/k3s-on-fedora-coreos-bare-metal/), 383 + with modifications made because I'm using a MacBook instead of a Fedora 384 + machine. 385 + </Conv> 386 + 387 + First, I needed to install [Podman Desktop](https://podman-desktop.io/), which is like the Docker Desktop app except it uses the [Red Hat Podman](https://podman.io/) stack instead of the Docker stack. For the purposes of this article, they are functionally equivalent. 388 + 389 + I made a new repo/folder and then started up a Fedora container: 390 + 391 + ``` 392 + podman run --rm -itv .:/data fedora:latest 393 + ``` 394 + 395 + I then installed the necessary packages: 396 + 397 + ``` 398 + dnf -y install coreos-installer butane ignition-validate 399 + ``` 400 + 401 + And then I copied over the template from the Fedora CoreOS k3s tutorial into `chrysalis.bu`. I edited it to have the hostname `chrysalis`, loaded my SSH keys into it, and then ran the script to generate a prebaked install ISO. I loaded it on a flashdrive and then stuck it into the same Mac Pro from the last episode. 402 + 403 + <Conv name="Cadey" mood="coffee"> 404 + Annoyingly, it seems that the right file extension for Butane configs is `.bu` 405 + and that there isn't a VSCode plugin for it. If I stick with Fedora CoreOS, 406 + I'll have to make something that makes `.bu` files get treated as YAML files 407 + or something. I just told VSCode to treat them as YAML files for now. 408 + </Conv> 409 + 410 + It installed perfectly. I suspect that the actual Red Hat installer can be changed to just treat this machine as a normal EFI platform without any issues, but that is a bug report for another day. Intel Macs are quickly going out of support anyways, so it's probably not going to be the highest priority for then even if I did file that bug. 411 + 412 + I got k3s up and running and then I checked the version number. My config was having me install k3s version 1.27.10, which is much older than the current version [1.30.0 "Uwubernetes"](https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-release/). I fixed the butane config to point to the new version of k3s and then I tried to find a way to apply it to my running machine. 413 + 414 + <Conv name="Aoi" mood="wut"> 415 + That should be easy, right? You should just need to push the config to the 416 + server somehow and then it'll reconverge, right? 417 + </Conv> 418 + 419 + Yeah, about that. It turns out that Fedora CoreOS is very much on the side of "cattle, not pets" when it comes to datacenter management. The Fedora CoreOS view is that any time you need to change out the Ignition config, you should reimage the machine. This makes sense for a lot of hyperconverged setups where this is as simple as pushing a button and waiting for it to come back. 420 + 421 + <Conv name="Cadey" mood="wat"> 422 + I'm not sure what the ideal Fedora CoreOS strategy for handling disk-based 423 + application state is. Maybe it's "don't fuck around with prod enough that this 424 + is an issue", which is reasonable enough. I remember that with normal CoreOS 425 + the advice was "please avoid relying on local storage as much as you can", but 426 + they probably solved that by this point, either with a blessed state partition 427 + or by continuing the advice to avoid local storage as much as you can. Further 428 + research would be required. 429 + </Conv> 430 + 431 + However, my homelab is many things, but it isn't a hyperconverged datacenter setup. It's where I fuck around so I can find out (and then launder that knowledge through you to the rest of the industry for Patreon money and ad impressions). If I want to adopt an OS in the homelab, I need the ability to change my mind without having to burn four USB drives and reflash my homelab. 432 + 433 + This was a bummer. I'm gonna have to figure out something else to get Kubernetes up and running for me. 434 + 435 + ## Other things I evaluated and ended up passing on 436 + 437 + I was told by a coworker that [k3OS](https://k3os.io/) is a great way to have a "boot to Kubernetes" environment that you don't have to think about. This is by the Rancher team, which I haven't heard about in ages since I played with [RancherOS](https://rancher.com/docs/os/v1.x/en/) way back in the before times. 438 + 439 + RancherOS was super wild for its time. It didn't have a package manager, it had the Docker daemon. Two Docker daemons in fact, one for the "system" docker daemon that handled things like TTY sessions, DHCP addresses, device management, system logs, and the like. The other Docker daemon was for the userland, which was where you ran your containers. 440 + 441 + <Conv name="Cadey" mood="coffee"> 442 + I kinda miss how wild RancherOS was. It was great for messing around with at 443 + one of my former workplaces. We didn't use it for anything super critical, but 444 + it was a great hypervisor for a Minecraft server. 445 + </Conv> 446 + 447 + I tried to get K3os up and running, but then I found out that it's deprecated. That information isn't on the website, it's on the [getting started documentation](https://github.com/rancher/k3os/blob/master/README.md#quick-start). It's apparently replaced by [Elemental](https://elemental.docs.rancher.com/), which seems to be built on top of OpenSUSE (kinda like how Fedora CoreOS is built on Fedora). 448 + 449 + <Conv name="Aoi" mood="wut"> 450 + Didn't Rancher get bought out by SUSE? That'd explain why everything is 451 + deprecated except for something based on OpenSUSE. 452 + </Conv> 453 + <Conv name="Cadey" mood="coffee"> 454 + Oh. Right. That makes sense. I guess I'll have to look into Elemental at some 455 + point. Maybe I'll do that in the future. 456 + </Conv> 457 + 458 + I'm gonna pass on this for now. Maybe in the future. 459 + 460 + ## The Talos Principle 461 + 462 + [Straton of Stageira](https://talosprinciple.fandom.com/wiki/Straton_of_Stageira) once argued that the mythical construct Talos (an automaton that experienced qualia and had sapience) proved that there was nothing special about mankind. If a product of human engineering could have the same kind of qualia that people do, then realistically there is nothing special about people when compared to machines. 463 + 464 + To say that [Talos Linux](https://www.talos.dev/) is minimal is a massive understatement. It only has literally [12 binaries in it](https://www.siderolabs.com/blog/there-are-only-12-binaries-in-talos-linux/). I've been conceptualizing it as "what if [gokrazy](/blog/gokrazy/) was production-worthy?". 465 + 466 + My main introduction to it was last year at [All Systems Go!](https://media.ccc.de/v/all-systems-go-2023-202-talos-linux-trustedboot-for-a-minimal-immutable-os) by a fellow speaker. I'd been wanting to try something like this out for a while, but I haven't had a good excuse to sample those waters until now. It's really intriguing because of how damn minimal it is. 467 + 468 + So I downloaded the arm64 ISO and set up a VM on my MacBook to fuck around with it. Here's a few of the things that I learned in the process: 469 + 470 + <Conv name="Cadey" mood="enby"> 471 + If you haven't tried out [UTM](https://mac.getutm.app) yet, you are really 472 + missing out. It's the missing virtual machine hypervisor for macOS. It's one 473 + of the best apps I know of for running virtual machines on Apple Silicon. I 474 + mostly use it to run random Linux machines on my MacBook, but I've also heard 475 + of people using it to play [Half-Life on an 476 + iPad](https://youtu.be/LrLDKYFyLMM). Highly suggest. 477 + </Conv> 478 + 479 + UTM has two modes it can run a VM in. One is "Apple Virtualization" mode that gives you theoretically higher performance at the cost of less options when it comes to networking (probably because `Hypervisor.framework` has less knobs available to control the VM environment). In order to connect the VM to a shared network (so you can poke it directly with `talosctl` commands without needing overlay VPNs or crazy networking magic like that), you need to create it without "Apple Virtualization" checked. This does mean you can't expose Rosetta to run amd64 binaries (and performance might be theoretically slower in a way you can't perceive given the minimal linux distros in play), but that's an acceptable tradeoff. 480 + 481 + <Picture 482 + path="xedn/dynamic/4bda0ab5-46db-4abd-b37b-8f14d2882e60" 483 + desc="UTM showing off the 'Shared Network' pane, you want this enabled to get access to the 192.168.65.0/24 network to poke your VM directly." 484 + /> 485 + 486 + Talos Linux is completely declarative for the base system and really just exists to make Kubernetes easier to run. One of my favorite parts has to be the way that you can combine different configuration snippets together into a composite machine config. Let's say you have a base "control plane config" in `controlplane.yaml` and some host-specific config in `hosts/hostname.yaml`. Your `talosctl apply-config` command would look like this: 487 + 488 + ```sh 489 + talosctl apply-config -n kos-mos -f controlplane.yaml -p @patches/subnets.yaml -p @hosts/kos-mos.yaml 490 + ``` 491 + 492 + This allows your `hosts/kos-mos.yaml` file to look like this: 493 + 494 + ```yaml 495 + cluster: 496 + apiServer: 497 + certSANs: 498 + - 100.110.6.17 499 + 500 + machine: 501 + network: 502 + hostname: kos-mos 503 + install: 504 + disk: /dev/nvme0n1 505 + ``` 506 + 507 + which allows me to do generic settings cluster-wide _and then_ specific settings for each host (just like I have with my Nix flakes repo). For example, I have a few homelab nodes with nvidia GPUs that I'd like to be able to run AI/large langle mangle tasks on. I can set up the base config to handle generic cases and then enable the GPU drivers only on the nodes that need them. 508 + 509 + <Conv name="Cadey" mood="coffee"> 510 + By the way, resist the temptation to install the nvidia GPU drivers on 511 + machines that do not need them. It will result in the nvidia GPU drivers 512 + trying to load in a loop, then complaining that they can't find the GPU, and 513 + then trying to load again. In order to unstuck yourself from that situation, 514 + you have to reimage the machine by attaching a crash cart and selecting the 515 + "wipe disk and boot into maintenance mode" option. This was fun to figure out 516 + by hand, but it was made easier with the `talosctl dashboard` command. 517 + </Conv> 518 + 519 + ### The Talosctl Dashboard 520 + 521 + I just have to take a moment to gush about the `talosctl dashboard` command. It's a TUI interface that lets you see what your nodes are doing. When you boot a metal Talos Linux node, it opens the dashboard by default so you can watch the logs as the system wakes up and becomes active. 522 + 523 + When you run it on your laptop, it's as good as if not better than having SSH access to the node. All the information you could want is right there at a glance and you can connect to mulitple machines at once. Just look at this: 524 + 525 + <Picture 526 + path="xedn/dynamic/f6bb22c4-f26d-41aa-868d-56dc7af841b3" 527 + desc="The talosctl dashboard, it's a TUI interface that lets you see what is going on with your nodes." 528 + /> 529 + 530 + Those three nodes can be swapped between by pressing the left and right arrow keys. It's the best kind of simple, the kind that you don't have to think about in order to use it. No documentation needed, just run the command and go on instinct. I love it. 531 + 532 + You can press F2 to get a view of the processes, resource use, and other errata. It's everything you could want out of htop, just without the ability to run Doom. 533 + 534 + ### Making myself a Kubernete 535 + 536 + <Conv name="Cadey" mood="coffee"> 537 + A little meta note because it's really easy to get words conflated here: 538 + whenever I use CapitalizedWords, I'm talking about the Kubernetes concepts, 539 + not the common English words. It's really hard for me to avoid talking about 540 + the word "service" given the subject matter. Whenever you see "Service", 541 + "Deployment", "Secret", "Ingress", or the like; know that I'm talking about 542 + the Kubernetes definition of those terms. 543 + </Conv> 544 + 545 + Talos Linux is built to do two things: 546 + 547 + 1. Boot into Linux 548 + 2. Run Kubernetes 549 + 550 + That's it. It's beautifully brutalist. I love it so far. 551 + 552 + I decided to start with `kos-mos` arbitrarily. I downloaded the ISO, tried to use balenaEtcher to flash it to a USB drive and then windows decided that now was the perfect time to start interrupting me with bullshit related to Explorer desperately trying to find and mount USB drives. 553 + 554 + <Conv name="Cadey" mood="coffee"> 555 + Lately Windows has been going out of its way to actively interfere when I try 556 + to do anything fancy or convenient. I only tolerate it for games, but I am 557 + reconsidering my approach. If only Wayland supported accessibility hooks. 558 + </Conv> 559 + 560 + I was unable to use balenaEtcher to write it, but then I found out that [Rufus](https://rufus.ie/en/) can write ISOs to USB drives in a way that doesn't rely on Windows to do the mounting or writing. That worked and I had `kos-mos` up and running in short order. 561 + 562 + <Conv name="Cadey" mood="enby"> 563 + This is when I found out about the hostname patch yaml trick, so it booted 564 + into a randomly generated `talos-whatever` hostname by default. This was okay, 565 + but I wanted to have the machine names be more meaningful so I can figure out 566 + what's running where at a glance. Changing hostnames was trivial though, you 567 + can do it from the dashboard worst case. I'm aware that this is defeating the 568 + point of the "cattle, not pets" flow that a lot of modern Linux distributions 569 + want you to go down, but my homelab servers are my pets. 570 + </Conv> 571 + 572 + After bootstrapping etcd and exposing the subnet routes, I made an nginx deployment with a service as a "hello world" to ensure that things were working properly. Here's the configuration I used: 573 + 574 + ```yaml 575 + --- 576 + apiVersion: apps/v1 577 + kind: Deployment 578 + metadata: 579 + name: nginx 580 + labels: 581 + app.kubernetes.io/name: nginx 582 + spec: 583 + replicas: 3 584 + selector: 585 + matchLabels: 586 + app.kubernetes.io/name: nginx 587 + template: 588 + metadata: 589 + labels: 590 + app.kubernetes.io/name: nginx 591 + spec: 592 + containers: 593 + - name: nginx 594 + image: nginx:1.14.2 595 + ports: 596 + - containerPort: 80 597 + --- 598 + apiVersion: v1 599 + kind: Service 600 + metadata: 601 + name: nginx 602 + spec: 603 + selector: 604 + app.kubernetes.io/name: nginx 605 + ports: 606 + - protocol: TCP 607 + port: 80 608 + targetPort: 80 609 + type: ClusterIP 610 + ``` 611 + 612 + <Conv name="Mara" mood="hacker"> 613 + For those of you that don't grok k8s yaml, this configuration creates two things: 614 + 615 + - A `Deployment` (think of it as a set of `Pods` that can be scaled up or down and upgraded on a rolling basis) that runs three copies of [nginx](https://nginx.org/en/) showing the default "welcome to nginx" page, with port 80 marked as "open" to other things. 616 + - A `ClusterIP Service` that exposes the nginx `Deployment`'s port 80 to a stable IP address within the cluster. This cluster IP will be used by other services to talk to the nginx `Deployment`. 617 + 618 + Normally these `ClusterIP` services are only exposed in the cluster (as the name implies), but when you have overlay networks and subnet routing in the mix, you can do anything, such as poking the service from your laptop: 619 + 620 + </Conv> 621 + 622 + <Picture 623 + path="xedn/dynamic/c4d36dd3-c8f1-4115-a504-48b9e6412fc8" 624 + desc="The 'welcome to nginx' page on the url http://nginx.default.svc.alrest.xeserv.us, which is not publicly exposed to you." 625 + /> 626 + 627 + Once this is up, you're golden. You can start deploying more things to your cluster and then they can talk to eachother. One of the first things I deployed was a Reddit/Discord bot that I maintain for a community I've been in for a long time. It's a simple stateless bot that only needs a single deployment to run. You can see its source code and deployment manifest [here](https://github.com/Xe/x/tree/master/cmd/sapientwindex). 628 + 629 + The only weird part here is that I needed to set up secrets for handling the bot's Discord webhook. I don't have a secret vault set up (looking onto setting up the 1password one out of convenience because I already use it at home), so I yolo-created the secret with `kubectl create secret generic sapientwindex --from-literal=DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/1234567890/ABC123` and then mounted it into the pod as an environment variable. The relevant yaml snippet is under the `bot` container's `env` key: 630 + 631 + ```yaml 632 + env: 633 + - name: DISCORD_WEBHOOK_URL 634 + valueFrom: 635 + secretKeyRef: 636 + name: sapientwindex 637 + key: DISCORD_WEBHOOK_URL 638 + ``` 639 + 640 + This is a little more verbose than I'd like, but I understand why it has to be this way. Kubernetes is the most generic tool you can make, as such it has to be able to adapt to any workflow you can imagine. Kubernetes manifests can't afford to make too many assumptions, so they simply elect not to as much as possible. As such, you need to spell out all your assumptions by hand. 641 + 642 + I'll get this refined in the future with templates or whatever, but for now my favorite part about it is that it works. 643 + 644 + <Conv name="Aoi" mood="wut"> 645 + Why are you making your secrets environment variables instead of mounting them 646 + as a filesystem? 647 + </Conv> 648 + <Conv name="Cadey" mood="aha"> 649 + I want to have this as an environment variable because this bot was made with 650 + the [12 factor app](https://12factor.net/) methodology in mind. It's a 651 + stateless bot that only needs a single environment variable to run, so I'm 652 + going to keep it that way. The bot is also already programmed to read from the 653 + environment variable (but I could have it read the environment variable from 654 + the 655 + [flagconfyg](https://github.com/Xe/x/tree/master/internal/confyg/flagconfyg) 656 + file if I needed to). If there were more than 10 variables, I'd probably mount 657 + the secret as a flagconfyg or .env file instead. If I wanted to support 658 + secrets as a filesystem, I'd need to write some extra code to import a 659 + directory tree as flag values as my /x/ repo (and other projects of mine) use 660 + [package flag](https://pkg.go.dev/flag) for managing secrets and other 661 + configuration variables. I'm lazy. 662 + </Conv> 663 + 664 + After I got that working, I connected some other nodes and I've ended up with this: 665 + 666 + ``` 667 + $ kubectl get nodes 668 + NAME STATUS ROLES AGE VERSION 669 + chrysalis Ready control-plane 20h v1.30.0 670 + kos-mos Ready control-plane 20h v1.30.0 671 + ontos Ready control-plane 20h v1.30.0 672 + ``` 673 + 674 + The next big thing to get working is to get a bunch of operators working so that I can have my cluster dig its meaty claws into various other things. 675 + 676 + ## What the hell is an operator anyways? 677 + 678 + In Kubernetes land, an operator is a thing you install into your cluster that makes it integrate with another service or provides some functionality. For example, the [1Password operator](https://developer.1password.com/docs/k8s/k8s-operator/) lets you import 1Password data into your cluster as Kubernetes secrets. It's effectively how you extend Kubernetes to do more things with the same Kubernetes workflow you're already used to. 679 + 680 + One of the best examples of this is the 1Password operator I mentioned. It's how I'm using 1Password to store secrets for my apps in my cluster. I can then edit them with the 1Password app on my PC or MacBook and the relevant services will restart automatically with the new secrets. 681 + 682 + So I installed the operator with Helm and then it worked the first time. I was surprised, given how terrible Helm is in my experience. 683 + 684 + <Conv name="Aoi" mood="wut"> 685 + Why is Helm bad? It's the standard way to install reusable things in 686 + Kubernetes. 687 + </Conv> 688 + <Conv name="Cadey" mood="coffee"> 689 + Helm uses string templating to template structured data. It's like using `sed` 690 + to template JSON. It works, but you have to abuse a lot of things like the 691 + [`indent`](https://helm.sh/docs/chart_template_guide/yaml_techniques/#indenting-and-templates) 692 + function in order for things to be generically applicable. It's a mess, but 693 + only when you try and use it in earnest across your stack. It's what made me 694 + nearly burn out of the industry. 695 + </Conv> 696 + <Conv name="Aoi" mood="coffee"> 697 + Why is so much of this stuff just one or two steps away from being really 698 + good? 699 + </Conv> 700 + <Conv name="Numa" mood="delet"> 701 + Venture capital! They used to have a way to do structured templates but it was 702 + deprecated and removed in Helm 3.0, so we all get to suffer together. 703 + </Conv> 704 + 705 + The only hard part I ran into was that it wasn't obvious how I should assemble the reference strings for 1Password secrets. When you create the 1Password secret syncing object, it looks like this: 706 + 707 + ```yaml 708 + apiVersion: onepassword.com/v1 709 + kind: OnePasswordItem 710 + metadata: 711 + name: sapientwindex 712 + spec: 713 + itemPath: "vaults/lc5zo4zjz3if3mkeuhufjmgmui/items/cqervqahekvmujrlhdaxgqaffi" 714 + ``` 715 + 716 + This tells the operator to create a secret named `sapientwindex` in the default namespace with the item path `vaults/lc5zo4zjz3if3mkeuhufjmgmui/items/cqervqahekvmujrlhdaxgqaffi`. The item path is made up of the vault ID (`lc5zo4zjz3if3mkeuhufjmgmui`) and the item ID (`cqervqahekvmujrlhdaxgqaffi`). I wasn't sure how to get these in the first place, but I found the vault ID with the `op vaults list` command and then figured out you can enable right-clicking in the 1Password app to get the item ID. 717 + 718 + To enable this, go to Settings -> Advanced -> Show debugging tools in the 1Password app. This will let you right-click any secret and choose "Copy item UUID" to get the item ID for these secret paths. 719 + 720 + This works pretty great, I'm gonna use this extensively going forward. It's gonna be slightly painful at first, but once I get into the flow of this (and realistically write a generator that pokes the 1password cli to scrape this information more easily) it should all even out. 721 + 722 + ## Trials and storage tribulations 723 + 724 + As I said at the end of [my most recent conference talk](/talks/2024/shashin/), storage is one of the most annoying bullshit things ever. It's extra complicated with Talos Linux in particular because of how it uses the disk. Most of the disks of my homelab are Talos' "ephemeral state" partitions, which are used for temporary storage and wiped when the machine updates. This is great for many things, but not for persistent storage with [PersistentVolumes/PersistntVolumeClaims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/). 725 + 726 + <Conv name="Mara" mood="hacker"> 727 + If you haven't used PersistentVolumes before, they are kinda like [Fly 728 + Volumes](https://fly.io/docs/reference/volumes/) or Docker Volumes. The main 729 + difference is that a PersistentVolume is usually shared between hosts, so that 730 + you can mount the same PersistentVolume on Pods located on multiple cluster 731 + Nodes. It's really neat. 732 + [StorageClasses](https://kubernetes.io/docs/concepts/storage/storage-classes/) 733 + let you handle things like data locality, backup policies, and more. This lets 734 + you set up multiple providers so that you can have some things managed by your 735 + cluster-local storage provider, some managed by the cloud provider, and some 736 + super-yolo ones directly mounted to the host filesystem. 737 + </Conv> 738 + 739 + I have tried the following things: 740 + 741 + - [Longhorn](https://longhorn.io/): a distributed block storage thing for Kubernetes by the team behind Rancher. It's pretty cool, but I got stuck at trying to get it actually running on my cluster. The pods were just permanently stuck in the `Pending` state due to etcd not being able to discover itself. 742 + - [OpenEBS](https://github.com/openebs/openebs): another distributed block storage thing for Kubernetes by some team of some kind. It claims to be the most widely used storage thing for Kubernetes, but I couldn't get it to work either. 743 + 744 + Among the things I've realized when debugging this is that _no matter what_, many storage things for Kubernetes will hardcode the cluster DNS name to be `cluster.local`. I made my cluster use the DNS name `alrest.xeserv.us` following the advice of one of my SRE friends to avoid using "fake" DNS names as much as possible . This has caused me no end of trouble, as many things in the Kubernetes ecosystem assume that the cluster DNS name is `cluster.local`. It turns out that many Kubernetes ecosystem tools hard-assume the DNS name because the CoreDNS configs in many popular Kubernetes platforms (like AWS EKS, Azure whatever-the-heck, and GKE) have broken DNS configs that make relative DNS names not work reliably. As a result, people have hardcoded the DNS name to `cluster.local` in many places in both configuration and code. 745 + 746 + <Conv name="Aoi" mood="coffee"> 747 + Yet again pragmatism wins out over correctness in the most annoying ways. Why 748 + does everything have to be so _bad_? 749 + </Conv> 750 + <Conv name="Mara" mood="hacker"> 751 + To be fair to the Kubernetes ecosystem maintainers, they are faced with a 752 + pretty impossible task. They have to be able to be flexible enough to handle 753 + random bespoke homelab clusters and whatever the cloud providers think is 754 + sensible. That is such a wide range of configurations that I don't think it's 755 + really possible to do anything _but_ make those assumptions about how things 756 + work. It's a shame that changing the cluster DNS name breaks so much, but it's 757 + understandable because most cloud providers don't expose that setting to 758 + users. It always sucks to use "fake" DNS names because they can and will 759 + become top-level domains [like what happened with 760 + `.dev`](https://prinsfrank.nl/2019/02/26/With-the-new-dev-domains-googles-dont-be-evil-phase-is-a-distant-memory). 761 + It would be nice if Kubernetes encouraged people to choose their own "real" 762 + domain names, but it's understandable that they ended up with `cluster.local` 763 + because `.local` [is registered as a "special-use domain 764 + name"](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) 765 + but the IETF. 766 + </Conv> 767 + 768 + Fixing this was easy, I had to edit the CoreDNS ConfigMap to look like this: 769 + 770 + ```yaml 771 + data: 772 + Corefile: |- 773 + .:53 { 774 + errors 775 + health { 776 + lameduck 5s 777 + } 778 + ready 779 + log . { 780 + class error 781 + } 782 + prometheus :9153 783 + 784 + kubernetes cluster.local alrest.xeserv.us in-addr.arpa ip6.arpa { 785 + pods insecure 786 + fallthrough in-addr.arpa ip6.arpa 787 + } 788 + forward . /etc/resolv.conf 789 + cache 30 790 + loop 791 + reload 792 + loadbalance 793 + } 794 + ``` 795 + 796 + I prepended the `cluster.local` "domain name" to the `kubernetes` block. Then I deleted the CoreDNS pods in the `kube-system` namespace and they were promptly restarted with the new configuration. This at least got me to the point where normal DNS things worked again. 797 + 798 + <Conv name="Mara" mood="hacker"> 799 + I later found out I didn't need to do this. When CoreDNS sees the ConfigMap 800 + update, it'll automatically reload the config. However, SRE instinct kicks in 801 + when you're dealing with unknowns and sometimes the placebo effect of 802 + restarting the damn thing by hand makes you feel better. Feeling better can be 803 + way more important than actually fixing the problem, especially when you're 804 + dealing with a lot of new technology. 805 + </Conv> 806 + <Conv name="Numa" mood="delet"> 807 + There's no kill like overkill afterall! 808 + </Conv> 809 + 810 + However, this didn't get Longhorn working. The manager container was just stuck trying to get created. Turns out the solution was really stupid and I want to explain what's going on here so that you can properly commiserate with me over the half a day I spent trying to get this working. 811 + 812 + Talos Linux sets a default security policy that blocks the Longhorn manager from running. This is because the Longhorn manager runs as root and Talos Linux is paranoid about security. In order to get Longhorn running, I had to add the following annotations to the Longhorn namespace: 813 + 814 + ```yaml 815 + apiVersion: v1 816 + kind: Namespace 817 + metadata: 818 + name: longhorn-system 819 + labels: 820 + pod-security.kubernetes.io/enforce: privileged 821 + pod-security.kubernetes.io/enforce-version: latest 822 + pod-security.kubernetes.io/audit: privileged 823 + pod-security.kubernetes.io/audit-version: latest 824 + pod-security.kubernetes.io/warn: privileged 825 + pod-security.kubernetes.io/warn-version: latest 826 + ``` 827 + 828 + After you do this, you need to delete the longhorn-deployer _Pod_ and then wait about 10-15 minutes for the entire system to converge. For some reason it doesn't automatically restart when labels are changed, but that is very forgiveable given how many weird things are at play with this ecosystem. Either way, getting this working _at all_ was a huge relief. 829 + 830 + <Conv name="Aoi" mood="wut"> 831 + Wasn't Longhorn part of the SUSE acquistion? 832 + </Conv> 833 + <Conv name="Cadey" mood="enby"> 834 + Yes, but they also donated Longhorn to the CNCF, so it's going to be 835 + maintained until it's inevitably deprecated in favor of yet another storage 836 + product. Hopefully there's an easy migration path, but I'm not going to worry 837 + about this until I have to. 838 + </Conv> 839 + 840 + Once Longhorn starts up, you can create a PersistentVolumeClaim and attach it to a pod: 841 + 842 + ```yaml 843 + apiVersion: v1 844 + kind: PersistentVolumeClaim 845 + metadata: 846 + name: longhorn-volv-pvc 847 + namespace: default 848 + spec: 849 + accessModes: 850 + - ReadWriteOnce 851 + storageClassName: longhorn 852 + resources: 853 + requests: 854 + storage: 256Mi 855 + --- 856 + apiVersion: v1 857 + kind: Pod 858 + metadata: 859 + name: volume-test 860 + namespace: default 861 + spec: 862 + restartPolicy: Always 863 + containers: 864 + - name: volume-test 865 + image: nginx:stable-alpine 866 + imagePullPolicy: IfNotPresent 867 + livenessProbe: 868 + exec: 869 + command: 870 + - ls 871 + - /data/lost+found 872 + initialDelaySeconds: 5 873 + periodSeconds: 5 874 + volumeMounts: 875 + - name: vol 876 + mountPath: /data 877 + ports: 878 + - containerPort: 80 879 + volumes: 880 + - name: vol 881 + persistentVolumeClaim: 882 + claimName: longhorn-volv-pvc 883 + ``` 884 + 885 + <Conv name="Cadey" mood="facepalm"> 886 + I feel so dumb right now. It was just a security policy mismatch. 887 + </Conv> 888 + <Conv name="Numa" mood="happy"> 889 + Hey, at least it was a dumb problem. Dumb problems are always so much easier 890 + to deal with than the not-dumb problems. The not-dumb problems end up sucking 891 + so much and drain you of your soul energy. 892 + </Conv> 893 + 894 + Longhorn ended up working, so I [set up backups](https://longhorn.io/docs/1.6.1/snapshots-and-backups/scheduling-backups-and-snapshots/) to [Tigris](https://tigrisdata.com) and then I plan to not think about it until I need to. The only catch is that I need to label every PersistentVolumeClaim with `recurring-job-group.longhorn.io/backup: enabled` to make my backup job run: 895 + 896 + ```yaml 897 + apiVersion: longhorn.io/v1beta1 898 + kind: RecurringJob 899 + metadata: 900 + name: backup 901 + namespace: longhorn-system 902 + spec: 903 + cron: "0 0 * * *" 904 + task: "backup" 905 + groups: 906 + - default 907 + retain: 4 908 + concurrency: 2 909 + ``` 910 + 911 + <Conv name="Cadey" mood="enby"> 912 + Thanks for the backup space Ovais & co! I wonder how efficient this really is 913 + because most of the blocks (based on unscientific random clicking around in 914 + the Tigris console) are under the threshold for [being inlined to 915 + FoundationDB](https://www.tigrisdata.com/docs/overview/#fast-small-object-retrieval). 916 + I'll have to ask them about it once I get some more significant data workloads 917 + in the mix. Realistically, it's probably fine and will end up being a decent 918 + stress test for them. 919 + </Conv> 920 + 921 + Hopefully I won't need to think about this for a while. At its best, storage is invisible. 922 + 923 + ## The ~~factory~~ cluster must grow 924 + 925 + I dug `logos` out of mothballs and then I plugged in the Talos Linux USB. I then ran the `logos` command to install Talos Linux on the machine. It worked perfectly and I had a new homelab node up and running in no time. All I had to do was: 926 + 927 + - Get it hooked up to Ethernet and power 928 + - Boot it off of the Talos Linux USB stick 929 + - Apply the config with `talosctl` from my macbook 930 + - Wait for it to reboot and everything to green up in `kubectl` 931 + 932 + That's it. This is what every homelab OS should strive to be. 933 + 934 + I also tried to add my [Win600](/blog/anbernic-win600-review/) to the cluster, but I don't think Talos supports wi-fi. I'm asking in the Matrix channel and in a room full of experts. I was able to get it to connect to ethernet over USB in a hilariously jankriffic setup though: 935 + 936 + <Picture 937 + path="xedn/dynamic/a1f2dea0-158d-4ee4-b708-3802f54a734e" 938 + desc="An Anbernic Win600 with its screen sideways booted into Talos Linux. It is precariously mounted on the floor with power going in on one end and ethernet going in on the other. It is not a production-worthy setup." 939 + /> 940 + 941 + <Conv name="Aoi" mood="coffee"> 942 + Why would you do this to yourself? 943 + </Conv> 944 + <Conv name="Numa" mood="happy"> 945 + Science isn't about why, it's about why not! 946 + </Conv> 947 + 948 + I seriously can't believe this works. It didn't work well enough to stay in production, but it's worth a laugh or two at least. I ended up removing this node so that I can have floor space back. I'll have to figure out how to get it on the network properly later, maybe after DevOpsDays KC. 949 + 950 + ## ingressd and related fucking up 951 + 952 + I was going to write about a super elegant hack that I'm doing to get ingress from the public internet to my homelab here, but I fucked up again and I potentially got to do etcd surgery. 953 + 954 + The hack I was trying to do was creating a userspace wireguard network for handling HTTP/HTTPS ingress from the public internet. I chose to use the network `10.255.255.0/24` for this (I had a TypeScript file to configure the WireGuard keys and everything). Apparently Talos Linux configured etcd to prefer anything in `10.0.0.0/8` by default. This has lead to the following bad state: 955 + 956 + ``` 957 + $ talosctl etcd members -n 192.168.2.236 958 + NODE ID HOSTNAME PEER URLS CLIENT URLS LEARNER 959 + 192.168.2.236 3a43ba639b0b3ec3 chrysalis https://10.255.255.16:2380 https://10.255.255.16:2379 false 960 + 192.168.2.236 d07a0bb98c5c225c kos-mos https://10.255.255.17:2380 https://192.168.2.236:2379 false 961 + 192.168.2.236 e78be83f410a07eb ontos https://10.255.255.19:2380 https://192.168.2.237:2379 false 962 + 192.168.2.236 e977d5296b07d384 logos https://10.255.255.18:2380 https://192.168.2.217:2379 false 963 + ``` 964 + 965 + This is uhhh, not good. The normal strategy for recovering from an etcd split brain involves stopping etcd on all nodes and then recovering one of them, but I can't do that because `talosctl` doesn't let you stop etcd: 966 + 967 + ``` 968 + $ talosctl service etcd -n 192.168.2.196 stop 969 + error starting service: 1 error occurred: 970 + * 192.168.2.196: rpc error: code = Unknown desc = service "etcd" doesn't support stop operation via API 971 + ``` 972 + 973 + When you get etcd into this state, it is generally very hard to convince it otherwise without doing database surgery and suffering the pain of having fucked it up. Fixing this is a very _doable_ process, but I didn't really wanna deal with it. 974 + 975 + I ended up blowing away the cluster and starting over. I tried using TESTNET (192.0.2.0/24) for the IP range but ran into issues where my super hacky userspace WireGuard code wasn't working right. I gave up at this point and ended up using my existing WireGuard mesh for ingress. I'll have to figure out how to do this properly later. 976 + 977 + <Conv name="Cadey" mood="facepalm"> 978 + While I was resetting the cluster, I ran into a kinda hilarious problem: 979 + asking Talos nodes to wipe their disk and reset all state makes them wipe 980 + _everything_, including the system partition. I did ask it to wipe 981 + _everything_, but I didn't think it would nuke the OS too. It was kind of a 982 + hilarious realization when I ended up figuring out what I did, but it's good 983 + to know that "go die" means that it will kill everything. That's kind of a 984 + dangerous call to expose without some kind of confirmation, but I guess that 985 + is at the pay-to-win tier with [Sidero Labs 986 + support](https://www.siderolabs.com/pricing/). I'll probably end up paying for 987 + a hobby subscription at some point, just to support the company powered my 988 + homelab's hopes and dreams. Money does tend to let one buy goods and services. 989 + </Conv> 990 + 991 + `ingressd` ended up becoming a simple TCP proxy with added [PROXY protocol](http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) support so that ingress-nginx could get the right source IP addresses. It's nothing to write home about, but it's my simple TCP proxy that I probably could have used something off the shelf for. 992 + 993 + <Conv name="Numa" mood="delet"> 994 + The not-invented-here is strong with this one. 995 + </Conv> 996 + <Conv name="Cadey" mood="aha"> 997 + Something something future expansion when I have time/energy. 998 + </Conv> 999 + 1000 + ### Using `ingressd` (why would you do this to yourself) 1001 + 1002 + If you want to install `ingressd` for your cluster, here's the high level steps: 1003 + 1004 + <Conv name="Mara" mood="hmm"> 1005 + Keep in mind, `ingressd` has no real support. If you run it, you are on your 1006 + own. Good luck if so! 1007 + </Conv> 1008 + 1009 + 1. Gather the secret keys needed for this terraform manifest (change the domain for Route 53 to your own domain): https://github.com/Xe/x/blob/master/cmd/ingressd/tf/main.tf 1010 + 2. Run `terraform apply` in the directory with the above manifest 1011 + 3. Go a level up and run `yeet` to build the ingressd RPM 1012 + 4. Install the RPM on your ingressd node 1013 + 5. Create the file `/etc/ingressd/ingressd.env` with the following contents: 1014 + 1015 + ``` 1016 + HTTP_TARGET=serviceIP:80 1017 + HTTPS_TARGET=serviceIP:443 1018 + ``` 1019 + 1020 + <Conv name="Mara" mood="hacker"> 1021 + Fetch the service IP from `kubectl get svc -n ingress-nginx` and replace 1022 + `serviceIP` with it. 1023 + </Conv> 1024 + 1025 + This assumes you are subnet routing your Kubernetes node and service network over your WireGuard mesh of choice. If you are not doing that, you can't use `ingressd`. 1026 + 1027 + <Conv name="Cadey" mood="aha"> 1028 + This is why I wanted to use a userspace WireGuard connection for this. If I 1029 + end up implementing this properly, I'm probably gonna end up using two 1030 + binaries: one on the public ingressd host, and an ingressd-buddy that runs in 1031 + the cluster. 1032 + </Conv> 1033 + 1034 + Also make sure to run the magic firewalld unbreaking commands: 1035 + 1036 + ``` 1037 + firewall-cmd --zone=public --add-service=http 1038 + firewall-cmd --zone=public --add-service=https 1039 + ``` 1040 + 1041 + <Conv name="Cadey" mood="percussive-maintenance"> 1042 + I always forget the magic firewalld unbreaking commands. 1043 + </Conv> 1044 + 1045 + ## It's always DNS 1046 + 1047 + Now that I have [ingress working](http://ingressd.cetacean.club/), it's time for one of the most painful things in the universe: DNS. Since I've used Kubernetes last, [External DNS](https://github.com/kubernetes-sigs/external-dns) is now production-ready. I'm going to use it to manage the DNS records for my services. 1048 + 1049 + In typical Kubernetes fashion, it seems that it has gotten incredibly complicated since the last time I used it. It used to be fairly simple, but now installing it requires you to really consider what the heck you are doing. There's also no Helm chart, so you're _really_ on your own. 1050 + 1051 + After reading some documentation, I ended up on the following Kubernetes manifest: [external-dns.yaml](https://gist.githubusercontent.com/Xe/8d4d960bcad372a7a2b04265b9eba21c/raw/1b430cf723f1877e764f72d9db720da95f95616b/external-dns.yaml). So that I can have this documented for me as much as it is for you, here is what this does: 1052 + 1053 + 1. Creates a namespace `external-dns` for `external-dns` to live in. 1054 + 2. Creates the [`external-dns` Custom Resource Definitions (CRDs)](https://kubernetes-sigs.github.io/external-dns/v0.14.1/contributing/crd-source/) so that I can make DNS records manually with Kubernetes objects should the spirit move me. 1055 + 3. Creates a service account for `external-dns`. 1056 + 4. Creates a cluster role and cluster role binding for `external-dns` to be able to read a small subset of Kubernetes objects (services, ingresses, and nodes, as well as its custom resources). 1057 + 5. Creates a [1Password secret](https://developer.1password.com/docs/k8s/k8s-operator/) to give `external-dns` Route53 god access. 1058 + 6. Creates two deployments of `external-dns`: 1059 + - One for the CRD powered external DNS to weave DNS records with YAML 1060 + - One to match on newly created ingress objects and create DNS records for them 1061 + 1062 + <Conv name="Aoi" mood="coffee"> 1063 + Jesus christ that's a lot of stuff. It makes sense when you're explaining how 1064 + it all builds up, but it's a lot. 1065 + </Conv> 1066 + <Conv name="Numa" mood="happy"> 1067 + Welcome to Kubernetes! It's YAML turtles all the way down. 1068 + </Conv> 1069 + 1070 + If I ever need to create a DNS record for a service, I can do so with the following YAML: 1071 + 1072 + ```yaml 1073 + apiVersion: externaldns.k8s.io/v1alpha1 1074 + kind: DNSEndpoint 1075 + metadata: 1076 + name: something 1077 + spec: 1078 + endpoints: 1079 + - dnsName: something.xeserv.us 1080 + recordTTL: 180 1081 + recordType: TXT 1082 + targets: 1083 + - "We're no strangers to love" 1084 + - "You know the rules and so do I" 1085 + - "A full commitment's what I'm thinking of" 1086 + - "You wouldn't get this from any other guy" 1087 + - "I just wanna tell you how I'm feeling" 1088 + - "Gotta make you understand" 1089 + - "Never gonna give you up" 1090 + - "Never gonna let you down" 1091 + - "Never gonna run around and hurt you" 1092 + - "Never gonna make you cry" 1093 + - "Never gonna say goodbye" 1094 + - "Never gonna tell a lie and hurt you" 1095 + ``` 1096 + 1097 + Hopefully I'll never need to do this, but I bet that _something_ will make me need to make a DNS TXT record at some point, and it's probably better to have that managed in configuration management somehow. 1098 + 1099 + ## cert-manager 1100 + 1101 + Now that there's ingress from the outside world and DNS records for my services, it's time to get HTTPS working. I'm going to use [cert-manager](https://cert-manager.io/) for this. It's a Kubernetes native way to manage certificates from Let's Encrypt and other CAs. 1102 + 1103 + Unlike nearly everything else in this process, installing cert-manager was relatively painless. I just had to install it with Helm. I also made Helm manage the Custom Resource Definitions, so that way I can easily upgrade them later. 1104 + 1105 + <Conv name="Mara" mood="hacker"> 1106 + This is probably a mistake, Helm doesn't handle Custom Resource Definition 1107 + updates gracefully. This will be corrected in the future, but right now the 1108 + impetus to care is very low. 1109 + </Conv> 1110 + 1111 + The only gotcha here is that there's annotations for Ingresses that you need to add to get cert-manager to issue certificates for them. Here's an example: 1112 + 1113 + ```yaml 1114 + apiVersion: networking.k8s.io/v1 1115 + kind: Ingress 1116 + metadata: 1117 + name: kuard 1118 + annotations: 1119 + cert-manager.io/cluster-issuer: "letsencrypt-prod" 1120 + spec: 1121 + # ... 1122 + ``` 1123 + 1124 + <Conv name="Aoi" mood="wut"> 1125 + What's the difference between a label and an annotation anyways? So far it 1126 + looks like you've been using them interchangeably. 1127 + </Conv> 1128 + <Conv name="Cadey" mood="aha"> 1129 + Labels are intended to be used to help find and select objects, while annotations are for more detailed information. Labels are limited to 64 bytes. The most common label you will see is the `app` or `app.kubernetes.io/name` label which points to the "app" an object is a part of. Annotations are much more intended for storing metadata about the object, and can be up to 256KB in size. They are intended to be used for things like machine-readable data, like the cert-manager issuer annotation. 1130 + </Conv> 1131 + <Conv name="Aoi" mood="wut"> 1132 + Why is Longhorn [using labels for subscribing Volumes to backup jobs](https://longhorn.io/docs/1.6.1/snapshots-and-backups/scheduling-backups-and-snapshots/#using-the-kubectl-command) then? 1133 + </Conv> 1134 + <Conv name="Cadey" mood="aha"> 1135 + Because Kubernetes labels are indexed in the cluster data store, observe: 1136 + 1137 + ``` 1138 + $ kubectl get Volume --all-namespaces -l=recurring-job-group.longhorn.io/backup=enabled 1139 + NAMESPACE NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE 1140 + longhorn-system pvc-e1916e66-7f7b-4322-93cd-52dc1fc418f7 v1 attached healthy 2147483648 logos 43h 1141 + ``` 1142 + 1143 + That also extends to other labels, such as `app.kubernetes.io/name`: 1144 + 1145 + ``` 1146 + $ kubectl get all,svc,ing -n mi -l app.kubernetes.io/name=mi 1147 + NAME READY STATUS RESTARTS AGE 1148 + pod/mi-6bd6d8bb44-cg7tf 1/1 Running 0 43h 1149 + 1150 + NAME READY UP-TO-DATE AVAILABLE AGE 1151 + deployment.apps/mi 1/1 1 1 43h 1152 + 1153 + NAME DESIRED CURRENT READY AGE 1154 + replicaset.apps/mi-6bd6d8bb44 1 1 1 43h 1155 + 1156 + NAME CLASS HOSTS ADDRESS PORTS AGE 1157 + ingress.networking.k8s.io/mi-public nginx mi.cetacean.club 100.109.37.97 80, 443 20h 1158 + ``` 1159 + 1160 + Annotations are more useful for meta-information and machine-readable data, like the cert-manager issuer annotation. You could also use it to attribute deployments to a specific git commit or something like that. 1161 + 1162 + </Conv> 1163 + 1164 + This will make `cert-manager` issue a certificate for the `kuard` ingress using the `letsencrypt-prod` issuer. You can also use `letsencrypt-staging` for testing. The part that you will fuck up is that the documentation mixes `ClusterIssuer` and `Issuer` resources and annotations. Here's what my Let's Encrypt staging and prod issuers look like: 1165 + 1166 + ```yaml 1167 + apiVersion: cert-manager.io/v1 1168 + kind: ClusterIssuer 1169 + metadata: 1170 + name: letsencrypt-staging 1171 + spec: 1172 + acme: 1173 + # You must replace this email address with your own. 1174 + # Let's Encrypt will use this to contact you about expiring 1175 + # certificates, and issues related to your account. 1176 + email: user@example.com 1177 + server: https://acme-staging-v02.api.letsencrypt.org/directory 1178 + privateKeySecretRef: 1179 + # Secret resource that will be used to store the account's private key. 1180 + name: letsencrypt-staging-acme-key 1181 + solvers: 1182 + - http01: 1183 + ingress: 1184 + ingressClassName: nginx 1185 + --- 1186 + apiVersion: cert-manager.io/v1 1187 + kind: ClusterIssuer 1188 + metadata: 1189 + name: letsencrypt-prod 1190 + spec: 1191 + acme: 1192 + # You must replace this email address with your own. 1193 + # Let's Encrypt will use this to contact you about expiring 1194 + # certificates, and issues related to your account. 1195 + email: user@example.com 1196 + server: https://acme-v02.api.letsencrypt.org/directory 1197 + privateKeySecretRef: 1198 + # Secret resource that will be used to store the account's private key. 1199 + name: letsencrypt-prod-acme-key 1200 + solvers: 1201 + - http01: 1202 + ingress: 1203 + ingressClassName: nginx 1204 + ``` 1205 + 1206 + These `ClusterIssuers` are what the `cert-manager.io/cluster-issuer:` annotation in the ingress object refers to. You can also use `Issuer` resources if you want to scope the issuer to a single namespace, but realistically I know you're lazier than I am so you're going to use `ClusterIssuer`. 1207 + 1208 + The flow of all of this looks kinda complicated, but you can visualize it with this handy diagram: 1209 + 1210 + ![A diagram explaining the cert-manager flow](/static/blog/cert-manager-flow.svg) 1211 + 1212 + Breaking this down, let's assume I've just created this Ingress resource: 1213 + 1214 + ```yaml 1215 + apiVersion: networking.k8s.io/v1 1216 + kind: Ingress 1217 + metadata: 1218 + name: kuard 1219 + annotations: 1220 + cert-manager.io/cluster-issuer: "letsencrypt-prod" 1221 + spec: 1222 + ingressClassName: nginx 1223 + tls: 1224 + - hosts: 1225 + - kuard.xeserv.us 1226 + secretName: kuard-tls 1227 + rules: 1228 + - host: kuard.xeserv.us 1229 + http: 1230 + paths: 1231 + - path: / 1232 + pathType: Prefix 1233 + backend: 1234 + service: 1235 + name: kuard 1236 + port: 1237 + name: http 1238 + ``` 1239 + 1240 + <Conv name="Mara" mood="hacker"> 1241 + This is an Ingress named `kuard` with the `letsencrypt-prod` certificate 1242 + issuer. It's specifically configured to use ingress-nginx (this is probably 1243 + not required if your cluster has only one ingress defined, but it's better to 1244 + be overly verbose) and matches the HTTP hostname `kuard.xeserv.us`. It points 1245 + to the service `kuard`'s named `http` port (whatever that is). The TLS block 1246 + tells ingress-nginx (and cert-manager) to expect a cert in the secret 1247 + `kuard-tls` for the domain name `kuard.xeserv.us`. 1248 + </Conv> 1249 + 1250 + When I create this Ingress with the cluster-issuer annotation, it's discovered by both external-dns and cert-manager. external-dns creates DNS records in Route 53 (and tracks them using DynamoDB). At the same time, cert-manager creates a Cert resource for the domains I specified in the in the `spec.tls.hosts` field of my Ingress. The Cert resource discoveres that the secret `kuard-tls` has no valid certificate in it, so it creates an Order for a new certificate. The Order creates a Challenge by poking Let's Encrypt to get the required token and then configures its own Ingress to handle the HTTP-01 strategy. 1251 + 1252 + Once it's able to verify that it can pass the Challenge locally (this requires external-dns to finish pushing DNS routes and for DNS to globally converge, but usually happens within a minute), it asks Let's Encrypt to test it. Once Let's Encrypt passes the test, it signs my certificate, the Challenge Ingress is deleted, the signed certificate is saved to the right Kubernetes secret, the Order is marked as fulfilled, the Cert is marked ready, and nginx is reloaded to point to that newly minted certificate. 1253 + 1254 + <Conv name="Aoi" mood="wut"> 1255 + I guess that does make sense when you spell it all out, but that is a _lot_ of boilerplate and interplay to do something that [autocert](https://pkg.go.dev/golang.org/x/crypto/acme/autocert) does for you for free. 1256 + </Conv> 1257 + <Conv name="Cadey" mood="aha"> 1258 + The way you should interpret this is that each of the Kubernetes components are _stateless_ as much as possible. All of the state is stored externally in Kubernetes objects. This means that any of the components involved can restart, get moved around between nodes, or crash without affecting any in-flight tasks. I've been understanding this as having all of the inherent complexity of what is going on laid bare in front of you, much like how Rust makes it very obvious what is going on at a syntax level. 1259 + 1260 + You're probably used to a lot of this being handwaved away by the platforms you use, but this all isn't really that hard. It's just a verbose presentation of it. Besides, most of this is describing how I'm handwaving all of this away for my own uses. 1261 + 1262 + </Conv> 1263 + <Conv name="Aoi" mood="cheer"> 1264 + I get it, you're basically making your homelab into your own platform, right? This means that you need to provide all those building blocks for yourself. I guess this explains why there's so many moving parts. 1265 + </Conv> 1266 + 1267 + ## Shipping the lab 1268 + 1269 + At this point, my lab is stable, useful, and ready for me to put jobs on it. I have: 1270 + 1271 + - A cluster of four machines running Talos Linux that I can submit jobs to with `kubectl` 1272 + - Persistent storage with Longhorn 1273 + - Backups of said persistent storage to [Tigris](https://www.tigrisdata.com/) 1274 + - Ingress from the public internet with `ingressd` and crimes 1275 + - DNS records managed by `external-dns` 1276 + - HTTPS certificates managed by `cert-manager` 1277 + 1278 + And this all adds up to a place where I can just throw jobs at and get the confidence that they will run. I'm going to be using this to run a bunch of other things that have previously been spread across a bunch of VPSes that I don't want to pay for anymore. Even though they are an excellent tax break right now. 1279 + 1280 + <Conv name="Cadey" mood="enby"> 1281 + I guess I did end up using Rocky Linux afterall because the ingressd node runs 1282 + it. It's a bone-stock image with automatic updates and a single RPM built by 1283 + `yeet`. Realistically I could probably get away with running a few ingressd 1284 + nodes, but I'm lazy and I don't want to have to manage more than one. High 1285 + availability is for production, not janky homelab setups. 1286 + </Conv> 1287 + 1288 + ## The parts of a manifest for my homelab 1289 + 1290 + When I deploy things to my homelab cluster, I can divide them into three basic classes: 1291 + 1292 + - Automation/bots that don't expose any API or web endpoints 1293 + - Internal services that do expose API or web endpoints 1294 + - Public-facing services that should be exposed to the public internet 1295 + 1296 + Automation/bots are the easiest. In the ideal case all I need is a Deployment and a Secret to hold the API keys to poke external services. For an example of that, see [within.website/x/cmd/sapientwindex](https://github.com/Xe/x/tree/master/cmd/sapientwindex). 1297 + 1298 + Internal services get a little bit more complicated. Depending on the service in question, it'll probably get: 1299 + 1300 + - A Namespace to hold everything 1301 + - A PersistentVolumeClaim for holding state (SQLite, JSONMutexDB, etc.) 1302 + - A Secret or two for the various private/API keys involved in the process 1303 + - A Deployment with one or more containers that actually runs that internal service's code 1304 + - A Service that exposes the internal ports to the cluster on well-known port numbers 1305 + 1306 + For most internal services, this is more than good enough. If I need to poke it, I can do so by connecting to `svcname.ns.svc.alrest.xeserv.us`. It'd work great for a Minecraft server or something else like that. 1307 + 1308 + However, I also do need to expose things to the public internet sometimes. When I need to do that, I define an Ingress that has the right domain name so the rest of the stack will just automatically make it work. 1309 + 1310 + This gives me the ability to just push things to the homelab without fear. Once new jobs get defined, the rest of the stack will converge, order certificates, and otherwise make things Just Work™️. It's kinda glorious now that it's all set up. 1311 + 1312 + ## What's next? 1313 + 1314 + Here are the things I want to play with next: 1315 + 1316 + - [KubeVirt](https://kubevirt.io/): I want to run some VMs on my cluster. This looks like it could be the basis for an even better [waifud](/blog/series/waifud/) setup. All it's missing is a decent admin UI, which I can probably make with Tailwind and HTMX. The biggest thing I want to play with is live migration of VMs between nodes. 1317 + - I want to get a Minecraft server running on my cluster and figure out some way to make it accessible to my patrons. I have no idea how I'll go about doing the latter, but I'm sure I'll figure it out. Worst case I think one of you nerds in the (patron-only) Discord has _some_ ideas. 1318 + - I want to resurrect [kubermemes](https://tulpa.dev/cadey/kubermemes) for generating my app deployments. I've had a lot of opinions change since I wrote that so many years ago, but overall the shape of my deployments is going to be "small file with some resource requests" that compiles into "YAML means 'Yeah, A Massive List of stuff'". 1319 + - I may also want to get AI stuff running on Talos Linux. I have two GPUs in that cluster and kinda want to have stable diffusion at home again. It'll be a good way to get back into the swing of things with AI stuff. 1320 + - ??? who knows what else will come up through my binges into weird GitHub projects and Hacker News. 1321 + 1322 + I'm willing to declare my homelab a success at this point. It's cool that I can spread the load between my machines so much more cleanly now. I'm excited to see what I can do with this, and I hope that you are excited that you get more blog posts out of it. 1323 + 1324 + <Conv name="Aoi" mood="coffee"> 1325 + What a way to kill a week of PTO, eh? 1326 + </Conv>
+2 -2
lume/src/notes/2024/homelab-v2/05.mdx
··· 29 29 30 30 - Get it hooked up to Ethernet and power 31 31 - Boot it off of the Talos Linux USB stick 32 - - Apply the config with `talosctl` 33 - - Wait for it to reboot and everything to green up 32 + - Apply the config with `talosctl` from my macbook 33 + - Wait for it to reboot and everything to green up in `kubectl` 34 34 35 35 That's it. This is what every homelab OS should strive to be. 36 36
+1
lume/src/static/blog/cert-manager-flow.svg
··· 1 + <svg aria-roledescription="flowchart-v2" role="graphics-document document" viewBox="-8 -8 1214.53125 293.75" style="max-width: 100%;" xmlns="http://www.w3.org/2000/svg" width="100%" id="graph-div" height="100%" xmlns:xlink="http://www.w3.org/1999/xlink"><style>@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.2/css/all.min.css");'</style><style>#graph-div{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}#graph-div .error-icon{fill:#a44141;}#graph-div .error-text{fill:#ddd;stroke:#ddd;}#graph-div .edge-thickness-normal{stroke-width:2px;}#graph-div .edge-thickness-thick{stroke-width:3.5px;}#graph-div .edge-pattern-solid{stroke-dasharray:0;}#graph-div .edge-pattern-dashed{stroke-dasharray:3;}#graph-div .edge-pattern-dotted{stroke-dasharray:2;}#graph-div .marker{fill:lightgrey;stroke:lightgrey;}#graph-div .marker.cross{stroke:lightgrey;}#graph-div svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#graph-div .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#ccc;}#graph-div .cluster-label text{fill:#F9FFFE;}#graph-div .cluster-label span,#graph-div p{color:#F9FFFE;}#graph-div .label text,#graph-div span,#graph-div p{fill:#ccc;color:#ccc;}#graph-div .node rect,#graph-div .node circle,#graph-div .node ellipse,#graph-div .node polygon,#graph-div .node path{fill:#1f2020;stroke:#81B1DB;stroke-width:1px;}#graph-div .flowchart-label text{text-anchor:middle;}#graph-div .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#graph-div .node .label{text-align:center;}#graph-div .node.clickable{cursor:pointer;}#graph-div .arrowheadPath{fill:lightgrey;}#graph-div .edgePath .path{stroke:lightgrey;stroke-width:2.0px;}#graph-div .flowchart-link{stroke:lightgrey;fill:none;}#graph-div .edgeLabel{background-color:hsl(0, 0%, 34.4117647059%);text-align:center;}#graph-div .edgeLabel rect{opacity:0.5;background-color:hsl(0, 0%, 34.4117647059%);fill:hsl(0, 0%, 34.4117647059%);}#graph-div .labelBkg{background-color:rgba(87.75, 87.75, 87.75, 0.5);}#graph-div .cluster rect{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:rgba(255, 255, 255, 0.25);stroke-width:1px;}#graph-div .cluster text{fill:#F9FFFE;}#graph-div .cluster span,#graph-div p{color:#F9FFFE;}#graph-div div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(20, 1.5873015873%, 12.3529411765%);border:1px solid rgba(255, 255, 255, 0.25);border-radius:2px;pointer-events:none;z-index:100;}#graph-div .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#ccc;}#graph-div :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g><marker orient="auto" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="6" viewBox="0 0 10 10" class="marker flowchart" id="graph-div_flowchart-pointEnd"><path style="stroke-width: 1; stroke-dasharray: 1, 0;" class="arrowMarkerPath" d="M 0 0 L 10 5 L 0 10 z"></path></marker><marker orient="auto" markerHeight="12" markerWidth="12" markerUnits="userSpaceOnUse" refY="5" refX="4.5" viewBox="0 0 10 10" class="marker flowchart" id="graph-div_flowchart-pointStart"><path style="stroke-width: 1; stroke-dasharray: 1, 0;" class="arrowMarkerPath" d="M 0 5 L 10 10 L 10 0 z"></path></marker><marker orient="auto" markerHeight="11" markerWidth="11" markerUnits="userSpaceOnUse" refY="5" refX="11" viewBox="0 0 10 10" class="marker flowchart" id="graph-div_flowchart-circleEnd"><circle style="stroke-width: 1; stroke-dasharray: 1, 0;" class="arrowMarkerPath" r="5" cy="5" cx="5"></circle></marker><marker orient="auto" markerHeight="11" markerWidth="11" markerUnits="userSpaceOnUse" refY="5" refX="-1" viewBox="0 0 10 10" class="marker flowchart" id="graph-div_flowchart-circleStart"><circle style="stroke-width: 1; stroke-dasharray: 1, 0;" class="arrowMarkerPath" r="5" cy="5" cx="5"></circle></marker><marker orient="auto" markerHeight="11" markerWidth="11" markerUnits="userSpaceOnUse" refY="5.2" refX="12" viewBox="0 0 11 11" class="marker cross flowchart" id="graph-div_flowchart-crossEnd"><path style="stroke-width: 2; stroke-dasharray: 1, 0;" class="arrowMarkerPath" d="M 1,1 l 9,9 M 10,1 l -9,9"></path></marker><marker orient="auto" markerHeight="11" markerWidth="11" markerUnits="userSpaceOnUse" refY="5.2" refX="-1" viewBox="0 0 11 11" class="marker cross flowchart" id="graph-div_flowchart-crossStart"><path style="stroke-width: 2; stroke-dasharray: 1, 0;" class="arrowMarkerPath" d="M 1,1 l 9,9 M 10,1 l -9,9"></path></marker><g class="root"><g class="clusters"></g><g class="edgePaths"><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-ing LE-ednsRecord" id="L-ing-ednsRecord-0" d="M70.258,89.4L87.336,79.75C104.414,70.1,138.57,50.8,186.822,41.15C235.073,31.5,297.419,31.5,346.854,31.5C396.289,31.5,432.813,31.5,459.063,31.5C485.313,31.5,501.289,31.5,529.826,31.5C558.362,31.5,599.458,31.5,645.618,31.5C691.779,31.5,743.003,31.5,796.274,31.5C849.546,31.5,904.865,31.5,950.415,31.5C995.965,31.5,1031.747,31.5,1049.637,31.5L1067.528,31.5"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-ing LE-certManager" id="L-ing-certManager-0" d="M70.258,103.314L87.336,100.429C104.414,97.543,138.57,91.771,176.613,88.886C214.657,86,256.586,86,277.551,86L298.516,86"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-certManager LE-Cert" id="L-certManager-Cert-0" d="M415.715,86L424.652,86C433.589,86,451.462,86,465.059,91.493C478.656,96.986,487.975,107.972,492.635,113.465L497.295,118.958"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Cert LE-Order" id="L-Cert-Order-0" d="M540.195,131.016L556.922,122.638C573.648,114.26,607.102,97.505,643.966,97.219C680.829,96.934,721.104,113.117,741.242,121.209L761.379,129.301"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Order LE-ednsRecord" id="L-Order-ednsRecord-0" d="M822.156,129.289L845.161,118.407C868.165,107.526,914.174,85.763,955.085,71.566C995.995,57.368,1031.806,50.736,1049.711,47.42L1067.617,44.105"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Order LE-Challenge" id="L-Order-Challenge-0" d="M822.156,142.5L845.161,142.5C868.165,142.5,914.174,142.5,958.457,147.926C1002.74,153.352,1045.296,164.203,1066.574,169.629L1087.853,175.055"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-LE LE-Challenge" id="L-LE-Challenge-0" d="M847.539,228.538L866.313,227.406C885.087,226.275,922.635,224.013,962.677,218.7C1002.718,213.388,1045.253,205.027,1066.52,200.846L1087.788,196.665"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Challenge LE-Order" id="L-Challenge-Order-0" d="M1092.988,206.346L1070.854,216.247C1048.72,226.147,1004.452,245.949,959.744,239.084C915.035,232.22,869.887,198.69,847.313,181.925L824.738,165.16"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Order LE-Cert" id="L-Order-Cert-0" d="M766.297,148.452L745.34,152.919C724.383,157.385,682.469,166.317,645.639,166.567C608.809,166.817,577.063,158.384,561.191,154.168L545.318,149.952"></path><path marker-end="url(#graph-div_flowchart-pointEnd)" style="fill:none;" class="edge-thickness-normal edge-pattern-solid flowchart-link LS-Cert LE-ing" id="L-Cert-ing-0" d="M494.336,157.809L490.169,160.591C486.003,163.373,477.669,168.936,455.241,171.718C432.813,174.5,396.289,174.5,346.854,174.5C297.419,174.5,235.073,174.5,187.62,166.78C140.167,159.06,107.607,143.62,91.327,135.899L75.047,128.179"></path></g><g class="edgeLabels"><g class="edgeLabel"><g transform="translate(0, 0)" class="label"><foreignObject height="0" width="0"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel"></span></div></foreignObject></g></g><g transform="translate(172.7265625, 86)" class="edgeLabel"><g transform="translate(-77.46875, -12)" class="label"><foreignObject height="24" width="154.9375"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">creates Cert resource</span></div></foreignObject></g></g><g class="edgeLabel"><g transform="translate(0, 0)" class="label"><foreignObject height="0" width="0"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel"></span></div></foreignObject></g></g><g transform="translate(640.5546875, 80.75)" class="edgeLabel"><g transform="translate(-75.359375, -24)" class="label"><foreignObject height="48" width="150.71875"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">if no cert<br/>discovered in secrets</span></div></foreignObject></g></g><g transform="translate(960.18359375, 64)" class="edgeLabel"><g transform="translate(-75.70703125, -12)" class="label"><foreignObject height="24" width="151.4140625"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">testing the challenge</span></div></foreignObject></g></g><g transform="translate(960.18359375, 142.5)" class="edgeLabel"><g transform="translate(-87.64453125, -36)" class="label"><foreignObject height="72" width="175.2890625"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">Let's Encrypt<br/>challenge to<br/>prove domain ownership</span></div></foreignObject></g></g><g transform="translate(960.18359375, 221.75)" class="edgeLabel"><g transform="translate(-72.0390625, -12)" class="label"><foreignObject height="24" width="144.078125"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">validating challenge</span></div></foreignObject></g></g><g transform="translate(960.18359375, 265.75)" class="edgeLabel"><g transform="translate(-59.515625, -12)" class="label"><foreignObject height="24" width="119.03125"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">passes challenge</span></div></foreignObject></g></g><g transform="translate(640.5546875, 175.25)" class="edgeLabel"><g transform="translate(-59.15234375, -24)" class="label"><foreignObject height="48" width="118.3046875"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">signed cert from<br/>Let's Encrypt</span></div></foreignObject></g></g><g transform="translate(359.765625, 174.5)" class="edgeLabel"><g transform="translate(-84.5703125, -12)" class="label"><foreignObject height="24" width="169.140625"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="edgeLabel">certificate ready to use</span></div></foreignObject></g></g></g><g class="nodes"><g transform="translate(35.12890625, 109.25)" data-id="ing" data-node="true" id="flowchart-ing-7980" class="node default default flowchart-label"><rect height="63" width="70.2578125" y="-31.5" x="-35.12890625" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-27.62890625, -24)" style="" class="label"><rect></rect><foreignObject height="48" width="55.2578125"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">Ingress<br/>created</span></div></foreignObject></g></g><g transform="translate(1135.6796875, 31.5)" data-id="ednsRecord" data-node="true" id="flowchart-ednsRecord-7981" class="node default default flowchart-label"><rect height="63" width="125.703125" y="-31.5" x="-62.8515625" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-55.3515625, -24)" style="" class="label"><rect></rect><foreignObject height="48" width="110.703125"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">External DNS<br/>creates records</span></div></foreignObject></g></g><g transform="translate(359.765625, 86)" data-id="certManager" data-node="true" id="flowchart-certManager-7983" class="node default default flowchart-label"><rect height="39" width="111.8984375" y="-19.5" x="-55.94921875" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-48.44921875, -12)" style="" class="label"><rect></rect><foreignObject height="24" width="96.8984375"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">cert-manager</span></div></foreignObject></g></g><g transform="translate(517.265625, 142.5)" data-id="Cert" data-node="true" id="flowchart-Cert-7985" class="node default default flowchart-label"><rect height="39" width="45.859375" y="-19.5" x="-22.9296875" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-15.4296875, -12)" style="" class="label"><rect></rect><foreignObject height="24" width="30.859375"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">Cert</span></div></foreignObject></g></g><g transform="translate(794.2265625, 142.5)" data-id="Order" data-node="true" id="flowchart-Order-7987" class="node default default flowchart-label"><rect height="39" width="55.859375" y="-19.5" x="-27.9296875" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-20.4296875, -12)" style="" class="label"><rect></rect><foreignObject height="24" width="40.859375"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">Order</span></div></foreignObject></g></g><g transform="translate(1135.6796875, 187.25)" data-id="Challenge" data-node="true" id="flowchart-Challenge-7991" class="node default default flowchart-label"><rect height="39" width="85.3828125" y="-19.5" x="-42.69140625" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-35.19140625, -12)" style="" class="label"><rect></rect><foreignObject height="24" width="70.3828125"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">Challenge</span></div></foreignObject></g></g><g transform="translate(794.2265625, 231.75)" data-id="LE" data-node="true" id="flowchart-LE-7992" class="node default default flowchart-label"><rect height="39" width="106.625" y="-19.5" x="-53.3125" ry="0" rx="0" style="" class="basic label-container"></rect><g transform="translate(-45.8125, -12)" style="" class="label"><rect></rect><foreignObject height="24" width="91.625"><div style="display: inline-block; white-space: nowrap;" xmlns="http://www.w3.org/1999/xhtml"><span class="nodeLabel">Let's Encrypt</span></div></foreignObject></g></g></g></g></g></svg>