The code and data behind xeiaso.net
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat(notes): dns fight

Signed-off-by: Xe Iaso <me@xeiaso.net>

Xe Iaso a6f33fd5 9490a7e9

+74 -1
+73
lume/src/notes/2026/dns-fight.mdx
··· 1 + --- 2 + title: "Homelab downtime update: The fight for DNS supremacy" 3 + desc: "Turns out everything DID NOT go offline somehow. Yay!" 4 + date: 2026-03-18 5 + --- 6 + 7 + Hey all, quick update continuing from yesterday's announcement that my homelab went down. This is stream of consciousness and unedited. Enjoy! 8 + 9 + Turns out the entire homelab didn't go down and two Kubernetes nodes survived the power outage somehow. 10 + 11 + Two Kubernetes controlplane nodes. 12 + 13 + Kubernetes really wants there to be an odd number of controlplane nodes and my workloads are too heavy for any single node to run and Longhorn really wants there to be at least three nodes online. So I had to turn them off. 14 + 15 + How did I get in? The Mac mini that I used for Anubis CI. It somehow automatically powered on when the grid reset and/or survived the power outage. 16 + 17 + ```text 18 + xe@t-elos:~$ uptime 19 + 09:45:55 up 66 days, 9:51, 4 users, load average: 0.37, 0.22, 0.18 20 + ``` 21 + 22 + Holy shit, that's good to know! 23 + 24 + Anyways the usual suspects for trying to debug things didn't work (kubectl get nodes got a timeout, etc.), so I did an nmap across the entire home subnet. Normally this is full of devices and hard to read. This time there's basically nothing. What stood out was this: 25 + 26 + ```text 27 + Nmap scan report for kos-mos (192.168.2.236) 28 + Host is up, received arp-response (0.00011s latency). 29 + Scanned at 2026-03-18 09:23:09 EDT for 1s 30 + Not shown: 996 closed tcp ports (reset) 31 + PORT STATE SERVICE REASON 32 + 3260/tcp open iscsi syn-ack ttl 64 33 + 9100/tcp open jetdirect syn-ack ttl 64 34 + 50000/tcp open ibm-db2 syn-ack ttl 64 35 + 50001/tcp open unknown syn-ack ttl 64 36 + MAC Address: FC:34:97:0D:1E:CD (Asustek Computer) 37 + 38 + Nmap scan report for ontos (192.168.2.237) 39 + Host is up, received arp-response (0.00011s latency). 40 + Scanned at 2026-03-18 09:23:09 EDT for 1s 41 + Not shown: 996 closed tcp ports (reset) 42 + PORT STATE SERVICE REASON 43 + 3260/tcp open iscsi syn-ack ttl 64 44 + 9100/tcp open jetdirect syn-ack ttl 64 45 + 50000/tcp open ibm-db2 syn-ack ttl 64 46 + 50001/tcp open unknown syn-ack ttl 64 47 + MAC Address: FC:34:97:0D:1F:AE (Asustek Computer) 48 + ``` 49 + 50 + Those two machines are Kubernetes controlplane nodes! I can't SSH into them because they're running Talos Linux, but I can use `talosctl` (via port 50000) to shut them down: 51 + 52 + ```text 53 + $ ./bin/talosctl -n 192.168.2.236 shutdown --force 54 + WARNING: 192.168.2.236: server version 1.9.1 is older than client version 1.12.5 55 + watching nodes: [192.168.2.236] 56 + * 192.168.2.236: events check condition met 57 + 58 + $ ./bin/talosctl -n 192.168.2.237 shutdown --force 59 + WARNING: 192.168.2.237: server version 1.9.1 is older than client version 1.12.5 60 + watching nodes: [192.168.2.237] 61 + * 192.168.2.237: events check condition met 62 + ``` 63 + 64 + And now it's offline until I get home. 65 + 66 + This was causing the sponsor panel to be offline because the external-dns pod in the homelab was online and fighting my new cloud deployment for DNS supremacy. The sponsor panel is now back online (I should have put it in the cloud in the first place, that's on me) and peace has been restored to most of the galaxy, at least as much as I can from here. 67 + 68 + Action items: 69 + 70 + - Figure out why ontos and kos-mos came back online 71 + - Make all nodes in the homelab resume power when wall power exists again 72 + - Review homelab for PSU damage 73 + - Re-evaluate usage of Talos Linux, switch to Rocky?
+1 -1
manifest/sponsor-panel/deployment.yaml
··· 23 23 spec: 24 24 containers: 25 25 - name: web 26 - image: ghcr.io/xe/site/sponsor-panel:main 26 + image: ghcr.io/xe/site/sponsor-panel:latest 27 27 imagePullPolicy: Always 28 28 resources: 29 29 limits: