The code and data behind xeiaso.net
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

My first deploys for a new Kubernetes cluster

Signed-off-by: Xe Iaso <me@xeiaso.net>

Xe Iaso 01aebde2 937f9642

+463
+13
lume/src/_components/Disclaimer.jsx
··· 1 + export default function TecharoDisclaimer({ children }) { 2 + return ( 3 + <> 4 + <link 5 + rel="stylesheet" 6 + href="https://cdn.xeiaso.net/file/christine-static/static/font/inter/inter.css" 7 + /> 8 + <div className="font-['Inter'] text-lg mx-auto mt-4 mb-2 rounded-lg bg-bg-2 p-4 dark:bg-bgDark-2 md:max-w-3xl font-extrabold xe-dont-newline"> 9 + {children} 10 + </div> 11 + </> 12 + ); 13 + }
+450
lume/src/notes/2024/essential-k8s.mdx
··· 1 + --- 2 + title: "My first deploys for a new Kubernetes cluster" 3 + desc: "This is documentation for myself, but you may enjoy it too" 4 + date: 2024-11-03 5 + hero: 6 + ai: Photo by Xe Iaso, iPhone 13 Pro 7 + file: cloudfront 8 + prompt: "An airplane window looking out to cloudy skies." 9 + --- 10 + 11 + I'm setting up some cloud Kubernetes clusters for a bit coming up on the blog. As a result, I need some documentation on what a "standard" cluster looks like. This is that documentation. 12 + 13 + <Conv name="Mara" mood="hacker"> 14 + Every Kubernetes term is WrittenInGoPublicValueCase. If you aren't sure what 15 + one of those terms means, google "site:kubernetes.io KubernetesTerm". 16 + </Conv> 17 + 18 + I'm assuming that the cluster is named `mechonis`. 19 + 20 + For the "core" of a cluster, I need these services set up: 21 + 22 + - Secret syncing with the [1Password operator](https://developer.1password.com/docs/k8s/k8s-operator/) 23 + - Certificate management with [cert-manager](https://cert-manager.io/) 24 + - DNS management with [external-dns](https://kubernetes-sigs.github.io/external-dns/v0.15.0/) 25 + - HTTP ingress with [ingress-nginx](https://kubernetes.github.io/ingress-nginx/) 26 + - High-latency high-volume storage with [csi-s3](https://github.com/yandex-cloud/k8s-csi-s3) pointed to [Tigris](https://tigrisdata.com) (technically optional, but including it for consistency) 27 + - The [metrics-server](https://github.com/kubernetes-sigs/metrics-server) so [k9s](https://k9scli.io) can see how much free CPU and RAM the cluster has 28 + 29 + These all complete different aspects of the three core features of any cloud deployment: compute, network, and storage. Most of my data will be hosted in the default StorageClass implementation provided by the platform (or in the case of baremetal clusters, something like [Longhorn](https://longhorn.io)), so the csi-s3 StorageClass is more of a "I need lots of data but am cheap" than anything. 30 + 31 + Most of this will be managed with [helmfile](https://github.com/helmfile/helmfile), but 1Password can't be. 32 + 33 + ## 1Password 34 + 35 + The most important thing at the core of my k8s setups is the [1Password operator](https://developer.1password.com/docs/k8s/k8s-operator/). This syncs 1password secrets to my Kubernetes clusters, so I don't need to define them in Secrets manually or risk putting the secret values into my OSS repos. This is done separately as I'm not able to use helmfile 36 + 37 + After you have [the `op` command set up](https://developer.1password.com/docs/cli/get-started/), create a new server with access to the `Kubernetes` vault: 38 + 39 + ``` 40 + op connect server create mechonis --vaults Kubernetes 41 + ``` 42 + 43 + Then install the 1password connect Helm release with `operator.create` set to `true`: 44 + 45 + ``` 46 + helm repo add \ 47 + 1password https://1password.github.io/connect-helm-charts/ 48 + helm install \ 49 + connect \ 50 + 1password/connect \ 51 + --set-file connect.credentials=1password-credentials.json \ 52 + --set operator.create=true \ 53 + --set operator.token.value=$(op connect token create --server mechonis --vault Kubernetes) 54 + ``` 55 + 56 + Now you can deploy OnePasswordItem resources as normal: 57 + 58 + ```yaml 59 + apiVersion: onepassword.com/v1 60 + kind: OnePasswordItem 61 + metadata: 62 + name: falin 63 + spec: 64 + itemPath: vaults/Kubernetes/items/Falin 65 + ``` 66 + 67 + ## cert-manager, ingress-nginx, metrics-server, and csi-s3 68 + 69 + In the cluster folder, create a file called `helmfile.yaml`. Copy these contents: 70 + 71 + <details> 72 + <summary>helmfile.yaml</summary> 73 + 74 + ```yaml 75 + repositories: 76 + - name: jetstack 77 + url: https://charts.jetstack.io 78 + - name: csi-s3 79 + url: cr.yandex/yc-marketplace/yandex-cloud/csi-s3 80 + oci: true 81 + - name: ingress-nginx 82 + url: https://kubernetes.github.io/ingress-nginx 83 + - name: metrics-server 84 + url: https://kubernetes-sigs.github.io/metrics-server/ 85 + 86 + releases: 87 + - name: cert-manager 88 + kubeContext: mechonis 89 + chart: jetstack/cert-manager 90 + createNamespace: true 91 + namespace: cert-manager 92 + version: v1.16.1 93 + set: 94 + - name: installCRDs 95 + value: "true" 96 + - name: prometheus.enabled 97 + value: "false" 98 + - name: csi-s3 99 + kubeContext: mechonis 100 + chart: csi-s3/csi-s3 101 + namespace: kube-system 102 + set: 103 + - name: "storageClass.name" 104 + value: "tigris" 105 + - name: "secret.accessKey" 106 + value: "" 107 + - name: "secret.secretKey" 108 + value: "" 109 + - name: "secret.endpoint" 110 + value: "https://fly.storage.tigris.dev" 111 + - name: "secret.region" 112 + value: "auto" 113 + - name: ingress-nginx 114 + chart: ingress-nginx/ingress-nginx 115 + kubeContext: mechonis 116 + namespace: ingress-nginx 117 + createNamespace: true 118 + - name: metrics-server 119 + kubeContext: mechonis 120 + chart: metrics-server/metrics-server 121 + namespace: kube-system 122 + ``` 123 + 124 + </details> 125 + 126 + Create a new admin access token in the [Tigris console](https://console.tigris.dev) and copy its access key ID and secret access key into `secret.accessKey` and `secret.secretKey` respectively. 127 + 128 + Run `helmfile apply`: 129 + 130 + ``` 131 + $ helmfile apply 132 + ``` 133 + 134 + This will take a second to think, and then everything should be set up. The LoadBalancer Service may take a minute or ten to get a public IP depending on which cloud you are setting things up on, but once it's done you can proceed to setting up DNS. 135 + 136 + ## external-dns 137 + 138 + The next kinda annoying part is getting [external-dns](https://kubernetes-sigs.github.io/external-dns/latest/) set up. It's something that looks like it should be packageable with something like Helm, but realistically it's such a generic tool that you're really better off making your own manifests and deploying it by hand. In my setup, I use these features of external-dns: 139 + 140 + - The [AWS Route 53](https://aws.amazon.com/route53/) DNS backend 141 + - The [AWS DynamoDB](https://aws.amazon.com/dynamodb/) registry to remember what records should be set in Route 53 142 + 143 + You will need two DynamoDB tables: 144 + 145 + - `external-dns-mechonis-crd`: for records created with DNSEndpoint resources 146 + - `external-dns-mechonis-ingress`: for records created with Ingress resources 147 + 148 + Create a terraform configuration for setting up these DynamoDB configuration values: 149 + 150 + <details> 151 + <summary>main.tf</summary> 152 + 153 + ```hcl 154 + terraform { 155 + backend "s3" { 156 + bucket = "within-tf-state" 157 + key = "k8s/mechonis/external-dns" 158 + region = "us-east-1" 159 + } 160 + } 161 + 162 + resource "aws_dynamodb_table" "external_dns_crd" { 163 + name = "external-dns-crd-mechonis" 164 + billing_mode = "PROVISIONED" 165 + read_capacity = 1 166 + write_capacity = 1 167 + table_class = "STANDARD" 168 + 169 + attribute { 170 + name = "k" 171 + type = "S" 172 + } 173 + 174 + hash_key = "k" 175 + } 176 + 177 + resource "aws_dynamodb_table" "external_dns_ingress" { 178 + name = "external-dns-ingress-mechonis" 179 + billing_mode = "PROVISIONED" 180 + read_capacity = 1 181 + write_capacity = 1 182 + table_class = "STANDARD" 183 + 184 + attribute { 185 + name = "k" 186 + type = "S" 187 + } 188 + 189 + hash_key = "k" 190 + } 191 + ``` 192 + 193 + </details> 194 + 195 + Create the tables with `terraform apply`: 196 + 197 + ``` 198 + terraform init 199 + terraform apply --auto-approve # yolo! 200 + ``` 201 + 202 + While that cooks, head over to `~/Code/Xe/x/kube/rhadamanthus/core/external-dns` and copy the contents to `~/Code/Xe/x/kube/mechonis/core/external-dns`. Then open `deployment-crd.yaml` and replace the DynamoDB table in the `crd` container's args: 203 + 204 + ```diff 205 + args: 206 + - --source=crd 207 + - --crd-source-apiversion=externaldns.k8s.io/v1alpha1 208 + - --crd-source-kind=DNSEndpoint 209 + - --provider=aws 210 + - --registry=dynamodb 211 + - --dynamodb-region=ca-central-1 212 + - - --dynamodb-table=external-dns-crd-rhadamanthus 213 + + - --dynamodb-table=external-dns-crd-mechonis 214 + ``` 215 + 216 + And in `deployment-ingress.yaml`: 217 + 218 + ```diff 219 + args: 220 + - --source=ingress 221 + - - --default-targets=rhadamanthus.xeserv.us 222 + + - --default-targets=mechonis.xeserv.us 223 + - --provider=aws 224 + - --registry=dynamodb 225 + - --dynamodb-region=ca-central-1 226 + - - --dynamodb-table=external-dns-ingress-rhadamanthus 227 + + - --dynamodb-table=external-dns-ingress-mechonis 228 + ``` 229 + 230 + Apply these configs with `kubectl apply`: 231 + 232 + ``` 233 + kubectl apply -k . 234 + ``` 235 + 236 + Then write a DNSEndpoint pointing to the created LoadBalancer. You may have to look up the IP addresses in the admin console of the cloud platform in question. 237 + 238 + <details> 239 + <summary>load-balancer-dns.yaml</summary> 240 + 241 + ```yaml 242 + apiVersion: externaldns.k8s.io/v1alpha1 243 + kind: DNSEndpoint 244 + metadata: 245 + name: load-balancer-dns 246 + spec: 247 + endpoints: 248 + - dnsName: mechonis.xeserv.us 249 + recordTTL: 3600 250 + recordType: A 251 + targets: 252 + - whatever.ipv4.goes.here 253 + - dnsName: mechonis.xeserv.us 254 + recordTTL: 3600 255 + recordType: AAAA 256 + targets: 257 + - 2000:something:goes:here:lol 258 + ``` 259 + 260 + </details> 261 + 262 + Apply it with `kubectl apply`: 263 + 264 + ``` 265 + kubectl apply -f load-balancer-dns.yaml 266 + ``` 267 + 268 + This will point `mechonis.xeserv.us` to the LoadBalancer, which will point to ingress-nginx based on Ingress configurations, which will route to your Services and Deployments, using Certs from cert-manager. 269 + 270 + ## cert-manager ACME issuers 271 + 272 + Copy the contents of `~/Code/Xe/x/kube/rhadamanthus/core/cert-manager` to `~/Code/Xe/x/kube/mechonis/core/cert-manager`. Apply them as-is, no changes are needed: 273 + 274 + ``` 275 + kubectl apply -k . 276 + ``` 277 + 278 + This will create `letsencrypt-prod` and `letsencrypt-staging` ClusterIssuers, which will allow the creation of Let's Encrypt certificates in their production and staging environments. 9 times out of 10, you won't need the staging environment, but when you are doing high-churn things involving debugging the certificate issuing setup, the staging environment is very useful because it has a [much higher rate limit](https://letsencrypt.org/docs/staging-environment/) than [the production environment](https://letsencrypt.org/docs/rate-limits/) does. 279 + 280 + ## Deploying a "hello, world" workload 281 + 282 + <Conv name="Mara" mood="hacker"> 283 + Nearly every term for "unit of thing to do" is taken by different aspects of 284 + Kubernetes and its ecosystem. The only one that isn't taken is "workload". A 285 + workload is a unit of work deployed somewhere, in practice this boils down to 286 + a Deployment, its Service, any PersistentVolumeClaims, Ingresses, or other 287 + resources that it needs in order to run. 288 + </Conv> 289 + 290 + Now you can put everything into test by making a simple "hello, world" workload. This will include: 291 + 292 + - A ConfigMap to store HTML to show to the user 293 + - A Deployment to run nginx pointed at the contents of the ConfigMap 294 + - A Service to give an internal DNS name for that Deployment's Pods 295 + - An Ingress to route traffic to that Service from the public Internet 296 + 297 + Make a folder called `hello-world` and put these files in it: 298 + 299 + <details> 300 + <summary>configmap.yaml</summary> 301 + 302 + ```yaml 303 + apiVersion: v1 304 + kind: ConfigMap 305 + metadata: 306 + name: hello-world 307 + data: 308 + index.html: | 309 + <html> 310 + <head> 311 + <title>Hello World!</title> 312 + </head> 313 + <body>Hello World!</body> 314 + </html> 315 + ``` 316 + 317 + </details> 318 + <details> 319 + <summary>deployment.yaml</summary> 320 + 321 + ```yaml 322 + apiVersion: apps/v1 323 + kind: Deployment 324 + metadata: 325 + name: hello-world 326 + spec: 327 + selector: 328 + matchLabels: 329 + app: hello-world 330 + replicas: 1 331 + template: 332 + metadata: 333 + labels: 334 + app: hello-world 335 + spec: 336 + containers: 337 + - name: web 338 + image: nginx 339 + ports: 340 + - containerPort: 80 341 + volumeMounts: 342 + - name: html 343 + mountPath: /usr/share/nginx/html 344 + volumes: 345 + - name: html 346 + configMap: 347 + name: hello-world 348 + ``` 349 + 350 + </details> 351 + <details> 352 + <summary>service.yaml</summary> 353 + 354 + ```yaml 355 + apiVersion: v1 356 + kind: Service 357 + metadata: 358 + name: hello-world 359 + spec: 360 + ports: 361 + - port: 80 362 + protocol: TCP 363 + selector: 364 + app: hello-world 365 + ``` 366 + 367 + </details> 368 + <details> 369 + <summary>ingress.yaml</summary> 370 + 371 + ```yaml 372 + apiVersion: networking.k8s.io/v1 373 + kind: Ingress 374 + metadata: 375 + name: hello-world 376 + annotations: 377 + cert-manager.io/cluster-issuer: "letsencrypt-prod" 378 + nginx.ingress.kubernetes.io/ssl-redirect: "true" 379 + spec: 380 + ingressClassName: nginx 381 + tls: 382 + - hosts: 383 + - hello.mechonis.xeserv.us 384 + secretName: hello-mechonis-xeserv-us-tls 385 + rules: 386 + - host: hello.mechonis.xeserv.us 387 + http: 388 + paths: 389 + - path: / 390 + pathType: Prefix 391 + backend: 392 + service: 393 + name: hello-world 394 + port: 395 + number: 80 396 + ``` 397 + 398 + </details> 399 + <details> 400 + <summary>kustomization.yaml</summary> 401 + 402 + ```yaml 403 + resources: 404 + - configmap.yaml 405 + - deployment.yaml 406 + - service.yaml 407 + - ingress.yaml 408 + ``` 409 + 410 + </details> 411 + 412 + Then apply it with `kubectl apply`: 413 + 414 + ``` 415 + kubectl apply -k . 416 + ``` 417 + 418 + It will take a minute for it to work, but here are the things that will be done in order so you can validate them: 419 + 420 + - The Ingress object has the `cert-manager.io/cluster-issuer: "letsencrypt-prod"` annotation, which triggers cert-manager to create a Cert for the Ingress 421 + - The Cert notices that there's no data in the Secret `hello-mechonis-xeserv-us-tls` in the default Namespace, so it creates an Order for a new certificate from the `letsencrypt-prod` ClusterIssuer (set up in the cert-manager apply step earlier) 422 + - The Order creates a new Challenge for that certificate, setting a DNS record in Route 53 and then waiting until it can validate that the Challenge matches what it expects 423 + - cert-manager asks Let's Encrypt to check the Challenge 424 + - The Order succeeds and the certificate data is written to the Secret `hello-mechonis-xeserv-us-tls` in the default Namespace 425 + - ingress-nginx is informed that the Secret has been updated and rehashes its configuration accordingly 426 + - HTTPS routing is set up for the `hello-world` service so every request to `hello.mechonis.xeserv.us` points to the Pods managed by the `hello-world` Deployment 427 + - external-dns checks for the presence of newly created Ingress objects it doesn't know about, and creates Route 53 entries for them 428 + 429 + This results in the `hello-world` workload going from nothing to fully working in about 5 minutes tops. Usually this can be less depending on how lucky you get with the response time of the Route 53 API. If it doesn't work, run through resources in this order in [k9s](https://k9scli.io/): 430 + 431 + - The `external-dns-ingress` Pod logs 432 + - The `cert-manager` Pod logs 433 + - Look for the Cert, is it marked as Ready? 434 + - Look for that Cert's Order, does it show any errors in its list of events? 435 + - Look for that Order's Challenge, does it show any errors in its list of events? 436 + 437 + <Conv name="Mara" mood="hacker"> 438 + By the way: k9s is fantastic. You should have it installed if you deal with 439 + Kubernetes. It should be baked into kubectl. It's a near perfect tool. 440 + </Conv> 441 + 442 + ## Conclusion 443 + 444 + From here you can deploy anything else you want, as long as the workload configuration kinda looks like the `hello-world` configuration. Namely, you MUST have the following things set: 445 + 446 + - Ingress objects MUST have the `cert-manager.io/cluster-issuer: "letsencrypt-prod"` annotation, if they don't, then no TLS certificate will be minted 447 + - Workloads MUST have the `nginx.ingress.kubernetes.io/ssl-redirect: "true"` to ensure that all plain HTTP traffic is upgraded to HTTPS 448 + - Sensitive data MUST be managed in 1Password via OnePasswordItem objects 449 + 450 + Happy kubeing all!