···11+name: E2E Tests
22+33+on:
44+ push:
55+ pull_request:
66+77+jobs:
88+ test-e2e:
99+ name: Run on Ubuntu
1010+ runs-on: ubuntu-latest
1111+ steps:
1212+ - name: Clone the code
1313+ uses: actions/checkout@v4
1414+1515+ - name: Setup Go
1616+ uses: actions/setup-go@v5
1717+ with:
1818+ go-version-file: go.mod
1919+2020+ - name: Install the latest version of kind
2121+ run: |
2222+ curl -Lo ./kind https://kind.sigs.k8s.io/dl/latest/kind-linux-amd64
2323+ chmod +x ./kind
2424+ sudo mv ./kind /usr/local/bin/kind
2525+2626+ - name: Verify kind installation
2727+ run: kind version
2828+2929+ - name: Running Test e2e
3030+ run: |
3131+ go mod tidy
3232+ make test-e2e
+23
.github/workflows/test.yml
···11+name: Tests
22+33+on:
44+ push:
55+ pull_request:
66+77+jobs:
88+ test:
99+ name: Run on Ubuntu
1010+ runs-on: ubuntu-latest
1111+ steps:
1212+ - name: Clone the code
1313+ uses: actions/checkout@v4
1414+1515+ - name: Setup Go
1616+ uses: actions/setup-go@v5
1717+ with:
1818+ go-version-file: go.mod
1919+2020+ - name: Running Tests
2121+ run: |
2222+ go mod tidy
2323+ make test
+27
.gitignore
···11+# Binaries for programs and plugins
22+*.exe
33+*.exe~
44+*.dll
55+*.so
66+*.dylib
77+bin/*
88+Dockerfile.cross
99+1010+# Test binary, built with `go test -c`
1111+*.test
1212+1313+# Output of the go coverage tool, specifically when used with LiteIDE
1414+*.out
1515+1616+# Go workspace file
1717+go.work
1818+1919+# Kubernetes Generated files - skip generated files, except for vendored files
2020+!vendor/**/zz_generated.*
2121+2222+# editor and IDE paraphernalia
2323+.idea
2424+.vscode
2525+*.swp
2626+*.swo
2727+*~
···11+engine: kubernetes
22+image: golang:1.24-bookworm
33+architecture: amd64
44+55+steps:
66+ - name: download dependencies
77+ command: |
88+ go mod download
99+1010+ - name: run tests
1111+ command: |
1212+ make test
1313+1414+ - name: build manager binary
1515+ command: |
1616+ make build
1717+1818+ - name: verify build artifacts
1919+ command: |
2020+ ls -lh bin/
2121+ file bin/manager
+21
.tangled/workflows/workflow-arm64.yaml
···11+engine: kubernetes
22+image: golang:1.24-bookworm
33+architecture: arm64
44+55+steps:
66+ - name: download dependencies
77+ command: |
88+ go mod download
99+1010+ - name: run tests
1111+ command: |
1212+ make test
1313+1414+ - name: build manager binary
1515+ command: |
1616+ make build
1717+1818+ - name: verify build artifacts
1919+ command: |
2020+ ls -lh bin/
2121+ file bin/manager
+360
ARCHITECTURE.md
···11+# Loom Architecture
22+33+Loom is a Kubernetes operator that runs tangled.org Spindle with a Kubernetes-native execution engine.
44+55+## Overview
66+77+```
88+┌─────────────────────────────────────────────┐
99+│ Loom Operator Pod │
1010+│ │
1111+│ ┌────────────────────────────────────┐ │
1212+│ │ Controller Manager │ │
1313+│ │ - Watches SpindleSet CRD │ │
1414+│ │ - Monitors Kubernetes Jobs │ │
1515+│ │ - Reports status to spindle DB │ │
1616+│ │ - Creates Service endpoints │ │
1717+│ └────────────────────────────────────┘ │
1818+│ │
1919+│ ┌────────────────────────────────────┐ │
2020+│ │ Embedded Spindle Server │ │
2121+│ │ │ │
2222+│ │ HTTP Endpoints: │ │
2323+│ │ - GET /events (WebSocket) │ │
2424+│ │ - GET /logs/{knot}/{rkey}/{name} │ │
2525+│ │ - POST /xrpc/sh.tangled.repo.* │ │
2626+│ │ │ │
2727+│ │ Components: │ │
2828+│ │ - EventConsumer (knot firehose) │ │
2929+│ │ - Database (SQLite) │ │
3030+│ │ - Queue (job queue) │ │
3131+│ │ - Vault (secrets manager) │ │
3232+│ │ - KubernetesEngine ──────────┐ │ │
3333+│ └──────────────────────────────│─────┘ │
3434+└─────────────────────────────────│───────────┘
3535+ │
3636+ │ creates
3737+ ▼
3838+ ┌──────────────────────────┐
3939+ │ Kubernetes Jobs │
4040+ │ (one per workflow) │
4141+ │ │
4242+ │ ┌──────────────────────┐ │
4343+ │ │ Init Container: │ │
4444+ │ │ - Clone repo │ │
4545+ │ │ - Checkout commit │ │
4646+ │ └──────────────────────┘ │
4747+ │ │
4848+ │ ┌──────────────────────┐ │
4949+ │ │ Main Container: │ │
5050+ │ │ - Execute all steps │ │
5151+ │ │ in sequence │ │
5252+ │ └──────────────────────┘ │
5353+ └──────────────────────────┘
5454+```
5555+5656+## Components
5757+5858+### Loom Operator
5959+6060+The Loom operator is a standard Kubernetes controller that:
6161+1. Watches `SpindleSet` custom resources
6262+2. Embeds a spindle server instance
6363+3. Creates Kubernetes Services to expose spindle HTTP endpoints
6464+4. Monitors Job status and reports to spindle's database
6565+6666+### Embedded Spindle Server
6767+6868+Instead of deploying separate spindle server instances, Loom **embeds** the spindle server:
6969+- Runs in the same process as the controller
7070+- Uses `spindle.New()` to initialize with `KubernetesEngine`
7171+- Handles all spindle functionality: WebSocket connections, XRPC endpoints, database, queue, etc.
7272+7373+### KubernetesEngine
7474+7575+A custom engine implementation that uses Kubernetes Jobs instead of Docker containers.
7676+7777+## Execution Model: KubernetesEngine vs NixeryEngine
7878+7979+### NixeryEngine Pattern (Docker)
8080+8181+```
8282+┌─────────────────────────────────────────────┐
8383+│ SetupWorkflow() │
8484+│ - docker create <image> cat │
8585+│ - docker start <container> │
8686+│ - Keep container running │
8787+└─────────────────────────────────────────────┘
8888+ │
8989+ ▼
9090+┌─────────────────────────────────────────────┐
9191+│ For each step: │
9292+│ RunStep() │
9393+│ - docker exec <container> bash -c <cmd> │
9494+│ - Stream logs to WorkflowLogger │
9595+│ - Wait for completion │
9696+│ - Return exit code │
9797+└─────────────────────────────────────────────┘
9898+ │
9999+ ▼
100100+┌─────────────────────────────────────────────┐
101101+│ DestroyWorkflow() │
102102+│ - docker stop <container> │
103103+│ - docker rm <container> │
104104+└─────────────────────────────────────────────┘
105105+```
106106+107107+**Key characteristics:**
108108+- One long-running container per workflow
109109+- Steps executed sequentially via `docker exec`
110110+- Spindle controls step execution timing
111111+- Fine-grained logging per step
112112+113113+### KubernetesEngine Pattern (Jobs)
114114+115115+```
116116+┌─────────────────────────────────────────────┐
117117+│ SetupWorkflow() │
118118+│ - Generate bash script with ALL steps │
119119+│ - Create Kubernetes Job │
120120+│ - Init container: clone repo │
121121+│ - Main container: execute script │
122122+│ - Job runs to completion │
123123+└─────────────────────────────────────────────┘
124124+ │
125125+ ▼
126126+┌─────────────────────────────────────────────┐
127127+│ RunStep() - NO-OP │
128128+│ (Steps are already running in the Job) │
129129+└─────────────────────────────────────────────┘
130130+ │
131131+ ▼
132132+┌─────────────────────────────────────────────┐
133133+│ DestroyWorkflow() │
134134+│ - Delete Kubernetes Job │
135135+│ - TTL controller auto-cleans after 1 hour │
136136+└─────────────────────────────────────────────┘
137137+```
138138+139139+**Key characteristics:**
140140+- One Kubernetes Job per workflow
141141+- All steps baked into a single bash script
142142+- Job runs autonomously after creation
143143+- Logs captured as a single stream per Job
144144+145145+## Why Different Execution Models?
146146+147147+### Docker: exec pattern is lightweight
148148+- `docker exec` is a cheap operation
149149+- Easy to run commands in existing container
150150+- Low overhead for sequential execution
151151+152152+### Kubernetes: exec pattern is heavyweight
153153+- `kubectl exec` involves API calls, auth, streaming setup
154154+- Each exec requires network round-trips
155155+- Kubernetes is designed for declarative workloads, not imperative step-by-step
156156+157157+### Jobs are idiomatic Kubernetes
158158+- Jobs are purpose-built for batch workloads
159159+- Native features: status tracking, TTL cleanup, retry policies
160160+- Declarative: describe the work, Kubernetes handles execution
161161+- Better isolation: each workflow is a separate Job
162162+163163+### Script generation is efficient
164164+- Single script with all steps: `BuildStepExecutionScript()`
165165+- GitHub Actions-compatible environment passing
166166+- Error handling built into script
167167+- Timestamps and step boundaries in logs
168168+- No coordinator overhead
169169+170170+## Trade-offs
171171+172172+### What we gain with Jobs:
173173+✅ Idiomatic Kubernetes
174174+✅ Simpler implementation
175175+✅ Better isolation per workflow
176176+✅ Native Job status and cleanup
177177+✅ No persistent connections needed
178178+✅ Works with standard Kubernetes features
179179+180180+### What we lose vs Docker exec:
181181+❌ No per-step control from spindle
182182+❌ Can't pause/resume between steps
183183+❌ Logs are bundled (not separated by step)
184184+❌ `RunStep()` is a no-op
185185+❌ Can't dynamically change step execution
186186+187187+### Is this okay?
188188+189189+**Yes!** The execution model difference is intentional and appropriate for Kubernetes. We still:
190190+- Run all steps in order
191191+- Handle errors correctly
192192+- Pass environment between steps
193193+- Report status to spindle DB
194194+- Stream logs
195195+- Support secrets
196196+197197+The interface contract with spindle's Engine is satisfied, just with a different implementation strategy.
198198+199199+## Step Script Generation
200200+201201+See `pkg/jobbuilder/script_builder.go` for how we generate the bash script.
202202+203203+### Features:
204204+- **GitHub Actions compatibility**: `GITHUB_ENV`, `GITHUB_PATH`, `GITHUB_OUTPUT`
205205+- **Environment passing**: Steps can set variables for later steps
206206+- **Error handling**: Script exits on first failure
207207+- **Timestamps**: Every step logs start/end times
208208+- **Step boundaries**: Clear markers in logs
209209+210210+### Example generated script:
211211+212212+```bash
213213+#!/bin/bash
214214+set -e
215215+set -o pipefail
216216+217217+# Setup GITHUB_ENV, GITHUB_PATH, GITHUB_OUTPUT
218218+export GITHUB_ENV=/tmp/github/env
219219+export GITHUB_PATH=/tmp/github/path
220220+export GITHUB_OUTPUT=/tmp/github/output
221221+222222+# Helper functions
223223+add_to_env() { echo "$1=$2" >> $GITHUB_ENV; export "$1"="$2"; }
224224+add_to_path() { echo "$1" >> $GITHUB_PATH; export PATH="$1:$PATH"; }
225225+load_github_env() { source "$GITHUB_ENV" 2>/dev/null || true; }
226226+227227+##############################################################################
228228+# Step 1: Build
229229+##############################################################################
230230+echo "2025-11-06T15:00:00.000Z ===== Starting Step 1: Build ====="
231231+load_github_env
232232+cd /tangled/workspace
233233+go build ./... || { echo "Step 'Build' failed"; exit 1; }
234234+echo "2025-11-06T15:00:10.000Z ===== Completed Step 1: Build ====="
235235+236236+##############################################################################
237237+# Step 2: Test
238238+##############################################################################
239239+echo "2025-11-06T15:00:10.000Z ===== Starting Step 2: Test ====="
240240+load_github_env
241241+cd /tangled/workspace
242242+go test ./... || { echo "Step 'Test' failed"; exit 1; }
243243+echo "2025-11-06T15:00:20.000Z ===== Completed Step 2: Test ====="
244244+245245+exit 0
246246+```
247247+248248+## Configuration
249249+250250+Loom reads spindle configuration from environment variables (via `tangled.org/core/spindle/config`):
251251+252252+```bash
253253+SPINDLE_SERVER_LISTEN_ADDR=0.0.0.0:6555
254254+SPINDLE_SERVER_DB_PATH=/data/spindle.db
255255+SPINDLE_SERVER_HOSTNAME=loom.example.com
256256+SPINDLE_SERVER_OWNER=did:web:example.com
257257+SPINDLE_SERVER_QUEUE_SIZE=100
258258+SPINDLE_SERVER_MAX_JOB_COUNT=2
259259+SPINDLE_SERVER_LOG_DIR=/var/log/spindle
260260+SPINDLE_SERVER_SECRETS_PROVIDER=sqlite
261261+```
262262+263263+These are typically set in the Loom Deployment manifest.
264264+265265+## Status Reporting
266266+267267+Status flows through spindle's event system:
268268+269269+1. **Job status changes** detected by controller
270270+2. **Write to spindle DB**: `db.StatusPending/Running/Success/Failed()`
271271+3. **Broadcast via Notifier**: `notifier.NotifyAll()`
272272+4. **Stream to subscribers**: Knots connected to `/events` WebSocket receive updates
273273+274274+This follows spindle's standard pattern - no custom status reporter needed.
275275+276276+## Networking
277277+278278+### Internal (Cluster)
279279+- Spindle HTTP server listens on `:6555` inside operator pod
280280+- Controller creates a Kubernetes Service to expose it
281281+- Service name: `<spindleset-name>-spindle`
282282+- Endpoints available at: `http://<service>.<namespace>.svc.cluster.local:6555`
283283+284284+### External Access
285285+Options for external access:
286286+1. **NodePort**: Change Service type to expose on node IP
287287+2. **LoadBalancer**: Create cloud load balancer
288288+3. **Ingress**: Route external traffic to Service
289289+4. **Port forwarding**: For testing: `kubectl port-forward svc/spindleset-sample-spindle 6555:6555`
290290+291291+## Multi-Architecture Support
292292+293293+Jobs can target specific node architectures using NodeAffinity:
294294+295295+```yaml
296296+spec:
297297+ template:
298298+ architecture: arm64 # or amd64
299299+```
300300+301301+The jobbuilder generates NodeAffinity rules:
302302+```yaml
303303+affinity:
304304+ nodeAffinity:
305305+ requiredDuringSchedulingIgnoredDuringExecution:
306306+ nodeSelectorTerms:
307307+ - matchExpressions:
308308+ - key: kubernetes.io/arch
309309+ operator: In
310310+ values: [arm64]
311311+```
312312+313313+## Future Enhancements
314314+315315+### 1. Structured Logs Per Step
316316+Have the Job script POST progress updates:
317317+```bash
318318+curl -X POST $SPINDLE_STATUS_URL -d '{"step": 1, "status": "start"}'
319319+go build ./...
320320+curl -X POST $SPINDLE_STATUS_URL -d '{"step": 1, "status": "complete"}'
321321+```
322322+323323+### 2. Step-level Artifacts
324324+Mount a PVC for step outputs:
325325+```yaml
326326+volumes:
327327+- name: artifacts
328328+ persistentVolumeClaim:
329329+ claimName: workflow-artifacts
330330+```
331331+332332+### 3. Parallel Steps
333333+Support running independent steps in parallel (separate containers in same pod).
334334+335335+### 4. Dynamic Step Execution
336336+For advanced use cases, implement the exec pattern:
337337+- Create long-running Pod (not Job)
338338+- Implement `RunStep()` with `kubectl exec`
339339+- Trade efficiency for flexibility
340340+341341+## Comparison with Other Engines
342342+343343+| Feature | NixeryEngine | KubernetesEngine |
344344+|---------|-------------|------------------|
345345+| Runtime | Docker | Kubernetes |
346346+| Isolation | Container | Job (Pod) |
347347+| Step execution | Sequential exec | Baked script |
348348+| RunStep() | Active | No-op |
349349+| Overhead | Low | Medium (K8s API) |
350350+| Idiomatic | Yes (Docker) | Yes (K8s) |
351351+| Scalability | Limited | High |
352352+| Multi-tenancy | Shared host | Cluster-native |
353353+354354+## References
355355+356356+- **Spindle Core**: `/home/data/core/spindle/`
357357+- **NixeryEngine**: `/home/data/core/spindle/engines/nixery/engine.go`
358358+- **KubernetesEngine**: `/home/data/loom/internal/engine/kubernetes_engine.go`
359359+- **Job Builder**: `/home/data/loom/pkg/jobbuilder/`
360360+- **Script Builder**: `/home/data/loom/pkg/jobbuilder/script_builder.go`
+339
CLAUDE.md
···11+# Working with Claude Code on Loom
22+33+This document describes how Claude Code was used to develop the Loom operator and provides guidelines for continuing development with AI assistance.
44+55+## Project Context
66+77+**What is Loom?**
88+Loom is a Kubernetes operator that coordinates tangled.org Spindles - ephemeral CI/CD runners inspired by GitHub's Actions Runner Controller (ARC). It enables running pipeline workflows in Kubernetes in response to events from tangled.org knots.
99+1010+**Key Characteristics:**
1111+- Built with operator-sdk and Kubebuilder
1212+- Integrates with tangled.org's AT Protocol-based event system
1313+- Reuses components from `tangled.org/core/spindle`
1414+- Implements a Kubernetes-native execution engine
1515+1616+## Development Approach
1717+1818+### Research Phase
1919+Claude Code researched three key areas before implementation:
2020+2121+1. **GitHub Actions Runner Controller (ARC)**
2222+ - Studied three-tier controller architecture
2323+ - Learned JIT token patterns for security
2424+ - Understood ephemeral runner lifecycle
2525+ - Reviewed auto-scaling mechanisms
2626+2727+2. **Chainguard Kaniko Fork**
2828+ - Investigated rootless container builds
2929+ - Understood security benefits
3030+ - Evaluated integration patterns (deferred for MVP)
3131+3232+3. **tangled.org Spindles**
3333+ - Analyzed existing implementation in `/home/data/core/spindle`
3434+ - Studied Engine interface and models
3535+ - Understood WebSocket event ingestion
3636+ - Reviewed Nixery integration (simplified for MVP)
3737+3838+### Architectural Decisions
3939+4040+**Simplifications Made:**
4141+1. **No Nixery for MVP**: Use standard Docker images instead of dynamic Nix-based images
4242+2. **Ephemeral Jobs**: Scale-to-zero approach, one Job per pipeline
4343+3. **Kubernetes-native logging**: Stream logs via K8s API instead of disk-based
4444+4. **Direct WebSocket**: Maintain persistent connection to knot (not polling)
4545+4646+**Code Reuse Strategy:**
4747+- Import models and interfaces from `tangled.org/core/spindle`
4848+- Reuse WebSocket client logic
4949+- Implement new `KubernetesEngine` for Job-based execution
5050+- Avoid duplicating existing functionality
5151+5252+## Claude Code Usage Guidelines
5353+5454+### When to Use Claude Code
5555+5656+**Good Use Cases:**
5757+- Implementing boilerplate (CRDs, controllers, builders)
5858+- Generating Kubernetes manifests and RBAC
5959+- Creating test scaffolding
6060+- Refactoring for clarity
6161+- Adding logging and metrics
6262+- Updating documentation
6363+6464+**Not Recommended:**
6565+- Security-critical authentication logic (review carefully)
6666+- Complex AT Protocol interactions (defer to existing code)
6767+- Performance-critical paths (benchmark first)
6868+6969+### Prompting Best Practices
7070+7171+**Provide Context:**
7272+```
7373+I want to add [feature]. We're using the KubernetesEngine from
7474+internal/engine/kubernetes_engine.go which implements the Engine
7575+interface from tangled.org/core/spindle/models. The feature should
7676+[specific behavior].
7777+```
7878+7979+**Reference Existing Code:**
8080+```
8181+Look at how core/spindle/engines/nixery/engine.go handles [X].
8282+We need similar logic in our KubernetesEngine but adapted for
8383+Kubernetes Jobs instead of Docker containers.
8484+```
8585+8686+**Specify Constraints:**
8787+```
8888+Implement [feature] but:
8989+1. Reuse the existing Job builder pattern
9090+2. Add Prometheus metrics
9191+3. Follow the error handling pattern from the controller
9292+4. Don't break multi-arch support
9393+```
9494+9595+### Common Patterns in This Project
9696+9797+**1. Job Creation Pattern**
9898+```go
9999+// Jobs are owned by SpindleSet for automatic cleanup
100100+ctrl.SetControllerReference(spindleSet, job, r.Scheme)
101101+102102+// Jobs labeled for querying
103103+labels := map[string]string{
104104+ "loom.j5t.io/spindleset": spindleSet.Name,
105105+ "loom.j5t.io/workflow": workflowName,
106106+}
107107+```
108108+109109+**2. Status Update Pattern**
110110+```go
111111+// Always update status in defer to ensure updates even on errors
112112+defer func() {
113113+ if err := r.Status().Update(ctx, spindleSet); err != nil {
114114+ log.Error(err, "Failed to update SpindleSet status")
115115+ }
116116+}()
117117+```
118118+119119+**3. Multi-Arch Node Targeting**
120120+```go
121121+// Architecture specified in workflow, translated to node affinity
122122+affinity := &corev1.Affinity{
123123+ NodeAffinity: &corev1.NodeAffinity{
124124+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
125125+ NodeSelectorTerms: []corev1.NodeSelectorTerm{{
126126+ MatchExpressions: []corev1.NodeSelectorRequirement{{
127127+ Key: "kubernetes.io/arch",
128128+ Operator: corev1.NodeSelectorOpIn,
129129+ Values: []string{workflow.Architecture},
130130+ }},
131131+ }},
132132+ },
133133+ },
134134+}
135135+```
136136+137137+**4. Metrics Pattern**
138138+```go
139139+// Register metrics in init()
140140+func init() {
141141+ metrics.Registry.MustRegister(runningSpindles, completedSpindles)
142142+}
143143+144144+// Update metrics in controller
145145+runningSpindles.Set(float64(spindleSet.Status.RunningJobs))
146146+```
147147+148148+## Development Workflow
149149+150150+### Standard Development Cycle
151151+152152+1. **Plan**: Update `PLAN.md` with new feature/phase
153153+2. **Implement**: Generate code with Claude Code
154154+3. **Test**: Run unit tests (`make test`)
155155+4. **Generate**: Update manifests (`make manifests`)
156156+5. **Deploy**: Test in cluster (`make deploy`)
157157+6. **Iterate**: Fix issues, update docs
158158+159159+### Key Commands
160160+161161+```bash
162162+# Generate CRDs and code
163163+make manifests generate
164164+165165+# Run tests
166166+make test
167167+168168+# Build operator
169169+make build
170170+171171+# Deploy to cluster
172172+make deploy IMG=<your-registry>/loom:tag
173173+174174+# Run locally (for debugging)
175175+make install run
176176+177177+# Clean up
178178+make undeploy
179179+```
180180+181181+### Testing Strategy
182182+183183+**Unit Tests:**
184184+- Mock Kubernetes clients
185185+- Test Job builder logic
186186+- Test affinity generation
187187+- Test script builder
188188+189189+**Integration Tests:**
190190+- Deploy to test cluster (kind/k3s)
191191+- Create SpindleSet CR
192192+- Simulate pipeline events
193193+- Verify Jobs created correctly
194194+- Check logs and status
195195+196196+**Manual Testing:**
197197+- Deploy to real cluster with mixed amd64/arm64 nodes
198198+- Connect to real tangled.org knot
199199+- Trigger actual pipeline runs
200200+- Monitor with Prometheus/Grafana
201201+202202+## Code Organization
203203+204204+### Package Structure
205205+206206+```
207207+loom/
208208+├── api/v1alpha1/ # CRD types (SpindleSet)
209209+├── internal/
210210+│ ├── controller/ # Reconciliation logic
211211+│ └── engine/ # KubernetesEngine implementation
212212+├── pkg/
213213+│ ├── ingester/ # WebSocket client
214214+│ ├── jobbuilder/ # Job template generation
215215+│ └── knot/ # Knot API client
216216+└── config/ # Kubernetes manifests
217217+```
218218+219219+### Import Guidelines
220220+221221+**From tangled.org/core:**
222222+```go
223223+// Models and interfaces
224224+import "tangled.org/core/spindle/models"
225225+import "tangled.org/core/api/tangled"
226226+227227+// Adapt, don't import directly:
228228+// - WebSocket client (pkg/ingester)
229229+// - Status reporting (pkg/knot)
230230+```
231231+232232+**Kubernetes:**
233233+```go
234234+import batchv1 "k8s.io/api/batch/v1"
235235+import corev1 "k8s.io/api/core/v1"
236236+import ctrl "sigs.k8s.io/controller-runtime"
237237+```
238238+239239+## Common Tasks
240240+241241+### Adding a New Field to SpindleSet
242242+243243+1. Edit `api/v1alpha1/spindleset_types.go`
244244+2. Add field with proper kubebuilder markers
245245+3. Run `make manifests generate`
246246+4. Update sample CRs in `config/samples/`
247247+5. Update controller logic to use field
248248+6. Add tests for new behavior
249249+250250+### Changing Job Template
251251+252252+1. Edit `pkg/jobbuilder/job_template.go`
253253+2. Update Job generation logic
254254+3. Run unit tests (`make test`)
255255+4. Deploy and test (`make deploy`)
256256+5. Update documentation if needed
257257+258258+### Adding Prometheus Metrics
259259+260260+1. Define metric in controller file
261261+2. Register in `init()` function
262262+3. Update metric in reconciliation loop
263263+4. Add metric to Prometheus config
264264+5. Document in `PLAN.md` metrics section
265265+266266+## Troubleshooting
267267+268268+### Common Issues
269269+270270+**Issue: CRD not updating**
271271+```bash
272272+# Regenerate and reinstall
273273+make manifests
274274+make install
275275+```
276276+277277+**Issue: Controller not reconciling**
278278+```bash
279279+# Check logs
280280+kubectl logs -n loom-system deployment/loom-controller-manager
281281+282282+# Check RBAC
283283+kubectl auth can-i create jobs --as=system:serviceaccount:loom-system:loom-controller-manager
284284+```
285285+286286+**Issue: Jobs stuck pending**
287287+```bash
288288+# Check job spec
289289+kubectl get job <job-name> -o yaml
290290+291291+# Check events
292292+kubectl describe job <job-name>
293293+294294+# Check node availability
295295+kubectl get nodes -L kubernetes.io/arch
296296+```
297297+298298+**Issue: WebSocket disconnecting**
299299+```bash
300300+# Check SpindleSet status
301301+kubectl get spindleset -o wide
302302+303303+# Check controller logs for connection errors
304304+kubectl logs -n loom-system deployment/loom-controller-manager | grep -i websocket
305305+```
306306+307307+## Contributing
308308+309309+### Before Asking Claude Code for Help
310310+311311+1. Read the existing code in the affected area
312312+2. Check `PLAN.md` for architectural context
313313+3. Review similar implementations in the codebase
314314+4. Check tangled.org/core for reusable components
315315+316316+### After Getting Code from Claude Code
317317+318318+1. Review generated code for correctness
319319+2. Run tests (`make test`)
320320+3. Check for proper error handling
321321+4. Verify metrics are updated
322322+5. Update documentation
323323+6. Test in real cluster
324324+325325+## Resources
326326+327327+- **Kubebuilder Book**: https://book.kubebuilder.io/
328328+- **Controller Runtime**: https://pkg.go.dev/sigs.k8s.io/controller-runtime
329329+- **GitHub ARC**: https://github.com/actions/actions-runner-controller
330330+- **tangled.org Core**: /home/data/core/
331331+- **Operator SDK**: https://sdk.operatorframework.io/
332332+333333+## Notes
334334+335335+- This project prioritizes simplicity over completeness for MVP
336336+- Code reuse from tangled.org/core is preferred over reimplementation
337337+- Multi-architecture support is a first-class concern
338338+- Prometheus metrics are essential, not optional
339339+- Security (RBAC, secrets) should be carefully reviewed
+500
CONFIGURATION.md
···11+# Loom Configuration Guide
22+33+Loom is configured via environment variables that are passed to the embedded spindle server.
44+55+## Quick Start
66+77+1. Create namespace: `kubectl create namespace loom-system`
88+2. Apply CRDs: `kubectl apply -f config/crd/bases/`
99+3. Apply RBAC: `kubectl apply -f config/rbac/`
1010+4. Deploy operator with config (see below)
1111+5. Create SpindleSet: `kubectl apply -f config/samples/`
1212+1313+## Required Environment Variables
1414+1515+These must be set in the Loom operator Deployment:
1616+1717+```yaml
1818+apiVersion: apps/v1
1919+kind: Deployment
2020+metadata:
2121+ name: loom-controller-manager
2222+ namespace: loom-system
2323+spec:
2424+ replicas: 1
2525+ selector:
2626+ matchLabels:
2727+ control-plane: controller-manager
2828+ template:
2929+ metadata:
3030+ labels:
3131+ control-plane: controller-manager
3232+ spec:
3333+ containers:
3434+ - name: manager
3535+ image: loom:latest
3636+ env:
3737+ # Required: Spindle server configuration
3838+ - name: SPINDLE_SERVER_LISTEN_ADDR
3939+ value: "0.0.0.0:6555"
4040+4141+ - name: SPINDLE_SERVER_DB_PATH
4242+ value: "/data/spindle.db"
4343+4444+ - name: SPINDLE_SERVER_HOSTNAME
4545+ value: "loom.example.com" # Change to your domain
4646+4747+ - name: SPINDLE_SERVER_OWNER
4848+ value: "did:web:example.com" # Change to your DID
4949+5050+ - name: SPINDLE_SERVER_JETSTREAM_ENDPOINT
5151+ value: "wss://jetstream1.us-west.bsky.network/subscribe"
5252+5353+ # Optional: Adjust queue settings
5454+ - name: SPINDLE_SERVER_QUEUE_SIZE
5555+ value: "100"
5656+5757+ - name: SPINDLE_SERVER_MAX_JOB_COUNT
5858+ value: "2" # Max concurrent workflows
5959+6060+ # Optional: Log directory (not used much in K8s)
6161+ - name: SPINDLE_SERVER_LOG_DIR
6262+ value: "/var/log/spindle"
6363+6464+ # Optional: Development mode
6565+ - name: SPINDLE_SERVER_DEV
6666+ value: "false"
6767+6868+ # Optional: Secrets provider (default: sqlite)
6969+ - name: SPINDLE_SERVER_SECRETS_PROVIDER
7070+ value: "sqlite"
7171+7272+ ports:
7373+ - containerPort: 6555
7474+ name: http
7575+ protocol: TCP
7676+7777+ volumeMounts:
7878+ - name: data
7979+ mountPath: /data
8080+8181+ volumes:
8282+ - name: data
8383+ emptyDir: {} # Or use PersistentVolume for persistence
8484+```
8585+8686+## Environment Variable Reference
8787+8888+### SPINDLE_SERVER_LISTEN_ADDR
8989+**Required**: Yes
9090+**Default**: `0.0.0.0:6555`
9191+**Description**: Address and port for spindle HTTP server
9292+**Example**: `0.0.0.0:6555`
9393+9494+The spindle HTTP server exposes:
9595+- `GET /events` - WebSocket for status updates
9696+- `GET /logs/{knot}/{rkey}/{name}` - WebSocket for logs
9797+- `POST /xrpc/*` - XRPC endpoints
9898+9999+### SPINDLE_SERVER_DB_PATH
100100+**Required**: Yes
101101+**Default**: `spindle.db`
102102+**Description**: Path to SQLite database file
103103+**Example**: `/data/spindle.db`
104104+105105+**Important**: Use a persistent volume if you want to retain data across pod restarts.
106106+107107+### SPINDLE_SERVER_HOSTNAME
108108+**Required**: Yes
109109+**Default**: None
110110+**Description**: Hostname for this spindle instance (used in DID)
111111+**Example**: `loom.example.com`
112112+113113+This is used to construct the spindle's DID: `did:web:<hostname>`
114114+115115+### SPINDLE_SERVER_OWNER
116116+**Required**: Yes
117117+**Default**: None
118118+**Description**: DID of the spindle owner
119119+**Example**: `did:web:example.com` or `did:plc:abc123...`
120120+121121+The owner has full control over the spindle instance.
122122+123123+### SPINDLE_SERVER_JETSTREAM_ENDPOINT
124124+**Required**: Yes
125125+**Default**: `wss://jetstream1.us-west.bsky.network/subscribe`
126126+**Description**: Bluesky jetstream endpoint for ingesting member/repo records
127127+**Example**: `wss://jetstream1.us-west.bsky.network/subscribe`
128128+129129+Used to watch for:
130130+- `sh.tangled.spindleMember` records
131131+- `sh.tangled.repo` records
132132+- `sh.tangled.repoCollaborator` records
133133+134134+### SPINDLE_SERVER_QUEUE_SIZE
135135+**Required**: No
136136+**Default**: `100`
137137+**Description**: Maximum number of jobs that can be queued
138138+**Example**: `100`
139139+140140+If queue is full, new pipeline events will be rejected.
141141+142142+### SPINDLE_SERVER_MAX_JOB_COUNT
143143+**Required**: No
144144+**Default**: `2`
145145+**Description**: Maximum number of workflows running concurrently
146146+**Example**: `5`
147147+148148+Controls how many Kubernetes Jobs will run in parallel.
149149+150150+### SPINDLE_SERVER_LOG_DIR
151151+**Required**: No
152152+**Default**: `/var/log/spindle`
153153+**Description**: Directory for workflow logs (not heavily used in Kubernetes)
154154+**Example**: `/var/log/spindle`
155155+156156+KubernetesEngine logs are primarily captured via Kubernetes pod logs, not local files.
157157+158158+### SPINDLE_SERVER_DEV
159159+**Required**: No
160160+**Default**: `false`
161161+**Description**: Enable development mode
162162+**Example**: `true`
163163+164164+Development mode may skip some validations or enable verbose logging.
165165+166166+### SPINDLE_SERVER_SECRETS_PROVIDER
167167+**Required**: No
168168+**Default**: `sqlite`
169169+**Description**: Secrets storage backend
170170+**Options**: `sqlite`, `openbao`
171171+172172+#### SQLite Provider (default)
173173+Stores secrets in the SQLite database.
174174+175175+No additional configuration needed.
176176+177177+#### OpenBao Provider
178178+Stores secrets in OpenBao vault.
179179+180180+**Additional environment variables required:**
181181+```yaml
182182+- name: SPINDLE_SERVER_SECRETS_PROVIDER
183183+ value: "openbao"
184184+185185+- name: SPINDLE_SERVER_SECRETS_OPENBAO_PROXY_ADDR
186186+ value: "http://openbao:8200"
187187+188188+- name: SPINDLE_SERVER_SECRETS_OPENBAO_MOUNT
189189+ value: "spindle" # Default mount path
190190+```
191191+192192+## Persistence
193193+194194+### SQLite Database
195195+196196+The SQLite database contains:
197197+- Repos being watched
198198+- Spindle members
199199+- Pipeline events
200200+- Status history
201201+- Secrets (if using sqlite provider)
202202+203203+**For production, use a PersistentVolume:**
204204+205205+```yaml
206206+volumes:
207207+- name: data
208208+ persistentVolumeClaim:
209209+ claimName: spindle-data
210210+211211+---
212212+apiVersion: v1
213213+kind: PersistentVolumeClaim
214214+metadata:
215215+ name: spindle-data
216216+ namespace: loom-system
217217+spec:
218218+ accessModes:
219219+ - ReadWriteOnce
220220+ resources:
221221+ requests:
222222+ storage: 10Gi
223223+```
224224+225225+### Log Directory
226226+227227+Not critical for Kubernetes deployments since logs are captured via pod logs.
228228+229229+You can omit the log directory volume or use emptyDir.
230230+231231+## Service Configuration
232232+233233+The controller automatically creates a Kubernetes Service to expose the spindle HTTP server:
234234+235235+```yaml
236236+apiVersion: v1
237237+kind: Service
238238+metadata:
239239+ name: spindleset-sample-spindle
240240+ namespace: default
241241+spec:
242242+ selector:
243243+ control-plane: controller-manager # Selects Loom operator pod
244244+ ports:
245245+ - name: http
246246+ port: 6555
247247+ targetPort: 6555
248248+ type: ClusterIP
249249+```
250250+251251+### External Access
252252+253253+**Option 1: NodePort**
254254+```yaml
255255+spec:
256256+ type: NodePort
257257+ ports:
258258+ - name: http
259259+ port: 6555
260260+ targetPort: 6555
261261+ nodePort: 30655
262262+```
263263+264264+**Option 2: LoadBalancer**
265265+```yaml
266266+spec:
267267+ type: LoadBalancer
268268+ ports:
269269+ - name: http
270270+ port: 6555
271271+ targetPort: 6555
272272+```
273273+274274+**Option 3: Ingress**
275275+```yaml
276276+apiVersion: networking.k8s.io/v1
277277+kind: Ingress
278278+metadata:
279279+ name: spindle-ingress
280280+ namespace: loom-system
281281+spec:
282282+ rules:
283283+ - host: loom.example.com
284284+ http:
285285+ paths:
286286+ - path: /
287287+ pathType: Prefix
288288+ backend:
289289+ service:
290290+ name: spindleset-sample-spindle
291291+ port:
292292+ number: 6555
293293+```
294294+295295+**Option 4: Port Forward (Testing)**
296296+```bash
297297+kubectl port-forward -n loom-system svc/spindleset-sample-spindle 6555:6555
298298+```
299299+300300+## SpindleSet CRD
301301+302302+The SpindleSet resource configures workflow execution (not spindle server config):
303303+304304+```yaml
305305+apiVersion: loom.j5t.io/v1alpha1
306306+kind: SpindleSet
307307+metadata:
308308+ name: spindleset-sample
309309+ namespace: default
310310+spec:
311311+ # URL of the knot this spindle serves (currently not used)
312312+ knotUrl: https://knot1.tangled.sh
313313+314314+ # Kubernetes secret containing auth credentials (currently not used)
315315+ knotAuthSecret: spindle-auth
316316+317317+ # Maximum concurrent workflows (not enforced yet - uses SPINDLE_SERVER_MAX_JOB_COUNT)
318318+ maxConcurrentJobs: 5
319319+320320+ # Template for Kubernetes Jobs
321321+ template:
322322+ # Resource limits for workflow pods
323323+ resources:
324324+ requests:
325325+ cpu: "500m"
326326+ memory: "1Gi"
327327+ limits:
328328+ cpu: "2"
329329+ memory: "4Gi"
330330+331331+ # Node selector for scheduling
332332+ nodeSelector:
333333+ disktype: ssd
334334+335335+ # Tolerations for taints
336336+ tolerations:
337337+ - key: workload
338338+ operator: Equal
339339+ value: ci
340340+ effect: NoSchedule
341341+342342+ # Additional affinity rules (merged with architecture affinity)
343343+ affinity:
344344+ nodeAffinity:
345345+ preferredDuringSchedulingIgnoredDuringExecution:
346346+ - weight: 100
347347+ preference:
348348+ matchExpressions:
349349+ - key: node-role.kubernetes.io/worker
350350+ operator: In
351351+ values: [true]
352352+```
353353+354354+**Note**: Currently the SpindleSet is mainly used to trigger Service creation. The spindle server configuration comes from environment variables in the operator Deployment.
355355+356356+## Complete Deployment Example
357357+358358+See `config/manager/manager.yaml` for the full operator deployment manifest.
359359+360360+Minimal example:
361361+362362+```yaml
363363+apiVersion: v1
364364+kind: Namespace
365365+metadata:
366366+ name: loom-system
367367+368368+---
369369+apiVersion: v1
370370+kind: ServiceAccount
371371+metadata:
372372+ name: loom-controller-manager
373373+ namespace: loom-system
374374+375375+---
376376+# RBAC manifests from config/rbac/...
377377+378378+---
379379+apiVersion: apps/v1
380380+kind: Deployment
381381+metadata:
382382+ name: loom-controller-manager
383383+ namespace: loom-system
384384+spec:
385385+ replicas: 1
386386+ selector:
387387+ matchLabels:
388388+ control-plane: controller-manager
389389+ template:
390390+ metadata:
391391+ labels:
392392+ control-plane: controller-manager
393393+ spec:
394394+ serviceAccountName: loom-controller-manager
395395+ containers:
396396+ - name: manager
397397+ image: loom:latest
398398+ command:
399399+ - /manager
400400+ env:
401401+ - name: SPINDLE_SERVER_LISTEN_ADDR
402402+ value: "0.0.0.0:6555"
403403+ - name: SPINDLE_SERVER_DB_PATH
404404+ value: "/data/spindle.db"
405405+ - name: SPINDLE_SERVER_HOSTNAME
406406+ value: "loom.example.com"
407407+ - name: SPINDLE_SERVER_OWNER
408408+ value: "did:web:example.com"
409409+ - name: SPINDLE_SERVER_JETSTREAM_ENDPOINT
410410+ value: "wss://jetstream1.us-west.bsky.network/subscribe"
411411+ - name: SPINDLE_SERVER_QUEUE_SIZE
412412+ value: "100"
413413+ - name: SPINDLE_SERVER_MAX_JOB_COUNT
414414+ value: "2"
415415+ ports:
416416+ - containerPort: 6555
417417+ name: http
418418+ - containerPort: 8081
419419+ name: healthz
420420+ livenessProbe:
421421+ httpGet:
422422+ path: /healthz
423423+ port: healthz
424424+ readinessProbe:
425425+ httpGet:
426426+ path: /readyz
427427+ port: healthz
428428+ volumeMounts:
429429+ - name: data
430430+ mountPath: /data
431431+ volumes:
432432+ - name: data
433433+ emptyDir: {}
434434+435435+---
436436+apiVersion: loom.j5t.io/v1alpha1
437437+kind: SpindleSet
438438+metadata:
439439+ name: default-spindle
440440+ namespace: loom-system
441441+spec:
442442+ knotUrl: https://knot1.tangled.sh
443443+ maxConcurrentJobs: 5
444444+```
445445+446446+## Testing Connectivity
447447+448448+### 1. Check Service Created
449449+```bash
450450+kubectl get svc -A | grep spindle
451451+```
452452+453453+Should show: `default-spindle-spindle`
454454+455455+### 2. Port Forward to Access Locally
456456+```bash
457457+kubectl port-forward -n loom-system svc/default-spindle-spindle 6555:6555
458458+```
459459+460460+### 3. Test /events WebSocket
461461+```bash
462462+websocat ws://localhost:6555/events
463463+```
464464+465465+Should connect and start streaming events (or stay connected waiting for events).
466466+467467+### 4. Check Logs
468468+```bash
469469+kubectl logs -n loom-system -l control-plane=controller-manager
470470+```
471471+472472+Should show:
473473+- "spindle server initialized successfully"
474474+- "starting spindle HTTP server"
475475+- "Spindle HTTP service created successfully"
476476+477477+## Troubleshooting
478478+479479+### "spindle server error: address already in use"
480480+Port 6555 is already bound. Check if another process is using it or if you have multiple operator replicas.
481481+482482+### "failed to load spindle config: SPINDLE_SERVER_HOSTNAME is required"
483483+Missing required environment variable. Check your Deployment manifest.
484484+485485+### Service not created
486486+Check controller logs for errors. Ensure RBAC permissions for Services are granted.
487487+488488+### Jobs not creating
489489+- Check spindle can reach knot WebSocket
490490+- Verify EventConsumer is running (logs should show "starting knot event consumer")
491491+- Check if repos are registered in spindle database
492492+493493+### Database locked errors
494494+SQLite can't be shared across multiple pods. Set `replicas: 1` or use a different database solution.
495495+496496+## Next Steps
497497+498498+- [Architecture Overview](./ARCHITECTURE.md) - Understand how Loom works
499499+- [Workflow Configuration](./WORKFLOWS.md) - Write workflow YAML files
500500+- [Upstream Improvements](./TANGLED.md) - Contribute to tangled.org/core
+34
Dockerfile
···11+# Build the manager binary
22+FROM golang:1.24 AS builder
33+ARG TARGETOS
44+ARG TARGETARCH
55+66+WORKDIR /workspace
77+# Copy the core module (for local replace directive)
88+COPY core/ core/
99+# Copy the Go Modules manifests
1010+COPY loom/go.mod loom/go.mod
1111+COPY loom/go.sum loom/go.sum
1212+# cache deps before building and copying source so that we don't need to re-download as much
1313+# and so that source changes don't invalidate our downloaded layer
1414+WORKDIR /workspace/loom
1515+RUN go mod download
1616+1717+# Copy the go source
1818+COPY loom/cmd/main.go cmd/main.go
1919+COPY loom/api/ api/
2020+COPY loom/internal/ internal/
2121+COPY loom/pkg/ pkg/
2222+2323+# Build
2424+# CGO is required for go-sqlite3
2525+RUN CGO_ENABLED=1 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go
2626+2727+# Use distroless base image with glibc (not static) since we need CGO
2828+# Refer to https://github.com/GoogleContainerTools/distroless for more details
2929+FROM gcr.io/distroless/base-debian12:nonroot
3030+WORKDIR /
3131+COPY --from=builder /workspace/loom/manager .
3232+USER 65532:65532
3333+3434+ENTRYPOINT ["/manager"]
+1066
Kubernetes Operator for Github Actions Runner.md
···11+# Kubernetes Operator for Github Actions Runner
22+Created at 2025-11-03 05:24:53 UTC
33+44+> @ 2025-11-03 05:24:53 UTC
55+>
66+> I want to implement a kubernetes operator similar to github actions runner controller. I want to primarily focus on the logic for spawning a “runner” container via jobs and polling logs as it’s processing. I also want to make sure i can build docker containers within these runners. And the jobs should have logic to target nodes based on os architecture. This is supposed to be a rough poc, i already have go logic that works for reading a yaml file and figuring out what workflow to run. It just needs to send that job/step to a job in kubes
77+88+<details><summary>Thinking</summary>
99+1010+The user is asking for help implementing a Kubernetes operator similar to GitHub Actions runner controller. They want to:
1111+1. Spawn runner containers via Jobs
1212+2. Poll logs as it's processing
1313+3. Build docker containers within runners
1414+4. Target nodes based on OS architecture
1515+5. This is a rough POC and they already have Go logic to read YAML and figure out what workflow to run
1616+1717+I'll provide a comprehensive answer covering the implementation of these requirements.
1818+1919+</details>
2020+2121+I'll help you implement a Kubernetes operator for spawning runner containers similar to GitHub Actions Runner Controller. Here's a comprehensive approach for your POC:
2222+2323+## **Kubernetes Job Spawning Logic**
2424+2525+Here's the core Go code structure for creating and managing Jobs:
2626+2727+```go
2828+package operator
2929+3030+import (
3131+ "context"
3232+ "fmt"
3333+ batchv1 "k8s.io/api/batch/v1"
3434+ corev1 "k8s.io/api/core/v1"
3535+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
3636+ "k8s.io/client-go/kubernetes"
3737+)
3838+3939+type RunnerController struct {
4040+ clientset *kubernetes.Clientset
4141+ namespace string
4242+}
4343+4444+func (rc *RunnerController) CreateRunnerJob(workflowID, stepName, stepCommand string, arch string) (*batchv1.Job, error) {
4545+ job := &batchv1.Job{
4646+ ObjectMeta: metav1.ObjectMeta{
4747+ Name: fmt.Sprintf("runner-%s-%s", workflowID, stepName),
4848+ Namespace: rc.namespace,
4949+ Labels: map[string]string{
5050+ "workflow-id": workflowID,
5151+ "step-name": stepName,
5252+ "type": "runner",
5353+ },
5454+ },
5555+ Spec: batchv1.JobSpec{
5656+ Template: corev1.PodTemplateSpec{
5757+ ObjectMeta: metav1.ObjectMeta{
5858+ Labels: map[string]string{
5959+ "workflow-id": workflowID,
6060+ "step-name": stepName,
6161+ },
6262+ },
6363+ Spec: corev1.PodSpec{
6464+ RestartPolicy: corev1.RestartPolicyNever,
6565+ NodeSelector: map[string]string{
6666+ "kubernetes.io/arch": arch, // amd64, arm64, etc.
6767+ },
6868+ Containers: []corev1.Container{
6969+ {
7070+ Name: "runner",
7171+ Image: "docker:dind", // Docker in Docker image
7272+ Command: []string{"/bin/sh", "-c"},
7373+ Args: []string{stepCommand},
7474+ Env: []corev1.EnvVar{
7575+ {
7676+ Name: "DOCKER_HOST",
7777+ Value: "tcp://localhost:2375",
7878+ },
7979+ },
8080+ VolumeMounts: []corev1.VolumeMount{
8181+ {
8282+ Name: "docker-graph-storage",
8383+ MountPath: "/var/lib/docker",
8484+ },
8585+ },
8686+ },
8787+ {
8888+ Name: "dind-daemon",
8989+ Image: "docker:dind",
9090+ Command: []string{"dockerd"},
9191+ Args: []string{
9292+ "--host=tcp://0.0.0.0:2375",
9393+ "--storage-driver=overlay2",
9494+ },
9595+ SecurityContext: &corev1.SecurityContext{
9696+ Privileged: &[]bool{true}[0],
9797+ },
9898+ VolumeMounts: []corev1.VolumeMount{
9999+ {
100100+ Name: "docker-graph-storage",
101101+ MountPath: "/var/lib/docker",
102102+ },
103103+ },
104104+ },
105105+ },
106106+ Volumes: []corev1.Volume{
107107+ {
108108+ Name: "docker-graph-storage",
109109+ VolumeSource: corev1.VolumeSource{
110110+ EmptyDir: &corev1.EmptyDirVolumeSource{},
111111+ },
112112+ },
113113+ },
114114+ },
115115+ },
116116+ },
117117+ }
118118+119119+ return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), job, metav1.CreateOptions{})
120120+}
121121+```
122122+123123+## **Docker-in-Docker Configuration**
124124+125125+For building Docker containers within your runners, you have two main approaches [^6][^2]:
126126+127127+### **Option 1: Docker-in-Docker (DinD)**
128128+The above example uses a sidecar container approach with the `docker:dind` image. This provides full Docker functionality but requires **privileged mode**.
129129+130130+### **Option 2: Kaniko (Rootless Alternative)**
131131+For better security, consider using Kaniko for building images without Docker daemon:
132132+133133+```go
134134+// Alternative container spec for Kaniko
135135+{
136136+ Name: "runner",
137137+ Image: "gcr.io/kaniko-project/executor:latest",
138138+ Args: []string{
139139+ "--dockerfile=Dockerfile",
140140+ "--context=git://github.com/your-repo.git",
141141+ "--destination=your-registry/image:tag",
142142+ },
143143+}
144144+```
145145+146146+## **Log Polling Implementation**
147147+148148+Here's how to poll and stream logs from your runner Jobs [^1][^3]:
149149+150150+```go
151151+import (
152152+ "bufio"
153153+ "io"
154154+ "time"
155155+ "k8s.io/client-go/kubernetes"
156156+)
157157+158158+func (rc *RunnerController) PollJobLogs(jobName string) error {
159159+ // Wait for pod to be created
160160+ time.Sleep(2 * time.Second)
161161+162162+ // Get pods for the job
163163+ labelSelector := fmt.Sprintf("job-name=%s", jobName)
164164+ pods, err := rc.clientset.CoreV1().Pods(rc.namespace).List(
165165+ context.TODO(),
166166+ metav1.ListOptions{LabelSelector: labelSelector},
167167+ )
168168+ if err != nil || len(pods.Items) == 0 {
169169+ return fmt.Errorf("no pods found for job %s", jobName)
170170+ }
171171+172172+ podName := pods.Items[0].Name
173173+174174+ // Stream logs
175175+ req := rc.clientset.CoreV1().Pods(rc.namespace).GetLogs(podName, &corev1.PodLogOptions{
176176+ Follow: true,
177177+ Container: "runner",
178178+ Timestamps: true,
179179+ })
180180+181181+ stream, err := req.Stream(context.TODO())
182182+ if err != nil {
183183+ return err
184184+ }
185185+ defer stream.Close()
186186+187187+ // Read and process logs
188188+ reader := bufio.NewReader(stream)
189189+ for {
190190+ line, err := reader.ReadString('\n')
191191+ if err != nil {
192192+ if err == io.EOF {
193193+ break
194194+ }
195195+ return err
196196+ }
197197+ // Process log line (send to channel, write to file, etc.)
198198+ fmt.Print(line)
199199+ }
200200+201201+ return nil
202202+}
203203+```
204204+205205+## **Architecture-Based Node Targeting**
206206+207207+To target specific node architectures, use **nodeSelector** or **nodeAffinity** [^1]:
208208+209209+```go
210210+// Enhanced node selection with affinity
211211+Spec: corev1.PodSpec{
212212+ Affinity: &corev1.Affinity{
213213+ NodeAffinity: &corev1.NodeAffinity{
214214+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
215215+ NodeSelectorTerms: []corev1.NodeSelectorTerm{
216216+ {
217217+ MatchExpressions: []corev1.NodeSelectorRequirement{
218218+ {
219219+ Key: "kubernetes.io/arch",
220220+ Operator: corev1.NodeSelectorOpIn,
221221+ Values: []string{arch}, // "amd64", "arm64"
222222+ },
223223+ {
224224+ Key: "kubernetes.io/os",
225225+ Operator: corev1.NodeSelectorOpIn,
226226+ Values: []string{"linux"},
227227+ },
228228+ },
229229+ },
230230+ },
231231+ },
232232+ },
233233+ },
234234+ // ... rest of pod spec
235235+}
236236+```
237237+238238+## **Job Monitoring and Cleanup**
239239+240240+Add logic to monitor job completion and cleanup:
241241+242242+```go
243243+func (rc *RunnerController) WaitForJobCompletion(jobName string, timeout time.Duration) (*batchv1.Job, error) {
244244+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
245245+ defer cancel()
246246+247247+ ticker := time.NewTicker(5 * time.Second)
248248+ defer ticker.Stop()
249249+250250+ for {
251251+ select {
252252+ case <-ctx.Done():
253253+ return nil, fmt.Errorf("job %s timed out", jobName)
254254+ case <-ticker.C:
255255+ job, err := rc.clientset.BatchV1().Jobs(rc.namespace).Get(
256256+ context.TODO(), jobName, metav1.GetOptions{},
257257+ )
258258+ if err != nil {
259259+ return nil, err
260260+ }
261261+262262+ if job.Status.Succeeded > 0 {
263263+ return job, nil
264264+ }
265265+ if job.Status.Failed > 0 {
266266+ return job, fmt.Errorf("job %s failed", jobName)
267267+ }
268268+ }
269269+ }
270270+}
271271+272272+func (rc *RunnerController) CleanupJob(jobName string) error {
273273+ propagationPolicy := metav1.DeletePropagationBackground
274274+ return rc.clientset.BatchV1().Jobs(rc.namespace).Delete(
275275+ context.TODO(),
276276+ jobName,
277277+ metav1.DeleteOptions{
278278+ PropagationPolicy: &propagationPolicy,
279279+ },
280280+ )
281281+}
282282+```
283283+284284+## **Integration with Your Workflow Parser**
285285+286286+Here's how to tie it all together with your existing YAML parser:
287287+288288+```go
289289+type WorkflowExecutor struct {
290290+ controller *RunnerController
291291+}
292292+293293+func (we *WorkflowExecutor) ExecuteWorkflow(workflow *WorkflowSpec) error {
294294+ for _, step := range workflow.Steps {
295295+ // Create job for step
296296+ job, err := we.controller.CreateRunnerJob(
297297+ workflow.ID,
298298+ step.Name,
299299+ step.Command,
300300+ step.Architecture, // from your YAML
301301+ )
302302+ if err != nil {
303303+ return err
304304+ }
305305+306306+ // Start log polling in goroutine
307307+ go we.controller.PollJobLogs(job.Name)
308308+309309+ // Wait for completion
310310+ completedJob, err := we.controller.WaitForJobCompletion(job.Name, 30*time.Minute)
311311+ if err != nil {
312312+ return err
313313+ }
314314+315315+ // Optional: cleanup
316316+ if workflow.CleanupOnComplete {
317317+ we.controller.CleanupJob(job.Name)
318318+ }
319319+ }
320320+ return nil
321321+}
322322+```
323323+324324+## **Key Considerations**
325325+326326+- **Security**: Running Docker-in-Docker requires privileged containers. Consider using Kaniko or Buildah for rootless builds
327327+- **Resource Management**: Set appropriate resource limits on Jobs to prevent resource exhaustion
328328+- **Persistent Storage**: For caching Docker layers between builds, consider using PersistentVolumes instead of EmptyDir
329329+- **RBAC**: Ensure your operator has proper ServiceAccount permissions to create/delete Jobs and read logs
330330+331331+This POC gives you a solid foundation similar to the GitHub Actions Runner Controller [^4][^5], with the core functionality you need for spawning runners, building containers, and monitoring execution.
332332+333333+334334+_References_:
335335+[^1]: [Jobs | Kubernetes](https://kubernetes.io/docs/concepts/workloads/controllers/job/)
336336+[^2]: [Jenkins: running workers in Kubernetes and Docker images build](https://dev.to/setevoy/jenkins-running-workers-in-kubernetes-and-docker-images-build-1lef)
337337+[^3]: [kubectl logs - Kubernetes](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_logs/)
338338+[^4]: [Kubernetes controller for GitHub Actions self-hosted runners](https://github.com/actions/actions-runner-controller)
339339+[^5]: [GitHub - evryfs/github-actions-runner-operator: K8S operator for...](https://github.com/evryfs/github-actions-runner-operator)
340340+[^6]: [Automating Docker Builds in Kubernetes | by Sebastien Goasguen](https://sebgoa.medium.com/automating-docker-builds-in-kubernetes-15ef047795ab)
341341+342342+> @ 2025-11-03 05:37:24 UTC
343343+>
344344+> Yeah i think i want to support buildah primarily. Not to say we can’t have ways to run alternative images later. Why would i want to choose node affinity over nodeselector?
345345+346346+<details><summary>Thinking</summary>
347347+348348+The user is asking about why they would choose node affinity over nodeSelector, and they want to use Buildah primarily for container builds. Let me provide a comprehensive answer about the differences between nodeSelector and nodeAffinity, and then update the code examples to use Buildah instead of Docker-in-Docker.
349349+350350+Based on the research provided:
351351+1. NodeSelector is simpler but less flexible
352352+2. NodeAffinity provides more advanced scheduling options
353353+3. Buildah can run rootless, which is better for security
354354+355355+Let me structure a response that addresses both points clearly.
356356+357357+</details>
358358+359359+Great choice on **Buildah**! It's much more secure since it can run rootless without requiring privileged containers. Let me explain the nodeAffinity vs nodeSelector decision and provide updated code for Buildah.
360360+361361+## **NodeSelector vs NodeAffinity: When to Choose Which**
362362+363363+### **NodeSelector**
364364+- **Simple equality-based selection** [^4][^2]
365365+- Only supports "AND" logic between labels
366366+- More concise for basic requirements
367367+- Example: `kubernetes.io/arch: amd64`
368368+369369+### **NodeAffinity**
370370+NodeAffinity offers **significantly more flexibility** that you'll likely want for a runner controller [^2][^1]:
371371+372372+1. **Complex expressions**: Use operators like `In`, `NotIn`, `Exists`, `DoesNotExist`, `Gt`, `Lt`
373373+2. **Soft preferences**: Use `preferredDuringSchedulingIgnoredDuringExecution` for optional requirements
374374+3. **Multiple criteria**: Combine multiple conditions with OR logic
375375+4. **Weighted preferences**: Assign weights to preferred nodes
376376+377377+For your runner controller, **nodeAffinity is better** because:
378378+- You can handle fallback scenarios (prefer arm64 but accept amd64)
379379+- You can exclude certain nodes (avoid nodes with specific taints)
380380+- You can combine architecture with other requirements (GPU nodes, specific OS versions)
381381+382382+## **Buildah-Based Runner Implementation**
383383+384384+Here's the updated implementation using Buildah:
385385+386386+```go
387387+package operator
388388+389389+import (
390390+ "context"
391391+ "fmt"
392392+ batchv1 "k8s.io/api/batch/v1"
393393+ corev1 "k8s.io/api/core/v1"
394394+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
395395+ "k8s.io/client-go/kubernetes"
396396+)
397397+398398+type BuildahRunnerConfig struct {
399399+ Registry string
400400+ RegistryAuth string // base64 encoded auth
401401+ StorageDriver string // overlay, vfs
402402+ Architecture string // amd64, arm64
403403+}
404404+405405+func (rc *RunnerController) CreateBuildahRunnerJob(
406406+ workflowID, stepName string,
407407+ buildContext string,
408408+ config BuildahRunnerConfig,
409409+) (*batchv1.Job, error) {
410410+411411+ job := &batchv1.Job{
412412+ ObjectMeta: metav1.ObjectMeta{
413413+ Name: fmt.Sprintf("runner-%s-%s", workflowID, stepName),
414414+ Namespace: rc.namespace,
415415+ Labels: map[string]string{
416416+ "workflow-id": workflowID,
417417+ "step-name": stepName,
418418+ "builder": "buildah",
419419+ },
420420+ },
421421+ Spec: batchv1.JobSpec{
422422+ BackoffLimit: &[]int32{2}[0],
423423+ Template: corev1.PodTemplateSpec{
424424+ ObjectMeta: metav1.ObjectMeta{
425425+ Labels: map[string]string{
426426+ "workflow-id": workflowID,
427427+ "step-name": stepName,
428428+ },
429429+ Annotations: map[string]string{
430430+ "container.apparmor.security.beta.kubernetes.io/buildah": "unconfined",
431431+ },
432432+ },
433433+ Spec: corev1.PodSpec{
434434+ RestartPolicy: corev1.RestartPolicyNever,
435435+436436+ // Using NodeAffinity for flexible architecture targeting
437437+ Affinity: &corev1.Affinity{
438438+ NodeAffinity: &corev1.NodeAffinity{
439439+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
440440+ NodeSelectorTerms: []corev1.NodeSelectorTerm{
441441+ {
442442+ MatchExpressions: []corev1.NodeSelectorRequirement{
443443+ {
444444+ Key: "kubernetes.io/arch",
445445+ Operator: corev1.NodeSelectorOpIn,
446446+ Values: []string{config.Architecture},
447447+ },
448448+ },
449449+ },
450450+ },
451451+ },
452452+ // Optional: Prefer nodes with more resources
453453+ PreferredDuringSchedulingIgnoredDuringExecution: []corev1.PreferredSchedulingTerm{
454454+ {
455455+ Weight: 100,
456456+ Preference: corev1.NodeSelectorTerm{
457457+ MatchExpressions: []corev1.NodeSelectorRequirement{
458458+ {
459459+ Key: "node-role.kubernetes.io/runner",
460460+ Operator: corev1.NodeSelectorOpExists,
461461+ },
462462+ },
463463+ },
464464+ },
465465+ },
466466+ },
467467+ },
468468+469469+ Containers: []corev1.Container{
470470+ {
471471+ Name: "buildah",
472472+ Image: "quay.io/buildah/stable:latest",
473473+ Command: []string{"/bin/bash", "-c"},
474474+ Args: []string{fmt.Sprintf(`
475475+ set -e
476476+ echo "Starting build process..."
477477+478478+ # Clone or prepare build context
479479+ %s
480480+481481+ # Build the image
482482+ buildah bud \
483483+ --storage-driver=%s \
484484+ --format=docker \
485485+ --tls-verify=false \
486486+ -t %s/%s:%s \
487487+ -f Dockerfile .
488488+489489+ # Push to registry (if configured)
490490+ if [ -n "$REGISTRY_AUTH" ]; then
491491+ echo "$REGISTRY_AUTH" | base64 -d > /tmp/auth.json
492492+ buildah push \
493493+ --authfile /tmp/auth.json \
494494+ --tls-verify=false \
495495+ %s/%s:%s
496496+ rm /tmp/auth.json
497497+ fi
498498+499499+ echo "Build completed successfully"
500500+ `,
501501+ buildContext,
502502+ config.StorageDriver,
503503+ config.Registry, workflowID, stepName,
504504+ config.Registry, workflowID, stepName,
505505+ )},
506506+507507+ SecurityContext: &corev1.SecurityContext{
508508+ // Buildah can run rootless!
509509+ RunAsUser: &[]int64{1000}[0],
510510+ RunAsGroup: &[]int64{1000}[0],
511511+ Capabilities: &corev1.Capabilities{
512512+ Add: []corev1.Capability{
513513+ "SETUID",
514514+ "SETGID",
515515+ },
516516+ },
517517+ },
518518+519519+ Env: []corev1.EnvVar{
520520+ {
521521+ Name: "STORAGE_DRIVER",
522522+ Value: config.StorageDriver,
523523+ },
524524+ {
525525+ Name: "REGISTRY_AUTH",
526526+ Value: config.RegistryAuth,
527527+ },
528528+ {
529529+ Name: "BUILDAH_ISOLATION",
530530+ Value: "chroot",
531531+ },
532532+ },
533533+534534+ Resources: corev1.ResourceRequirements{
535535+ Requests: corev1.ResourceList{
536536+ corev1.ResourceCPU: resource.MustParse("500m"),
537537+ corev1.ResourceMemory: resource.MustParse("1Gi"),
538538+ },
539539+ Limits: corev1.ResourceList{
540540+ corev1.ResourceCPU: resource.MustParse("2"),
541541+ corev1.ResourceMemory: resource.MustParse("4Gi"),
542542+ },
543543+ },
544544+545545+ VolumeMounts: []corev1.VolumeMount{
546546+ {
547547+ Name: "buildah-storage",
548548+ MountPath: "/var/lib/containers",
549549+ },
550550+ },
551551+ },
552552+ },
553553+554554+ Volumes: []corev1.Volume{
555555+ {
556556+ Name: "buildah-storage",
557557+ VolumeSource: corev1.VolumeSource{
558558+ EmptyDir: &corev1.EmptyDirVolumeSource{
559559+ Medium: corev1.StorageMediumMemory, // Use RAM for speed
560560+ },
561561+ },
562562+ },
563563+ },
564564+ },
565565+ },
566566+ },
567567+ }
568568+569569+ return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), job, metav1.CreateOptions{})
570570+}
571571+```
572572+573573+## **Supporting Multiple Build Tools with Strategy Pattern**
574574+575575+Since you mentioned supporting alternative images later, here's an extensible approach:
576576+577577+```go
578578+type BuilderStrategy interface {
579579+ CreateJob(workflowID, stepName string, config BuildConfig) *batchv1.Job
580580+ GetImage() string
581581+ GetSecurityContext() *corev1.SecurityContext
582582+}
583583+584584+type BuildahStrategy struct{}
585585+type KanikoStrategy struct{}
586586+type DockerStrategy struct{}
587587+588588+func (b *BuildahStrategy) GetSecurityContext() *corev1.SecurityContext {
589589+ return &corev1.SecurityContext{
590590+ RunAsUser: &[]int64{1000}[0],
591591+ RunAsGroup: &[]int64{1000}[0],
592592+ Capabilities: &corev1.Capabilities{
593593+ Add: []corev1.Capability{"SETUID", "SETGID"},
594594+ },
595595+ }
596596+}
597597+598598+func (k *KanikoStrategy) GetSecurityContext() *corev1.SecurityContext {
599599+ return &corev1.SecurityContext{
600600+ RunAsUser: &[]int64{0}[0], // Kaniko needs root but doesn't need privileged
601601+ }
602602+}
603603+604604+// Factory pattern for builder selection
605605+func GetBuilderStrategy(builderType string) BuilderStrategy {
606606+ switch builderType {
607607+ case "kaniko":
608608+ return &KanikoStrategy{}
609609+ case "docker":
610610+ return &DockerStrategy{}
611611+ default:
612612+ return &BuildahStrategy{} // Default to Buildah
613613+ }
614614+}
615615+```
616616+617617+## **Advanced NodeAffinity Example**
618618+619619+Here's a more sophisticated nodeAffinity setup that showcases why it's superior for complex requirements:
620620+621621+```go
622622+Affinity: &corev1.Affinity{
623623+ NodeAffinity: &corev1.NodeAffinity{
624624+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
625625+ NodeSelectorTerms: []corev1.NodeSelectorTerm{
626626+ {
627627+ MatchExpressions: []corev1.NodeSelectorRequirement{
628628+ {
629629+ Key: "kubernetes.io/arch",
630630+ Operator: corev1.NodeSelectorOpIn,
631631+ Values: []string{"amd64", "arm64"}, // Either architecture
632632+ },
633633+ {
634634+ Key: "node.kubernetes.io/instance-type",
635635+ Operator: corev1.NodeSelectorOpNotIn,
636636+ Values: []string{"t2.micro", "t2.small"}, // Exclude small instances
637637+ },
638638+ },
639639+ },
640640+ },
641641+ },
642642+ PreferredDuringSchedulingIgnoredDuringExecution: []corev1.PreferredSchedulingTerm{
643643+ {
644644+ Weight: 100,
645645+ Preference: corev1.NodeSelectorTerm{
646646+ MatchExpressions: []corev1.NodeSelectorRequirement{
647647+ {
648648+ Key: "kubernetes.io/arch",
649649+ Operator: corev1.NodeSelectorOpIn,
650650+ Values: []string{config.PreferredArch}, // Prefer specific arch
651651+ },
652652+ },
653653+ },
654654+ },
655655+ {
656656+ Weight: 50,
657657+ Preference: corev1.NodeSelectorTerm{
658658+ MatchExpressions: []corev1.NodeSelectorRequirement{
659659+ {
660660+ Key: "workload-type",
661661+ Operator: corev1.NodeSelectorOpIn,
662662+ Values: []string{"build-optimized"},
663663+ },
664664+ },
665665+ },
666666+ },
667667+ },
668668+ },
669669+}
670670+```
671671+672672+**Buildah** is an excellent choice for your runner controller because it:
673673+- Runs without root privileges (better security) [^5][^3]
674674+- Doesn't require a daemon
675675+- Has lower resource overhead
676676+- Supports multiple storage drivers
677677+- Works well in Kubernetes environments
678678+679679+The combination of Buildah with nodeAffinity gives you a secure, flexible runner system that can intelligently place workloads across your cluster.
680680+681681+682682+_References_:
683683+[^1]: [Mastering Node Placement in Kubernetes: Node Selectors vs. Node Affinity](https://medium.com/@rahulbansod519/mastering-node-placement-in-kubernetes-node-selectors-vs-node-affinity-f6a24cda0ddd)
684684+[^2]: [Node Selector vs Node Affinity in Kubernetes - Medium](https://medium.com/@reach2shristi.81/node-selector-vs-node-affinity-in-kubernetes-3292e9552aa0)
685685+[^3]: [Buildah and Podman Relationship | buildah.io](https://buildah.io/blogs/2018/10/31/podman-buildah-relationship.html)
686686+[^4]: [Kubernetes: Node Selector and Node Affinity | by Claire Lee | Medium](https://yuminlee2.medium.com/kubernetes-node-selector-and-node-affinity-ecb3a4d69165)
687687+[^5]: [Building a Buildah container image for Kubernetes](https://buildah.io/blogs/2018/03/01/building-buildah-container-image-for-kubernetes.html)
688688+689689+> @ 2025-11-03 05:43:40 UTC
690690+>
691691+> How does GitHub handle isolation? I believe each step in a job is the same runner, but different jobs are isolated? This code appears to make a single step into a job? Is there shared context between steps? Ie environment variables, cached files etc?
692692+693693+<details><summary>Thinking</summary>
694694+695695+The user is asking about GitHub's isolation model and noticing that my code creates a separate Kubernetes Job for each step, which is different from how GitHub Actions works. They're right - GitHub Actions runs all steps in a job on the same runner with shared context, while different jobs are isolated. Let me explain this architecture and provide updated code that better mimics GitHub's model.
696696+697697+</details>
698698+699699+Excellent observation! You're absolutely right - **GitHub Actions runs all steps within a job on the same runner**, sharing context between steps. Your current code creates separate Kubernetes Jobs for each step, which breaks this model. Let me explain GitHub's architecture and provide a better implementation.
700700+701701+## **GitHub Actions Isolation Model**
702702+703703+### **How GitHub Actually Works:**
704704+- **Job Level**: Each job gets its own runner (isolated environment) [^1]
705705+- **Step Level**: All steps within a job execute sequentially on the same runner [^1]
706706+- **Shared Context**: Steps share:
707707+ - Filesystem/workspace
708708+ - Environment variables
709709+ - Docker volumes [^4]
710710+ - Network context
711711+ - Installed tools/dependencies
712712+713713+### **Isolation Boundaries:**
714714+- **Between Jobs**: Complete isolation (different runners/pods)
715715+- **Between Steps**: No isolation (same container/environment)
716716+- **Between Workflows**: Complete isolation
717717+718718+## **Corrected Implementation: Job = Pod, Steps = Sequential Commands**
719719+720720+Here's a redesigned approach that properly mimics GitHub's model:
721721+722722+```go
723723+package operator
724724+725725+import (
726726+ "context"
727727+ "encoding/json"
728728+ "fmt"
729729+ "strings"
730730+ batchv1 "k8s.io/api/batch/v1"
731731+ corev1 "k8s.io/api/core/v1"
732732+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
733733+)
734734+735735+type WorkflowJob struct {
736736+ ID string
737737+ Name string
738738+ Steps []Step
739739+ Architecture string
740740+ Environment map[string]string
741741+}
742742+743743+type Step struct {
744744+ Name string
745745+ Command string
746746+ Env map[string]string
747747+ Uses string // For action references
748748+}
749749+750750+func (rc *RunnerController) CreateJobRunner(job WorkflowJob) (*batchv1.Job, error) {
751751+ // Build a single script that executes all steps sequentially
752752+ runnerScript := rc.buildRunnerScript(job.Steps, job.Environment)
753753+754754+ k8sJob := &batchv1.Job{
755755+ ObjectMeta: metav1.ObjectMeta{
756756+ Name: fmt.Sprintf("runner-job-%s", job.ID),
757757+ Namespace: rc.namespace,
758758+ Labels: map[string]string{
759759+ "workflow-id": job.ID,
760760+ "job-name": job.Name,
761761+ "type": "github-runner",
762762+ },
763763+ },
764764+ Spec: batchv1.JobSpec{
765765+ BackoffLimit: &[]int32{0}[0], // No retries for workflow jobs
766766+ Template: corev1.PodTemplateSpec{
767767+ Spec: corev1.PodSpec{
768768+ RestartPolicy: corev1.RestartPolicyNever,
769769+770770+ // Single init container to set up workspace
771771+ InitContainers: []corev1.Container{
772772+ {
773773+ Name: "setup-workspace",
774774+ Image: "alpine/git:latest",
775775+ Command: []string{"/bin/sh", "-c"},
776776+ Args: []string{`
777777+ # Clone repository or set up workspace
778778+ git clone ${REPO_URL} /workspace || true
779779+ cd /workspace
780780+ git checkout ${GITHUB_SHA:-main}
781781+ `},
782782+ VolumeMounts: []corev1.VolumeMount{
783783+ {
784784+ Name: "workspace",
785785+ MountPath: "/workspace",
786786+ },
787787+ },
788788+ },
789789+ },
790790+791791+ // Main runner container executes ALL steps
792792+ Containers: []corev1.Container{
793793+ {
794794+ Name: "runner",
795795+ Image: "quay.io/buildah/stable:latest",
796796+ Command: []string{"/bin/bash", "-c"},
797797+ Args: []string{runnerScript},
798798+799799+ WorkingDir: "/workspace",
800800+801801+ Env: rc.buildEnvironment(job),
802802+803803+ VolumeMounts: []corev1.VolumeMount{
804804+ {
805805+ Name: "workspace",
806806+ MountPath: "/workspace",
807807+ },
808808+ {
809809+ Name: "step-outputs",
810810+ MountPath: "/tmp/step-outputs",
811811+ },
812812+ {
813813+ Name: "github-env",
814814+ MountPath: "/tmp/github",
815815+ },
816816+ {
817817+ Name: "buildah-storage",
818818+ MountPath: "/var/lib/containers",
819819+ },
820820+ },
821821+822822+ SecurityContext: &corev1.SecurityContext{
823823+ RunAsUser: &[]int64{1000}[0],
824824+ RunAsGroup: &[]int64{1000}[0],
825825+ },
826826+ },
827827+ },
828828+829829+ Volumes: []corev1.Volume{
830830+ {
831831+ Name: "workspace",
832832+ VolumeSource: corev1.VolumeSource{
833833+ EmptyDir: &corev1.EmptyDirVolumeSource{},
834834+ },
835835+ },
836836+ {
837837+ Name: "step-outputs",
838838+ VolumeSource: corev1.VolumeSource{
839839+ EmptyDir: &corev1.EmptyDirVolumeSource{},
840840+ },
841841+ },
842842+ {
843843+ Name: "github-env",
844844+ VolumeSource: corev1.VolumeSource{
845845+ EmptyDir: &corev1.EmptyDirVolumeSource{},
846846+ },
847847+ },
848848+ {
849849+ Name: "buildah-storage",
850850+ VolumeSource: corev1.VolumeSource{
851851+ EmptyDir: &corev1.EmptyDirVolumeSource{
852852+ Medium: corev1.StorageMediumMemory,
853853+ },
854854+ },
855855+ },
856856+ },
857857+858858+ NodeSelector: map[string]string{
859859+ "kubernetes.io/arch": job.Architecture,
860860+ },
861861+ },
862862+ },
863863+ },
864864+ }
865865+866866+ return rc.clientset.BatchV1().Jobs(rc.namespace).Create(context.TODO(), k8sJob, metav1.CreateOptions{})
867867+}
868868+```
869869+870870+## **Step Execution Script Builder**
871871+872872+This critical function creates a bash script that maintains context between steps:
873873+874874+```go
875875+func (rc *RunnerController) buildRunnerScript(steps []Step, jobEnv map[string]string) string {
876876+ var script strings.Builder
877877+878878+ script.WriteString(`#!/bin/bash
879879+set -e
880880+881881+# GitHub Actions compatible environment setup
882882+export GITHUB_ENV=/tmp/github/env
883883+export GITHUB_PATH=/tmp/github/path
884884+export GITHUB_OUTPUT=/tmp/github/output
885885+export GITHUB_STEP_SUMMARY=/tmp/github/step-summary.md
886886+887887+# Initialize files
888888+touch $GITHUB_ENV $GITHUB_PATH $GITHUB_OUTPUT $GITHUB_STEP_SUMMARY
889889+890890+# Helper functions matching GitHub Actions
891891+add_to_env() {
892892+ echo "$1=$2" >> $GITHUB_ENV
893893+ export $1="$2"
894894+}
895895+896896+add_to_path() {
897897+ echo "$1" >> $GITHUB_PATH
898898+ export PATH="$1:$PATH"
899899+}
900900+901901+set_output() {
902902+ echo "$1=$2" >> $GITHUB_OUTPUT
903903+}
904904+905905+# Function to load environment between steps
906906+load_github_env() {
907907+ if [ -f "$GITHUB_ENV" ]; then
908908+ set -a
909909+ source $GITHUB_ENV
910910+ set +a
911911+ fi
912912+ if [ -f "$GITHUB_PATH" ]; then
913913+ while IFS= read -r line; do
914914+ export PATH="$line:$PATH"
915915+ done < $GITHUB_PATH
916916+ fi
917917+}
918918+919919+`)
920920+921921+ // Add each step
922922+ for i, step := range steps {
923923+ script.WriteString(fmt.Sprintf(`
924924+echo "==== Starting Step: %s ===="
925925+load_github_env
926926+927927+# Step %d environment variables
928928+`, step.Name, i+1))
929929+930930+ // Add step-specific environment variables
931931+ for k, v := range step.Env {
932932+ script.WriteString(fmt.Sprintf("export %s='%s'\n", k, v))
933933+ }
934934+935935+ // Execute step command
936936+ script.WriteString(fmt.Sprintf(`
937937+# Execute step
938938+(
939939+ %s
940940+) || {
941941+ echo "Step '%s' failed with exit code $?"
942942+ exit 1
943943+}
944944+945945+echo "==== Completed Step: %s ===="
946946+`, step.Command, step.Name, step.Name))
947947+ }
948948+949949+ return script.String()
950950+}
951951+```
952952+953953+## **Shared Context Implementation**
954954+955955+Here's how to properly share context between steps like GitHub Actions does [^2][^3]:
956956+957957+```go
958958+type StepOutput struct {
959959+ StepName string
960960+ Outputs map[string]string
961961+}
962962+963963+// Enhanced step structure with outputs
964964+type EnhancedStep struct {
965965+ Step
966966+ ID string // For referencing outputs
967967+ Outputs []string // Expected output keys
968968+}
969969+970970+func (rc *RunnerController) buildEnhancedRunnerScript(steps []EnhancedStep) string {
971971+ var script strings.Builder
972972+973973+ // ... (previous setup code) ...
974974+975975+ for i, step := range steps {
976976+ script.WriteString(fmt.Sprintf(`
977977+# Step: %s (id: %s)
978978+STEP_OUTPUT_FILE="/tmp/step-outputs/%s.json"
979979+980980+# Run step and capture outputs
981981+(
982982+ %s
983983+984984+ # Save step outputs if any were set
985985+ if [ -f "$GITHUB_OUTPUT" ]; then
986986+ # Parse and save outputs for this step
987987+ python3 -c "
988988+import json
989989+outputs = {}
990990+with open('$GITHUB_OUTPUT', 'r') as f:
991991+ for line in f:
992992+ if '=' in line:
993993+ key, value = line.strip().split('=', 1)
994994+ outputs[key] = value
995995+with open('$STEP_OUTPUT_FILE', 'w') as f:
996996+ json.dump(outputs, f)
997997+"
998998+ fi
999999+)
10001000+10011001+# Make outputs available as environment variables for next steps
10021002+if [ -f "$STEP_OUTPUT_FILE" ]; then
10031003+ export STEP_%s_OUTPUTS=$(cat $STEP_OUTPUT_FILE)
10041004+fi
10051005+`, step.Name, step.ID, step.ID, step.Command, strings.ToUpper(step.ID)))
10061006+ }
10071007+10081008+ return script.String()
10091009+}
10101010+```
10111011+10121012+## **Multi-Container Pattern for Complex Workflows**
10131013+10141014+For workflows that need services (like databases), use a multi-container pod:
10151015+10161016+```go
10171017+// For jobs that need service containers (like GitHub Actions services)
10181018+func (rc *RunnerController) CreateJobWithServices(job WorkflowJob, services []ServiceContainer) (*batchv1.Job, error) {
10191019+ containers := []corev1.Container{
10201020+ // Main runner container
10211021+ {
10221022+ Name: "runner",
10231023+ // ... (as before)
10241024+ },
10251025+ }
10261026+10271027+ // Add service containers that run alongside the main runner
10281028+ for _, svc := range services {
10291029+ containers = append(containers, corev1.Container{
10301030+ Name: svc.Name,
10311031+ Image: svc.Image,
10321032+ Env: convertToEnvVar(svc.Environment),
10331033+ Ports: svc.Ports,
10341034+ // Services run for the lifetime of the job
10351035+ Command: svc.Command,
10361036+ })
10371037+ }
10381038+10391039+ // Update pod spec with all containers
10401040+ // ...
10411041+}
10421042+```
10431043+10441044+## **Key Differences from Original Implementation**
10451045+10461046+| Aspect | GitHub Actions Model | Your Original Code | Corrected Approach |
10471047+|--------|---------------------|-------------------|-------------------|
10481048+| **Job Mapping** | Job = Runner Instance | Step = K8s Job | Job = K8s Job/Pod |
10491049+| **Step Execution** | Sequential in same environment | Separate pods | Sequential in same container |
10501050+| **Filesystem** | Shared across steps | Isolated per step | Shared via volumes |
10511051+| **Environment Variables** | Persistent across steps | Isolated | Persistent via GITHUB_ENV |
10521052+| **Docker/Buildah Cache** | Shared in job | Lost between steps | Shared via volume |
10531053+10541054+This corrected approach properly mimics GitHub Actions' execution model while maintaining the isolation boundaries that make sense in Kubernetes.
10551055+10561056+10571057+_References_:
10581058+[^1]: [Understanding GitHub Actions](https://docs.github.com/articles/getting-started-with-github-actions)
10591059+[^2]: [Store information in variables - GitHub Docs](https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-variables)
10601060+[^3]: [Contexts - GitHub Docs](https://docs.github.com/en/actions/concepts/workflows-and-actions/contexts)
10611061+[^4]: [Workflow syntax for GitHub Actions](https://docs.github.com/actions/using-workflows/workflow-syntax-for-github-actions)
10621062+10631063+10641064+---
10651065+10661066+_Generated by [Kagi Assistant](https://kagi.com/assistant)_
+362
Makefile
···11+# VERSION defines the project version for the bundle.
22+# Update this value when you upgrade the version of your project.
33+# To re-generate a bundle for another specific version without changing the standard setup, you can:
44+# - use the VERSION as arg of the bundle target (e.g make bundle VERSION=0.0.2)
55+# - use environment variables to overwrite this value (e.g export VERSION=0.0.2)
66+VERSION ?= 0.0.1
77+88+# CHANNELS define the bundle channels used in the bundle.
99+# Add a new line here if you would like to change its default config. (E.g CHANNELS = "candidate,fast,stable")
1010+# To re-generate a bundle for other specific channels without changing the standard setup, you can:
1111+# - use the CHANNELS as arg of the bundle target (e.g make bundle CHANNELS=candidate,fast,stable)
1212+# - use environment variables to overwrite this value (e.g export CHANNELS="candidate,fast,stable")
1313+ifneq ($(origin CHANNELS), undefined)
1414+BUNDLE_CHANNELS := --channels=$(CHANNELS)
1515+endif
1616+1717+# DEFAULT_CHANNEL defines the default channel used in the bundle.
1818+# Add a new line here if you would like to change its default config. (E.g DEFAULT_CHANNEL = "stable")
1919+# To re-generate a bundle for any other default channel without changing the default setup, you can:
2020+# - use the DEFAULT_CHANNEL as arg of the bundle target (e.g make bundle DEFAULT_CHANNEL=stable)
2121+# - use environment variables to overwrite this value (e.g export DEFAULT_CHANNEL="stable")
2222+ifneq ($(origin DEFAULT_CHANNEL), undefined)
2323+BUNDLE_DEFAULT_CHANNEL := --default-channel=$(DEFAULT_CHANNEL)
2424+endif
2525+BUNDLE_METADATA_OPTS ?= $(BUNDLE_CHANNELS) $(BUNDLE_DEFAULT_CHANNEL)
2626+2727+# IMAGE_TAG_BASE defines the docker.io namespace and part of the image name for remote images.
2828+# This variable is used to construct full image tags for bundle and catalog images.
2929+#
3030+# For example, running 'make bundle-build bundle-push catalog-build catalog-push' will build and push both
3131+# j5t.io/loom-bundle:$VERSION and j5t.io/loom-catalog:$VERSION.
3232+IMAGE_TAG_BASE ?= j5t.io/loom
3333+3434+# BUNDLE_IMG defines the image:tag used for the bundle.
3535+# You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
3636+BUNDLE_IMG ?= $(IMAGE_TAG_BASE)-bundle:v$(VERSION)
3737+3838+# BUNDLE_GEN_FLAGS are the flags passed to the operator-sdk generate bundle command
3939+BUNDLE_GEN_FLAGS ?= -q --overwrite --version $(VERSION) $(BUNDLE_METADATA_OPTS)
4040+4141+# USE_IMAGE_DIGESTS defines if images are resolved via tags or digests
4242+# You can enable this value if you would like to use SHA Based Digests
4343+# To enable set flag to true
4444+USE_IMAGE_DIGESTS ?= false
4545+ifeq ($(USE_IMAGE_DIGESTS), true)
4646+ BUNDLE_GEN_FLAGS += --use-image-digests
4747+endif
4848+4949+# Set the Operator SDK version to use. By default, what is installed on the system is used.
5050+# This is useful for CI or a project to utilize a specific version of the operator-sdk toolkit.
5151+OPERATOR_SDK_VERSION ?= v1.41.1
5252+# Image URL to use all building/pushing image targets
5353+IMG ?= controller:latest
5454+5555+# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
5656+ifeq (,$(shell go env GOBIN))
5757+GOBIN=$(shell go env GOPATH)/bin
5858+else
5959+GOBIN=$(shell go env GOBIN)
6060+endif
6161+6262+# CONTAINER_TOOL defines the container tool to be used for building images.
6363+# Be aware that the target commands are only tested with Docker which is
6464+# scaffolded by default. However, you might want to replace it to use other
6565+# tools. (i.e. podman)
6666+CONTAINER_TOOL ?= docker
6767+6868+# Setting SHELL to bash allows bash commands to be executed by recipes.
6969+# Options are set to exit when a recipe line exits non-zero or a piped command fails.
7070+SHELL = /usr/bin/env bash -o pipefail
7171+.SHELLFLAGS = -ec
7272+7373+.PHONY: all
7474+all: build
7575+7676+##@ General
7777+7878+# The help target prints out all targets with their descriptions organized
7979+# beneath their categories. The categories are represented by '##@' and the
8080+# target descriptions by '##'. The awk command is responsible for reading the
8181+# entire set of makefiles included in this invocation, looking for lines of the
8282+# file as xyz: ## something, and then pretty-format the target and help. Then,
8383+# if there's a line with ##@ something, that gets pretty-printed as a category.
8484+# More info on the usage of ANSI control characters for terminal formatting:
8585+# https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_parameters
8686+# More info on the awk command:
8787+# http://linuxcommand.org/lc3_adv_awk.php
8888+8989+.PHONY: help
9090+help: ## Display this help.
9191+ @awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
9292+9393+##@ Development
9494+9595+.PHONY: manifests
9696+manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
9797+ $(CONTROLLER_GEN) rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
9898+9999+.PHONY: generate
100100+generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
101101+ $(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./..."
102102+103103+.PHONY: fmt
104104+fmt: ## Run go fmt against code.
105105+ go fmt ./...
106106+107107+.PHONY: vet
108108+vet: ## Run go vet against code.
109109+ go vet ./...
110110+111111+.PHONY: test
112112+test: manifests generate fmt vet setup-envtest ## Run tests.
113113+ KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test $$(go list ./... | grep -v /e2e) -coverprofile cover.out
114114+115115+# TODO(user): To use a different vendor for e2e tests, modify the setup under 'tests/e2e'.
116116+# The default setup assumes Kind is pre-installed and builds/loads the Manager Docker image locally.
117117+# CertManager is installed by default; skip with:
118118+# - CERT_MANAGER_INSTALL_SKIP=true
119119+KIND_CLUSTER ?= loom-test-e2e
120120+121121+.PHONY: setup-test-e2e
122122+setup-test-e2e: ## Set up a Kind cluster for e2e tests if it does not exist
123123+ @command -v $(KIND) >/dev/null 2>&1 || { \
124124+ echo "Kind is not installed. Please install Kind manually."; \
125125+ exit 1; \
126126+ }
127127+ @case "$$($(KIND) get clusters)" in \
128128+ *"$(KIND_CLUSTER)"*) \
129129+ echo "Kind cluster '$(KIND_CLUSTER)' already exists. Skipping creation." ;; \
130130+ *) \
131131+ echo "Creating Kind cluster '$(KIND_CLUSTER)'..."; \
132132+ $(KIND) create cluster --name $(KIND_CLUSTER) ;; \
133133+ esac
134134+135135+.PHONY: test-e2e
136136+test-e2e: setup-test-e2e manifests generate fmt vet ## Run the e2e tests. Expected an isolated environment using Kind.
137137+ KIND_CLUSTER=$(KIND_CLUSTER) go test ./test/e2e/ -v -ginkgo.v
138138+ $(MAKE) cleanup-test-e2e
139139+140140+.PHONY: cleanup-test-e2e
141141+cleanup-test-e2e: ## Tear down the Kind cluster used for e2e tests
142142+ @$(KIND) delete cluster --name $(KIND_CLUSTER)
143143+144144+.PHONY: lint
145145+lint: golangci-lint ## Run golangci-lint linter
146146+ $(GOLANGCI_LINT) run
147147+148148+.PHONY: lint-fix
149149+lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes
150150+ $(GOLANGCI_LINT) run --fix
151151+152152+.PHONY: lint-config
153153+lint-config: golangci-lint ## Verify golangci-lint linter configuration
154154+ $(GOLANGCI_LINT) config verify
155155+156156+##@ Build
157157+158158+.PHONY: build
159159+build: manifests generate fmt vet ## Build manager binary.
160160+ go build -o bin/manager cmd/main.go
161161+162162+.PHONY: run
163163+run: manifests generate fmt vet ## Run a controller from your host.
164164+ go run ./cmd/main.go
165165+166166+# If you wish to build the manager image targeting other platforms you can use the --platform flag.
167167+# (i.e. docker build --platform linux/arm64). However, you must enable docker buildKit for it.
168168+# More info: https://docs.docker.com/develop/develop-images/build_enhancements/
169169+.PHONY: docker-build
170170+docker-build: ## Build docker image with the manager.
171171+ cd .. && $(CONTAINER_TOOL) build -f loom/Dockerfile -t ${IMG} .
172172+173173+.PHONY: docker-push
174174+docker-push: ## Push docker image with the manager.
175175+ $(CONTAINER_TOOL) push ${IMG}
176176+177177+# PLATFORMS defines the target platforms for the manager image be built to provide support to multiple
178178+# architectures. (i.e. make docker-buildx IMG=myregistry/mypoperator:0.0.1). To use this option you need to:
179179+# - be able to use docker buildx. More info: https://docs.docker.com/build/buildx/
180180+# - have enabled BuildKit. More info: https://docs.docker.com/develop/develop-images/build_enhancements/
181181+# - be able to push the image to your registry (i.e. if you do not set a valid value via IMG=<myregistry/image:<tag>> then the export will fail)
182182+# To adequately provide solutions that are compatible with multiple platforms, you should consider using this option.
183183+PLATFORMS ?= linux/arm64,linux/amd64,linux/s390x,linux/ppc64le
184184+.PHONY: docker-buildx
185185+docker-buildx: ## Build and push docker image for the manager for cross-platform support
186186+ # copy existing Dockerfile and insert --platform=${BUILDPLATFORM} into Dockerfile.cross, and preserve the original Dockerfile
187187+ sed -e '1 s/\(^FROM\)/FROM --platform=\$$\{BUILDPLATFORM\}/; t' -e ' 1,// s//FROM --platform=\$$\{BUILDPLATFORM\}/' Dockerfile > Dockerfile.cross
188188+ - $(CONTAINER_TOOL) buildx create --name loom-builder
189189+ $(CONTAINER_TOOL) buildx use loom-builder
190190+ - cd .. && $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f loom/Dockerfile.cross .
191191+ - $(CONTAINER_TOOL) buildx rm loom-builder
192192+ rm Dockerfile.cross
193193+194194+.PHONY: build-installer
195195+build-installer: manifests generate kustomize ## Generate a consolidated YAML with CRDs and deployment.
196196+ mkdir -p dist
197197+ cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
198198+ $(KUSTOMIZE) build config/default > dist/install.yaml
199199+200200+##@ Deployment
201201+202202+ifndef ignore-not-found
203203+ ignore-not-found = false
204204+endif
205205+206206+.PHONY: install
207207+install: manifests kustomize ## Install CRDs into the K8s cluster specified in ~/.kube/config.
208208+ $(KUSTOMIZE) build config/crd | $(KUBECTL) apply -f -
209209+210210+.PHONY: uninstall
211211+uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
212212+ $(KUSTOMIZE) build config/crd | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -
213213+214214+.PHONY: deploy
215215+deploy: manifests kustomize ## Deploy controller to the K8s cluster specified in ~/.kube/config.
216216+ cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
217217+ $(KUSTOMIZE) build config/default | $(KUBECTL) apply -f -
218218+219219+.PHONY: undeploy
220220+undeploy: kustomize ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
221221+ $(KUSTOMIZE) build config/default | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -
222222+223223+##@ Dependencies
224224+225225+## Location to install dependencies to
226226+LOCALBIN ?= $(shell pwd)/bin
227227+$(LOCALBIN):
228228+ mkdir -p $(LOCALBIN)
229229+230230+## Tool Binaries
231231+KUBECTL ?= kubectl
232232+KIND ?= kind
233233+KUSTOMIZE ?= $(LOCALBIN)/kustomize
234234+CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen
235235+ENVTEST ?= $(LOCALBIN)/setup-envtest
236236+GOLANGCI_LINT = $(LOCALBIN)/golangci-lint
237237+238238+## Tool Versions
239239+KUSTOMIZE_VERSION ?= v5.6.0
240240+CONTROLLER_TOOLS_VERSION ?= v0.18.0
241241+#ENVTEST_VERSION is the version of controller-runtime release branch to fetch the envtest setup script (i.e. release-0.20)
242242+ENVTEST_VERSION ?= $(shell go list -m -f "{{ .Version }}" sigs.k8s.io/controller-runtime | awk -F'[v.]' '{printf "release-%d.%d", $$2, $$3}')
243243+#ENVTEST_K8S_VERSION is the version of Kubernetes to use for setting up ENVTEST binaries (i.e. 1.31)
244244+ENVTEST_K8S_VERSION ?= $(shell go list -m -f "{{ .Version }}" k8s.io/api | awk -F'[v.]' '{printf "1.%d", $$3}')
245245+GOLANGCI_LINT_VERSION ?= v2.1.0
246246+247247+.PHONY: kustomize
248248+kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
249249+$(KUSTOMIZE): $(LOCALBIN)
250250+ $(call go-install-tool,$(KUSTOMIZE),sigs.k8s.io/kustomize/kustomize/v5,$(KUSTOMIZE_VERSION))
251251+252252+.PHONY: controller-gen
253253+controller-gen: $(CONTROLLER_GEN) ## Download controller-gen locally if necessary.
254254+$(CONTROLLER_GEN): $(LOCALBIN)
255255+ $(call go-install-tool,$(CONTROLLER_GEN),sigs.k8s.io/controller-tools/cmd/controller-gen,$(CONTROLLER_TOOLS_VERSION))
256256+257257+.PHONY: setup-envtest
258258+setup-envtest: envtest ## Download the binaries required for ENVTEST in the local bin directory.
259259+ @echo "Setting up envtest binaries for Kubernetes version $(ENVTEST_K8S_VERSION)..."
260260+ @$(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path || { \
261261+ echo "Error: Failed to set up envtest binaries for version $(ENVTEST_K8S_VERSION)."; \
262262+ exit 1; \
263263+ }
264264+265265+.PHONY: envtest
266266+envtest: $(ENVTEST) ## Download setup-envtest locally if necessary.
267267+$(ENVTEST): $(LOCALBIN)
268268+ $(call go-install-tool,$(ENVTEST),sigs.k8s.io/controller-runtime/tools/setup-envtest,$(ENVTEST_VERSION))
269269+270270+.PHONY: golangci-lint
271271+golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
272272+$(GOLANGCI_LINT): $(LOCALBIN)
273273+ $(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/v2/cmd/golangci-lint,$(GOLANGCI_LINT_VERSION))
274274+275275+# go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist
276276+# $1 - target path with name of binary
277277+# $2 - package url which can be installed
278278+# $3 - specific version of package
279279+define go-install-tool
280280+@[ -f "$(1)-$(3)" ] || { \
281281+set -e; \
282282+package=$(2)@$(3) ;\
283283+echo "Downloading $${package}" ;\
284284+rm -f $(1) || true ;\
285285+GOBIN=$(LOCALBIN) go install $${package} ;\
286286+mv $(1) $(1)-$(3) ;\
287287+} ;\
288288+ln -sf $(1)-$(3) $(1)
289289+endef
290290+291291+.PHONY: operator-sdk
292292+OPERATOR_SDK ?= $(LOCALBIN)/operator-sdk
293293+operator-sdk: ## Download operator-sdk locally if necessary.
294294+ifeq (,$(wildcard $(OPERATOR_SDK)))
295295+ifeq (, $(shell which operator-sdk 2>/dev/null))
296296+ @{ \
297297+ set -e ;\
298298+ mkdir -p $(dir $(OPERATOR_SDK)) ;\
299299+ OS=$(shell go env GOOS) && ARCH=$(shell go env GOARCH) && \
300300+ curl -sSLo $(OPERATOR_SDK) https://github.com/operator-framework/operator-sdk/releases/download/$(OPERATOR_SDK_VERSION)/operator-sdk_$${OS}_$${ARCH} ;\
301301+ chmod +x $(OPERATOR_SDK) ;\
302302+ }
303303+else
304304+OPERATOR_SDK = $(shell which operator-sdk)
305305+endif
306306+endif
307307+308308+.PHONY: bundle
309309+bundle: manifests kustomize operator-sdk ## Generate bundle manifests and metadata, then validate generated files.
310310+ $(OPERATOR_SDK) generate kustomize manifests -q
311311+ cd config/manager && $(KUSTOMIZE) edit set image controller=$(IMG)
312312+ $(KUSTOMIZE) build config/manifests | $(OPERATOR_SDK) generate bundle $(BUNDLE_GEN_FLAGS)
313313+ $(OPERATOR_SDK) bundle validate ./bundle
314314+315315+.PHONY: bundle-build
316316+bundle-build: ## Build the bundle image.
317317+ $(CONTAINER_TOOL) build -f bundle.Dockerfile -t $(BUNDLE_IMG) .
318318+319319+.PHONY: bundle-push
320320+bundle-push: ## Push the bundle image.
321321+ $(MAKE) docker-push IMG=$(BUNDLE_IMG)
322322+323323+.PHONY: opm
324324+OPM = $(LOCALBIN)/opm
325325+opm: ## Download opm locally if necessary.
326326+ifeq (,$(wildcard $(OPM)))
327327+ifeq (,$(shell which opm 2>/dev/null))
328328+ @{ \
329329+ set -e ;\
330330+ mkdir -p $(dir $(OPM)) ;\
331331+ OS=$(shell go env GOOS) && ARCH=$(shell go env GOARCH) && \
332332+ curl -sSLo $(OPM) https://github.com/operator-framework/operator-registry/releases/download/v1.55.0/$${OS}-$${ARCH}-opm ;\
333333+ chmod +x $(OPM) ;\
334334+ }
335335+else
336336+OPM = $(shell which opm)
337337+endif
338338+endif
339339+340340+# A comma-separated list of bundle images (e.g. make catalog-build BUNDLE_IMGS=example.com/operator-bundle:v0.1.0,example.com/operator-bundle:v0.2.0).
341341+# These images MUST exist in a registry and be pull-able.
342342+BUNDLE_IMGS ?= $(BUNDLE_IMG)
343343+344344+# The image tag given to the resulting catalog image (e.g. make catalog-build CATALOG_IMG=example.com/operator-catalog:v0.2.0).
345345+CATALOG_IMG ?= $(IMAGE_TAG_BASE)-catalog:v$(VERSION)
346346+347347+# Set CATALOG_BASE_IMG to an existing catalog image tag to add $BUNDLE_IMGS to that image.
348348+ifneq ($(origin CATALOG_BASE_IMG), undefined)
349349+FROM_INDEX_OPT := --from-index $(CATALOG_BASE_IMG)
350350+endif
351351+352352+# Build a catalog image by adding bundle images to an empty catalog using the operator package manager tool, 'opm'.
353353+# This recipe invokes 'opm' in 'semver' bundle add mode. For more information on add modes, see:
354354+# https://github.com/operator-framework/community-operators/blob/7f1438c/docs/packaging-operator.md#updating-your-existing-operator
355355+.PHONY: catalog-build
356356+catalog-build: opm ## Build a catalog image.
357357+ $(OPM) index add --container-tool $(CONTAINER_TOOL) --mode semver --tag $(CATALOG_IMG) --bundles $(BUNDLE_IMGS) $(FROM_INDEX_OPT)
358358+359359+# Push the catalog image.
360360+.PHONY: catalog-push
361361+catalog-push: ## Push a catalog image.
362362+ $(MAKE) docker-push IMG=$(CATALOG_IMG)
+416
PLAN.md
···11+# Loom Kubernetes Operator - Implementation Plan
22+33+## Project Overview
44+55+Loom is a Kubernetes operator for coordinating tangled.org Spindles - ephemeral CI/CD runners that execute pipelines in response to events from tangled.org knots. Inspired by GitHub's Actions Runner Controller (ARC) but adapted for tangled.org's AT Protocol-based, event-driven architecture.
66+77+## Architecture
88+99+### Core Principles
1010+- **Ephemeral Spindles**: One Kubernetes Job per pipeline event (scale-to-zero)
1111+- **Event-Driven**: WebSocket connection to tangled.org knot for pipeline events
1212+- **Code Reuse**: Leverage `tangled.org/core/spindle` for WebSocket, models, interfaces
1313+- **Simple Images**: Use standard Docker images (golang:1.24, node:20, etc.) - no Nixery for MVP
1414+- **Multi-Arch Support**: Schedule jobs on amd64/arm64 nodes based on workflow specification
1515+- **New Component**: Kubernetes-native Engine that spawns Jobs instead of Docker containers
1616+1717+### Key Components
1818+1919+1. **SpindleSet CRD**: Configures connection to tangled.org knot and job templates
2020+2. **SpindleSet Controller**: Maintains WebSocket connection, handles pipeline events
2121+3. **KubernetesEngine**: Implements tangled.org's Engine interface for Kubernetes Jobs
2222+4. **Job Builder**: Generates Job specs with multi-arch node affinity
2323+5. **Log Streamer**: Streams pod logs to knot via Kubernetes API
2424+6. **Status Reporter**: Reports workflow status back to tangled.org
2525+2626+---
2727+2828+## Phase 1: CRD Design & Basic Structure
2929+3030+### SpindleSet CRD
3131+```yaml
3232+apiVersion: loom.j5t.io/v1alpha1
3333+kind: SpindleSet
3434+metadata:
3535+ name: tangled-org-spindle
3636+spec:
3737+ # Knot configuration
3838+ knotUrl: https://tangled.org/@org/repo
3939+ knotAuthSecret: spindle-auth # Secret with auth token
4040+4141+ # Scaling configuration
4242+ maxConcurrentJobs: 10
4343+4444+ # Default template (can be overridden by workflow)
4545+ template:
4646+ resources:
4747+ requests:
4848+ cpu: 500m
4949+ memory: 1Gi
5050+ limits:
5151+ cpu: 2
5252+ memory: 4Gi
5353+5454+ # Node targeting defaults
5555+ nodeSelector: {}
5656+ tolerations: []
5757+ affinity: {}
5858+```
5959+6060+### Status Fields
6161+- `conditions`: Standard Kubernetes conditions
6262+- `pendingJobs`, `runningJobs`: Current job counts
6363+- `completedJobs`, `failedJobs`: Cumulative counters
6464+- `webSocketConnected`: WebSocket connection status
6565+- `lastEventTime`: Last received event timestamp
6666+6767+---
6868+6969+## Phase 2: Kubernetes Engine Implementation
7070+7171+### Workflow File Format
7272+```yaml
7373+# In tangled.org repository's .tangled/pipeline.yaml
7474+image: golang:1.24-bookworm
7575+architecture: amd64 # or arm64
7676+7777+steps:
7878+ - name: run tests
7979+ command: |
8080+ go test -v ./...
8181+8282+ - name: build binary
8383+ command: |
8484+ go build -o app ./cmd
8585+```
8686+8787+### Job Pod Structure
8888+- **Init container**: Clone repository from tangled.org
8989+- **Main container**:
9090+ - Image: `{workflow.image}` (e.g., `golang:1.24-bookworm`)
9191+ - Platform: `linux/{architecture}`
9292+ - Execute all steps sequentially
9393+- **Volumes**:
9494+ - `/tangled/workspace` - Shared workspace (emptyDir)
9595+ - `/tmp/step-outputs` - Step output communication
9696+ - `/tmp/github` - GITHUB_ENV-style env passing
9797+- **Node Affinity**: Based on `architecture` field
9898+9999+### Multi-Architecture Support
100100+```go
101101+func (e *KubernetesEngine) buildJobAffinity(arch string) *corev1.Affinity {
102102+ return &corev1.Affinity{
103103+ NodeAffinity: &corev1.NodeAffinity{
104104+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
105105+ NodeSelectorTerms: []corev1.NodeSelectorTerm{
106106+ {
107107+ MatchExpressions: []corev1.NodeSelectorRequirement{
108108+ {
109109+ Key: "kubernetes.io/arch",
110110+ Operator: corev1.NodeSelectorOpIn,
111111+ Values: []string{arch}, // amd64 or arm64
112112+ },
113113+ },
114114+ },
115115+ },
116116+ },
117117+ },
118118+ }
119119+}
120120+```
121121+122122+### Step Execution Model
123123+Generate bash script that executes all steps sequentially:
124124+- GitHub Actions-compatible environment variables (`GITHUB_ENV`, `GITHUB_OUTPUT`)
125125+- Environment passing between steps
126126+- Error handling and exit on failure
127127+- Step-level logging with timestamps
128128+129129+---
130130+131131+## Phase 3: WebSocket Integration & Event Handling
132132+133133+### WebSocket Client (Reuse from core/spindle)
134134+- Connect to `{knotUrl}/spindle/events`
135135+- Handle cursor-based backfill for missed events
136136+- Subscribe to live `sh.tangled.pipeline` events
137137+- Exponential backoff on connection failures
138138+139139+### Event Handler → Job Creation
140140+1. Parse pipeline event payload
141141+2. Extract workflow definition, repo, commit SHA
142142+3. Create Kubernetes Job with:
143143+ - Correct architecture node affinity
144144+ - Image from workflow spec
145145+ - Steps as bash script
146146+ - Owner reference to SpindleSet (for cleanup)
147147+4. Label Job with pipeline metadata
148148+149149+### SpindleSet Controller Reconciliation
150150+- Establish WebSocket connection to knot
151151+- Subscribe to pipeline events
152152+- Create Jobs on event received
153153+- Monitor running Jobs
154154+- Update SpindleSet status
155155+- Handle connection failures
156156+157157+---
158158+159159+## Phase 4: Status Reporting & Observability
160160+161161+### Job Status Tracking
162162+Watch Job events via Kubernetes API:
163163+- Job created → Report "running" to knot
164164+- Job succeeded → Report "success" to knot
165165+- Job failed → Report "failure" with error to knot
166166+- Job timeout → Report "timeout" to knot
167167+168168+### Status Reporting to Knot
169169+Reuse `spindle/db` status update patterns:
170170+- `StatusRunning()` - When Job starts
171171+- `StatusSuccess()` - When Job succeeds
172172+- `StatusFailed()` - When Job fails with error message
173173+- `StatusTimeout()` - When Job exceeds timeout
174174+175175+### Prometheus Metrics
176176+```go
177177+loom_pending_spindles // Gauge: jobs pending
178178+loom_running_spindles // Gauge: jobs running
179179+loom_completed_spindles_total // Counter: total completed
180180+loom_failed_spindles_total // Counter: total failed
181181+loom_pipeline_duration_seconds // Histogram: execution duration
182182+```
183183+184184+Exposed via controller-runtime's metrics server.
185185+186186+---
187187+188188+## Phase 5: Log Streaming via Kubernetes API
189189+190190+### Implementation
191191+```go
192192+func (e *KubernetesEngine) StreamLogsToKnot(ctx context.Context, jobName string, knotClient *KnotClient) {
193193+ // 1. Get pod for job
194194+ // 2. Stream logs via K8s API
195195+ // 3. Forward each line to knot in real-time
196196+}
197197+```
198198+199199+### Log Format
200200+Send to knot in tangled.org spindle format:
201201+```json
202202+{
203203+ "kind": "data", // or "control"
204204+ "content": "test output line",
205205+ "stepId": 0,
206206+ "stepKind": "user"
207207+}
208208+```
209209+210210+---
211211+212212+## Phase 6: Testing & Deployment
213213+214214+### Unit Tests
215215+- Job template generation with different architectures
216216+- Node affinity generation (amd64 vs arm64)
217217+- Step script builder
218218+- Mock WebSocket client
219219+220220+### Integration Tests
221221+```go
222222+// Test with real cluster
223223+func TestE2E_SimpleGoPipeline(t *testing.T) {
224224+ // 1. Deploy SpindleSet CR
225225+ // 2. Send test pipeline event
226226+ // 3. Verify Job created on correct arch node
227227+ // 4. Wait for completion
228228+ // 5. Check logs streamed to knot
229229+}
230230+```
231231+232232+### Manual Testing
233233+```bash
234234+# Deploy operator
235235+make deploy IMG=ghcr.io/you/loom:v0.1.0
236236+237237+# Create SpindleSet
238238+kubectl apply -f config/samples/spindleset_sample.yaml
239239+240240+# Push code to tangled.org with .tangled/pipeline.yaml
241241+242242+# Watch Jobs
243243+kubectl get jobs -l loom.j5t.io/spindleset=test-spindle -w
244244+245245+# Check pod placement
246246+kubectl get pods -o wide
247247+248248+# View logs
249249+kubectl logs -f job/runner-<hash>
250250+```
251251+252252+---
253253+254254+## File Structure
255255+256256+```
257257+loom/
258258+├── api/v1alpha1/
259259+│ ├── spindleset_types.go # SpindleSet CRD
260260+│ └── groupversion_info.go
261261+│
262262+├── internal/
263263+│ ├── controller/
264264+│ │ └── spindleset_controller.go # Main reconciliation loop
265265+│ │
266266+│ └── engine/
267267+│ └── kubernetes_engine.go # K8s-native Engine implementation
268268+│
269269+├── pkg/
270270+│ ├── ingester/
271271+│ │ └── websocket.go # WebSocket client (adapted from core)
272272+│ │
273273+│ ├── jobbuilder/
274274+│ │ ├── job_template.go # Generate Job specs
275275+│ │ ├── affinity.go # Multi-arch node affinity
276276+│ │ └── script_builder.go # Step execution script
277277+│ │
278278+│ └── knot/
279279+│ └── client.go # Knot API client for status/logs
280280+│
281281+├── config/
282282+│ ├── crd/ # Generated CRD manifests
283283+│ ├── rbac/ # RBAC for Job CRUD
284284+│ └── samples/
285285+│ └── spindleset_sample.yaml
286286+│
287287+└── cmd/main.go # Operator entrypoint
288288+```
289289+290290+---
291291+292292+## Dependencies
293293+294294+### From tangled.org/core
295295+```go
296296+import (
297297+ "tangled.org/core/spindle/models" // Engine interface
298298+ "tangled.org/core/spindle/config" // Config models
299299+ "tangled.org/core/api/tangled" // Pipeline types
300300+ // Adapt WebSocket logic from spindle/stream.go, ingester.go
301301+)
302302+```
303303+304304+### Kubernetes
305305+```go
306306+import (
307307+ batchv1 "k8s.io/api/batch/v1"
308308+ corev1 "k8s.io/api/core/v1"
309309+ "sigs.k8s.io/controller-runtime/pkg/client"
310310+)
311311+```
312312+313313+### Metrics
314314+```go
315315+import (
316316+ "github.com/prometheus/client_golang/prometheus"
317317+ "sigs.k8s.io/controller-runtime/pkg/metrics"
318318+)
319319+```
320320+321321+---
322322+323323+## Implementation Order
324324+325325+1. ✅ Create SpindleSet CRD (API types, generate manifests)
326326+2. ⏳ Implement Job builder (template generation, multi-arch affinity)
327327+3. ⏳ Implement KubernetesEngine (Engine interface for K8s Jobs)
328328+4. ⏳ Import WebSocket client (adapt from core/spindle)
329329+5. ⏳ Implement SpindleSet controller (reconciliation + event handling)
330330+6. ⏳ Add Job status monitoring (watch Jobs, report to knot)
331331+7. ⏳ Add log streaming (K8s API → knot)
332332+8. ⏳ Add Prometheus metrics (instrument controller)
333333+9. ⏳ Testing (unit + integration tests)
334334+10. ⏳ Documentation (usage guide, architecture diagrams)
335335+336336+---
337337+338338+## MVP Scope
339339+340340+### Include ✅
341341+- SpindleSet CRD with knot configuration
342342+- WebSocket connection to knot
343343+- Kubernetes Job creation per pipeline event
344344+- Multi-architecture support (amd64/arm64 node targeting)
345345+- Standard Docker images (golang:1.24, node:20, etc.)
346346+- Sequential step execution in single pod
347347+- Log streaming from K8s pods to knot via K8s API
348348+- Status reporting to knot (success/failure/timeout)
349349+- Prometheus metrics
350350+351351+### Exclude (Future Enhancements) ❌
352352+- Nixery integration (add later)
353353+- Kaniko/Buildah for container builds
354354+- Persistent Nix store caching
355355+- Multi-knot support
356356+- Advanced auto-scaling policies
357357+- Service containers (DB sidecars)
358358+- Matrix builds
359359+360360+---
361361+362362+## Key Design Decisions
363363+364364+1. **Ephemeral Jobs**: Scale-to-zero, one Job per pipeline event
365365+2. **Simple Images**: Use any Docker Hub image, no Nixery complexity for MVP
366366+3. **Multi-Arch Native**: Use Kubernetes node affinity for amd64/arm64 targeting
367367+4. **All steps in one pod**: GitHub Actions model (shared filesystem/env)
368368+5. **K8s API for logs**: Stream pod logs to knot, no disk-based logging needed
369369+6. **Reuse spindle models**: Maintain compatibility, adapt only execution layer
370370+7. **Prometheus metrics**: Standard observability from day one
371371+372372+---
373373+374374+## Future Enhancements
375375+376376+### Phase 7: Nixery Integration
377377+- Detect `dependencies.nixpkgs` in workflow spec
378378+- Generate Nixery image URL dynamically
379379+- Support both standard images and Nixery
380380+- Implement Nix store caching (PVC)
381381+382382+### Phase 8: Advanced Features
383383+- Kaniko/Buildah integration for container builds
384384+- Service containers (like GitHub Actions services)
385385+- Matrix builds (multiple arch/version combinations)
386386+- Caching strategies (build cache, dependencies)
387387+- Advanced auto-scaling (predictive scaling)
388388+389389+### Phase 9: Multi-Tenancy
390390+- Multiple SpindleSets per cluster
391391+- Resource quotas per SpindleSet
392392+- Network policies for isolation
393393+- Multi-knot support (one operator, many knots)
394394+395395+---
396396+397397+## Success Criteria
398398+399399+**MVP is complete when:**
400400+1. SpindleSet CRD can be deployed to cluster
401401+2. WebSocket connection to tangled.org knot established
402402+3. Pipeline events trigger Job creation
403403+4. Jobs execute on correct architecture nodes
404404+5. Logs stream back to knot in real-time
405405+6. Status updates sent to knot (success/failure)
406406+7. Prometheus metrics exposed
407407+8. Basic integration test passes
408408+409409+**Production-ready when:**
410410+1. Full test coverage (unit + integration)
411411+2. Error handling and retry logic robust
412412+3. Documentation complete
413413+4. Helm chart available
414414+5. Multi-arch container images published
415415+6. Performance benchmarked
416416+7. Security review completed
+23
PROJECT
···11+# Code generated by tool. DO NOT EDIT.
22+# This file is used to track the info used to scaffold your project
33+# and allow the plugins properly work.
44+# More info: https://book.kubebuilder.io/reference/project-config.html
55+domain: j5t.io
66+layout:
77+- go.kubebuilder.io/v4
88+plugins:
99+ manifests.sdk.operatorframework.io/v2: {}
1010+ scorecard.sdk.operatorframework.io/v2: {}
1111+projectName: loom
1212+repo: tangled.org/evan.jarrett.net/loom
1313+resources:
1414+- api:
1515+ crdVersion: v1
1616+ namespaced: true
1717+ controller: true
1818+ domain: j5t.io
1919+ group: loom
2020+ kind: SpindleSet
2121+ path: tangled.org/evan.jarrett.net/loom/api/v1alpha1
2222+ version: v1alpha1
2323+version: "3"
+135
README.md
···11+# loom
22+// TODO(user): Add simple overview of use/purpose
33+44+## Description
55+// TODO(user): An in-depth paragraph about your project and overview of use
66+77+## Getting Started
88+99+### Prerequisites
1010+- go version v1.24.0+
1111+- docker version 17.03+.
1212+- kubectl version v1.11.3+.
1313+- Access to a Kubernetes v1.11.3+ cluster.
1414+1515+### To Deploy on the cluster
1616+**Build and push your image to the location specified by `IMG`:**
1717+1818+```sh
1919+make docker-build docker-push IMG=<some-registry>/loom:tag
2020+```
2121+2222+**NOTE:** This image ought to be published in the personal registry you specified.
2323+And it is required to have access to pull the image from the working environment.
2424+Make sure you have the proper permission to the registry if the above commands don’t work.
2525+2626+**Install the CRDs into the cluster:**
2727+2828+```sh
2929+make install
3030+```
3131+3232+**Deploy the Manager to the cluster with the image specified by `IMG`:**
3333+3434+```sh
3535+make deploy IMG=<some-registry>/loom:tag
3636+```
3737+3838+> **NOTE**: If you encounter RBAC errors, you may need to grant yourself cluster-admin
3939+privileges or be logged in as admin.
4040+4141+**Create instances of your solution**
4242+You can apply the samples (examples) from the config/sample:
4343+4444+```sh
4545+kubectl apply -k config/samples/
4646+```
4747+4848+>**NOTE**: Ensure that the samples has default values to test it out.
4949+5050+### To Uninstall
5151+**Delete the instances (CRs) from the cluster:**
5252+5353+```sh
5454+kubectl delete -k config/samples/
5555+```
5656+5757+**Delete the APIs(CRDs) from the cluster:**
5858+5959+```sh
6060+make uninstall
6161+```
6262+6363+**UnDeploy the controller from the cluster:**
6464+6565+```sh
6666+make undeploy
6767+```
6868+6969+## Project Distribution
7070+7171+Following the options to release and provide this solution to the users.
7272+7373+### By providing a bundle with all YAML files
7474+7575+1. Build the installer for the image built and published in the registry:
7676+7777+```sh
7878+make build-installer IMG=<some-registry>/loom:tag
7979+```
8080+8181+**NOTE:** The makefile target mentioned above generates an 'install.yaml'
8282+file in the dist directory. This file contains all the resources built
8383+with Kustomize, which are necessary to install this project without its
8484+dependencies.
8585+8686+2. Using the installer
8787+8888+Users can just run 'kubectl apply -f <URL for YAML BUNDLE>' to install
8989+the project, i.e.:
9090+9191+```sh
9292+kubectl apply -f https://raw.githubusercontent.com/<org>/loom/<tag or branch>/dist/install.yaml
9393+```
9494+9595+### By providing a Helm Chart
9696+9797+1. Build the chart using the optional helm plugin
9898+9999+```sh
100100+operator-sdk edit --plugins=helm/v1-alpha
101101+```
102102+103103+2. See that a chart was generated under 'dist/chart', and users
104104+can obtain this solution from there.
105105+106106+**NOTE:** If you change the project, you need to update the Helm Chart
107107+using the same command above to sync the latest changes. Furthermore,
108108+if you create webhooks, you need to use the above command with
109109+the '--force' flag and manually ensure that any custom configuration
110110+previously added to 'dist/chart/values.yaml' or 'dist/chart/manager/manager.yaml'
111111+is manually re-applied afterwards.
112112+113113+## Contributing
114114+// TODO(user): Add detailed information on how you would like others to contribute to this project
115115+116116+**NOTE:** Run `make help` for more information on all potential `make` targets
117117+118118+More information can be found via the [Kubebuilder Documentation](https://book.kubebuilder.io/introduction.html)
119119+120120+## License
121121+122122+Copyright 2025 Evan Jarrett.
123123+124124+Licensed under the Apache License, Version 2.0 (the "License");
125125+you may not use this file except in compliance with the License.
126126+You may obtain a copy of the License at
127127+128128+ http://www.apache.org/licenses/LICENSE-2.0
129129+130130+Unless required by applicable law or agreed to in writing, software
131131+distributed under the License is distributed on an "AS IS" BASIS,
132132+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
133133+See the License for the specific language governing permissions and
134134+limitations under the License.
135135+
+431
TANGLED.md
···11+# Upstream Improvements for tangled.org/core
22+33+This document tracks issues and proposed improvements for the upstream `tangled.org/core` repository that would benefit Loom and other spindle implementations.
44+55+## Critical Issues
66+77+### 1. Spindle Server Constructor Not Exported
88+99+**Current State:**
1010+- `spindle.Run()` creates an unexported `Spindle` struct internally
1111+- Hardcodes `nixery.New()` engine initialization
1212+- Blocks on `http.ListenAndServe()` at the end
1313+- No way to inject custom engines or access spindle components
1414+1515+**Proposed Solution:**
1616+```go
1717+// spindle/server.go
1818+1919+// Spindle should be exported
2020+type Spindle struct {
2121+ jc *jetstream.JetstreamClient
2222+ db *db.DB
2323+ e *rbac.Enforcer
2424+ l *slog.Logger
2525+ n *notifier.Notifier
2626+ engs map[string]models.Engine
2727+ jq *queue.Queue
2828+ cfg *config.Config
2929+ ks *eventconsumer.Consumer
3030+ res *idresolver.Resolver
3131+ vault secrets.Manager
3232+}
3333+3434+// New creates a spindle server with custom engines
3535+func New(ctx context.Context, cfg *config.Config, engines map[string]models.Engine) (*Spindle, error) {
3636+ // All the initialization logic currently in Run()
3737+ // But accept engines as parameter instead of hardcoding nixery
3838+3939+ return &Spindle{...}, nil
4040+}
4141+4242+// Start starts the HTTP server (blocking)
4343+func (s *Spindle) Start(ctx context.Context) error {
4444+ s.l.Info("starting spindle server", "address", s.cfg.Server.ListenAddr)
4545+ return http.ListenAndServe(s.cfg.Server.ListenAddr, s.Router())
4646+}
4747+4848+// Run is a convenience function for standard spindle operation
4949+func Run(ctx context.Context) error {
5050+ cfg, err := config.Load(ctx)
5151+ if err != nil {
5252+ return err
5353+ }
5454+5555+ // Create default engines (nixery)
5656+ nixeryEng, err := nixery.New(ctx, cfg)
5757+ if err != nil {
5858+ return err
5959+ }
6060+6161+ s, err := New(ctx, cfg, map[string]models.Engine{
6262+ "nixery": nixeryEng,
6363+ })
6464+ if err != nil {
6565+ return err
6666+ }
6767+6868+ return s.Start(ctx)
6969+}
7070+```
7171+7272+**Benefits:**
7373+- Custom engine implementations (KubernetesEngine, DockerEngine, etc.)
7474+- Embedding spindle in other applications
7575+- Testing with mock engines
7676+- Multiple engines in single spindle instance
7777+7878+**Use Case in Loom:**
7979+```go
8080+// Loom can now use spindle properly
8181+s, err := spindle.New(ctx, cfg, map[string]models.Engine{
8282+ "kubernetes": kubeEngine,
8383+})
8484+8585+// Start in goroutine
8686+go s.Start(ctx)
8787+8888+// Access components for controller
8989+controller := &SpindleSetReconciler{
9090+ DB: s.DB(),
9191+ Queue: s.Queue(),
9292+ Engines: s.Engines(),
9393+}
9494+```
9595+9696+---
9797+9898+## Helpful Interface Additions
9999+100100+### 2. Engine Configuration Interface
101101+102102+**Issue:**
103103+Engines have different configuration needs but no standard way to express requirements.
104104+105105+**Proposed:**
106106+```go
107107+// models/engine.go
108108+109109+type EngineConfigurable interface {
110110+ // ConfigSchema returns JSON schema for engine-specific config
111111+ ConfigSchema() map[string]interface{}
112112+113113+ // ApplyConfig updates engine with workflow-specific config
114114+ ApplyConfig(config map[string]interface{}) error
115115+}
116116+```
117117+118118+**Benefit:**
119119+- Workflows can specify engine requirements
120120+- Dynamic engine selection based on config
121121+- Validation of engine compatibility
122122+123123+---
124124+125125+### 3. Repo Storage Backend Interface
126126+127127+**Issue:**
128128+Database operations for repos are tightly coupled to SQLite implementation.
129129+130130+**Proposed:**
131131+```go
132132+// spindle/storage/storage.go
133133+134134+type RepoStorage interface {
135135+ AddRepo(knot, owner, name string) error
136136+ GetRepo(knot, owner, name string) (*Repo, error)
137137+ ListRepos() ([]*Repo, error)
138138+ RemoveRepo(knot, owner, name string) error
139139+ Knots() ([]string, error)
140140+}
141141+142142+// Allow injection of custom storage
143143+func New(ctx context.Context, cfg *config.Config, storage RepoStorage, ...) (*Spindle, error)
144144+```
145145+146146+**Benefit:**
147147+- Different storage backends (PostgreSQL, etcd, etc.)
148148+- Better testing with mock storage
149149+- Cloud-native deployments
150150+151151+---
152152+153153+### 4. Event Consumer Plugin Interface
154154+155155+**Issue:**
156156+EventConsumer is hardcoded to consume from knots. Other event sources require forking.
157157+158158+**Proposed:**
159159+```go
160160+// eventconsumer/source.go
161161+162162+type SourceProvider interface {
163163+ // Discover returns available event sources
164164+ Discover(ctx context.Context) ([]Source, error)
165165+166166+ // Watch notifies when new sources appear
167167+ Watch(ctx context.Context) (<-chan Source, error)
168168+}
169169+170170+// Allow registration of source providers
171171+func (c *Consumer) RegisterProvider(name string, provider SourceProvider) error
172172+```
173173+174174+**Benefit:**
175175+- GitHub webhook sources
176176+- GitLab sources
177177+- Custom CI/CD integrations
178178+- Kubernetes events as sources
179179+180180+---
181181+182182+### 5. Status Reporter Interface
183183+184184+**Issue:**
185185+Status is written to DB + notifier, but no abstraction for other status destinations.
186186+187187+**Proposed:**
188188+```go
189189+// spindle/status/status.go
190190+191191+type StatusReporter interface {
192192+ ReportPending(ctx context.Context, wid models.WorkflowId) error
193193+ ReportRunning(ctx context.Context, wid models.WorkflowId) error
194194+ ReportSuccess(ctx context.Context, wid models.WorkflowId) error
195195+ ReportFailed(ctx context.Context, wid models.WorkflowId, err string, exitCode int64) error
196196+ ReportTimeout(ctx context.Context, wid models.WorkflowId) error
197197+}
198198+199199+// Allow multiple reporters
200200+func (s *Spindle) AddStatusReporter(reporter StatusReporter)
201201+202202+// Default implementation uses DB + Notifier
203203+type DBStatusReporter struct { ... }
204204+```
205205+206206+**Benefit:**
207207+- Send status to external systems (Prometheus, Datadog, etc.)
208208+- Custom status aggregation
209209+- Multi-destination status broadcasting
210210+211211+---
212212+213213+### 6. Secrets Backend Plugin Interface
214214+215215+**Issue:**
216216+Only SQLite and OpenBao supported, no plugin system.
217217+218218+**Proposed:**
219219+```go
220220+// spindle/secrets/plugin.go
221221+222222+type SecretProvider interface {
223223+ Name() string
224224+ // Existing Manager interface methods...
225225+}
226226+227227+// Registry for secret providers
228228+var providers = make(map[string]func(*config.Config) (Manager, error))
229229+230230+func RegisterProvider(name string, factory func(*config.Config) (Manager, error)) {
231231+ providers[name] = factory
232232+}
233233+234234+// In config loading
235235+func CreateSecretsManager(cfg *config.Config) (Manager, error) {
236236+ factory, ok := providers[cfg.Server.Secrets.Provider]
237237+ if !ok {
238238+ return nil, fmt.Errorf("unknown provider: %s", cfg.Server.Secrets.Provider)
239239+ }
240240+ return factory(cfg)
241241+}
242242+```
243243+244244+**Benefit:**
245245+- AWS Secrets Manager
246246+- HashiCorp Vault
247247+- Kubernetes Secrets
248248+- Custom secret backends
249249+250250+---
251251+252252+### 7. Workflow Logger Interface
253253+254254+**Issue:**
255255+WorkflowLogger writes to disk at hardcoded `/var/log/spindle/`. No abstraction.
256256+257257+**Proposed:**
258258+```go
259259+// models/logger.go
260260+261261+type LogDestination interface {
262262+ Write(workflowId WorkflowId, stepIdx int, line string) error
263263+ Read(workflowId WorkflowId, stepIdx int) ([]string, error)
264264+ Stream(workflowId WorkflowId) (<-chan LogLine, error)
265265+}
266266+267267+type WorkflowLogger struct {
268268+ destinations []LogDestination
269269+}
270270+271271+// Built-in implementations
272272+type FileLogDestination struct { ... }
273273+type S3LogDestination struct { ... }
274274+type DatabaseLogDestination struct { ... }
275275+```
276276+277277+**Benefit:**
278278+- Cloud storage for logs (S3, GCS, Azure Blob)
279279+- Database log storage
280280+- Log streaming services (Loki, ElasticSearch)
281281+- Multiple destinations
282282+283283+---
284284+285285+### 8. Pipeline Validation Interface
286286+287287+**Issue:**
288288+No pre-execution validation of pipeline definitions.
289289+290290+**Proposed:**
291291+```go
292292+// models/validator.go
293293+294294+type PipelineValidator interface {
295295+ Validate(pipeline *tangled.Pipeline) error
296296+}
297297+298298+type EngineValidator interface {
299299+ ValidateWorkflow(workflow *Workflow) error
300300+}
301301+302302+// Register validators
303303+func (s *Spindle) AddValidator(v PipelineValidator)
304304+```
305305+306306+**Benefit:**
307307+- Catch errors before execution
308308+- Custom validation rules
309309+- Policy enforcement (no privileged containers, resource limits, etc.)
310310+311311+---
312312+313313+## Configuration Improvements
314314+315315+### 9. Config File Support
316316+317317+**Current:** Only environment variables
318318+**Proposed:** Support YAML/JSON config files in addition to env vars
319319+320320+```go
321321+// Load from file or env
322322+func Load(ctx context.Context) (*Config, error) {
323323+ if path := os.Getenv("SPINDLE_CONFIG_FILE"); path != "" {
324324+ return LoadFromFile(path)
325325+ }
326326+ return LoadFromEnv(ctx)
327327+}
328328+```
329329+330330+---
331331+332332+### 10. Engine-Specific Config Sections
333333+334334+**Current:** Only NixeryPipelines config
335335+**Proposed:** Generic engine config map
336336+337337+```go
338338+type Config struct {
339339+ Server Server
340340+ EngineConfig map[string]map[string]interface{} // engine name -> config
341341+}
342342+```
343343+344344+Example:
345345+```yaml
346346+spindle:
347347+ server:
348348+ hostname: spindle.example.com
349349+ engines:
350350+ nixery:
351351+ nixery: nixery.tangled.sh
352352+ workflow_timeout: 5m
353353+ kubernetes:
354354+ namespace: default
355355+ default_service_account: spindle-runner
356356+```
357357+358358+---
359359+360360+## Testing Utilities
361361+362362+### 11. Mock Engine Implementation
363363+364364+**Proposed:** Include a mock engine for testing
365365+366366+```go
367367+// testing/mockengine.go
368368+369369+type MockEngine struct {
370370+ InitFunc func(...) (*models.Workflow, error)
371371+ SetupFunc func(...) error
372372+ RunStepFunc func(...) error
373373+ DestroyFunc func(...) error
374374+}
375375+376376+func (m *MockEngine) InitWorkflow(...) (*models.Workflow, error) {
377377+ if m.InitFunc != nil {
378378+ return m.InitFunc(...)
379379+ }
380380+ return &models.Workflow{}, nil
381381+}
382382+// ... etc
383383+```
384384+385385+---
386386+387387+## Documentation Needs
388388+389389+### 12. Engine Implementation Guide
390390+391391+Documentation for implementing custom engines:
392392+- Required interface methods
393393+- Lifecycle expectations
394394+- Error handling patterns
395395+- Testing strategies
396396+- Example implementations
397397+398398+### 13. Deployment Patterns
399399+400400+Documentation for different deployment scenarios:
401401+- Single server
402402+- High availability
403403+- Kubernetes operator (Loom)
404404+- Docker Compose
405405+- Systemd service
406406+407407+---
408408+409409+## Priority for Loom
410410+411411+**High Priority:**
412412+1. ✅ **Spindle Server Constructor** - Blocks Loom implementation completely
413413+2. Engine Configuration Interface - Needed for proper Kubernetes config
414414+415415+**Medium Priority:**
416416+3. Secrets Backend Plugin - For Kubernetes secrets integration
417417+4. Workflow Logger Interface - For pod log streaming
418418+5. Status Reporter Interface - For custom status handling
419419+420420+**Low Priority:**
421421+6. Other interfaces - Nice to have, workarounds exist
422422+423423+---
424424+425425+## Contributing Back
426426+427427+Once Loom is working, we should:
428428+1. Extract generic patterns from KubernetesEngine
429429+2. Propose these interfaces upstream
430430+3. Submit PRs for critical issues (#1)
431431+4. Share deployment patterns and documentation
+36
api/v1alpha1/groupversion_info.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+// Package v1alpha1 contains API Schema definitions for the loom v1alpha1 API group.
1818+// +kubebuilder:object:generate=true
1919+// +groupName=loom.j5t.io
2020+package v1alpha1
2121+2222+import (
2323+ "k8s.io/apimachinery/pkg/runtime/schema"
2424+ "sigs.k8s.io/controller-runtime/pkg/scheme"
2525+)
2626+2727+var (
2828+ // GroupVersion is group version used to register these objects.
2929+ GroupVersion = schema.GroupVersion{Group: "loom.j5t.io", Version: "v1alpha1"}
3030+3131+ // SchemeBuilder is used to add go types to the GroupVersionKind scheme.
3232+ SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
3333+3434+ // AddToScheme adds the types in this group-version to the given scheme.
3535+ AddToScheme = SchemeBuilder.AddToScheme
3636+)
+136
api/v1alpha1/spindleset_types.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package v1alpha1
1818+1919+import (
2020+ corev1 "k8s.io/api/core/v1"
2121+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2222+)
2323+2424+// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
2525+// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
2626+2727+// SpindleSetSpec defines the desired state of SpindleSet.
2828+type SpindleSetSpec struct {
2929+ // KnotUrl is the URL of the tangled.org knot to connect to (e.g., https://tangled.org/@org/repo).
3030+ // +kubebuilder:validation:Required
3131+ // +kubebuilder:validation:Pattern=`^https?://.*`
3232+ KnotUrl string `json:"knotUrl"`
3333+3434+ // KnotAuthSecret is the name of the Secret containing authentication credentials for the knot.
3535+ // The secret should contain a key "token" with the auth token.
3636+ // +kubebuilder:validation:Required
3737+ KnotAuthSecret string `json:"knotAuthSecret"`
3838+3939+ // MaxConcurrentJobs is the maximum number of concurrent spindle jobs that can run.
4040+ // Defaults to 10 if not specified.
4141+ // +kubebuilder:default=10
4242+ // +kubebuilder:validation:Minimum=1
4343+ // +kubebuilder:validation:Maximum=100
4444+ // +optional
4545+ MaxConcurrentJobs int32 `json:"maxConcurrentJobs,omitempty"`
4646+4747+ // Template is the default pod template configuration for spindle jobs.
4848+ // Individual workflows can override these settings.
4949+ // +optional
5050+ Template SpindleTemplate `json:"template,omitempty"`
5151+}
5252+5353+// SpindleTemplate defines the pod template configuration for spindle jobs.
5454+type SpindleTemplate struct {
5555+ // Resources defines the compute resource requirements for spindle jobs.
5656+ // +optional
5757+ Resources corev1.ResourceRequirements `json:"resources,omitempty"`
5858+5959+ // NodeSelector is a selector which must be true for the pod to fit on a node.
6060+ // +optional
6161+ NodeSelector map[string]string `json:"nodeSelector,omitempty"`
6262+6363+ // Tolerations allows pods to schedule onto nodes with matching taints.
6464+ // +optional
6565+ Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
6666+6767+ // Affinity defines scheduling constraints for spindle job pods.
6868+ // +optional
6969+ Affinity *corev1.Affinity `json:"affinity,omitempty"`
7070+}
7171+7272+// SpindleSetStatus defines the observed state of SpindleSet.
7373+type SpindleSetStatus struct {
7474+ // Conditions represent the latest available observations of the SpindleSet's state.
7575+ // +optional
7676+ // +patchMergeKey=type
7777+ // +patchStrategy=merge
7878+ // +listType=map
7979+ // +listMapKey=type
8080+ Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
8181+8282+ // PendingJobs is the number of spindle jobs currently pending.
8383+ // +optional
8484+ PendingJobs int32 `json:"pendingJobs,omitempty"`
8585+8686+ // RunningJobs is the number of spindle jobs currently running.
8787+ // +optional
8888+ RunningJobs int32 `json:"runningJobs,omitempty"`
8989+9090+ // CompletedJobs is the total number of spindle jobs that have completed successfully.
9191+ // +optional
9292+ CompletedJobs int32 `json:"completedJobs,omitempty"`
9393+9494+ // FailedJobs is the total number of spindle jobs that have failed.
9595+ // +optional
9696+ FailedJobs int32 `json:"failedJobs,omitempty"`
9797+9898+ // WebSocketConnected indicates whether the WebSocket connection to the knot is active.
9999+ // +optional
100100+ WebSocketConnected bool `json:"webSocketConnected,omitempty"`
101101+102102+ // LastEventTime is the timestamp of the last pipeline event received from the knot.
103103+ // +optional
104104+ LastEventTime *metav1.Time `json:"lastEventTime,omitempty"`
105105+}
106106+107107+// +kubebuilder:object:root=true
108108+// +kubebuilder:subresource:status
109109+// +kubebuilder:printcolumn:name="Knot URL",type=string,JSONPath=`.spec.knotUrl`
110110+// +kubebuilder:printcolumn:name="Connected",type=boolean,JSONPath=`.status.webSocketConnected`
111111+// +kubebuilder:printcolumn:name="Running",type=integer,JSONPath=`.status.runningJobs`
112112+// +kubebuilder:printcolumn:name="Completed",type=integer,JSONPath=`.status.completedJobs`
113113+// +kubebuilder:printcolumn:name="Failed",type=integer,JSONPath=`.status.failedJobs`
114114+// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
115115+116116+// SpindleSet is the Schema for the spindlesets API.
117117+type SpindleSet struct {
118118+ metav1.TypeMeta `json:",inline"`
119119+ metav1.ObjectMeta `json:"metadata,omitempty"`
120120+121121+ Spec SpindleSetSpec `json:"spec,omitempty"`
122122+ Status SpindleSetStatus `json:"status,omitempty"`
123123+}
124124+125125+// +kubebuilder:object:root=true
126126+127127+// SpindleSetList contains a list of SpindleSet.
128128+type SpindleSetList struct {
129129+ metav1.TypeMeta `json:",inline"`
130130+ metav1.ListMeta `json:"metadata,omitempty"`
131131+ Items []SpindleSet `json:"items"`
132132+}
133133+134134+func init() {
135135+ SchemeBuilder.Register(&SpindleSet{}, &SpindleSetList{})
136136+}
+163
api/v1alpha1/zz_generated.deepcopy.go
···11+//go:build !ignore_autogenerated
22+33+/*
44+Copyright 2025 Evan Jarrett.
55+66+Licensed under the Apache License, Version 2.0 (the "License");
77+you may not use this file except in compliance with the License.
88+You may obtain a copy of the License at
99+1010+ http://www.apache.org/licenses/LICENSE-2.0
1111+1212+Unless required by applicable law or agreed to in writing, software
1313+distributed under the License is distributed on an "AS IS" BASIS,
1414+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1515+See the License for the specific language governing permissions and
1616+limitations under the License.
1717+*/
1818+1919+// Code generated by controller-gen. DO NOT EDIT.
2020+2121+package v1alpha1
2222+2323+import (
2424+ "k8s.io/api/core/v1"
2525+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2626+ runtime "k8s.io/apimachinery/pkg/runtime"
2727+)
2828+2929+// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
3030+func (in *SpindleSet) DeepCopyInto(out *SpindleSet) {
3131+ *out = *in
3232+ out.TypeMeta = in.TypeMeta
3333+ in.ObjectMeta.DeepCopyInto(&out.ObjectMeta)
3434+ in.Spec.DeepCopyInto(&out.Spec)
3535+ in.Status.DeepCopyInto(&out.Status)
3636+}
3737+3838+// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SpindleSet.
3939+func (in *SpindleSet) DeepCopy() *SpindleSet {
4040+ if in == nil {
4141+ return nil
4242+ }
4343+ out := new(SpindleSet)
4444+ in.DeepCopyInto(out)
4545+ return out
4646+}
4747+4848+// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
4949+func (in *SpindleSet) DeepCopyObject() runtime.Object {
5050+ if c := in.DeepCopy(); c != nil {
5151+ return c
5252+ }
5353+ return nil
5454+}
5555+5656+// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
5757+func (in *SpindleSetList) DeepCopyInto(out *SpindleSetList) {
5858+ *out = *in
5959+ out.TypeMeta = in.TypeMeta
6060+ in.ListMeta.DeepCopyInto(&out.ListMeta)
6161+ if in.Items != nil {
6262+ in, out := &in.Items, &out.Items
6363+ *out = make([]SpindleSet, len(*in))
6464+ for i := range *in {
6565+ (*in)[i].DeepCopyInto(&(*out)[i])
6666+ }
6767+ }
6868+}
6969+7070+// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SpindleSetList.
7171+func (in *SpindleSetList) DeepCopy() *SpindleSetList {
7272+ if in == nil {
7373+ return nil
7474+ }
7575+ out := new(SpindleSetList)
7676+ in.DeepCopyInto(out)
7777+ return out
7878+}
7979+8080+// DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
8181+func (in *SpindleSetList) DeepCopyObject() runtime.Object {
8282+ if c := in.DeepCopy(); c != nil {
8383+ return c
8484+ }
8585+ return nil
8686+}
8787+8888+// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
8989+func (in *SpindleSetSpec) DeepCopyInto(out *SpindleSetSpec) {
9090+ *out = *in
9191+ in.Template.DeepCopyInto(&out.Template)
9292+}
9393+9494+// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SpindleSetSpec.
9595+func (in *SpindleSetSpec) DeepCopy() *SpindleSetSpec {
9696+ if in == nil {
9797+ return nil
9898+ }
9999+ out := new(SpindleSetSpec)
100100+ in.DeepCopyInto(out)
101101+ return out
102102+}
103103+104104+// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
105105+func (in *SpindleSetStatus) DeepCopyInto(out *SpindleSetStatus) {
106106+ *out = *in
107107+ if in.Conditions != nil {
108108+ in, out := &in.Conditions, &out.Conditions
109109+ *out = make([]metav1.Condition, len(*in))
110110+ for i := range *in {
111111+ (*in)[i].DeepCopyInto(&(*out)[i])
112112+ }
113113+ }
114114+ if in.LastEventTime != nil {
115115+ in, out := &in.LastEventTime, &out.LastEventTime
116116+ *out = (*in).DeepCopy()
117117+ }
118118+}
119119+120120+// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SpindleSetStatus.
121121+func (in *SpindleSetStatus) DeepCopy() *SpindleSetStatus {
122122+ if in == nil {
123123+ return nil
124124+ }
125125+ out := new(SpindleSetStatus)
126126+ in.DeepCopyInto(out)
127127+ return out
128128+}
129129+130130+// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
131131+func (in *SpindleTemplate) DeepCopyInto(out *SpindleTemplate) {
132132+ *out = *in
133133+ in.Resources.DeepCopyInto(&out.Resources)
134134+ if in.NodeSelector != nil {
135135+ in, out := &in.NodeSelector, &out.NodeSelector
136136+ *out = make(map[string]string, len(*in))
137137+ for key, val := range *in {
138138+ (*out)[key] = val
139139+ }
140140+ }
141141+ if in.Tolerations != nil {
142142+ in, out := &in.Tolerations, &out.Tolerations
143143+ *out = make([]v1.Toleration, len(*in))
144144+ for i := range *in {
145145+ (*in)[i].DeepCopyInto(&(*out)[i])
146146+ }
147147+ }
148148+ if in.Affinity != nil {
149149+ in, out := &in.Affinity, &out.Affinity
150150+ *out = new(v1.Affinity)
151151+ (*in).DeepCopyInto(*out)
152152+ }
153153+}
154154+155155+// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SpindleTemplate.
156156+func (in *SpindleTemplate) DeepCopy() *SpindleTemplate {
157157+ if in == nil {
158158+ return nil
159159+ }
160160+ out := new(SpindleTemplate)
161161+ in.DeepCopyInto(out)
162162+ return out
163163+}
+302
cmd/main.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package main
1818+1919+import (
2020+ "context"
2121+ "crypto/tls"
2222+ "flag"
2323+ "fmt"
2424+ "os"
2525+ "path/filepath"
2626+2727+ // Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
2828+ // to ensure that exec-entrypoint and run can make use of them.
2929+ _ "k8s.io/client-go/plugin/pkg/client/auth"
3030+3131+ "k8s.io/apimachinery/pkg/runtime"
3232+ utilruntime "k8s.io/apimachinery/pkg/util/runtime"
3333+ clientgoscheme "k8s.io/client-go/kubernetes/scheme"
3434+ ctrl "sigs.k8s.io/controller-runtime"
3535+ "sigs.k8s.io/controller-runtime/pkg/certwatcher"
3636+ "sigs.k8s.io/controller-runtime/pkg/healthz"
3737+ "sigs.k8s.io/controller-runtime/pkg/log/zap"
3838+ "sigs.k8s.io/controller-runtime/pkg/metrics/filters"
3939+ metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
4040+ "sigs.k8s.io/controller-runtime/pkg/webhook"
4141+4242+ "tangled.org/core/spindle"
4343+ "tangled.org/core/spindle/config"
4444+ "tangled.org/core/spindle/models"
4545+4646+ loomv1alpha1 "tangled.org/evan.jarrett.net/loom/api/v1alpha1"
4747+ "tangled.org/evan.jarrett.net/loom/internal/controller"
4848+ "tangled.org/evan.jarrett.net/loom/internal/engine"
4949+ // +kubebuilder:scaffold:imports
5050+)
5151+5252+var (
5353+ scheme = runtime.NewScheme()
5454+ setupLog = ctrl.Log.WithName("setup")
5555+)
5656+5757+func init() {
5858+ utilruntime.Must(clientgoscheme.AddToScheme(scheme))
5959+6060+ utilruntime.Must(loomv1alpha1.AddToScheme(scheme))
6161+ // +kubebuilder:scaffold:scheme
6262+}
6363+6464+// initializeSpindle creates a spindle server with KubernetesEngine
6565+func initializeSpindle(ctx context.Context, cfg *config.Config, mgr ctrl.Manager) (*spindle.Spindle, error) {
6666+ // Initialize Kubernetes engine
6767+ // Namespace and template will be configured by controller based on SpindleSet CRD
6868+ kubeEngine := engine.NewKubernetesEngine(mgr.GetClient(), "default", loomv1alpha1.SpindleTemplate{})
6969+7070+ // Create engines map with kubernetes engine
7171+ engines := map[string]models.Engine{
7272+ "kubernetes": kubeEngine,
7373+ }
7474+7575+ // Use spindle's New() function to create spindle server with our custom engine
7676+ s, err := spindle.New(ctx, cfg, engines)
7777+ if err != nil {
7878+ return nil, fmt.Errorf("failed to create spindle: %w", err)
7979+ }
8080+8181+ return s, nil
8282+}
8383+8484+// nolint:gocyclo
8585+func main() {
8686+ var metricsAddr string
8787+ var metricsCertPath, metricsCertName, metricsCertKey string
8888+ var webhookCertPath, webhookCertName, webhookCertKey string
8989+ var enableLeaderElection bool
9090+ var probeAddr string
9191+ var secureMetrics bool
9292+ var enableHTTP2 bool
9393+ var tlsOpts []func(*tls.Config)
9494+ flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
9595+ "Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
9696+ flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
9797+ flag.BoolVar(&enableLeaderElection, "leader-elect", false,
9898+ "Enable leader election for controller manager. "+
9999+ "Enabling this will ensure there is only one active controller manager.")
100100+ flag.BoolVar(&secureMetrics, "metrics-secure", true,
101101+ "If set, the metrics endpoint is served securely via HTTPS. Use --metrics-secure=false to use HTTP instead.")
102102+ flag.StringVar(&webhookCertPath, "webhook-cert-path", "", "The directory that contains the webhook certificate.")
103103+ flag.StringVar(&webhookCertName, "webhook-cert-name", "tls.crt", "The name of the webhook certificate file.")
104104+ flag.StringVar(&webhookCertKey, "webhook-cert-key", "tls.key", "The name of the webhook key file.")
105105+ flag.StringVar(&metricsCertPath, "metrics-cert-path", "",
106106+ "The directory that contains the metrics server certificate.")
107107+ flag.StringVar(&metricsCertName, "metrics-cert-name", "tls.crt", "The name of the metrics server certificate file.")
108108+ flag.StringVar(&metricsCertKey, "metrics-cert-key", "tls.key", "The name of the metrics server key file.")
109109+ flag.BoolVar(&enableHTTP2, "enable-http2", false,
110110+ "If set, HTTP/2 will be enabled for the metrics and webhook servers")
111111+ opts := zap.Options{
112112+ Development: true,
113113+ }
114114+ opts.BindFlags(flag.CommandLine)
115115+ flag.Parse()
116116+117117+ ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))
118118+119119+ // if the enable-http2 flag is false (the default), http/2 should be disabled
120120+ // due to its vulnerabilities. More specifically, disabling http/2 will
121121+ // prevent from being vulnerable to the HTTP/2 Stream Cancellation and
122122+ // Rapid Reset CVEs. For more information see:
123123+ // - https://github.com/advisories/GHSA-qppj-fm5r-hxr3
124124+ // - https://github.com/advisories/GHSA-4374-p667-p6c8
125125+ disableHTTP2 := func(c *tls.Config) {
126126+ setupLog.Info("disabling http/2")
127127+ c.NextProtos = []string{"http/1.1"}
128128+ }
129129+130130+ if !enableHTTP2 {
131131+ tlsOpts = append(tlsOpts, disableHTTP2)
132132+ }
133133+134134+ // Create watchers for metrics and webhooks certificates
135135+ var metricsCertWatcher, webhookCertWatcher *certwatcher.CertWatcher
136136+137137+ // Initial webhook TLS options
138138+ webhookTLSOpts := tlsOpts
139139+140140+ if len(webhookCertPath) > 0 {
141141+ setupLog.Info("Initializing webhook certificate watcher using provided certificates",
142142+ "webhook-cert-path", webhookCertPath, "webhook-cert-name", webhookCertName, "webhook-cert-key", webhookCertKey)
143143+144144+ var err error
145145+ webhookCertWatcher, err = certwatcher.New(
146146+ filepath.Join(webhookCertPath, webhookCertName),
147147+ filepath.Join(webhookCertPath, webhookCertKey),
148148+ )
149149+ if err != nil {
150150+ setupLog.Error(err, "Failed to initialize webhook certificate watcher")
151151+ os.Exit(1)
152152+ }
153153+154154+ webhookTLSOpts = append(webhookTLSOpts, func(config *tls.Config) {
155155+ config.GetCertificate = webhookCertWatcher.GetCertificate
156156+ })
157157+ }
158158+159159+ webhookServer := webhook.NewServer(webhook.Options{
160160+ TLSOpts: webhookTLSOpts,
161161+ })
162162+163163+ // Metrics endpoint is enabled in 'config/default/kustomization.yaml'. The Metrics options configure the server.
164164+ // More info:
165165+ // - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.21.0/pkg/metrics/server
166166+ // - https://book.kubebuilder.io/reference/metrics.html
167167+ metricsServerOptions := metricsserver.Options{
168168+ BindAddress: metricsAddr,
169169+ SecureServing: secureMetrics,
170170+ TLSOpts: tlsOpts,
171171+ }
172172+173173+ if secureMetrics {
174174+ // FilterProvider is used to protect the metrics endpoint with authn/authz.
175175+ // These configurations ensure that only authorized users and service accounts
176176+ // can access the metrics endpoint. The RBAC are configured in 'config/rbac/kustomization.yaml'. More info:
177177+ // https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.21.0/pkg/metrics/filters#WithAuthenticationAndAuthorization
178178+ metricsServerOptions.FilterProvider = filters.WithAuthenticationAndAuthorization
179179+ }
180180+181181+ // If the certificate is not specified, controller-runtime will automatically
182182+ // generate self-signed certificates for the metrics server. While convenient for development and testing,
183183+ // this setup is not recommended for production.
184184+ //
185185+ // TODO(user): If you enable certManager, uncomment the following lines:
186186+ // - [METRICS-WITH-CERTS] at config/default/kustomization.yaml to generate and use certificates
187187+ // managed by cert-manager for the metrics server.
188188+ // - [PROMETHEUS-WITH-CERTS] at config/prometheus/kustomization.yaml for TLS certification.
189189+ if len(metricsCertPath) > 0 {
190190+ setupLog.Info("Initializing metrics certificate watcher using provided certificates",
191191+ "metrics-cert-path", metricsCertPath, "metrics-cert-name", metricsCertName, "metrics-cert-key", metricsCertKey)
192192+193193+ var err error
194194+ metricsCertWatcher, err = certwatcher.New(
195195+ filepath.Join(metricsCertPath, metricsCertName),
196196+ filepath.Join(metricsCertPath, metricsCertKey),
197197+ )
198198+ if err != nil {
199199+ setupLog.Error(err, "to initialize metrics certificate watcher", "error", err)
200200+ os.Exit(1)
201201+ }
202202+203203+ metricsServerOptions.TLSOpts = append(metricsServerOptions.TLSOpts, func(config *tls.Config) {
204204+ config.GetCertificate = metricsCertWatcher.GetCertificate
205205+ })
206206+ }
207207+208208+ // Create context for spindle initialization
209209+ ctx := context.Background()
210210+211211+ // Load spindle configuration from environment
212212+ spindleCfg, err := config.Load(ctx)
213213+ if err != nil {
214214+ setupLog.Error(err, "failed to load spindle config")
215215+ os.Exit(1)
216216+ }
217217+218218+ mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
219219+ Scheme: scheme,
220220+ Metrics: metricsServerOptions,
221221+ WebhookServer: webhookServer,
222222+ HealthProbeBindAddress: probeAddr,
223223+ LeaderElection: enableLeaderElection,
224224+ LeaderElectionID: "d9c2f6b8.j5t.io",
225225+ // LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
226226+ // when the Manager ends. This requires the binary to immediately end when the
227227+ // Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
228228+ // speeds up voluntary leader transitions as the new leader don't have to wait
229229+ // LeaseDuration time first.
230230+ //
231231+ // In the default scaffold provided, the program ends immediately after
232232+ // the manager stops, so would be fine to enable this option. However,
233233+ // if you are doing or is intended to do any operation such as perform cleanups
234234+ // after the manager stops then its usage might be unsafe.
235235+ // LeaderElectionReleaseOnCancel: true,
236236+ })
237237+ if err != nil {
238238+ setupLog.Error(err, "unable to start manager")
239239+ os.Exit(1)
240240+ }
241241+242242+ // Initialize spindle server with KubernetesEngine
243243+ s, err := initializeSpindle(ctx, spindleCfg, mgr)
244244+ if err != nil {
245245+ setupLog.Error(err, "failed to initialize spindle")
246246+ os.Exit(1)
247247+ }
248248+ defer s.Queue().Stop()
249249+250250+ setupLog.Info("spindle server initialized successfully")
251251+252252+ // Start spindle HTTP server in background
253253+ go func() {
254254+ setupLog.Info("starting spindle HTTP server", "address", spindleCfg.Server.ListenAddr)
255255+ if err := s.Start(ctx); err != nil {
256256+ setupLog.Error(err, "spindle HTTP server error")
257257+ }
258258+ }()
259259+260260+ // Setup controller with spindle components
261261+ if err := (&controller.SpindleSetReconciler{
262262+ Client: mgr.GetClient(),
263263+ Scheme: mgr.GetScheme(),
264264+ Config: mgr.GetConfig(),
265265+ Spindle: s,
266266+ }).SetupWithManager(mgr); err != nil {
267267+ setupLog.Error(err, "unable to create controller", "controller", "SpindleSet")
268268+ os.Exit(1)
269269+ }
270270+ // +kubebuilder:scaffold:builder
271271+272272+ if metricsCertWatcher != nil {
273273+ setupLog.Info("Adding metrics certificate watcher to manager")
274274+ if err := mgr.Add(metricsCertWatcher); err != nil {
275275+ setupLog.Error(err, "unable to add metrics certificate watcher to manager")
276276+ os.Exit(1)
277277+ }
278278+ }
279279+280280+ if webhookCertWatcher != nil {
281281+ setupLog.Info("Adding webhook certificate watcher to manager")
282282+ if err := mgr.Add(webhookCertWatcher); err != nil {
283283+ setupLog.Error(err, "unable to add webhook certificate watcher to manager")
284284+ os.Exit(1)
285285+ }
286286+ }
287287+288288+ if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
289289+ setupLog.Error(err, "unable to set up health check")
290290+ os.Exit(1)
291291+ }
292292+ if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
293293+ setupLog.Error(err, "unable to set up ready check")
294294+ os.Exit(1)
295295+ }
296296+297297+ setupLog.Info("starting manager")
298298+ if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
299299+ setupLog.Error(err, "problem running manager")
300300+ os.Exit(1)
301301+ }
302302+}
+1212
config/crd/bases/loom.j5t.io_spindlesets.yaml
···11+---
22+apiVersion: apiextensions.k8s.io/v1
33+kind: CustomResourceDefinition
44+metadata:
55+ annotations:
66+ controller-gen.kubebuilder.io/version: v0.18.0
77+ name: spindlesets.loom.j5t.io
88+spec:
99+ group: loom.j5t.io
1010+ names:
1111+ kind: SpindleSet
1212+ listKind: SpindleSetList
1313+ plural: spindlesets
1414+ singular: spindleset
1515+ scope: Namespaced
1616+ versions:
1717+ - additionalPrinterColumns:
1818+ - jsonPath: .spec.knotUrl
1919+ name: Knot URL
2020+ type: string
2121+ - jsonPath: .status.webSocketConnected
2222+ name: Connected
2323+ type: boolean
2424+ - jsonPath: .status.runningJobs
2525+ name: Running
2626+ type: integer
2727+ - jsonPath: .status.completedJobs
2828+ name: Completed
2929+ type: integer
3030+ - jsonPath: .status.failedJobs
3131+ name: Failed
3232+ type: integer
3333+ - jsonPath: .metadata.creationTimestamp
3434+ name: Age
3535+ type: date
3636+ name: v1alpha1
3737+ schema:
3838+ openAPIV3Schema:
3939+ description: SpindleSet is the Schema for the spindlesets API.
4040+ properties:
4141+ apiVersion:
4242+ description: |-
4343+ APIVersion defines the versioned schema of this representation of an object.
4444+ Servers should convert recognized schemas to the latest internal value, and
4545+ may reject unrecognized values.
4646+ More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
4747+ type: string
4848+ kind:
4949+ description: |-
5050+ Kind is a string value representing the REST resource this object represents.
5151+ Servers may infer this from the endpoint the client submits requests to.
5252+ Cannot be updated.
5353+ In CamelCase.
5454+ More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
5555+ type: string
5656+ metadata:
5757+ type: object
5858+ spec:
5959+ description: SpindleSetSpec defines the desired state of SpindleSet.
6060+ properties:
6161+ knotAuthSecret:
6262+ description: |-
6363+ KnotAuthSecret is the name of the Secret containing authentication credentials for the knot.
6464+ The secret should contain a key "token" with the auth token.
6565+ type: string
6666+ knotUrl:
6767+ description: KnotUrl is the URL of the tangled.org knot to connect
6868+ to (e.g., https://tangled.org/@org/repo).
6969+ pattern: ^https?://.*
7070+ type: string
7171+ maxConcurrentJobs:
7272+ default: 10
7373+ description: |-
7474+ MaxConcurrentJobs is the maximum number of concurrent spindle jobs that can run.
7575+ Defaults to 10 if not specified.
7676+ format: int32
7777+ maximum: 100
7878+ minimum: 1
7979+ type: integer
8080+ template:
8181+ description: |-
8282+ Template is the default pod template configuration for spindle jobs.
8383+ Individual workflows can override these settings.
8484+ properties:
8585+ affinity:
8686+ description: Affinity defines scheduling constraints for spindle
8787+ job pods.
8888+ properties:
8989+ nodeAffinity:
9090+ description: Describes node affinity scheduling rules for
9191+ the pod.
9292+ properties:
9393+ preferredDuringSchedulingIgnoredDuringExecution:
9494+ description: |-
9595+ The scheduler will prefer to schedule pods to nodes that satisfy
9696+ the affinity expressions specified by this field, but it may choose
9797+ a node that violates one or more of the expressions. The node that is
9898+ most preferred is the one with the greatest sum of weights, i.e.
9999+ for each node that meets all of the scheduling requirements (resource
100100+ request, requiredDuringScheduling affinity expressions, etc.),
101101+ compute a sum by iterating through the elements of this field and adding
102102+ "weight" to the sum if the node matches the corresponding matchExpressions; the
103103+ node(s) with the highest sum are the most preferred.
104104+ items:
105105+ description: |-
106106+ An empty preferred scheduling term matches all objects with implicit weight 0
107107+ (i.e. it's a no-op). A null preferred scheduling term matches no objects (i.e. is also a no-op).
108108+ properties:
109109+ preference:
110110+ description: A node selector term, associated with
111111+ the corresponding weight.
112112+ properties:
113113+ matchExpressions:
114114+ description: A list of node selector requirements
115115+ by node's labels.
116116+ items:
117117+ description: |-
118118+ A node selector requirement is a selector that contains values, a key, and an operator
119119+ that relates the key and values.
120120+ properties:
121121+ key:
122122+ description: The label key that the selector
123123+ applies to.
124124+ type: string
125125+ operator:
126126+ description: |-
127127+ Represents a key's relationship to a set of values.
128128+ Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
129129+ type: string
130130+ values:
131131+ description: |-
132132+ An array of string values. If the operator is In or NotIn,
133133+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
134134+ the values array must be empty. If the operator is Gt or Lt, the values
135135+ array must have a single element, which will be interpreted as an integer.
136136+ This array is replaced during a strategic merge patch.
137137+ items:
138138+ type: string
139139+ type: array
140140+ x-kubernetes-list-type: atomic
141141+ required:
142142+ - key
143143+ - operator
144144+ type: object
145145+ type: array
146146+ x-kubernetes-list-type: atomic
147147+ matchFields:
148148+ description: A list of node selector requirements
149149+ by node's fields.
150150+ items:
151151+ description: |-
152152+ A node selector requirement is a selector that contains values, a key, and an operator
153153+ that relates the key and values.
154154+ properties:
155155+ key:
156156+ description: The label key that the selector
157157+ applies to.
158158+ type: string
159159+ operator:
160160+ description: |-
161161+ Represents a key's relationship to a set of values.
162162+ Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
163163+ type: string
164164+ values:
165165+ description: |-
166166+ An array of string values. If the operator is In or NotIn,
167167+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
168168+ the values array must be empty. If the operator is Gt or Lt, the values
169169+ array must have a single element, which will be interpreted as an integer.
170170+ This array is replaced during a strategic merge patch.
171171+ items:
172172+ type: string
173173+ type: array
174174+ x-kubernetes-list-type: atomic
175175+ required:
176176+ - key
177177+ - operator
178178+ type: object
179179+ type: array
180180+ x-kubernetes-list-type: atomic
181181+ type: object
182182+ x-kubernetes-map-type: atomic
183183+ weight:
184184+ description: Weight associated with matching the
185185+ corresponding nodeSelectorTerm, in the range 1-100.
186186+ format: int32
187187+ type: integer
188188+ required:
189189+ - preference
190190+ - weight
191191+ type: object
192192+ type: array
193193+ x-kubernetes-list-type: atomic
194194+ requiredDuringSchedulingIgnoredDuringExecution:
195195+ description: |-
196196+ If the affinity requirements specified by this field are not met at
197197+ scheduling time, the pod will not be scheduled onto the node.
198198+ If the affinity requirements specified by this field cease to be met
199199+ at some point during pod execution (e.g. due to an update), the system
200200+ may or may not try to eventually evict the pod from its node.
201201+ properties:
202202+ nodeSelectorTerms:
203203+ description: Required. A list of node selector terms.
204204+ The terms are ORed.
205205+ items:
206206+ description: |-
207207+ A null or empty node selector term matches no objects. The requirements of
208208+ them are ANDed.
209209+ The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
210210+ properties:
211211+ matchExpressions:
212212+ description: A list of node selector requirements
213213+ by node's labels.
214214+ items:
215215+ description: |-
216216+ A node selector requirement is a selector that contains values, a key, and an operator
217217+ that relates the key and values.
218218+ properties:
219219+ key:
220220+ description: The label key that the selector
221221+ applies to.
222222+ type: string
223223+ operator:
224224+ description: |-
225225+ Represents a key's relationship to a set of values.
226226+ Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
227227+ type: string
228228+ values:
229229+ description: |-
230230+ An array of string values. If the operator is In or NotIn,
231231+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
232232+ the values array must be empty. If the operator is Gt or Lt, the values
233233+ array must have a single element, which will be interpreted as an integer.
234234+ This array is replaced during a strategic merge patch.
235235+ items:
236236+ type: string
237237+ type: array
238238+ x-kubernetes-list-type: atomic
239239+ required:
240240+ - key
241241+ - operator
242242+ type: object
243243+ type: array
244244+ x-kubernetes-list-type: atomic
245245+ matchFields:
246246+ description: A list of node selector requirements
247247+ by node's fields.
248248+ items:
249249+ description: |-
250250+ A node selector requirement is a selector that contains values, a key, and an operator
251251+ that relates the key and values.
252252+ properties:
253253+ key:
254254+ description: The label key that the selector
255255+ applies to.
256256+ type: string
257257+ operator:
258258+ description: |-
259259+ Represents a key's relationship to a set of values.
260260+ Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
261261+ type: string
262262+ values:
263263+ description: |-
264264+ An array of string values. If the operator is In or NotIn,
265265+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
266266+ the values array must be empty. If the operator is Gt or Lt, the values
267267+ array must have a single element, which will be interpreted as an integer.
268268+ This array is replaced during a strategic merge patch.
269269+ items:
270270+ type: string
271271+ type: array
272272+ x-kubernetes-list-type: atomic
273273+ required:
274274+ - key
275275+ - operator
276276+ type: object
277277+ type: array
278278+ x-kubernetes-list-type: atomic
279279+ type: object
280280+ x-kubernetes-map-type: atomic
281281+ type: array
282282+ x-kubernetes-list-type: atomic
283283+ required:
284284+ - nodeSelectorTerms
285285+ type: object
286286+ x-kubernetes-map-type: atomic
287287+ type: object
288288+ podAffinity:
289289+ description: Describes pod affinity scheduling rules (e.g.
290290+ co-locate this pod in the same node, zone, etc. as some
291291+ other pod(s)).
292292+ properties:
293293+ preferredDuringSchedulingIgnoredDuringExecution:
294294+ description: |-
295295+ The scheduler will prefer to schedule pods to nodes that satisfy
296296+ the affinity expressions specified by this field, but it may choose
297297+ a node that violates one or more of the expressions. The node that is
298298+ most preferred is the one with the greatest sum of weights, i.e.
299299+ for each node that meets all of the scheduling requirements (resource
300300+ request, requiredDuringScheduling affinity expressions, etc.),
301301+ compute a sum by iterating through the elements of this field and adding
302302+ "weight" to the sum if the node has pods which matches the corresponding podAffinityTerm; the
303303+ node(s) with the highest sum are the most preferred.
304304+ items:
305305+ description: The weights of all of the matched WeightedPodAffinityTerm
306306+ fields are added per-node to find the most preferred
307307+ node(s)
308308+ properties:
309309+ podAffinityTerm:
310310+ description: Required. A pod affinity term, associated
311311+ with the corresponding weight.
312312+ properties:
313313+ labelSelector:
314314+ description: |-
315315+ A label query over a set of resources, in this case pods.
316316+ If it's null, this PodAffinityTerm matches with no Pods.
317317+ properties:
318318+ matchExpressions:
319319+ description: matchExpressions is a list
320320+ of label selector requirements. The requirements
321321+ are ANDed.
322322+ items:
323323+ description: |-
324324+ A label selector requirement is a selector that contains values, a key, and an operator that
325325+ relates the key and values.
326326+ properties:
327327+ key:
328328+ description: key is the label key
329329+ that the selector applies to.
330330+ type: string
331331+ operator:
332332+ description: |-
333333+ operator represents a key's relationship to a set of values.
334334+ Valid operators are In, NotIn, Exists and DoesNotExist.
335335+ type: string
336336+ values:
337337+ description: |-
338338+ values is an array of string values. If the operator is In or NotIn,
339339+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
340340+ the values array must be empty. This array is replaced during a strategic
341341+ merge patch.
342342+ items:
343343+ type: string
344344+ type: array
345345+ x-kubernetes-list-type: atomic
346346+ required:
347347+ - key
348348+ - operator
349349+ type: object
350350+ type: array
351351+ x-kubernetes-list-type: atomic
352352+ matchLabels:
353353+ additionalProperties:
354354+ type: string
355355+ description: |-
356356+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
357357+ map is equivalent to an element of matchExpressions, whose key field is "key", the
358358+ operator is "In", and the values array contains only "value". The requirements are ANDed.
359359+ type: object
360360+ type: object
361361+ x-kubernetes-map-type: atomic
362362+ matchLabelKeys:
363363+ description: |-
364364+ MatchLabelKeys is a set of pod label keys to select which pods will
365365+ be taken into consideration. The keys are used to lookup values from the
366366+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key in (value)`
367367+ to select the group of existing pods which pods will be taken into consideration
368368+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
369369+ pod labels will be ignored. The default value is empty.
370370+ The same key is forbidden to exist in both matchLabelKeys and labelSelector.
371371+ Also, matchLabelKeys cannot be set when labelSelector isn't set.
372372+ items:
373373+ type: string
374374+ type: array
375375+ x-kubernetes-list-type: atomic
376376+ mismatchLabelKeys:
377377+ description: |-
378378+ MismatchLabelKeys is a set of pod label keys to select which pods will
379379+ be taken into consideration. The keys are used to lookup values from the
380380+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key notin (value)`
381381+ to select the group of existing pods which pods will be taken into consideration
382382+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
383383+ pod labels will be ignored. The default value is empty.
384384+ The same key is forbidden to exist in both mismatchLabelKeys and labelSelector.
385385+ Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
386386+ items:
387387+ type: string
388388+ type: array
389389+ x-kubernetes-list-type: atomic
390390+ namespaceSelector:
391391+ description: |-
392392+ A label query over the set of namespaces that the term applies to.
393393+ The term is applied to the union of the namespaces selected by this field
394394+ and the ones listed in the namespaces field.
395395+ null selector and null or empty namespaces list means "this pod's namespace".
396396+ An empty selector ({}) matches all namespaces.
397397+ properties:
398398+ matchExpressions:
399399+ description: matchExpressions is a list
400400+ of label selector requirements. The requirements
401401+ are ANDed.
402402+ items:
403403+ description: |-
404404+ A label selector requirement is a selector that contains values, a key, and an operator that
405405+ relates the key and values.
406406+ properties:
407407+ key:
408408+ description: key is the label key
409409+ that the selector applies to.
410410+ type: string
411411+ operator:
412412+ description: |-
413413+ operator represents a key's relationship to a set of values.
414414+ Valid operators are In, NotIn, Exists and DoesNotExist.
415415+ type: string
416416+ values:
417417+ description: |-
418418+ values is an array of string values. If the operator is In or NotIn,
419419+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
420420+ the values array must be empty. This array is replaced during a strategic
421421+ merge patch.
422422+ items:
423423+ type: string
424424+ type: array
425425+ x-kubernetes-list-type: atomic
426426+ required:
427427+ - key
428428+ - operator
429429+ type: object
430430+ type: array
431431+ x-kubernetes-list-type: atomic
432432+ matchLabels:
433433+ additionalProperties:
434434+ type: string
435435+ description: |-
436436+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
437437+ map is equivalent to an element of matchExpressions, whose key field is "key", the
438438+ operator is "In", and the values array contains only "value". The requirements are ANDed.
439439+ type: object
440440+ type: object
441441+ x-kubernetes-map-type: atomic
442442+ namespaces:
443443+ description: |-
444444+ namespaces specifies a static list of namespace names that the term applies to.
445445+ The term is applied to the union of the namespaces listed in this field
446446+ and the ones selected by namespaceSelector.
447447+ null or empty namespaces list and null namespaceSelector means "this pod's namespace".
448448+ items:
449449+ type: string
450450+ type: array
451451+ x-kubernetes-list-type: atomic
452452+ topologyKey:
453453+ description: |-
454454+ This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching
455455+ the labelSelector in the specified namespaces, where co-located is defined as running on a node
456456+ whose value of the label with key topologyKey matches that of any node on which any of the
457457+ selected pods is running.
458458+ Empty topologyKey is not allowed.
459459+ type: string
460460+ required:
461461+ - topologyKey
462462+ type: object
463463+ weight:
464464+ description: |-
465465+ weight associated with matching the corresponding podAffinityTerm,
466466+ in the range 1-100.
467467+ format: int32
468468+ type: integer
469469+ required:
470470+ - podAffinityTerm
471471+ - weight
472472+ type: object
473473+ type: array
474474+ x-kubernetes-list-type: atomic
475475+ requiredDuringSchedulingIgnoredDuringExecution:
476476+ description: |-
477477+ If the affinity requirements specified by this field are not met at
478478+ scheduling time, the pod will not be scheduled onto the node.
479479+ If the affinity requirements specified by this field cease to be met
480480+ at some point during pod execution (e.g. due to a pod label update), the
481481+ system may or may not try to eventually evict the pod from its node.
482482+ When there are multiple elements, the lists of nodes corresponding to each
483483+ podAffinityTerm are intersected, i.e. all terms must be satisfied.
484484+ items:
485485+ description: |-
486486+ Defines a set of pods (namely those matching the labelSelector
487487+ relative to the given namespace(s)) that this pod should be
488488+ co-located (affinity) or not co-located (anti-affinity) with,
489489+ where co-located is defined as running on a node whose value of
490490+ the label with key <topologyKey> matches that of any node on which
491491+ a pod of the set of pods is running
492492+ properties:
493493+ labelSelector:
494494+ description: |-
495495+ A label query over a set of resources, in this case pods.
496496+ If it's null, this PodAffinityTerm matches with no Pods.
497497+ properties:
498498+ matchExpressions:
499499+ description: matchExpressions is a list of label
500500+ selector requirements. The requirements are
501501+ ANDed.
502502+ items:
503503+ description: |-
504504+ A label selector requirement is a selector that contains values, a key, and an operator that
505505+ relates the key and values.
506506+ properties:
507507+ key:
508508+ description: key is the label key that
509509+ the selector applies to.
510510+ type: string
511511+ operator:
512512+ description: |-
513513+ operator represents a key's relationship to a set of values.
514514+ Valid operators are In, NotIn, Exists and DoesNotExist.
515515+ type: string
516516+ values:
517517+ description: |-
518518+ values is an array of string values. If the operator is In or NotIn,
519519+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
520520+ the values array must be empty. This array is replaced during a strategic
521521+ merge patch.
522522+ items:
523523+ type: string
524524+ type: array
525525+ x-kubernetes-list-type: atomic
526526+ required:
527527+ - key
528528+ - operator
529529+ type: object
530530+ type: array
531531+ x-kubernetes-list-type: atomic
532532+ matchLabels:
533533+ additionalProperties:
534534+ type: string
535535+ description: |-
536536+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
537537+ map is equivalent to an element of matchExpressions, whose key field is "key", the
538538+ operator is "In", and the values array contains only "value". The requirements are ANDed.
539539+ type: object
540540+ type: object
541541+ x-kubernetes-map-type: atomic
542542+ matchLabelKeys:
543543+ description: |-
544544+ MatchLabelKeys is a set of pod label keys to select which pods will
545545+ be taken into consideration. The keys are used to lookup values from the
546546+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key in (value)`
547547+ to select the group of existing pods which pods will be taken into consideration
548548+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
549549+ pod labels will be ignored. The default value is empty.
550550+ The same key is forbidden to exist in both matchLabelKeys and labelSelector.
551551+ Also, matchLabelKeys cannot be set when labelSelector isn't set.
552552+ items:
553553+ type: string
554554+ type: array
555555+ x-kubernetes-list-type: atomic
556556+ mismatchLabelKeys:
557557+ description: |-
558558+ MismatchLabelKeys is a set of pod label keys to select which pods will
559559+ be taken into consideration. The keys are used to lookup values from the
560560+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key notin (value)`
561561+ to select the group of existing pods which pods will be taken into consideration
562562+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
563563+ pod labels will be ignored. The default value is empty.
564564+ The same key is forbidden to exist in both mismatchLabelKeys and labelSelector.
565565+ Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
566566+ items:
567567+ type: string
568568+ type: array
569569+ x-kubernetes-list-type: atomic
570570+ namespaceSelector:
571571+ description: |-
572572+ A label query over the set of namespaces that the term applies to.
573573+ The term is applied to the union of the namespaces selected by this field
574574+ and the ones listed in the namespaces field.
575575+ null selector and null or empty namespaces list means "this pod's namespace".
576576+ An empty selector ({}) matches all namespaces.
577577+ properties:
578578+ matchExpressions:
579579+ description: matchExpressions is a list of label
580580+ selector requirements. The requirements are
581581+ ANDed.
582582+ items:
583583+ description: |-
584584+ A label selector requirement is a selector that contains values, a key, and an operator that
585585+ relates the key and values.
586586+ properties:
587587+ key:
588588+ description: key is the label key that
589589+ the selector applies to.
590590+ type: string
591591+ operator:
592592+ description: |-
593593+ operator represents a key's relationship to a set of values.
594594+ Valid operators are In, NotIn, Exists and DoesNotExist.
595595+ type: string
596596+ values:
597597+ description: |-
598598+ values is an array of string values. If the operator is In or NotIn,
599599+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
600600+ the values array must be empty. This array is replaced during a strategic
601601+ merge patch.
602602+ items:
603603+ type: string
604604+ type: array
605605+ x-kubernetes-list-type: atomic
606606+ required:
607607+ - key
608608+ - operator
609609+ type: object
610610+ type: array
611611+ x-kubernetes-list-type: atomic
612612+ matchLabels:
613613+ additionalProperties:
614614+ type: string
615615+ description: |-
616616+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
617617+ map is equivalent to an element of matchExpressions, whose key field is "key", the
618618+ operator is "In", and the values array contains only "value". The requirements are ANDed.
619619+ type: object
620620+ type: object
621621+ x-kubernetes-map-type: atomic
622622+ namespaces:
623623+ description: |-
624624+ namespaces specifies a static list of namespace names that the term applies to.
625625+ The term is applied to the union of the namespaces listed in this field
626626+ and the ones selected by namespaceSelector.
627627+ null or empty namespaces list and null namespaceSelector means "this pod's namespace".
628628+ items:
629629+ type: string
630630+ type: array
631631+ x-kubernetes-list-type: atomic
632632+ topologyKey:
633633+ description: |-
634634+ This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching
635635+ the labelSelector in the specified namespaces, where co-located is defined as running on a node
636636+ whose value of the label with key topologyKey matches that of any node on which any of the
637637+ selected pods is running.
638638+ Empty topologyKey is not allowed.
639639+ type: string
640640+ required:
641641+ - topologyKey
642642+ type: object
643643+ type: array
644644+ x-kubernetes-list-type: atomic
645645+ type: object
646646+ podAntiAffinity:
647647+ description: Describes pod anti-affinity scheduling rules
648648+ (e.g. avoid putting this pod in the same node, zone, etc.
649649+ as some other pod(s)).
650650+ properties:
651651+ preferredDuringSchedulingIgnoredDuringExecution:
652652+ description: |-
653653+ The scheduler will prefer to schedule pods to nodes that satisfy
654654+ the anti-affinity expressions specified by this field, but it may choose
655655+ a node that violates one or more of the expressions. The node that is
656656+ most preferred is the one with the greatest sum of weights, i.e.
657657+ for each node that meets all of the scheduling requirements (resource
658658+ request, requiredDuringScheduling anti-affinity expressions, etc.),
659659+ compute a sum by iterating through the elements of this field and adding
660660+ "weight" to the sum if the node has pods which matches the corresponding podAffinityTerm; the
661661+ node(s) with the highest sum are the most preferred.
662662+ items:
663663+ description: The weights of all of the matched WeightedPodAffinityTerm
664664+ fields are added per-node to find the most preferred
665665+ node(s)
666666+ properties:
667667+ podAffinityTerm:
668668+ description: Required. A pod affinity term, associated
669669+ with the corresponding weight.
670670+ properties:
671671+ labelSelector:
672672+ description: |-
673673+ A label query over a set of resources, in this case pods.
674674+ If it's null, this PodAffinityTerm matches with no Pods.
675675+ properties:
676676+ matchExpressions:
677677+ description: matchExpressions is a list
678678+ of label selector requirements. The requirements
679679+ are ANDed.
680680+ items:
681681+ description: |-
682682+ A label selector requirement is a selector that contains values, a key, and an operator that
683683+ relates the key and values.
684684+ properties:
685685+ key:
686686+ description: key is the label key
687687+ that the selector applies to.
688688+ type: string
689689+ operator:
690690+ description: |-
691691+ operator represents a key's relationship to a set of values.
692692+ Valid operators are In, NotIn, Exists and DoesNotExist.
693693+ type: string
694694+ values:
695695+ description: |-
696696+ values is an array of string values. If the operator is In or NotIn,
697697+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
698698+ the values array must be empty. This array is replaced during a strategic
699699+ merge patch.
700700+ items:
701701+ type: string
702702+ type: array
703703+ x-kubernetes-list-type: atomic
704704+ required:
705705+ - key
706706+ - operator
707707+ type: object
708708+ type: array
709709+ x-kubernetes-list-type: atomic
710710+ matchLabels:
711711+ additionalProperties:
712712+ type: string
713713+ description: |-
714714+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
715715+ map is equivalent to an element of matchExpressions, whose key field is "key", the
716716+ operator is "In", and the values array contains only "value". The requirements are ANDed.
717717+ type: object
718718+ type: object
719719+ x-kubernetes-map-type: atomic
720720+ matchLabelKeys:
721721+ description: |-
722722+ MatchLabelKeys is a set of pod label keys to select which pods will
723723+ be taken into consideration. The keys are used to lookup values from the
724724+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key in (value)`
725725+ to select the group of existing pods which pods will be taken into consideration
726726+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
727727+ pod labels will be ignored. The default value is empty.
728728+ The same key is forbidden to exist in both matchLabelKeys and labelSelector.
729729+ Also, matchLabelKeys cannot be set when labelSelector isn't set.
730730+ items:
731731+ type: string
732732+ type: array
733733+ x-kubernetes-list-type: atomic
734734+ mismatchLabelKeys:
735735+ description: |-
736736+ MismatchLabelKeys is a set of pod label keys to select which pods will
737737+ be taken into consideration. The keys are used to lookup values from the
738738+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key notin (value)`
739739+ to select the group of existing pods which pods will be taken into consideration
740740+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
741741+ pod labels will be ignored. The default value is empty.
742742+ The same key is forbidden to exist in both mismatchLabelKeys and labelSelector.
743743+ Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
744744+ items:
745745+ type: string
746746+ type: array
747747+ x-kubernetes-list-type: atomic
748748+ namespaceSelector:
749749+ description: |-
750750+ A label query over the set of namespaces that the term applies to.
751751+ The term is applied to the union of the namespaces selected by this field
752752+ and the ones listed in the namespaces field.
753753+ null selector and null or empty namespaces list means "this pod's namespace".
754754+ An empty selector ({}) matches all namespaces.
755755+ properties:
756756+ matchExpressions:
757757+ description: matchExpressions is a list
758758+ of label selector requirements. The requirements
759759+ are ANDed.
760760+ items:
761761+ description: |-
762762+ A label selector requirement is a selector that contains values, a key, and an operator that
763763+ relates the key and values.
764764+ properties:
765765+ key:
766766+ description: key is the label key
767767+ that the selector applies to.
768768+ type: string
769769+ operator:
770770+ description: |-
771771+ operator represents a key's relationship to a set of values.
772772+ Valid operators are In, NotIn, Exists and DoesNotExist.
773773+ type: string
774774+ values:
775775+ description: |-
776776+ values is an array of string values. If the operator is In or NotIn,
777777+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
778778+ the values array must be empty. This array is replaced during a strategic
779779+ merge patch.
780780+ items:
781781+ type: string
782782+ type: array
783783+ x-kubernetes-list-type: atomic
784784+ required:
785785+ - key
786786+ - operator
787787+ type: object
788788+ type: array
789789+ x-kubernetes-list-type: atomic
790790+ matchLabels:
791791+ additionalProperties:
792792+ type: string
793793+ description: |-
794794+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
795795+ map is equivalent to an element of matchExpressions, whose key field is "key", the
796796+ operator is "In", and the values array contains only "value". The requirements are ANDed.
797797+ type: object
798798+ type: object
799799+ x-kubernetes-map-type: atomic
800800+ namespaces:
801801+ description: |-
802802+ namespaces specifies a static list of namespace names that the term applies to.
803803+ The term is applied to the union of the namespaces listed in this field
804804+ and the ones selected by namespaceSelector.
805805+ null or empty namespaces list and null namespaceSelector means "this pod's namespace".
806806+ items:
807807+ type: string
808808+ type: array
809809+ x-kubernetes-list-type: atomic
810810+ topologyKey:
811811+ description: |-
812812+ This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching
813813+ the labelSelector in the specified namespaces, where co-located is defined as running on a node
814814+ whose value of the label with key topologyKey matches that of any node on which any of the
815815+ selected pods is running.
816816+ Empty topologyKey is not allowed.
817817+ type: string
818818+ required:
819819+ - topologyKey
820820+ type: object
821821+ weight:
822822+ description: |-
823823+ weight associated with matching the corresponding podAffinityTerm,
824824+ in the range 1-100.
825825+ format: int32
826826+ type: integer
827827+ required:
828828+ - podAffinityTerm
829829+ - weight
830830+ type: object
831831+ type: array
832832+ x-kubernetes-list-type: atomic
833833+ requiredDuringSchedulingIgnoredDuringExecution:
834834+ description: |-
835835+ If the anti-affinity requirements specified by this field are not met at
836836+ scheduling time, the pod will not be scheduled onto the node.
837837+ If the anti-affinity requirements specified by this field cease to be met
838838+ at some point during pod execution (e.g. due to a pod label update), the
839839+ system may or may not try to eventually evict the pod from its node.
840840+ When there are multiple elements, the lists of nodes corresponding to each
841841+ podAffinityTerm are intersected, i.e. all terms must be satisfied.
842842+ items:
843843+ description: |-
844844+ Defines a set of pods (namely those matching the labelSelector
845845+ relative to the given namespace(s)) that this pod should be
846846+ co-located (affinity) or not co-located (anti-affinity) with,
847847+ where co-located is defined as running on a node whose value of
848848+ the label with key <topologyKey> matches that of any node on which
849849+ a pod of the set of pods is running
850850+ properties:
851851+ labelSelector:
852852+ description: |-
853853+ A label query over a set of resources, in this case pods.
854854+ If it's null, this PodAffinityTerm matches with no Pods.
855855+ properties:
856856+ matchExpressions:
857857+ description: matchExpressions is a list of label
858858+ selector requirements. The requirements are
859859+ ANDed.
860860+ items:
861861+ description: |-
862862+ A label selector requirement is a selector that contains values, a key, and an operator that
863863+ relates the key and values.
864864+ properties:
865865+ key:
866866+ description: key is the label key that
867867+ the selector applies to.
868868+ type: string
869869+ operator:
870870+ description: |-
871871+ operator represents a key's relationship to a set of values.
872872+ Valid operators are In, NotIn, Exists and DoesNotExist.
873873+ type: string
874874+ values:
875875+ description: |-
876876+ values is an array of string values. If the operator is In or NotIn,
877877+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
878878+ the values array must be empty. This array is replaced during a strategic
879879+ merge patch.
880880+ items:
881881+ type: string
882882+ type: array
883883+ x-kubernetes-list-type: atomic
884884+ required:
885885+ - key
886886+ - operator
887887+ type: object
888888+ type: array
889889+ x-kubernetes-list-type: atomic
890890+ matchLabels:
891891+ additionalProperties:
892892+ type: string
893893+ description: |-
894894+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
895895+ map is equivalent to an element of matchExpressions, whose key field is "key", the
896896+ operator is "In", and the values array contains only "value". The requirements are ANDed.
897897+ type: object
898898+ type: object
899899+ x-kubernetes-map-type: atomic
900900+ matchLabelKeys:
901901+ description: |-
902902+ MatchLabelKeys is a set of pod label keys to select which pods will
903903+ be taken into consideration. The keys are used to lookup values from the
904904+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key in (value)`
905905+ to select the group of existing pods which pods will be taken into consideration
906906+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
907907+ pod labels will be ignored. The default value is empty.
908908+ The same key is forbidden to exist in both matchLabelKeys and labelSelector.
909909+ Also, matchLabelKeys cannot be set when labelSelector isn't set.
910910+ items:
911911+ type: string
912912+ type: array
913913+ x-kubernetes-list-type: atomic
914914+ mismatchLabelKeys:
915915+ description: |-
916916+ MismatchLabelKeys is a set of pod label keys to select which pods will
917917+ be taken into consideration. The keys are used to lookup values from the
918918+ incoming pod labels, those key-value labels are merged with `labelSelector` as `key notin (value)`
919919+ to select the group of existing pods which pods will be taken into consideration
920920+ for the incoming pod's pod (anti) affinity. Keys that don't exist in the incoming
921921+ pod labels will be ignored. The default value is empty.
922922+ The same key is forbidden to exist in both mismatchLabelKeys and labelSelector.
923923+ Also, mismatchLabelKeys cannot be set when labelSelector isn't set.
924924+ items:
925925+ type: string
926926+ type: array
927927+ x-kubernetes-list-type: atomic
928928+ namespaceSelector:
929929+ description: |-
930930+ A label query over the set of namespaces that the term applies to.
931931+ The term is applied to the union of the namespaces selected by this field
932932+ and the ones listed in the namespaces field.
933933+ null selector and null or empty namespaces list means "this pod's namespace".
934934+ An empty selector ({}) matches all namespaces.
935935+ properties:
936936+ matchExpressions:
937937+ description: matchExpressions is a list of label
938938+ selector requirements. The requirements are
939939+ ANDed.
940940+ items:
941941+ description: |-
942942+ A label selector requirement is a selector that contains values, a key, and an operator that
943943+ relates the key and values.
944944+ properties:
945945+ key:
946946+ description: key is the label key that
947947+ the selector applies to.
948948+ type: string
949949+ operator:
950950+ description: |-
951951+ operator represents a key's relationship to a set of values.
952952+ Valid operators are In, NotIn, Exists and DoesNotExist.
953953+ type: string
954954+ values:
955955+ description: |-
956956+ values is an array of string values. If the operator is In or NotIn,
957957+ the values array must be non-empty. If the operator is Exists or DoesNotExist,
958958+ the values array must be empty. This array is replaced during a strategic
959959+ merge patch.
960960+ items:
961961+ type: string
962962+ type: array
963963+ x-kubernetes-list-type: atomic
964964+ required:
965965+ - key
966966+ - operator
967967+ type: object
968968+ type: array
969969+ x-kubernetes-list-type: atomic
970970+ matchLabels:
971971+ additionalProperties:
972972+ type: string
973973+ description: |-
974974+ matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
975975+ map is equivalent to an element of matchExpressions, whose key field is "key", the
976976+ operator is "In", and the values array contains only "value". The requirements are ANDed.
977977+ type: object
978978+ type: object
979979+ x-kubernetes-map-type: atomic
980980+ namespaces:
981981+ description: |-
982982+ namespaces specifies a static list of namespace names that the term applies to.
983983+ The term is applied to the union of the namespaces listed in this field
984984+ and the ones selected by namespaceSelector.
985985+ null or empty namespaces list and null namespaceSelector means "this pod's namespace".
986986+ items:
987987+ type: string
988988+ type: array
989989+ x-kubernetes-list-type: atomic
990990+ topologyKey:
991991+ description: |-
992992+ This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching
993993+ the labelSelector in the specified namespaces, where co-located is defined as running on a node
994994+ whose value of the label with key topologyKey matches that of any node on which any of the
995995+ selected pods is running.
996996+ Empty topologyKey is not allowed.
997997+ type: string
998998+ required:
999999+ - topologyKey
10001000+ type: object
10011001+ type: array
10021002+ x-kubernetes-list-type: atomic
10031003+ type: object
10041004+ type: object
10051005+ nodeSelector:
10061006+ additionalProperties:
10071007+ type: string
10081008+ description: NodeSelector is a selector which must be true for
10091009+ the pod to fit on a node.
10101010+ type: object
10111011+ resources:
10121012+ description: Resources defines the compute resource requirements
10131013+ for spindle jobs.
10141014+ properties:
10151015+ claims:
10161016+ description: |-
10171017+ Claims lists the names of resources, defined in spec.resourceClaims,
10181018+ that are used by this container.
10191019+10201020+ This is an alpha field and requires enabling the
10211021+ DynamicResourceAllocation feature gate.
10221022+10231023+ This field is immutable. It can only be set for containers.
10241024+ items:
10251025+ description: ResourceClaim references one entry in PodSpec.ResourceClaims.
10261026+ properties:
10271027+ name:
10281028+ description: |-
10291029+ Name must match the name of one entry in pod.spec.resourceClaims of
10301030+ the Pod where this field is used. It makes that resource available
10311031+ inside a container.
10321032+ type: string
10331033+ request:
10341034+ description: |-
10351035+ Request is the name chosen for a request in the referenced claim.
10361036+ If empty, everything from the claim is made available, otherwise
10371037+ only the result of this request.
10381038+ type: string
10391039+ required:
10401040+ - name
10411041+ type: object
10421042+ type: array
10431043+ x-kubernetes-list-map-keys:
10441044+ - name
10451045+ x-kubernetes-list-type: map
10461046+ limits:
10471047+ additionalProperties:
10481048+ anyOf:
10491049+ - type: integer
10501050+ - type: string
10511051+ pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
10521052+ x-kubernetes-int-or-string: true
10531053+ description: |-
10541054+ Limits describes the maximum amount of compute resources allowed.
10551055+ More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
10561056+ type: object
10571057+ requests:
10581058+ additionalProperties:
10591059+ anyOf:
10601060+ - type: integer
10611061+ - type: string
10621062+ pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
10631063+ x-kubernetes-int-or-string: true
10641064+ description: |-
10651065+ Requests describes the minimum amount of compute resources required.
10661066+ If Requests is omitted for a container, it defaults to Limits if that is explicitly specified,
10671067+ otherwise to an implementation-defined value. Requests cannot exceed Limits.
10681068+ More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
10691069+ type: object
10701070+ type: object
10711071+ tolerations:
10721072+ description: Tolerations allows pods to schedule onto nodes with
10731073+ matching taints.
10741074+ items:
10751075+ description: |-
10761076+ The pod this Toleration is attached to tolerates any taint that matches
10771077+ the triple <key,value,effect> using the matching operator <operator>.
10781078+ properties:
10791079+ effect:
10801080+ description: |-
10811081+ Effect indicates the taint effect to match. Empty means match all taint effects.
10821082+ When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
10831083+ type: string
10841084+ key:
10851085+ description: |-
10861086+ Key is the taint key that the toleration applies to. Empty means match all taint keys.
10871087+ If the key is empty, operator must be Exists; this combination means to match all values and all keys.
10881088+ type: string
10891089+ operator:
10901090+ description: |-
10911091+ Operator represents a key's relationship to the value.
10921092+ Valid operators are Exists and Equal. Defaults to Equal.
10931093+ Exists is equivalent to wildcard for value, so that a pod can
10941094+ tolerate all taints of a particular category.
10951095+ type: string
10961096+ tolerationSeconds:
10971097+ description: |-
10981098+ TolerationSeconds represents the period of time the toleration (which must be
10991099+ of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
11001100+ it is not set, which means tolerate the taint forever (do not evict). Zero and
11011101+ negative values will be treated as 0 (evict immediately) by the system.
11021102+ format: int64
11031103+ type: integer
11041104+ value:
11051105+ description: |-
11061106+ Value is the taint value the toleration matches to.
11071107+ If the operator is Exists, the value should be empty, otherwise just a regular string.
11081108+ type: string
11091109+ type: object
11101110+ type: array
11111111+ type: object
11121112+ required:
11131113+ - knotAuthSecret
11141114+ - knotUrl
11151115+ type: object
11161116+ status:
11171117+ description: SpindleSetStatus defines the observed state of SpindleSet.
11181118+ properties:
11191119+ completedJobs:
11201120+ description: CompletedJobs is the total number of spindle jobs that
11211121+ have completed successfully.
11221122+ format: int32
11231123+ type: integer
11241124+ conditions:
11251125+ description: Conditions represent the latest available observations
11261126+ of the SpindleSet's state.
11271127+ items:
11281128+ description: Condition contains details for one aspect of the current
11291129+ state of this API Resource.
11301130+ properties:
11311131+ lastTransitionTime:
11321132+ description: |-
11331133+ lastTransitionTime is the last time the condition transitioned from one status to another.
11341134+ This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
11351135+ format: date-time
11361136+ type: string
11371137+ message:
11381138+ description: |-
11391139+ message is a human readable message indicating details about the transition.
11401140+ This may be an empty string.
11411141+ maxLength: 32768
11421142+ type: string
11431143+ observedGeneration:
11441144+ description: |-
11451145+ observedGeneration represents the .metadata.generation that the condition was set based upon.
11461146+ For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
11471147+ with respect to the current state of the instance.
11481148+ format: int64
11491149+ minimum: 0
11501150+ type: integer
11511151+ reason:
11521152+ description: |-
11531153+ reason contains a programmatic identifier indicating the reason for the condition's last transition.
11541154+ Producers of specific condition types may define expected values and meanings for this field,
11551155+ and whether the values are considered a guaranteed API.
11561156+ The value should be a CamelCase string.
11571157+ This field may not be empty.
11581158+ maxLength: 1024
11591159+ minLength: 1
11601160+ pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
11611161+ type: string
11621162+ status:
11631163+ description: status of the condition, one of True, False, Unknown.
11641164+ enum:
11651165+ - "True"
11661166+ - "False"
11671167+ - Unknown
11681168+ type: string
11691169+ type:
11701170+ description: type of condition in CamelCase or in foo.example.com/CamelCase.
11711171+ maxLength: 316
11721172+ pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
11731173+ type: string
11741174+ required:
11751175+ - lastTransitionTime
11761176+ - message
11771177+ - reason
11781178+ - status
11791179+ - type
11801180+ type: object
11811181+ type: array
11821182+ x-kubernetes-list-map-keys:
11831183+ - type
11841184+ x-kubernetes-list-type: map
11851185+ failedJobs:
11861186+ description: FailedJobs is the total number of spindle jobs that have
11871187+ failed.
11881188+ format: int32
11891189+ type: integer
11901190+ lastEventTime:
11911191+ description: LastEventTime is the timestamp of the last pipeline event
11921192+ received from the knot.
11931193+ format: date-time
11941194+ type: string
11951195+ pendingJobs:
11961196+ description: PendingJobs is the number of spindle jobs currently pending.
11971197+ format: int32
11981198+ type: integer
11991199+ runningJobs:
12001200+ description: RunningJobs is the number of spindle jobs currently running.
12011201+ format: int32
12021202+ type: integer
12031203+ webSocketConnected:
12041204+ description: WebSocketConnected indicates whether the WebSocket connection
12051205+ to the knot is active.
12061206+ type: boolean
12071207+ type: object
12081208+ type: object
12091209+ served: true
12101210+ storage: true
12111211+ subresources:
12121212+ status: {}
+16
config/crd/kustomization.yaml
···11+# This kustomization.yaml is not intended to be run by itself,
22+# since it depends on service name and namespace that are out of this kustomize package.
33+# It should be run by config/default
44+resources:
55+- bases/loom.j5t.io_spindlesets.yaml
66+# +kubebuilder:scaffold:crdkustomizeresource
77+88+patches:
99+# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix.
1010+# patches here are for enabling the conversion webhook for each CRD
1111+# +kubebuilder:scaffold:crdkustomizewebhookpatch
1212+1313+# [WEBHOOK] To enable webhook, uncomment the following section
1414+# the following config is for teaching kustomize how to do kustomization for CRDs.
1515+#configurations:
1616+#- kustomizeconfig.yaml
+19
config/crd/kustomizeconfig.yaml
···11+# This file is for teaching kustomize how to substitute name and namespace reference in CRD
22+nameReference:
33+- kind: Service
44+ version: v1
55+ fieldSpecs:
66+ - kind: CustomResourceDefinition
77+ version: v1
88+ group: apiextensions.k8s.io
99+ path: spec/conversion/webhook/clientConfig/service/name
1010+1111+namespace:
1212+- kind: CustomResourceDefinition
1313+ version: v1
1414+ group: apiextensions.k8s.io
1515+ path: spec/conversion/webhook/clientConfig/service/namespace
1616+ create: false
1717+1818+varReference:
1919+- path: metadata/annotations
+30
config/default/cert_metrics_manager_patch.yaml
···11+# This patch adds the args, volumes, and ports to allow the manager to use the metrics-server certs.
22+33+# Add the volumeMount for the metrics-server certs
44+- op: add
55+ path: /spec/template/spec/containers/0/volumeMounts/-
66+ value:
77+ mountPath: /tmp/k8s-metrics-server/metrics-certs
88+ name: metrics-certs
99+ readOnly: true
1010+1111+# Add the --metrics-cert-path argument for the metrics server
1212+- op: add
1313+ path: /spec/template/spec/containers/0/args/-
1414+ value: --metrics-cert-path=/tmp/k8s-metrics-server/metrics-certs
1515+1616+# Add the metrics-server certs volume configuration
1717+- op: add
1818+ path: /spec/template/spec/volumes/-
1919+ value:
2020+ name: metrics-certs
2121+ secret:
2222+ secretName: metrics-server-cert
2323+ optional: false
2424+ items:
2525+ - key: ca.crt
2626+ path: ca.crt
2727+ - key: tls.crt
2828+ path: tls.crt
2929+ - key: tls.key
3030+ path: tls.key
+234
config/default/kustomization.yaml
···11+# Adds namespace to all resources.
22+namespace: loom-system
33+44+# Value of this field is prepended to the
55+# names of all resources, e.g. a deployment named
66+# "wordpress" becomes "alices-wordpress".
77+# Note that it should also match with the prefix (text before '-') of the namespace
88+# field above.
99+namePrefix: loom-
1010+1111+# Labels to add to all resources and selectors.
1212+#labels:
1313+#- includeSelectors: true
1414+# pairs:
1515+# someName: someValue
1616+1717+resources:
1818+- ../crd
1919+- ../rbac
2020+- ../manager
2121+# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
2222+# crd/kustomization.yaml
2323+#- ../webhook
2424+# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
2525+#- ../certmanager
2626+# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
2727+#- ../prometheus
2828+# [METRICS] Expose the controller manager metrics service.
2929+- metrics_service.yaml
3030+# [NETWORK POLICY] Protect the /metrics endpoint and Webhook Server with NetworkPolicy.
3131+# Only Pod(s) running a namespace labeled with 'metrics: enabled' will be able to gather the metrics.
3232+# Only CR(s) which requires webhooks and are applied on namespaces labeled with 'webhooks: enabled' will
3333+# be able to communicate with the Webhook Server.
3434+#- ../network-policy
3535+3636+# Uncomment the patches line if you enable Metrics
3737+patches:
3838+# [METRICS] The following patch will enable the metrics endpoint using HTTPS and the port :8443.
3939+# More info: https://book.kubebuilder.io/reference/metrics
4040+- path: manager_metrics_patch.yaml
4141+ target:
4242+ kind: Deployment
4343+4444+# Uncomment the patches line if you enable Metrics and CertManager
4545+# [METRICS-WITH-CERTS] To enable metrics protected with certManager, uncomment the following line.
4646+# This patch will protect the metrics with certManager self-signed certs.
4747+#- path: cert_metrics_manager_patch.yaml
4848+# target:
4949+# kind: Deployment
5050+5151+# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
5252+# crd/kustomization.yaml
5353+#- path: manager_webhook_patch.yaml
5454+# target:
5555+# kind: Deployment
5656+5757+# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
5858+# Uncomment the following replacements to add the cert-manager CA injection annotations
5959+#replacements:
6060+# - source: # Uncomment the following block to enable certificates for metrics
6161+# kind: Service
6262+# version: v1
6363+# name: controller-manager-metrics-service
6464+# fieldPath: metadata.name
6565+# targets:
6666+# - select:
6767+# kind: Certificate
6868+# group: cert-manager.io
6969+# version: v1
7070+# name: metrics-certs
7171+# fieldPaths:
7272+# - spec.dnsNames.0
7373+# - spec.dnsNames.1
7474+# options:
7575+# delimiter: '.'
7676+# index: 0
7777+# create: true
7878+# - select: # Uncomment the following to set the Service name for TLS config in Prometheus ServiceMonitor
7979+# kind: ServiceMonitor
8080+# group: monitoring.coreos.com
8181+# version: v1
8282+# name: controller-manager-metrics-monitor
8383+# fieldPaths:
8484+# - spec.endpoints.0.tlsConfig.serverName
8585+# options:
8686+# delimiter: '.'
8787+# index: 0
8888+# create: true
8989+#
9090+# - source:
9191+# kind: Service
9292+# version: v1
9393+# name: controller-manager-metrics-service
9494+# fieldPath: metadata.namespace
9595+# targets:
9696+# - select:
9797+# kind: Certificate
9898+# group: cert-manager.io
9999+# version: v1
100100+# name: metrics-certs
101101+# fieldPaths:
102102+# - spec.dnsNames.0
103103+# - spec.dnsNames.1
104104+# options:
105105+# delimiter: '.'
106106+# index: 1
107107+# create: true
108108+# - select: # Uncomment the following to set the Service namespace for TLS in Prometheus ServiceMonitor
109109+# kind: ServiceMonitor
110110+# group: monitoring.coreos.com
111111+# version: v1
112112+# name: controller-manager-metrics-monitor
113113+# fieldPaths:
114114+# - spec.endpoints.0.tlsConfig.serverName
115115+# options:
116116+# delimiter: '.'
117117+# index: 1
118118+# create: true
119119+#
120120+# - source: # Uncomment the following block if you have any webhook
121121+# kind: Service
122122+# version: v1
123123+# name: webhook-service
124124+# fieldPath: .metadata.name # Name of the service
125125+# targets:
126126+# - select:
127127+# kind: Certificate
128128+# group: cert-manager.io
129129+# version: v1
130130+# name: serving-cert
131131+# fieldPaths:
132132+# - .spec.dnsNames.0
133133+# - .spec.dnsNames.1
134134+# options:
135135+# delimiter: '.'
136136+# index: 0
137137+# create: true
138138+# - source:
139139+# kind: Service
140140+# version: v1
141141+# name: webhook-service
142142+# fieldPath: .metadata.namespace # Namespace of the service
143143+# targets:
144144+# - select:
145145+# kind: Certificate
146146+# group: cert-manager.io
147147+# version: v1
148148+# name: serving-cert
149149+# fieldPaths:
150150+# - .spec.dnsNames.0
151151+# - .spec.dnsNames.1
152152+# options:
153153+# delimiter: '.'
154154+# index: 1
155155+# create: true
156156+#
157157+# - source: # Uncomment the following block if you have a ValidatingWebhook (--programmatic-validation)
158158+# kind: Certificate
159159+# group: cert-manager.io
160160+# version: v1
161161+# name: serving-cert # This name should match the one in certificate.yaml
162162+# fieldPath: .metadata.namespace # Namespace of the certificate CR
163163+# targets:
164164+# - select:
165165+# kind: ValidatingWebhookConfiguration
166166+# fieldPaths:
167167+# - .metadata.annotations.[cert-manager.io/inject-ca-from]
168168+# options:
169169+# delimiter: '/'
170170+# index: 0
171171+# create: true
172172+# - source:
173173+# kind: Certificate
174174+# group: cert-manager.io
175175+# version: v1
176176+# name: serving-cert
177177+# fieldPath: .metadata.name
178178+# targets:
179179+# - select:
180180+# kind: ValidatingWebhookConfiguration
181181+# fieldPaths:
182182+# - .metadata.annotations.[cert-manager.io/inject-ca-from]
183183+# options:
184184+# delimiter: '/'
185185+# index: 1
186186+# create: true
187187+#
188188+# - source: # Uncomment the following block if you have a DefaultingWebhook (--defaulting )
189189+# kind: Certificate
190190+# group: cert-manager.io
191191+# version: v1
192192+# name: serving-cert
193193+# fieldPath: .metadata.namespace # Namespace of the certificate CR
194194+# targets:
195195+# - select:
196196+# kind: MutatingWebhookConfiguration
197197+# fieldPaths:
198198+# - .metadata.annotations.[cert-manager.io/inject-ca-from]
199199+# options:
200200+# delimiter: '/'
201201+# index: 0
202202+# create: true
203203+# - source:
204204+# kind: Certificate
205205+# group: cert-manager.io
206206+# version: v1
207207+# name: serving-cert
208208+# fieldPath: .metadata.name
209209+# targets:
210210+# - select:
211211+# kind: MutatingWebhookConfiguration
212212+# fieldPaths:
213213+# - .metadata.annotations.[cert-manager.io/inject-ca-from]
214214+# options:
215215+# delimiter: '/'
216216+# index: 1
217217+# create: true
218218+#
219219+# - source: # Uncomment the following block if you have a ConversionWebhook (--conversion)
220220+# kind: Certificate
221221+# group: cert-manager.io
222222+# version: v1
223223+# name: serving-cert
224224+# fieldPath: .metadata.namespace # Namespace of the certificate CR
225225+# targets: # Do not remove or uncomment the following scaffold marker; required to generate code for target CRD.
226226+# +kubebuilder:scaffold:crdkustomizecainjectionns
227227+# - source:
228228+# kind: Certificate
229229+# group: cert-manager.io
230230+# version: v1
231231+# name: serving-cert
232232+# fieldPath: .metadata.name
233233+# targets: # Do not remove or uncomment the following scaffold marker; required to generate code for target CRD.
234234+# +kubebuilder:scaffold:crdkustomizecainjectionname
+4
config/default/manager_metrics_patch.yaml
···11+# This patch adds the args to allow exposing the metrics endpoint using HTTPS
22+- op: add
33+ path: /spec/template/spec/containers/0/args/0
44+ value: --metrics-bind-address=:8443
···11+# These resources constitute the fully configured set of manifests
22+# used to generate the 'manifests/' directory in a bundle.
33+resources:
44+- bases/loom.clusterserviceversion.yaml
55+- ../default
66+- ../samples
77+- ../scorecard
88+99+# [WEBHOOK] To enable webhooks, uncomment all the sections with [WEBHOOK] prefix.
1010+# Do NOT uncomment sections with prefix [CERTMANAGER], as OLM does not support cert-manager.
1111+# These patches remove the unnecessary "cert" volume and its manager container volumeMount.
1212+#patches:
1313+#- target:
1414+# group: apps
1515+# version: v1
1616+# kind: Deployment
1717+# name: controller-manager
1818+# namespace: system
1919+# patch: |-
2020+# # Remove the manager container's "cert" volumeMount, since OLM will create and mount a set of certs.
2121+# # Update the indices in this path if adding or removing containers/volumeMounts in the manager's Deployment.
2222+# - op: remove
2323+2424+# path: /spec/template/spec/containers/0/volumeMounts/0
2525+# # Remove the "cert" volume, since OLM will create and mount a set of certs.
2626+# # Update the indices in this path if adding or removing volumes in the manager's Deployment.
2727+# - op: remove
2828+# path: /spec/template/spec/volumes/0
+27
config/network-policy/allow-metrics-traffic.yaml
···11+# This NetworkPolicy allows ingress traffic
22+# with Pods running on namespaces labeled with 'metrics: enabled'. Only Pods on those
33+# namespaces are able to gather data from the metrics endpoint.
44+apiVersion: networking.k8s.io/v1
55+kind: NetworkPolicy
66+metadata:
77+ labels:
88+ app.kubernetes.io/name: loom
99+ app.kubernetes.io/managed-by: kustomize
1010+ name: allow-metrics-traffic
1111+ namespace: system
1212+spec:
1313+ podSelector:
1414+ matchLabels:
1515+ control-plane: controller-manager
1616+ app.kubernetes.io/name: loom
1717+ policyTypes:
1818+ - Ingress
1919+ ingress:
2020+ # This allows ingress traffic from any namespace with the label metrics: enabled
2121+ - from:
2222+ - namespaceSelector:
2323+ matchLabels:
2424+ metrics: enabled # Only from namespaces with this label
2525+ ports:
2626+ - port: 8443
2727+ protocol: TCP
···11+resources:
22+- monitor.yaml
33+44+# [PROMETHEUS-WITH-CERTS] The following patch configures the ServiceMonitor in ../prometheus
55+# to securely reference certificates created and managed by cert-manager.
66+# Additionally, ensure that you uncomment the [METRICS WITH CERTMANAGER] patch under config/default/kustomization.yaml
77+# to mount the "metrics-server-cert" secret in the Manager Deployment.
88+#patches:
99+# - path: monitor_tls_patch.yaml
1010+# target:
1111+# kind: ServiceMonitor
+27
config/prometheus/monitor.yaml
···11+# Prometheus Monitor Service (Metrics)
22+apiVersion: monitoring.coreos.com/v1
33+kind: ServiceMonitor
44+metadata:
55+ labels:
66+ control-plane: controller-manager
77+ app.kubernetes.io/name: loom
88+ app.kubernetes.io/managed-by: kustomize
99+ name: controller-manager-metrics-monitor
1010+ namespace: system
1111+spec:
1212+ endpoints:
1313+ - path: /metrics
1414+ port: https # Ensure this is the name of the port that exposes HTTPS metrics
1515+ scheme: https
1616+ bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
1717+ tlsConfig:
1818+ # TODO(user): The option insecureSkipVerify: true is not recommended for production since it disables
1919+ # certificate verification, exposing the system to potential man-in-the-middle attacks.
2020+ # For production environments, it is recommended to use cert-manager for automatic TLS certificate management.
2121+ # To apply this configuration, enable cert-manager and use the patch located at config/prometheus/servicemonitor_tls_patch.yaml,
2222+ # which securely references the certificate from the 'metrics-server-cert' secret.
2323+ insecureSkipVerify: true
2424+ selector:
2525+ matchLabels:
2626+ control-plane: controller-manager
2727+ app.kubernetes.io/name: loom
+19
config/prometheus/monitor_tls_patch.yaml
···11+# Patch for Prometheus ServiceMonitor to enable secure TLS configuration
22+# using certificates managed by cert-manager
33+- op: replace
44+ path: /spec/endpoints/0/tlsConfig
55+ value:
66+ # SERVICE_NAME and SERVICE_NAMESPACE will be substituted by kustomize
77+ serverName: SERVICE_NAME.SERVICE_NAMESPACE.svc
88+ insecureSkipVerify: false
99+ ca:
1010+ secret:
1111+ name: metrics-server-cert
1212+ key: ca.crt
1313+ cert:
1414+ secret:
1515+ name: metrics-server-cert
1616+ key: tls.crt
1717+ keySecret:
1818+ name: metrics-server-cert
1919+ key: tls.key
+28
config/rbac/kustomization.yaml
···11+resources:
22+# All RBAC will be applied under this service account in
33+# the deployment namespace. You may comment out this resource
44+# if your manager will use a service account that exists at
55+# runtime. Be sure to update RoleBinding and ClusterRoleBinding
66+# subjects if changing service account names.
77+- service_account.yaml
88+- role.yaml
99+- role_binding.yaml
1010+- leader_election_role.yaml
1111+- leader_election_role_binding.yaml
1212+# The following RBAC configurations are used to protect
1313+# the metrics endpoint with authn/authz. These configurations
1414+# ensure that only authorized users and service accounts
1515+# can access the metrics endpoint. Comment the following
1616+# permissions if you want to disable this protection.
1717+# More info: https://book.kubebuilder.io/reference/metrics.html
1818+- metrics_auth_role.yaml
1919+- metrics_auth_role_binding.yaml
2020+- metrics_reader_role.yaml
2121+# For each CRD, "Admin", "Editor" and "Viewer" roles are scaffolded by
2222+# default, aiding admins in cluster management. Those roles are
2323+# not used by the loom itself. You can comment the following lines
2424+# if you do not want those helpers be installed with your Project.
2525+- spindleset_admin_role.yaml
2626+- spindleset_editor_role.yaml
2727+- spindleset_viewer_role.yaml
2828+
···11+# This rule is not used by the project loom itself.
22+# It is provided to allow the cluster admin to help manage permissions for users.
33+#
44+# Grants full permissions ('*') over loom.j5t.io.
55+# This role is intended for users authorized to modify roles and bindings within the cluster,
66+# enabling them to delegate specific permissions to other users or groups as needed.
77+88+apiVersion: rbac.authorization.k8s.io/v1
99+kind: ClusterRole
1010+metadata:
1111+ labels:
1212+ app.kubernetes.io/name: loom
1313+ app.kubernetes.io/managed-by: kustomize
1414+ name: spindleset-admin-role
1515+rules:
1616+- apiGroups:
1717+ - loom.j5t.io
1818+ resources:
1919+ - spindlesets
2020+ verbs:
2121+ - '*'
2222+- apiGroups:
2323+ - loom.j5t.io
2424+ resources:
2525+ - spindlesets/status
2626+ verbs:
2727+ - get
+33
config/rbac/spindleset_editor_role.yaml
···11+# This rule is not used by the project loom itself.
22+# It is provided to allow the cluster admin to help manage permissions for users.
33+#
44+# Grants permissions to create, update, and delete resources within the loom.j5t.io.
55+# This role is intended for users who need to manage these resources
66+# but should not control RBAC or manage permissions for others.
77+88+apiVersion: rbac.authorization.k8s.io/v1
99+kind: ClusterRole
1010+metadata:
1111+ labels:
1212+ app.kubernetes.io/name: loom
1313+ app.kubernetes.io/managed-by: kustomize
1414+ name: spindleset-editor-role
1515+rules:
1616+- apiGroups:
1717+ - loom.j5t.io
1818+ resources:
1919+ - spindlesets
2020+ verbs:
2121+ - create
2222+ - delete
2323+ - get
2424+ - list
2525+ - patch
2626+ - update
2727+ - watch
2828+- apiGroups:
2929+ - loom.j5t.io
3030+ resources:
3131+ - spindlesets/status
3232+ verbs:
3333+ - get
+29
config/rbac/spindleset_viewer_role.yaml
···11+# This rule is not used by the project loom itself.
22+# It is provided to allow the cluster admin to help manage permissions for users.
33+#
44+# Grants read-only access to loom.j5t.io resources.
55+# This role is intended for users who need visibility into these resources
66+# without permissions to modify them. It is ideal for monitoring purposes and limited-access viewing.
77+88+apiVersion: rbac.authorization.k8s.io/v1
99+kind: ClusterRole
1010+metadata:
1111+ labels:
1212+ app.kubernetes.io/name: loom
1313+ app.kubernetes.io/managed-by: kustomize
1414+ name: spindleset-viewer-role
1515+rules:
1616+- apiGroups:
1717+ - loom.j5t.io
1818+ resources:
1919+ - spindlesets
2020+ verbs:
2121+ - get
2222+ - list
2323+ - watch
2424+- apiGroups:
2525+ - loom.j5t.io
2626+ resources:
2727+ - spindlesets/status
2828+ verbs:
2929+ - get
+4
config/samples/kustomization.yaml
···11+## Append samples of your project ##
22+resources:
33+- loom_v1alpha1_spindleset.yaml
44+# +kubebuilder:scaffold:manifestskustomizesamples
+35
config/samples/loom_v1alpha1_spindleset.yaml
···11+apiVersion: loom.j5t.io/v1alpha1
22+kind: SpindleSet
33+metadata:
44+ labels:
55+ app.kubernetes.io/name: loom
66+ app.kubernetes.io/managed-by: kustomize
77+ name: spindleset-sample
88+spec:
99+ # URL of the tangled.org knot to connect to
1010+ knotUrl: https://knot1.tangled.sh
1111+1212+ # Name of the Secret containing the auth token (key: token)
1313+ knotAuthSecret: spindle-auth
1414+1515+ # Maximum number of concurrent spindle jobs (default: 10)
1616+ maxConcurrentJobs: 5
1717+1818+ # Default template for spindle job pods
1919+ template:
2020+ resources:
2121+ requests:
2222+ cpu: "500m"
2323+ memory: "1Gi"
2424+ limits:
2525+ cpu: "2"
2626+ memory: "4Gi"
2727+2828+ # Optional: Node selector for pod placement
2929+ nodeSelector: {}
3030+3131+ # Optional: Tolerations for scheduling
3232+ tolerations: []
3333+3434+ # Optional: Pod affinity/anti-affinity rules
3535+ affinity: {}
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
+434
internal/controller/spindleset_controller.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package controller
1818+1919+import (
2020+ "context"
2121+ "fmt"
2222+ "sync"
2323+2424+ "tangled.org/core/spindle"
2525+ "tangled.org/core/spindle/models"
2626+2727+ batchv1 "k8s.io/api/batch/v1"
2828+ corev1 "k8s.io/api/core/v1"
2929+ apierrors "k8s.io/apimachinery/pkg/api/errors"
3030+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
3131+ "k8s.io/apimachinery/pkg/runtime"
3232+ "k8s.io/apimachinery/pkg/util/intstr"
3333+ "k8s.io/client-go/rest"
3434+ ctrl "sigs.k8s.io/controller-runtime"
3535+ "sigs.k8s.io/controller-runtime/pkg/client"
3636+ "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
3737+ "sigs.k8s.io/controller-runtime/pkg/log"
3838+3939+ loomv1alpha1 "tangled.org/evan.jarrett.net/loom/api/v1alpha1"
4040+)
4141+4242+// SpindleSetReconciler reconciles a SpindleSet object
4343+type SpindleSetReconciler struct {
4444+ client.Client
4545+ Scheme *runtime.Scheme
4646+ Config *rest.Config
4747+ Spindle *spindle.Spindle
4848+4949+ // Track watched Jobs for status reporting
5050+ watchedJobs sync.Map // map[string]models.WorkflowId
5151+}
5252+5353+// +kubebuilder:rbac:groups=loom.j5t.io,resources=spindlesets,verbs=get;list;watch;create;update;patch;delete
5454+// +kubebuilder:rbac:groups=loom.j5t.io,resources=spindlesets/status,verbs=get;update;patch
5555+// +kubebuilder:rbac:groups=loom.j5t.io,resources=spindlesets/finalizers,verbs=update
5656+// +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete
5757+// +kubebuilder:rbac:groups=batch,resources=jobs/status,verbs=get
5858+// +kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch
5959+// +kubebuilder:rbac:groups="",resources=pods/log,verbs=get
6060+// +kubebuilder:rbac:groups="",resources=secrets,verbs=get;list;watch
6161+// +kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete
6262+6363+// Reconcile is part of the main kubernetes reconciliation loop which aims to
6464+// move the current state of the cluster closer to the desired state.
6565+//
6666+// The Reconcile function handles:
6767+// - Fetching the SpindleSet resource
6868+// - Managing the WebSocket connection to the tangled.org knot
6969+// - Monitoring spindle Jobs and updating status
7070+// - Cleaning up completed Jobs
7171+//
7272+// For more details, check Reconcile and its Result here:
7373+// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.21.0/pkg/reconcile
7474+func (r *SpindleSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
7575+ logger := log.FromContext(ctx)
7676+7777+ // Fetch the SpindleSet instance
7878+ spindleSet := &loomv1alpha1.SpindleSet{}
7979+ if err := r.Get(ctx, req.NamespacedName, spindleSet); err != nil {
8080+ if apierrors.IsNotFound(err) {
8181+ // SpindleSet was deleted, nothing to do
8282+ logger.Info("SpindleSet resource not found, likely deleted")
8383+ return ctrl.Result{}, nil
8484+ }
8585+ logger.Error(err, "Failed to get SpindleSet")
8686+ return ctrl.Result{}, err
8787+ }
8888+8989+ logger.Info("Reconciling SpindleSet", "name", spindleSet.Name, "knotUrl", spindleSet.Spec.KnotUrl)
9090+9191+ // Add finalizer if not present
9292+ if !controllerutil.ContainsFinalizer(spindleSet, "loom.j5t.io/finalizer") {
9393+ controllerutil.AddFinalizer(spindleSet, "loom.j5t.io/finalizer")
9494+ if err := r.Update(ctx, spindleSet); err != nil {
9595+ return ctrl.Result{}, err
9696+ }
9797+ }
9898+9999+ // Handle deletion
100100+ if !spindleSet.DeletionTimestamp.IsZero() {
101101+ return r.handleDeletion(ctx, spindleSet)
102102+ }
103103+104104+ // Ensure spindle HTTP Service exists
105105+ if err := r.ensureSpindleService(ctx, spindleSet); err != nil {
106106+ logger.Error(err, "Failed to ensure spindle service")
107107+ // Don't return error - we'll retry on next reconcile
108108+ }
109109+110110+ // Monitor Job statuses and report to spindle DB
111111+ if err := r.monitorJobStatuses(ctx, spindleSet); err != nil {
112112+ logger.Error(err, "Failed to monitor job statuses")
113113+ // Don't return error - we'll retry on next reconcile
114114+ }
115115+116116+ // Update status based on current Jobs
117117+ if err := r.updateStatus(ctx, spindleSet); err != nil {
118118+ logger.Error(err, "Failed to update status")
119119+ return ctrl.Result{}, err
120120+ }
121121+122122+ // Requeue after 30 seconds to update status
123123+ return ctrl.Result{RequeueAfter: 30 * ctrl.Result{}.RequeueAfter}, nil
124124+}
125125+126126+// handleDeletion handles cleanup when SpindleSet is being deleted
127127+func (r *SpindleSetReconciler) handleDeletion(ctx context.Context, spindleSet *loomv1alpha1.SpindleSet) (ctrl.Result, error) {
128128+ logger := log.FromContext(ctx)
129129+ logger.Info("Handling SpindleSet deletion")
130130+131131+ // List all Jobs owned by this SpindleSet
132132+ jobList := &batchv1.JobList{}
133133+ if err := r.List(ctx, jobList, client.InNamespace(spindleSet.Namespace), client.MatchingLabels{
134134+ "loom.j5t.io/spindleset": spindleSet.Name,
135135+ }); err != nil {
136136+ logger.Error(err, "Failed to list Jobs for cleanup")
137137+ return ctrl.Result{}, err
138138+ }
139139+140140+ // Delete all Jobs
141141+ for _, job := range jobList.Items {
142142+ logger.Info("Deleting Job", "jobName", job.Name)
143143+ if err := r.Delete(ctx, &job); client.IgnoreNotFound(err) != nil {
144144+ logger.Error(err, "Failed to delete Job", "jobName", job.Name)
145145+ return ctrl.Result{}, err
146146+ }
147147+ }
148148+149149+ // Remove finalizer
150150+ controllerutil.RemoveFinalizer(spindleSet, "loom.j5t.io/finalizer")
151151+ if err := r.Update(ctx, spindleSet); err != nil {
152152+ return ctrl.Result{}, err
153153+ }
154154+155155+ logger.Info("SpindleSet cleanup complete")
156156+ return ctrl.Result{}, nil
157157+}
158158+159159+// updateStatus updates the SpindleSet status based on current Jobs
160160+func (r *SpindleSetReconciler) updateStatus(ctx context.Context, spindleSet *loomv1alpha1.SpindleSet) error {
161161+ logger := log.FromContext(ctx)
162162+163163+ // List all Jobs owned by this SpindleSet
164164+ jobList := &batchv1.JobList{}
165165+ if err := r.List(ctx, jobList, client.InNamespace(spindleSet.Namespace), client.MatchingLabels{
166166+ "loom.j5t.io/spindleset": spindleSet.Name,
167167+ }); err != nil {
168168+ return fmt.Errorf("failed to list Jobs: %w", err)
169169+ }
170170+171171+ // Count Jobs by status
172172+ var pending, running, completed, failed int32
173173+ for _, job := range jobList.Items {
174174+ if job.Status.Active > 0 {
175175+ running++
176176+ } else if job.Status.Succeeded > 0 {
177177+ completed++
178178+ } else if job.Status.Failed > 0 {
179179+ failed++
180180+ } else {
181181+ pending++
182182+ }
183183+ }
184184+185185+ // Update status
186186+ statusChanged := false
187187+ if spindleSet.Status.PendingJobs != pending {
188188+ spindleSet.Status.PendingJobs = pending
189189+ statusChanged = true
190190+ }
191191+ if spindleSet.Status.RunningJobs != running {
192192+ spindleSet.Status.RunningJobs = running
193193+ statusChanged = true
194194+ }
195195+ if spindleSet.Status.CompletedJobs != completed {
196196+ spindleSet.Status.CompletedJobs = completed
197197+ statusChanged = true
198198+ }
199199+ if spindleSet.Status.FailedJobs != failed {
200200+ spindleSet.Status.FailedJobs = failed
201201+ statusChanged = true
202202+ }
203203+204204+ // Update conditions
205205+ readyCondition := metav1.Condition{
206206+ Type: "Ready",
207207+ Status: metav1.ConditionTrue,
208208+ Reason: "SpindleSetReady",
209209+ Message: "SpindleSet is ready to process pipeline events",
210210+ LastTransitionTime: metav1.Now(),
211211+ ObservedGeneration: spindleSet.Generation,
212212+ }
213213+214214+ // Check if condition needs to be updated
215215+ existingCondition := findCondition(spindleSet.Status.Conditions, "Ready")
216216+ if existingCondition == nil || existingCondition.Status != readyCondition.Status {
217217+ setCondition(&spindleSet.Status.Conditions, readyCondition)
218218+ statusChanged = true
219219+ }
220220+221221+ // Only update if something changed
222222+ if statusChanged {
223223+ logger.Info("Updating SpindleSet status",
224224+ "pending", pending,
225225+ "running", running,
226226+ "completed", completed,
227227+ "failed", failed)
228228+229229+ if err := r.Status().Update(ctx, spindleSet); err != nil {
230230+ return fmt.Errorf("failed to update SpindleSet status: %w", err)
231231+ }
232232+ }
233233+234234+ return nil
235235+}
236236+237237+// findCondition finds a condition in the list by type
238238+func findCondition(conditions []metav1.Condition, conditionType string) *metav1.Condition {
239239+ for i := range conditions {
240240+ if conditions[i].Type == conditionType {
241241+ return &conditions[i]
242242+ }
243243+ }
244244+ return nil
245245+}
246246+247247+// setCondition adds or updates a condition in the list
248248+func setCondition(conditions *[]metav1.Condition, newCondition metav1.Condition) {
249249+ if conditions == nil {
250250+ *conditions = []metav1.Condition{}
251251+ }
252252+253253+ for i := range *conditions {
254254+ if (*conditions)[i].Type == newCondition.Type {
255255+ (*conditions)[i] = newCondition
256256+ return
257257+ }
258258+ }
259259+260260+ *conditions = append(*conditions, newCondition)
261261+}
262262+263263+// ensureSpindleService ensures a Service exists to expose the spindle HTTP server
264264+func (r *SpindleSetReconciler) ensureSpindleService(ctx context.Context, spindleSet *loomv1alpha1.SpindleSet) error {
265265+ logger := log.FromContext(ctx)
266266+267267+ // Service name based on SpindleSet
268268+ serviceName := fmt.Sprintf("%s-spindle", spindleSet.Name)
269269+270270+ // Check if Service already exists
271271+ existingService := &corev1.Service{}
272272+ err := r.Get(ctx, client.ObjectKey{
273273+ Name: serviceName,
274274+ Namespace: spindleSet.Namespace,
275275+ }, existingService)
276276+277277+ if err == nil {
278278+ // Service already exists
279279+ logger.V(1).Info("Spindle service already exists", "service", serviceName)
280280+ return nil
281281+ }
282282+283283+ if !apierrors.IsNotFound(err) {
284284+ return fmt.Errorf("failed to get service: %w", err)
285285+ }
286286+287287+ // Create new Service
288288+ service := &corev1.Service{
289289+ ObjectMeta: metav1.ObjectMeta{
290290+ Name: serviceName,
291291+ Namespace: spindleSet.Namespace,
292292+ Labels: map[string]string{
293293+ "loom.j5t.io/component": "spindle-http",
294294+ "loom.j5t.io/spindleset": spindleSet.Name,
295295+ },
296296+ },
297297+ Spec: corev1.ServiceSpec{
298298+ Selector: map[string]string{
299299+ // Select the loom operator pod
300300+ // This assumes the operator deployment has this label
301301+ "control-plane": "controller-manager",
302302+ },
303303+ Ports: []corev1.ServicePort{
304304+ {
305305+ Name: "http",
306306+ Protocol: corev1.ProtocolTCP,
307307+ Port: 6555, // Default spindle HTTP port
308308+ TargetPort: intstr.FromInt(6555),
309309+ },
310310+ },
311311+ Type: corev1.ServiceTypeClusterIP,
312312+ },
313313+ }
314314+315315+ // Set SpindleSet as owner
316316+ if err := controllerutil.SetControllerReference(spindleSet, service, r.Scheme); err != nil {
317317+ return fmt.Errorf("failed to set controller reference: %w", err)
318318+ }
319319+320320+ logger.Info("Creating spindle HTTP service", "service", serviceName, "port", 6555)
321321+ if err := r.Create(ctx, service); err != nil {
322322+ return fmt.Errorf("failed to create service: %w", err)
323323+ }
324324+325325+ logger.Info("Spindle HTTP service created successfully", "service", serviceName)
326326+ return nil
327327+}
328328+329329+// monitorJobStatuses monitors Jobs and reports status changes to spindle DB
330330+func (r *SpindleSetReconciler) monitorJobStatuses(ctx context.Context, spindleSet *loomv1alpha1.SpindleSet) error {
331331+ logger := log.FromContext(ctx)
332332+333333+ // List all Jobs owned by this SpindleSet
334334+ jobList := &batchv1.JobList{}
335335+ if err := r.List(ctx, jobList, client.InNamespace(spindleSet.Namespace), client.MatchingLabels{
336336+ "loom.j5t.io/component": "spindle",
337337+ }); err != nil {
338338+ return fmt.Errorf("failed to list Jobs: %w", err)
339339+ }
340340+341341+ // Check each Job's status
342342+ for _, job := range jobList.Items {
343343+ jobKey := fmt.Sprintf("%s/%s", job.Namespace, job.Name)
344344+345345+ // Extract workflow ID from Job labels
346346+ pipelineID := job.Labels["loom.j5t.io/pipeline-id"]
347347+ workflowName := job.Labels["loom.j5t.io/workflow"]
348348+ knot := job.Annotations["loom.j5t.io/knot"]
349349+350350+ if pipelineID == "" || workflowName == "" {
351351+ continue
352352+ }
353353+354354+ wid := models.WorkflowId{
355355+ PipelineId: models.PipelineId{
356356+ Knot: knot,
357357+ Rkey: pipelineID,
358358+ },
359359+ Name: workflowName,
360360+ }
361361+362362+ // Check if this is a new Job or status changed
363363+ _, exists := r.watchedJobs.Load(jobKey)
364364+ newJob := !exists
365365+366366+ // Determine current status
367367+ var currentStatus string
368368+ if job.Status.Active > 0 {
369369+ currentStatus = "running"
370370+ } else if job.Status.Succeeded > 0 {
371371+ currentStatus = "succeeded"
372372+ } else if job.Status.Failed > 0 {
373373+ currentStatus = "failed"
374374+ } else {
375375+ currentStatus = "pending"
376376+ }
377377+378378+ // Report status changes to spindle DB
379379+ if newJob && currentStatus == "pending" {
380380+ logger.Info("Reporting pending status", "job", job.Name, "workflow", workflowName)
381381+ if err := r.Spindle.DB().StatusPending(wid, r.Spindle.Notifier()); err != nil {
382382+ logger.Error(err, "Failed to report pending status")
383383+ }
384384+ r.watchedJobs.Store(jobKey, wid)
385385+ }
386386+387387+ if currentStatus == "running" {
388388+ if newJob || !exists {
389389+ logger.Info("Reporting running status", "job", job.Name, "workflow", workflowName)
390390+ if err := r.Spindle.DB().StatusRunning(wid, r.Spindle.Notifier()); err != nil {
391391+ logger.Error(err, "Failed to report running status")
392392+ }
393393+ r.watchedJobs.Store(jobKey, wid)
394394+ }
395395+ }
396396+397397+ if currentStatus == "succeeded" {
398398+ logger.Info("Reporting success status", "job", job.Name, "workflow", workflowName)
399399+ if err := r.Spindle.DB().StatusSuccess(wid, r.Spindle.Notifier()); err != nil {
400400+ logger.Error(err, "Failed to report success status")
401401+ }
402402+ r.watchedJobs.Delete(jobKey)
403403+ }
404404+405405+ if currentStatus == "failed" {
406406+ logger.Info("Reporting failed status", "job", job.Name, "workflow", workflowName)
407407+ // Try to get error message from Job conditions
408408+ errorMsg := "Job failed"
409409+ for _, cond := range job.Status.Conditions {
410410+ if cond.Type == batchv1.JobFailed {
411411+ errorMsg = cond.Message
412412+ break
413413+ }
414414+ }
415415+ exitCode := int64(1)
416416+ if err := r.Spindle.DB().StatusFailed(wid, errorMsg, exitCode, r.Spindle.Notifier()); err != nil {
417417+ logger.Error(err, "Failed to report failed status")
418418+ }
419419+ r.watchedJobs.Delete(jobKey)
420420+ }
421421+ }
422422+423423+ return nil
424424+}
425425+426426+// SetupWithManager sets up the controller with the Manager.
427427+func (r *SpindleSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
428428+ return ctrl.NewControllerManagedBy(mgr).
429429+ For(&loomv1alpha1.SpindleSet{}).
430430+ Owns(&batchv1.Job{}). // Watch Jobs owned by SpindleSet
431431+ Owns(&corev1.Service{}). // Watch Services owned by SpindleSet
432432+ Named("spindleset").
433433+ Complete(r)
434434+}
+84
internal/controller/spindleset_controller_test.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package controller
1818+1919+import (
2020+ "context"
2121+2222+ . "github.com/onsi/ginkgo/v2"
2323+ . "github.com/onsi/gomega"
2424+ "k8s.io/apimachinery/pkg/api/errors"
2525+ "k8s.io/apimachinery/pkg/types"
2626+ "sigs.k8s.io/controller-runtime/pkg/reconcile"
2727+2828+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2929+3030+ loomv1alpha1 "tangled.org/evan.jarrett.net/loom/api/v1alpha1"
3131+)
3232+3333+var _ = Describe("SpindleSet Controller", func() {
3434+ Context("When reconciling a resource", func() {
3535+ const resourceName = "test-resource"
3636+3737+ ctx := context.Background()
3838+3939+ typeNamespacedName := types.NamespacedName{
4040+ Name: resourceName,
4141+ Namespace: "default", // TODO(user):Modify as needed
4242+ }
4343+ spindleset := &loomv1alpha1.SpindleSet{}
4444+4545+ BeforeEach(func() {
4646+ By("creating the custom resource for the Kind SpindleSet")
4747+ err := k8sClient.Get(ctx, typeNamespacedName, spindleset)
4848+ if err != nil && errors.IsNotFound(err) {
4949+ resource := &loomv1alpha1.SpindleSet{
5050+ ObjectMeta: metav1.ObjectMeta{
5151+ Name: resourceName,
5252+ Namespace: "default",
5353+ },
5454+ // TODO(user): Specify other spec details if needed.
5555+ }
5656+ Expect(k8sClient.Create(ctx, resource)).To(Succeed())
5757+ }
5858+ })
5959+6060+ AfterEach(func() {
6161+ // TODO(user): Cleanup logic after each test, like removing the resource instance.
6262+ resource := &loomv1alpha1.SpindleSet{}
6363+ err := k8sClient.Get(ctx, typeNamespacedName, resource)
6464+ Expect(err).NotTo(HaveOccurred())
6565+6666+ By("Cleanup the specific resource instance SpindleSet")
6767+ Expect(k8sClient.Delete(ctx, resource)).To(Succeed())
6868+ })
6969+ It("should successfully reconcile the resource", func() {
7070+ By("Reconciling the created resource")
7171+ controllerReconciler := &SpindleSetReconciler{
7272+ Client: k8sClient,
7373+ Scheme: k8sClient.Scheme(),
7474+ }
7575+7676+ _, err := controllerReconciler.Reconcile(ctx, reconcile.Request{
7777+ NamespacedName: typeNamespacedName,
7878+ })
7979+ Expect(err).NotTo(HaveOccurred())
8080+ // TODO(user): Add more specific assertions depending on your controller's reconciliation logic.
8181+ // Example: If you expect a certain status condition after reconciliation, verify it here.
8282+ })
8383+ })
8484+})
+116
internal/controller/suite_test.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package controller
1818+1919+import (
2020+ "context"
2121+ "os"
2222+ "path/filepath"
2323+ "testing"
2424+2525+ . "github.com/onsi/ginkgo/v2"
2626+ . "github.com/onsi/gomega"
2727+2828+ "k8s.io/client-go/kubernetes/scheme"
2929+ "k8s.io/client-go/rest"
3030+ "sigs.k8s.io/controller-runtime/pkg/client"
3131+ "sigs.k8s.io/controller-runtime/pkg/envtest"
3232+ logf "sigs.k8s.io/controller-runtime/pkg/log"
3333+ "sigs.k8s.io/controller-runtime/pkg/log/zap"
3434+3535+ loomv1alpha1 "tangled.org/evan.jarrett.net/loom/api/v1alpha1"
3636+ // +kubebuilder:scaffold:imports
3737+)
3838+3939+// These tests use Ginkgo (BDD-style Go testing framework). Refer to
4040+// http://onsi.github.io/ginkgo/ to learn more about Ginkgo.
4141+4242+var (
4343+ ctx context.Context
4444+ cancel context.CancelFunc
4545+ testEnv *envtest.Environment
4646+ cfg *rest.Config
4747+ k8sClient client.Client
4848+)
4949+5050+func TestControllers(t *testing.T) {
5151+ RegisterFailHandler(Fail)
5252+5353+ RunSpecs(t, "Controller Suite")
5454+}
5555+5656+var _ = BeforeSuite(func() {
5757+ logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true)))
5858+5959+ ctx, cancel = context.WithCancel(context.TODO())
6060+6161+ var err error
6262+ err = loomv1alpha1.AddToScheme(scheme.Scheme)
6363+ Expect(err).NotTo(HaveOccurred())
6464+6565+ // +kubebuilder:scaffold:scheme
6666+6767+ By("bootstrapping test environment")
6868+ testEnv = &envtest.Environment{
6969+ CRDDirectoryPaths: []string{filepath.Join("..", "..", "config", "crd", "bases")},
7070+ ErrorIfCRDPathMissing: true,
7171+ }
7272+7373+ // Retrieve the first found binary directory to allow running tests from IDEs
7474+ if getFirstFoundEnvTestBinaryDir() != "" {
7575+ testEnv.BinaryAssetsDirectory = getFirstFoundEnvTestBinaryDir()
7676+ }
7777+7878+ // cfg is defined in this file globally.
7979+ cfg, err = testEnv.Start()
8080+ Expect(err).NotTo(HaveOccurred())
8181+ Expect(cfg).NotTo(BeNil())
8282+8383+ k8sClient, err = client.New(cfg, client.Options{Scheme: scheme.Scheme})
8484+ Expect(err).NotTo(HaveOccurred())
8585+ Expect(k8sClient).NotTo(BeNil())
8686+})
8787+8888+var _ = AfterSuite(func() {
8989+ By("tearing down the test environment")
9090+ cancel()
9191+ err := testEnv.Stop()
9292+ Expect(err).NotTo(HaveOccurred())
9393+})
9494+9595+// getFirstFoundEnvTestBinaryDir locates the first binary in the specified path.
9696+// ENVTEST-based tests depend on specific binaries, usually located in paths set by
9797+// controller-runtime. When running tests directly (e.g., via an IDE) without using
9898+// Makefile targets, the 'BinaryAssetsDirectory' must be explicitly configured.
9999+//
100100+// This function streamlines the process by finding the required binaries, similar to
101101+// setting the 'KUBEBUILDER_ASSETS' environment variable. To ensure the binaries are
102102+// properly set up, run 'make setup-envtest' beforehand.
103103+func getFirstFoundEnvTestBinaryDir() string {
104104+ basePath := filepath.Join("..", "..", "bin", "k8s")
105105+ entries, err := os.ReadDir(basePath)
106106+ if err != nil {
107107+ logf.Log.Error(err, "Failed to read directory", "path", basePath)
108108+ return ""
109109+ }
110110+ for _, entry := range entries {
111111+ if entry.IsDir() {
112112+ return filepath.Join(basePath, entry.Name())
113113+ }
114114+ }
115115+ return ""
116116+}
+306
internal/engine/kubernetes_engine.go
···11+package engine
22+33+import (
44+ "context"
55+ "fmt"
66+ "time"
77+88+ "gopkg.in/yaml.v3"
99+ batchv1 "k8s.io/api/batch/v1"
1010+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
1111+ "sigs.k8s.io/controller-runtime/pkg/client"
1212+ "sigs.k8s.io/controller-runtime/pkg/log"
1313+1414+ "tangled.org/core/api/tangled"
1515+ "tangled.org/core/spindle/models"
1616+ "tangled.org/core/spindle/secrets"
1717+1818+ loomv1alpha1 "tangled.org/evan.jarrett.net/loom/api/v1alpha1"
1919+ "tangled.org/evan.jarrett.net/loom/pkg/jobbuilder"
2020+)
2121+2222+// KubernetesEngine implements the spindle Engine interface for Kubernetes Jobs.
2323+type KubernetesEngine struct {
2424+ client client.Client
2525+ namespace string
2626+ template loomv1alpha1.SpindleTemplate
2727+2828+ // Track created jobs for cleanup
2929+ jobs map[string]*batchv1.Job
3030+3131+ // Store current knot for Job annotations
3232+ currentKnot string
3333+}
3434+3535+// NewKubernetesEngine creates a new Kubernetes-based spindle engine.
3636+func NewKubernetesEngine(client client.Client, namespace string, template loomv1alpha1.SpindleTemplate) *KubernetesEngine {
3737+ return &KubernetesEngine{
3838+ client: client,
3939+ namespace: namespace,
4040+ template: template,
4141+ jobs: make(map[string]*batchv1.Job),
4242+ }
4343+}
4444+4545+// WorkflowSpec is the structure of the workflow YAML file.
4646+type WorkflowSpec struct {
4747+ Image string `yaml:"image"`
4848+ Architecture string `yaml:"architecture"`
4949+ Steps []WorkflowStepSpec `yaml:"steps"`
5050+ When []WorkflowWhenSpec `yaml:"when,omitempty"`
5151+ Environment map[string]string `yaml:"environment,omitempty"`
5252+ Dependencies *WorkflowDependenciesSpec `yaml:"dependencies,omitempty"`
5353+}
5454+5555+type WorkflowStepSpec struct {
5656+ Name string `yaml:"name"`
5757+ Command string `yaml:"command"`
5858+ Environment map[string]string `yaml:"environment,omitempty"`
5959+}
6060+6161+type WorkflowWhenSpec struct {
6262+ Event []string `yaml:"event,omitempty"`
6363+ Branch []string `yaml:"branch,omitempty"`
6464+}
6565+6666+type WorkflowDependenciesSpec struct {
6767+ Nixpkgs []string `yaml:"nixpkgs,omitempty"`
6868+}
6969+7070+// SimpleStep implements the models.Step interface.
7171+type SimpleStep struct {
7272+ StepName string
7373+ StepCommand string
7474+ StepKind models.StepKind
7575+}
7676+7777+func (s SimpleStep) Name() string {
7878+ return s.StepName
7979+}
8080+8181+func (s SimpleStep) Command() string {
8282+ return s.StepCommand
8383+}
8484+8585+func (s SimpleStep) Kind() models.StepKind {
8686+ return s.StepKind
8787+}
8888+8989+// InitWorkflow parses the workflow YAML and initializes a Workflow model.
9090+func (e *KubernetesEngine) InitWorkflow(twf tangled.Pipeline_Workflow, tpl tangled.Pipeline) (*models.Workflow, error) {
9191+ // Parse the Raw YAML
9292+ var spec WorkflowSpec
9393+ if err := yaml.Unmarshal([]byte(twf.Raw), &spec); err != nil {
9494+ return nil, fmt.Errorf("failed to parse workflow YAML: %w", err)
9595+ }
9696+9797+ // Validate required fields
9898+ if spec.Image == "" {
9999+ return nil, fmt.Errorf("workflow must specify an 'image' field")
100100+ }
101101+102102+ // Default architecture to amd64 if not specified
103103+ if spec.Architecture == "" {
104104+ spec.Architecture = "amd64"
105105+ }
106106+107107+ // Convert steps to models.Step interface
108108+ steps := make([]models.Step, 0, len(spec.Steps))
109109+ for _, stepSpec := range spec.Steps {
110110+ steps = append(steps, SimpleStep{
111111+ StepName: stepSpec.Name,
112112+ StepCommand: stepSpec.Command,
113113+ StepKind: models.StepKindUser,
114114+ })
115115+ }
116116+117117+ // Store the parsed spec in Data field for later use
118118+ workflowData := map[string]interface{}{
119119+ "spec": spec,
120120+ "triggerRepo": tpl.TriggerMetadata.Repo,
121121+ "cloneOpts": twf.Clone,
122122+ }
123123+124124+ workflow := &models.Workflow{
125125+ Steps: steps,
126126+ Name: twf.Name,
127127+ Data: workflowData,
128128+ }
129129+130130+ return workflow, nil
131131+}
132132+133133+// SetupWorkflow creates the Kubernetes Job for the workflow.
134134+func (e *KubernetesEngine) SetupWorkflow(ctx context.Context, wid models.WorkflowId, wf *models.Workflow) error {
135135+ logger := log.FromContext(ctx).WithValues("workflow", wid.Name, "pipeline", wid.PipelineId.Rkey)
136136+137137+ // Extract workflow data
138138+ workflowData, ok := wf.Data.(map[string]interface{})
139139+ if !ok {
140140+ return fmt.Errorf("invalid workflow data type")
141141+ }
142142+143143+ spec, ok := workflowData["spec"].(WorkflowSpec)
144144+ if !ok {
145145+ return fmt.Errorf("workflow spec not found in data")
146146+ }
147147+148148+ triggerRepo, ok := workflowData["triggerRepo"].(*tangled.Pipeline_TriggerRepo)
149149+ if !ok {
150150+ return fmt.Errorf("trigger repo not found in data")
151151+ }
152152+153153+ // Determine commit SHA based on trigger type
154154+ // For now, assume we're handling push events with NewSha
155155+ // TODO: Handle other trigger types (PR, manual)
156156+ commitSHA := "" // Need to extract from trigger metadata
157157+158158+ // Build repository clone URL
159159+ repoURL := fmt.Sprintf("https://%s/%s/%s", triggerRepo.Knot, triggerRepo.Did, triggerRepo.Repo)
160160+161161+ // Store knot for status reporting
162162+ e.currentKnot = triggerRepo.Knot
163163+164164+ // Convert workflow steps to jobbuilder format
165165+ jobSteps := make([]jobbuilder.WorkflowStep, 0, len(wf.Steps))
166166+ for _, step := range wf.Steps {
167167+ jobSteps = append(jobSteps, jobbuilder.WorkflowStep{
168168+ Name: step.Name(),
169169+ Command: step.Command(),
170170+ Env: nil, // TODO: Extract from step if available
171171+ })
172172+ }
173173+174174+ // Build Job configuration
175175+ jobConfig := jobbuilder.WorkflowConfig{
176176+ WorkflowName: wf.Name,
177177+ PipelineID: wid.PipelineId.Rkey, // Use rkey as pipeline ID
178178+ Image: spec.Image,
179179+ Architecture: spec.Architecture,
180180+ Steps: jobSteps,
181181+ RepoURL: repoURL,
182182+ CommitSHA: commitSHA,
183183+ Secrets: nil, // TODO: Handle secrets
184184+ Template: e.template,
185185+ Namespace: e.namespace,
186186+ Knot: e.currentKnot,
187187+ }
188188+189189+ // Create the Job
190190+ job, err := jobbuilder.BuildJob(jobConfig)
191191+ if err != nil {
192192+ return fmt.Errorf("failed to build job: %w", err)
193193+ }
194194+195195+ // Create the Job in Kubernetes
196196+ logger.Info("Creating Kubernetes Job", "jobName", job.Name)
197197+ if err := e.client.Create(ctx, job); err != nil {
198198+ return fmt.Errorf("failed to create Kubernetes Job: %w", err)
199199+ }
200200+201201+ // Track the job for cleanup
202202+ e.jobs[wid.String()] = job
203203+204204+ logger.Info("Kubernetes Job created successfully", "jobName", job.Name)
205205+ return nil
206206+}
207207+208208+// WorkflowTimeout returns the maximum time a workflow is allowed to run.
209209+func (e *KubernetesEngine) WorkflowTimeout() time.Duration {
210210+ // Default timeout: 1 hour
211211+ return 1 * time.Hour
212212+}
213213+214214+// DestroyWorkflow cleans up the Kubernetes Job after completion.
215215+func (e *KubernetesEngine) DestroyWorkflow(ctx context.Context, wid models.WorkflowId) error {
216216+ logger := log.FromContext(ctx).WithValues("workflow", wid.Name, "pipeline", wid.PipelineId.Rkey)
217217+218218+ job, exists := e.jobs[wid.String()]
219219+ if !exists {
220220+ logger.Info("No job found to destroy")
221221+ return nil
222222+ }
223223+224224+ // Delete the Job
225225+ // Note: Job has TTLSecondsAfterFinished set, so it will auto-cleanup
226226+ // This is just for explicit cleanup if needed
227227+ logger.Info("Cleaning up Kubernetes Job", "jobName", job.Name)
228228+229229+ // Delete with propagation policy to remove pods
230230+ deletePolicy := metav1.DeletePropagationForeground
231231+ deleteOptions := &client.DeleteOptions{
232232+ PropagationPolicy: &deletePolicy,
233233+ }
234234+235235+ if err := e.client.Delete(ctx, job, deleteOptions); err != nil {
236236+ // Ignore not found errors (job may have already been deleted by TTL)
237237+ if client.IgnoreNotFound(err) != nil {
238238+ return fmt.Errorf("failed to delete Kubernetes Job: %w", err)
239239+ }
240240+ }
241241+242242+ // Remove from tracking map
243243+ delete(e.jobs, wid.String())
244244+245245+ logger.Info("Kubernetes Job cleaned up successfully")
246246+ return nil
247247+}
248248+249249+// RunStep waits for the Job to complete and streams logs.
250250+// For Kubernetes engine, all steps run in a single Job, but we need to wait for completion.
251251+func (e *KubernetesEngine) RunStep(ctx context.Context, wid models.WorkflowId, w *models.Workflow, idx int, secrets []secrets.UnlockedSecret, wfLogger *models.WorkflowLogger) error {
252252+ logger := log.FromContext(ctx).WithValues("workflow", wid.Name, "pipeline", wid.PipelineId.Rkey, "step", idx)
253253+254254+ // Only wait on the first step - the Job runs all steps together
255255+ if idx != 0 {
256256+ return nil
257257+ }
258258+259259+ job, exists := e.jobs[wid.String()]
260260+ if !exists {
261261+ return fmt.Errorf("no job found for workflow")
262262+ }
263263+264264+ logger.Info("Waiting for Kubernetes Job to complete", "jobName", job.Name)
265265+266266+ // Wait for the Job to complete (with timeout from WorkflowTimeout)
267267+ timeout := time.After(e.WorkflowTimeout())
268268+ ticker := time.NewTicker(5 * time.Second)
269269+ defer ticker.Stop()
270270+271271+ for {
272272+ select {
273273+ case <-ctx.Done():
274274+ return ctx.Err()
275275+ case <-timeout:
276276+ return fmt.Errorf("job timed out after %v", e.WorkflowTimeout())
277277+ case <-ticker.C:
278278+ // Check Job status
279279+ currentJob := &batchv1.Job{}
280280+ err := e.client.Get(ctx, client.ObjectKey{
281281+ Namespace: job.Namespace,
282282+ Name: job.Name,
283283+ }, currentJob)
284284+ if err != nil {
285285+ return fmt.Errorf("failed to get job status: %w", err)
286286+ }
287287+288288+ // Check if Job completed
289289+ if currentJob.Status.Succeeded > 0 {
290290+ logger.Info("Kubernetes Job completed successfully")
291291+ return nil
292292+ }
293293+294294+ // Check if Job failed
295295+ if currentJob.Status.Failed > 0 {
296296+ logger.Error(nil, "Kubernetes Job failed")
297297+ return fmt.Errorf("job failed")
298298+ }
299299+300300+ // Still running, continue waiting
301301+ }
302302+ }
303303+}
304304+305305+// Ensure KubernetesEngine implements the Engine interface
306306+var _ models.Engine = (*KubernetesEngine)(nil)
+88
pkg/jobbuilder/affinity.go
···11+package jobbuilder
22+33+import (
44+ corev1 "k8s.io/api/core/v1"
55+)
66+77+// BuildArchitectureAffinity creates node affinity rules for the specified architecture.
88+// This ensures jobs are scheduled on nodes that match the requested architecture (amd64/arm64).
99+func BuildArchitectureAffinity(architecture string) *corev1.Affinity {
1010+ if architecture == "" {
1111+ // Default to amd64 if not specified
1212+ architecture = "amd64"
1313+ }
1414+1515+ return &corev1.Affinity{
1616+ NodeAffinity: &corev1.NodeAffinity{
1717+ RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
1818+ NodeSelectorTerms: []corev1.NodeSelectorTerm{
1919+ {
2020+ MatchExpressions: []corev1.NodeSelectorRequirement{
2121+ {
2222+ Key: "kubernetes.io/arch",
2323+ Operator: corev1.NodeSelectorOpIn,
2424+ Values: []string{architecture},
2525+ },
2626+ },
2727+ },
2828+ },
2929+ },
3030+ },
3131+ }
3232+}
3333+3434+// MergeAffinity merges architecture-based affinity with user-provided affinity.
3535+// Architecture affinity takes precedence as a hard requirement.
3636+func MergeAffinity(archAffinity, userAffinity *corev1.Affinity) *corev1.Affinity {
3737+ if userAffinity == nil {
3838+ return archAffinity
3939+ }
4040+4141+ merged := archAffinity.DeepCopy()
4242+4343+ // Merge pod affinity if user specified it
4444+ if userAffinity.PodAffinity != nil {
4545+ merged.PodAffinity = userAffinity.PodAffinity.DeepCopy()
4646+ }
4747+4848+ // Merge pod anti-affinity if user specified it
4949+ if userAffinity.PodAntiAffinity != nil {
5050+ merged.PodAntiAffinity = userAffinity.PodAntiAffinity.DeepCopy()
5151+ }
5252+5353+ // Merge node affinity - architecture requirement is preserved, user preferences are added
5454+ if userAffinity.NodeAffinity != nil {
5555+ if merged.NodeAffinity == nil {
5656+ merged.NodeAffinity = &corev1.NodeAffinity{}
5757+ }
5858+5959+ // Add user's preferred scheduling terms
6060+ if userAffinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution != nil {
6161+ merged.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution =
6262+ userAffinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution
6363+ }
6464+6565+ // Merge required node selector terms (AND operation)
6666+ if userAffinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
6767+ for _, userTerm := range userAffinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
6868+ // Append user's match expressions to architecture term
6969+ if len(merged.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms) > 0 {
7070+ merged.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions =
7171+ append(
7272+ merged.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchExpressions,
7373+ userTerm.MatchExpressions...,
7474+ )
7575+ if len(userTerm.MatchFields) > 0 {
7676+ merged.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchFields =
7777+ append(
7878+ merged.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms[0].MatchFields,
7979+ userTerm.MatchFields...,
8080+ )
8181+ }
8282+ }
8383+ }
8484+ }
8585+ }
8686+8787+ return merged
8888+}
···11+package jobbuilder
22+33+import (
44+ "fmt"
55+ "strings"
66+)
77+88+// WorkflowStep represents a single step in a workflow.
99+type WorkflowStep struct {
1010+ Name string
1111+ Command string
1212+ Env map[string]string
1313+}
1414+1515+// BuildStepExecutionScript generates a bash script that executes all workflow steps sequentially.
1616+// The script implements GitHub Actions-compatible environment passing between steps.
1717+func BuildStepExecutionScript(steps []WorkflowStep) string {
1818+ var script strings.Builder
1919+2020+ // Script header with error handling
2121+ script.WriteString(`#!/bin/bash
2222+set -e
2323+set -o pipefail
2424+2525+`)
2626+2727+ // GitHub Actions-compatible environment setup
2828+ script.WriteString(`# GitHub Actions-compatible environment setup
2929+export GITHUB_ENV=/tmp/github/env
3030+export GITHUB_PATH=/tmp/github/path
3131+export GITHUB_OUTPUT=/tmp/github/output
3232+export GITHUB_STEP_SUMMARY=/tmp/github/step-summary.md
3333+3434+# Initialize files
3535+mkdir -p /tmp/github
3636+touch $GITHUB_ENV $GITHUB_PATH $GITHUB_OUTPUT $GITHUB_STEP_SUMMARY
3737+3838+# Helper functions for environment passing
3939+add_to_env() {
4040+ echo "$1=$2" >> $GITHUB_ENV
4141+ export "$1"="$2"
4242+}
4343+4444+add_to_path() {
4545+ echo "$1" >> $GITHUB_PATH
4646+ export PATH="$1:$PATH"
4747+}
4848+4949+set_output() {
5050+ echo "$1=$2" >> $GITHUB_OUTPUT
5151+}
5252+5353+# Load environment variables set by previous steps
5454+load_github_env() {
5555+ if [ -f "$GITHUB_ENV" ]; then
5656+ set -a
5757+ source "$GITHUB_ENV" 2>/dev/null || true
5858+ set +a
5959+ fi
6060+ if [ -f "$GITHUB_PATH" ]; then
6161+ while IFS= read -r line; do
6262+ if [ -n "$line" ]; then
6363+ export PATH="$line:$PATH"
6464+ fi
6565+ done < "$GITHUB_PATH"
6666+ fi
6767+}
6868+6969+# Timestamp helper
7070+timestamp() {
7171+ date -u +"%Y-%m-%dT%H:%M:%S.%3NZ"
7272+}
7373+7474+`)
7575+7676+ // Generate code for each step
7777+ for i, step := range steps {
7878+ script.WriteString(fmt.Sprintf(`
7979+##############################################################################
8080+# Step %d: %s
8181+##############################################################################
8282+echo ""
8383+echo "$(timestamp) ===== Starting Step %d: %s ====="
8484+echo ""
8585+8686+# Load environment from previous steps
8787+load_github_env
8888+8989+`, i+1, step.Name, i+1, step.Name))
9090+9191+ // Set step-specific environment variables
9292+ if len(step.Env) > 0 {
9393+ script.WriteString("# Step-specific environment variables\n")
9494+ for k, v := range step.Env {
9595+ // Escape single quotes in the value
9696+ escapedValue := strings.ReplaceAll(v, "'", "'\\''")
9797+ script.WriteString(fmt.Sprintf("export %s='%s'\n", k, escapedValue))
9898+ }
9999+ script.WriteString("\n")
100100+ }
101101+102102+ // Execute the step command
103103+ script.WriteString("# Execute step command\n")
104104+ script.WriteString("(\n")
105105+ script.WriteString(" cd /tangled/workspace\n")
106106+ script.WriteString(" " + step.Command + "\n")
107107+ script.WriteString(") || {\n")
108108+ script.WriteString(fmt.Sprintf(" echo \"$(timestamp) Step '%s' failed with exit code $?\"\n", step.Name))
109109+ script.WriteString(" exit 1\n")
110110+ script.WriteString("}\n\n")
111111+112112+ script.WriteString(fmt.Sprintf("echo \"$(timestamp) ===== Completed Step %d: %s =====\"\n", i+1, step.Name))
113113+ }
114114+115115+ // Script footer
116116+ script.WriteString(`
117117+echo ""
118118+echo "$(timestamp) ===== All steps completed successfully ====="
119119+exit 0
120120+`)
121121+122122+ return script.String()
123123+}
124124+125125+// EscapeShellString escapes a string for safe use in shell scripts.
126126+func EscapeShellString(s string) string {
127127+ // Replace single quotes with '\''
128128+ return strings.ReplaceAll(s, "'", "'\\''")
129129+}
+89
test/e2e/e2e_suite_test.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package e2e
1818+1919+import (
2020+ "fmt"
2121+ "os"
2222+ "os/exec"
2323+ "testing"
2424+2525+ . "github.com/onsi/ginkgo/v2"
2626+ . "github.com/onsi/gomega"
2727+2828+ "tangled.org/evan.jarrett.net/loom/test/utils"
2929+)
3030+3131+var (
3232+ // Optional Environment Variables:
3333+ // - CERT_MANAGER_INSTALL_SKIP=true: Skips CertManager installation during test setup.
3434+ // These variables are useful if CertManager is already installed, avoiding
3535+ // re-installation and conflicts.
3636+ skipCertManagerInstall = os.Getenv("CERT_MANAGER_INSTALL_SKIP") == "true"
3737+ // isCertManagerAlreadyInstalled will be set true when CertManager CRDs be found on the cluster
3838+ isCertManagerAlreadyInstalled = false
3939+4040+ // projectImage is the name of the image which will be build and loaded
4141+ // with the code source changes to be tested.
4242+ projectImage = "example.com/loom:v0.0.1"
4343+)
4444+4545+// TestE2E runs the end-to-end (e2e) test suite for the project. These tests execute in an isolated,
4646+// temporary environment to validate project changes with the purposed to be used in CI jobs.
4747+// The default setup requires Kind, builds/loads the Manager Docker image locally, and installs
4848+// CertManager.
4949+func TestE2E(t *testing.T) {
5050+ RegisterFailHandler(Fail)
5151+ _, _ = fmt.Fprintf(GinkgoWriter, "Starting loom integration test suite\n")
5252+ RunSpecs(t, "e2e suite")
5353+}
5454+5555+var _ = BeforeSuite(func() {
5656+ By("building the manager(Operator) image")
5757+ cmd := exec.Command("make", "docker-build", fmt.Sprintf("IMG=%s", projectImage))
5858+ _, err := utils.Run(cmd)
5959+ ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to build the manager(Operator) image")
6060+6161+ // TODO(user): If you want to change the e2e test vendor from Kind, ensure the image is
6262+ // built and available before running the tests. Also, remove the following block.
6363+ By("loading the manager(Operator) image on Kind")
6464+ err = utils.LoadImageToKindClusterWithName(projectImage)
6565+ ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to load the manager(Operator) image into Kind")
6666+6767+ // The tests-e2e are intended to run on a temporary cluster that is created and destroyed for testing.
6868+ // To prevent errors when tests run in environments with CertManager already installed,
6969+ // we check for its presence before execution.
7070+ // Setup CertManager before the suite if not skipped and if not already installed
7171+ if !skipCertManagerInstall {
7272+ By("checking if cert manager is installed already")
7373+ isCertManagerAlreadyInstalled = utils.IsCertManagerCRDsInstalled()
7474+ if !isCertManagerAlreadyInstalled {
7575+ _, _ = fmt.Fprintf(GinkgoWriter, "Installing CertManager...\n")
7676+ Expect(utils.InstallCertManager()).To(Succeed(), "Failed to install CertManager")
7777+ } else {
7878+ _, _ = fmt.Fprintf(GinkgoWriter, "WARNING: CertManager is already installed. Skipping installation...\n")
7979+ }
8080+ }
8181+})
8282+8383+var _ = AfterSuite(func() {
8484+ // Teardown CertManager after the suite if not skipped and if it was not already installed
8585+ if !skipCertManagerInstall && !isCertManagerAlreadyInstalled {
8686+ _, _ = fmt.Fprintf(GinkgoWriter, "Uninstalling CertManager...\n")
8787+ utils.UninstallCertManager()
8888+ }
8989+})
+329
test/e2e/e2e_test.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package e2e
1818+1919+import (
2020+ "encoding/json"
2121+ "fmt"
2222+ "os"
2323+ "os/exec"
2424+ "path/filepath"
2525+ "time"
2626+2727+ . "github.com/onsi/ginkgo/v2"
2828+ . "github.com/onsi/gomega"
2929+3030+ "tangled.org/evan.jarrett.net/loom/test/utils"
3131+)
3232+3333+// namespace where the project is deployed in
3434+const namespace = "loom-system"
3535+3636+// serviceAccountName created for the project
3737+const serviceAccountName = "loom-controller-manager"
3838+3939+// metricsServiceName is the name of the metrics service of the project
4040+const metricsServiceName = "loom-controller-manager-metrics-service"
4141+4242+// metricsRoleBindingName is the name of the RBAC that will be created to allow get the metrics data
4343+const metricsRoleBindingName = "loom-metrics-binding"
4444+4545+var _ = Describe("Manager", Ordered, func() {
4646+ var controllerPodName string
4747+4848+ // Before running the tests, set up the environment by creating the namespace,
4949+ // enforce the restricted security policy to the namespace, installing CRDs,
5050+ // and deploying the controller.
5151+ BeforeAll(func() {
5252+ By("creating manager namespace")
5353+ cmd := exec.Command("kubectl", "create", "ns", namespace)
5454+ _, err := utils.Run(cmd)
5555+ Expect(err).NotTo(HaveOccurred(), "Failed to create namespace")
5656+5757+ By("labeling the namespace to enforce the restricted security policy")
5858+ cmd = exec.Command("kubectl", "label", "--overwrite", "ns", namespace,
5959+ "pod-security.kubernetes.io/enforce=restricted")
6060+ _, err = utils.Run(cmd)
6161+ Expect(err).NotTo(HaveOccurred(), "Failed to label namespace with restricted policy")
6262+6363+ By("installing CRDs")
6464+ cmd = exec.Command("make", "install")
6565+ _, err = utils.Run(cmd)
6666+ Expect(err).NotTo(HaveOccurred(), "Failed to install CRDs")
6767+6868+ By("deploying the controller-manager")
6969+ cmd = exec.Command("make", "deploy", fmt.Sprintf("IMG=%s", projectImage))
7070+ _, err = utils.Run(cmd)
7171+ Expect(err).NotTo(HaveOccurred(), "Failed to deploy the controller-manager")
7272+ })
7373+7474+ // After all tests have been executed, clean up by undeploying the controller, uninstalling CRDs,
7575+ // and deleting the namespace.
7676+ AfterAll(func() {
7777+ By("cleaning up the curl pod for metrics")
7878+ cmd := exec.Command("kubectl", "delete", "pod", "curl-metrics", "-n", namespace)
7979+ _, _ = utils.Run(cmd)
8080+8181+ By("undeploying the controller-manager")
8282+ cmd = exec.Command("make", "undeploy")
8383+ _, _ = utils.Run(cmd)
8484+8585+ By("uninstalling CRDs")
8686+ cmd = exec.Command("make", "uninstall")
8787+ _, _ = utils.Run(cmd)
8888+8989+ By("removing manager namespace")
9090+ cmd = exec.Command("kubectl", "delete", "ns", namespace)
9191+ _, _ = utils.Run(cmd)
9292+ })
9393+9494+ // After each test, check for failures and collect logs, events,
9595+ // and pod descriptions for debugging.
9696+ AfterEach(func() {
9797+ specReport := CurrentSpecReport()
9898+ if specReport.Failed() {
9999+ By("Fetching controller manager pod logs")
100100+ cmd := exec.Command("kubectl", "logs", controllerPodName, "-n", namespace)
101101+ controllerLogs, err := utils.Run(cmd)
102102+ if err == nil {
103103+ _, _ = fmt.Fprintf(GinkgoWriter, "Controller logs:\n %s", controllerLogs)
104104+ } else {
105105+ _, _ = fmt.Fprintf(GinkgoWriter, "Failed to get Controller logs: %s", err)
106106+ }
107107+108108+ By("Fetching Kubernetes events")
109109+ cmd = exec.Command("kubectl", "get", "events", "-n", namespace, "--sort-by=.lastTimestamp")
110110+ eventsOutput, err := utils.Run(cmd)
111111+ if err == nil {
112112+ _, _ = fmt.Fprintf(GinkgoWriter, "Kubernetes events:\n%s", eventsOutput)
113113+ } else {
114114+ _, _ = fmt.Fprintf(GinkgoWriter, "Failed to get Kubernetes events: %s", err)
115115+ }
116116+117117+ By("Fetching curl-metrics logs")
118118+ cmd = exec.Command("kubectl", "logs", "curl-metrics", "-n", namespace)
119119+ metricsOutput, err := utils.Run(cmd)
120120+ if err == nil {
121121+ _, _ = fmt.Fprintf(GinkgoWriter, "Metrics logs:\n %s", metricsOutput)
122122+ } else {
123123+ _, _ = fmt.Fprintf(GinkgoWriter, "Failed to get curl-metrics logs: %s", err)
124124+ }
125125+126126+ By("Fetching controller manager pod description")
127127+ cmd = exec.Command("kubectl", "describe", "pod", controllerPodName, "-n", namespace)
128128+ podDescription, err := utils.Run(cmd)
129129+ if err == nil {
130130+ fmt.Println("Pod description:\n", podDescription)
131131+ } else {
132132+ fmt.Println("Failed to describe controller pod")
133133+ }
134134+ }
135135+ })
136136+137137+ SetDefaultEventuallyTimeout(2 * time.Minute)
138138+ SetDefaultEventuallyPollingInterval(time.Second)
139139+140140+ Context("Manager", func() {
141141+ It("should run successfully", func() {
142142+ By("validating that the controller-manager pod is running as expected")
143143+ verifyControllerUp := func(g Gomega) {
144144+ // Get the name of the controller-manager pod
145145+ cmd := exec.Command("kubectl", "get",
146146+ "pods", "-l", "control-plane=controller-manager",
147147+ "-o", "go-template={{ range .items }}"+
148148+ "{{ if not .metadata.deletionTimestamp }}"+
149149+ "{{ .metadata.name }}"+
150150+ "{{ \"\\n\" }}{{ end }}{{ end }}",
151151+ "-n", namespace,
152152+ )
153153+154154+ podOutput, err := utils.Run(cmd)
155155+ g.Expect(err).NotTo(HaveOccurred(), "Failed to retrieve controller-manager pod information")
156156+ podNames := utils.GetNonEmptyLines(podOutput)
157157+ g.Expect(podNames).To(HaveLen(1), "expected 1 controller pod running")
158158+ controllerPodName = podNames[0]
159159+ g.Expect(controllerPodName).To(ContainSubstring("controller-manager"))
160160+161161+ // Validate the pod's status
162162+ cmd = exec.Command("kubectl", "get",
163163+ "pods", controllerPodName, "-o", "jsonpath={.status.phase}",
164164+ "-n", namespace,
165165+ )
166166+ output, err := utils.Run(cmd)
167167+ g.Expect(err).NotTo(HaveOccurred())
168168+ g.Expect(output).To(Equal("Running"), "Incorrect controller-manager pod status")
169169+ }
170170+ Eventually(verifyControllerUp).Should(Succeed())
171171+ })
172172+173173+ It("should ensure the metrics endpoint is serving metrics", func() {
174174+ By("creating a ClusterRoleBinding for the service account to allow access to metrics")
175175+ cmd := exec.Command("kubectl", "create", "clusterrolebinding", metricsRoleBindingName,
176176+ "--clusterrole=loom-metrics-reader",
177177+ fmt.Sprintf("--serviceaccount=%s:%s", namespace, serviceAccountName),
178178+ )
179179+ _, err := utils.Run(cmd)
180180+ Expect(err).NotTo(HaveOccurred(), "Failed to create ClusterRoleBinding")
181181+182182+ By("validating that the metrics service is available")
183183+ cmd = exec.Command("kubectl", "get", "service", metricsServiceName, "-n", namespace)
184184+ _, err = utils.Run(cmd)
185185+ Expect(err).NotTo(HaveOccurred(), "Metrics service should exist")
186186+187187+ By("getting the service account token")
188188+ token, err := serviceAccountToken()
189189+ Expect(err).NotTo(HaveOccurred())
190190+ Expect(token).NotTo(BeEmpty())
191191+192192+ By("waiting for the metrics endpoint to be ready")
193193+ verifyMetricsEndpointReady := func(g Gomega) {
194194+ cmd := exec.Command("kubectl", "get", "endpoints", metricsServiceName, "-n", namespace)
195195+ output, err := utils.Run(cmd)
196196+ g.Expect(err).NotTo(HaveOccurred())
197197+ g.Expect(output).To(ContainSubstring("8443"), "Metrics endpoint is not ready")
198198+ }
199199+ Eventually(verifyMetricsEndpointReady).Should(Succeed())
200200+201201+ By("verifying that the controller manager is serving the metrics server")
202202+ verifyMetricsServerStarted := func(g Gomega) {
203203+ cmd := exec.Command("kubectl", "logs", controllerPodName, "-n", namespace)
204204+ output, err := utils.Run(cmd)
205205+ g.Expect(err).NotTo(HaveOccurred())
206206+ g.Expect(output).To(ContainSubstring("controller-runtime.metrics\tServing metrics server"),
207207+ "Metrics server not yet started")
208208+ }
209209+ Eventually(verifyMetricsServerStarted).Should(Succeed())
210210+211211+ By("creating the curl-metrics pod to access the metrics endpoint")
212212+ cmd = exec.Command("kubectl", "run", "curl-metrics", "--restart=Never",
213213+ "--namespace", namespace,
214214+ "--image=curlimages/curl:latest",
215215+ "--overrides",
216216+ fmt.Sprintf(`{
217217+ "spec": {
218218+ "containers": [{
219219+ "name": "curl",
220220+ "image": "curlimages/curl:latest",
221221+ "command": ["/bin/sh", "-c"],
222222+ "args": ["curl -v -k -H 'Authorization: Bearer %s' https://%s.%s.svc.cluster.local:8443/metrics"],
223223+ "securityContext": {
224224+ "allowPrivilegeEscalation": false,
225225+ "capabilities": {
226226+ "drop": ["ALL"]
227227+ },
228228+ "runAsNonRoot": true,
229229+ "runAsUser": 1000,
230230+ "seccompProfile": {
231231+ "type": "RuntimeDefault"
232232+ }
233233+ }
234234+ }],
235235+ "serviceAccount": "%s"
236236+ }
237237+ }`, token, metricsServiceName, namespace, serviceAccountName))
238238+ _, err = utils.Run(cmd)
239239+ Expect(err).NotTo(HaveOccurred(), "Failed to create curl-metrics pod")
240240+241241+ By("waiting for the curl-metrics pod to complete.")
242242+ verifyCurlUp := func(g Gomega) {
243243+ cmd := exec.Command("kubectl", "get", "pods", "curl-metrics",
244244+ "-o", "jsonpath={.status.phase}",
245245+ "-n", namespace)
246246+ output, err := utils.Run(cmd)
247247+ g.Expect(err).NotTo(HaveOccurred())
248248+ g.Expect(output).To(Equal("Succeeded"), "curl pod in wrong status")
249249+ }
250250+ Eventually(verifyCurlUp, 5*time.Minute).Should(Succeed())
251251+252252+ By("getting the metrics by checking curl-metrics logs")
253253+ metricsOutput := getMetricsOutput()
254254+ Expect(metricsOutput).To(ContainSubstring(
255255+ "controller_runtime_reconcile_total",
256256+ ))
257257+ })
258258+259259+ // +kubebuilder:scaffold:e2e-webhooks-checks
260260+261261+ // TODO: Customize the e2e test suite with scenarios specific to your project.
262262+ // Consider applying sample/CR(s) and check their status and/or verifying
263263+ // the reconciliation by using the metrics, i.e.:
264264+ // metricsOutput := getMetricsOutput()
265265+ // Expect(metricsOutput).To(ContainSubstring(
266266+ // fmt.Sprintf(`controller_runtime_reconcile_total{controller="%s",result="success"} 1`,
267267+ // strings.ToLower(<Kind>),
268268+ // ))
269269+ })
270270+})
271271+272272+// serviceAccountToken returns a token for the specified service account in the given namespace.
273273+// It uses the Kubernetes TokenRequest API to generate a token by directly sending a request
274274+// and parsing the resulting token from the API response.
275275+func serviceAccountToken() (string, error) {
276276+ const tokenRequestRawString = `{
277277+ "apiVersion": "authentication.k8s.io/v1",
278278+ "kind": "TokenRequest"
279279+ }`
280280+281281+ // Temporary file to store the token request
282282+ secretName := fmt.Sprintf("%s-token-request", serviceAccountName)
283283+ tokenRequestFile := filepath.Join("/tmp", secretName)
284284+ err := os.WriteFile(tokenRequestFile, []byte(tokenRequestRawString), os.FileMode(0o644))
285285+ if err != nil {
286286+ return "", err
287287+ }
288288+289289+ var out string
290290+ verifyTokenCreation := func(g Gomega) {
291291+ // Execute kubectl command to create the token
292292+ cmd := exec.Command("kubectl", "create", "--raw", fmt.Sprintf(
293293+ "/api/v1/namespaces/%s/serviceaccounts/%s/token",
294294+ namespace,
295295+ serviceAccountName,
296296+ ), "-f", tokenRequestFile)
297297+298298+ output, err := cmd.CombinedOutput()
299299+ g.Expect(err).NotTo(HaveOccurred())
300300+301301+ // Parse the JSON output to extract the token
302302+ var token tokenRequest
303303+ err = json.Unmarshal(output, &token)
304304+ g.Expect(err).NotTo(HaveOccurred())
305305+306306+ out = token.Status.Token
307307+ }
308308+ Eventually(verifyTokenCreation).Should(Succeed())
309309+310310+ return out, err
311311+}
312312+313313+// getMetricsOutput retrieves and returns the logs from the curl pod used to access the metrics endpoint.
314314+func getMetricsOutput() string {
315315+ By("getting the curl-metrics logs")
316316+ cmd := exec.Command("kubectl", "logs", "curl-metrics", "-n", namespace)
317317+ metricsOutput, err := utils.Run(cmd)
318318+ Expect(err).NotTo(HaveOccurred(), "Failed to retrieve logs from curl pod")
319319+ Expect(metricsOutput).To(ContainSubstring("< HTTP/1.1 200 OK"))
320320+ return metricsOutput
321321+}
322322+323323+// tokenRequest is a simplified representation of the Kubernetes TokenRequest API response,
324324+// containing only the token field that we need to extract.
325325+type tokenRequest struct {
326326+ Status struct {
327327+ Token string `json:"token"`
328328+ } `json:"status"`
329329+}
+254
test/utils/utils.go
···11+/*
22+Copyright 2025 Evan Jarrett.
33+44+Licensed under the Apache License, Version 2.0 (the "License");
55+you may not use this file except in compliance with the License.
66+You may obtain a copy of the License at
77+88+ http://www.apache.org/licenses/LICENSE-2.0
99+1010+Unless required by applicable law or agreed to in writing, software
1111+distributed under the License is distributed on an "AS IS" BASIS,
1212+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313+See the License for the specific language governing permissions and
1414+limitations under the License.
1515+*/
1616+1717+package utils
1818+1919+import (
2020+ "bufio"
2121+ "bytes"
2222+ "fmt"
2323+ "os"
2424+ "os/exec"
2525+ "strings"
2626+2727+ . "github.com/onsi/ginkgo/v2" // nolint:revive,staticcheck
2828+)
2929+3030+const (
3131+ prometheusOperatorVersion = "v0.77.1"
3232+ prometheusOperatorURL = "https://github.com/prometheus-operator/prometheus-operator/" +
3333+ "releases/download/%s/bundle.yaml"
3434+3535+ certmanagerVersion = "v1.16.3"
3636+ certmanagerURLTmpl = "https://github.com/cert-manager/cert-manager/releases/download/%s/cert-manager.yaml"
3737+)
3838+3939+func warnError(err error) {
4040+ _, _ = fmt.Fprintf(GinkgoWriter, "warning: %v\n", err)
4141+}
4242+4343+// Run executes the provided command within this context
4444+func Run(cmd *exec.Cmd) (string, error) {
4545+ dir, _ := GetProjectDir()
4646+ cmd.Dir = dir
4747+4848+ if err := os.Chdir(cmd.Dir); err != nil {
4949+ _, _ = fmt.Fprintf(GinkgoWriter, "chdir dir: %q\n", err)
5050+ }
5151+5252+ cmd.Env = append(os.Environ(), "GO111MODULE=on")
5353+ command := strings.Join(cmd.Args, " ")
5454+ _, _ = fmt.Fprintf(GinkgoWriter, "running: %q\n", command)
5555+ output, err := cmd.CombinedOutput()
5656+ if err != nil {
5757+ return string(output), fmt.Errorf("%q failed with error %q: %w", command, string(output), err)
5858+ }
5959+6060+ return string(output), nil
6161+}
6262+6363+// InstallPrometheusOperator installs the prometheus Operator to be used to export the enabled metrics.
6464+func InstallPrometheusOperator() error {
6565+ url := fmt.Sprintf(prometheusOperatorURL, prometheusOperatorVersion)
6666+ cmd := exec.Command("kubectl", "create", "-f", url)
6767+ _, err := Run(cmd)
6868+ return err
6969+}
7070+7171+// UninstallPrometheusOperator uninstalls the prometheus
7272+func UninstallPrometheusOperator() {
7373+ url := fmt.Sprintf(prometheusOperatorURL, prometheusOperatorVersion)
7474+ cmd := exec.Command("kubectl", "delete", "-f", url)
7575+ if _, err := Run(cmd); err != nil {
7676+ warnError(err)
7777+ }
7878+}
7979+8080+// IsPrometheusCRDsInstalled checks if any Prometheus CRDs are installed
8181+// by verifying the existence of key CRDs related to Prometheus.
8282+func IsPrometheusCRDsInstalled() bool {
8383+ // List of common Prometheus CRDs
8484+ prometheusCRDs := []string{
8585+ "prometheuses.monitoring.coreos.com",
8686+ "prometheusrules.monitoring.coreos.com",
8787+ "prometheusagents.monitoring.coreos.com",
8888+ }
8989+9090+ cmd := exec.Command("kubectl", "get", "crds", "-o", "custom-columns=NAME:.metadata.name")
9191+ output, err := Run(cmd)
9292+ if err != nil {
9393+ return false
9494+ }
9595+ crdList := GetNonEmptyLines(output)
9696+ for _, crd := range prometheusCRDs {
9797+ for _, line := range crdList {
9898+ if strings.Contains(line, crd) {
9999+ return true
100100+ }
101101+ }
102102+ }
103103+104104+ return false
105105+}
106106+107107+// UninstallCertManager uninstalls the cert manager
108108+func UninstallCertManager() {
109109+ url := fmt.Sprintf(certmanagerURLTmpl, certmanagerVersion)
110110+ cmd := exec.Command("kubectl", "delete", "-f", url)
111111+ if _, err := Run(cmd); err != nil {
112112+ warnError(err)
113113+ }
114114+}
115115+116116+// InstallCertManager installs the cert manager bundle.
117117+func InstallCertManager() error {
118118+ url := fmt.Sprintf(certmanagerURLTmpl, certmanagerVersion)
119119+ cmd := exec.Command("kubectl", "apply", "-f", url)
120120+ if _, err := Run(cmd); err != nil {
121121+ return err
122122+ }
123123+ // Wait for cert-manager-webhook to be ready, which can take time if cert-manager
124124+ // was re-installed after uninstalling on a cluster.
125125+ cmd = exec.Command("kubectl", "wait", "deployment.apps/cert-manager-webhook",
126126+ "--for", "condition=Available",
127127+ "--namespace", "cert-manager",
128128+ "--timeout", "5m",
129129+ )
130130+131131+ _, err := Run(cmd)
132132+ return err
133133+}
134134+135135+// IsCertManagerCRDsInstalled checks if any Cert Manager CRDs are installed
136136+// by verifying the existence of key CRDs related to Cert Manager.
137137+func IsCertManagerCRDsInstalled() bool {
138138+ // List of common Cert Manager CRDs
139139+ certManagerCRDs := []string{
140140+ "certificates.cert-manager.io",
141141+ "issuers.cert-manager.io",
142142+ "clusterissuers.cert-manager.io",
143143+ "certificaterequests.cert-manager.io",
144144+ "orders.acme.cert-manager.io",
145145+ "challenges.acme.cert-manager.io",
146146+ }
147147+148148+ // Execute the kubectl command to get all CRDs
149149+ cmd := exec.Command("kubectl", "get", "crds")
150150+ output, err := Run(cmd)
151151+ if err != nil {
152152+ return false
153153+ }
154154+155155+ // Check if any of the Cert Manager CRDs are present
156156+ crdList := GetNonEmptyLines(output)
157157+ for _, crd := range certManagerCRDs {
158158+ for _, line := range crdList {
159159+ if strings.Contains(line, crd) {
160160+ return true
161161+ }
162162+ }
163163+ }
164164+165165+ return false
166166+}
167167+168168+// LoadImageToKindClusterWithName loads a local docker image to the kind cluster
169169+func LoadImageToKindClusterWithName(name string) error {
170170+ cluster := "kind"
171171+ if v, ok := os.LookupEnv("KIND_CLUSTER"); ok {
172172+ cluster = v
173173+ }
174174+ kindOptions := []string{"load", "docker-image", name, "--name", cluster}
175175+ cmd := exec.Command("kind", kindOptions...)
176176+ _, err := Run(cmd)
177177+ return err
178178+}
179179+180180+// GetNonEmptyLines converts given command output string into individual objects
181181+// according to line breakers, and ignores the empty elements in it.
182182+func GetNonEmptyLines(output string) []string {
183183+ var res []string
184184+ elements := strings.Split(output, "\n")
185185+ for _, element := range elements {
186186+ if element != "" {
187187+ res = append(res, element)
188188+ }
189189+ }
190190+191191+ return res
192192+}
193193+194194+// GetProjectDir will return the directory where the project is
195195+func GetProjectDir() (string, error) {
196196+ wd, err := os.Getwd()
197197+ if err != nil {
198198+ return wd, fmt.Errorf("failed to get current working directory: %w", err)
199199+ }
200200+ wd = strings.ReplaceAll(wd, "/test/e2e", "")
201201+ return wd, nil
202202+}
203203+204204+// UncommentCode searches for target in the file and remove the comment prefix
205205+// of the target content. The target content may span multiple lines.
206206+func UncommentCode(filename, target, prefix string) error {
207207+ // false positive
208208+ // nolint:gosec
209209+ content, err := os.ReadFile(filename)
210210+ if err != nil {
211211+ return fmt.Errorf("failed to read file %q: %w", filename, err)
212212+ }
213213+ strContent := string(content)
214214+215215+ idx := strings.Index(strContent, target)
216216+ if idx < 0 {
217217+ return fmt.Errorf("unable to find the code %q to be uncomment", target)
218218+ }
219219+220220+ out := new(bytes.Buffer)
221221+ _, err = out.Write(content[:idx])
222222+ if err != nil {
223223+ return fmt.Errorf("failed to write to output: %w", err)
224224+ }
225225+226226+ scanner := bufio.NewScanner(bytes.NewBufferString(target))
227227+ if !scanner.Scan() {
228228+ return nil
229229+ }
230230+ for {
231231+ if _, err = out.WriteString(strings.TrimPrefix(scanner.Text(), prefix)); err != nil {
232232+ return fmt.Errorf("failed to write to output: %w", err)
233233+ }
234234+ // Avoid writing a newline in case the previous line was the last in target.
235235+ if !scanner.Scan() {
236236+ break
237237+ }
238238+ if _, err = out.WriteString("\n"); err != nil {
239239+ return fmt.Errorf("failed to write to output: %w", err)
240240+ }
241241+ }
242242+243243+ if _, err = out.Write(content[idx+len(target):]); err != nil {
244244+ return fmt.Errorf("failed to write to output: %w", err)
245245+ }
246246+247247+ // false positive
248248+ // nolint:gosec
249249+ if err = os.WriteFile(filename, out.Bytes(), 0644); err != nil {
250250+ return fmt.Errorf("failed to write file %q: %w", filename, err)
251251+ }
252252+253253+ return nil
254254+}