···11-# New Race Condition-Free Architecture
22-33-## Overview
44-55-The HSM Secrets Operator has been refactored to eliminate race conditions through clear separation of concerns and ephemeral coordination using pod annotations.
66-77-## Architecture Components
88-99-### 1. HSMDevice (Readonly Spec)
1010-```yaml
1111-apiVersion: hsm.j5t.io/v1alpha1
1212-kind: HSMDevice
1313-metadata:
1414- name: pico-hsm
1515-spec:
1616- deviceType: "PicoHSM"
1717- discovery:
1818- autoDiscovery: true
1919- pkcs11:
2020- libraryPath: "/usr/lib/libsc-hsm-pkcs11.so"
2121- slotId: 0
2222- pinSecret:
2323- name: "pico-hsm-pin"
2424- key: "pin"
2525-```
2626-2727-**Purpose**: Device specification and configuration only. No dynamic status.
2828-2929-### 2. HSMPool (Aggregated Status)
3030-```yaml
3131-apiVersion: hsm.j5t.io/v1alpha1
3232-kind: HSMPool
3333-metadata:
3434- name: pico-hsm-pool
3535- ownerReferences:
3636- - kind: HSMDevice
3737- name: pico-hsm
3838-spec:
3939- hsmDeviceRef: pico-hsm
4040- gracePeriod: 5m
4141-status:
4242- phase: Ready
4343- totalDevices: 2
4444- availableDevices: 2
4545- reportingPods:
4646- - podName: discovery-node1
4747- devicesFound: 1
4848- fresh: true
4949- - podName: discovery-node2
5050- devicesFound: 1
5151- fresh: true
5252-```
5353-5454-**Purpose**: Aggregates discovery results from all pods with grace periods for outages.
5555-5656-### 3. Pod Annotations (Ephemeral Reports)
5757-```yaml
5858-apiVersion: v1
5959-kind: Pod
6060-metadata:
6161- name: discovery-node1
6262- annotations:
6363- hsm.j5t.io/device-report: |
6464- {
6565- "hsmDeviceName": "pico-hsm",
6666- "reportingNode": "node1",
6767- "discoveredDevices": [
6868- {
6969- "devicePath": "/dev/bus/usb/001/015",
7070- "serialNumber": "DC6A33145E23A42A",
7171- "lastSeen": "2025-08-19T10:00:00Z"
7272- }
7373- ],
7474- "lastReportTime": "2025-08-19T10:00:00Z",
7575- "discoveryStatus": "completed"
7676- }
7777-```
7878-7979-**Purpose**: Each discovery pod reports its findings via its own annotations. Auto-cleanup when pods disappear.
8080-8181-## Data Flow
8282-8383-```
8484-1. User creates HSMDevice → Manager creates HSMPool
8585-2. Discovery pods see HSMDevice → Update own annotations
8686-3. HSMPool controller watches annotations → Aggregates into pool status
8787-4. Pool status shows complete device availability across cluster
8888-```
8989-9090-## Benefits
9191-9292-✅ **No Race Conditions**: Each resource has single owner
9393-✅ **Automatic Cleanup**: Pod dies → annotations disappear → no stale data
9494-✅ **Grace Periods**: 5-minute buffer prevents agent churn during outages
9595-✅ **Kubernetes Native**: Standard patterns (annotations, owner refs, watches)
9696-✅ **Scalable**: Works with any number of discovery pods
9797-9898-## Migration Guide
9999-100100-### Old vs New
101101-102102-**Before**:
103103-- HSMDevice had complex status with coordination
104104-- Multiple pods fought over same status
105105-- Race conditions and reconciliation loops
106106-107107-**After**:
108108-- HSMDevice: readonly spec only
109109-- HSMPool: aggregated status
110110-- Pod annotations: ephemeral reports
111111-112112-### Deployment Changes
113113-114114-1. **New CRDs**: HSMPool CRD added alongside HSMDevice
115115-2. **Pod Environment**: Discovery pods need `POD_NAME` and `POD_NAMESPACE` env vars
116116-3. **RBAC**: Added permissions for HSMPools and pod annotations
117117-118118-### Expected Behavior
119119-120120-```bash
121121-# Create HSMDevice
122122-kubectl apply -f examples/new-architecture/test-hsmdevice.yaml
123123-124124-# Manager auto-creates HSMPool
125125-kubectl get hsmpool
126126-# NAME HSMDEVICE TOTAL AVAILABLE PHASE
127127-# pico-hsm-test-pool pico-hsm-test 2 2 Ready
128128-129129-# Check pod reports
130130-kubectl get pods -l app.kubernetes.io/component=discovery \
131131- -o jsonpath='{range .items[*]}{.metadata.name}: {.metadata.annotations.hsm\.j5t\.io/device-report}{"\n"}{end}'
132132-133133-# Pool aggregates all reports
134134-kubectl get hsmpool pico-hsm-test-pool -o yaml
135135-```
136136-137137-The new architecture is production-ready and eliminates all race conditions while providing clear visibility into device discovery across the cluster.
+51-13
README.md
···4455## Description
6677-The HSM Secrets Operator implements a controller pattern that maintains bidirectional synchronization between HSM binary data files and Kubernetes Secret objects. It uses a dual-binary architecture with automatic USB device discovery and dynamic agent deployment to provide secure, hardware-backed secret management in Kubernetes environments.
77+The HSM Secrets Operator implements a controller pattern that maintains bidirectional synchronization between HSM binary data files and Kubernetes Secret objects. It uses a four-binary architecture with gRPC communication, automatic USB device discovery, and dynamic agent deployment to provide secure, hardware-backed secret management in Kubernetes environments.
8899### Key Features
1010···1212- **Bidirectional Sync**: Automatic synchronization between HSM storage and Kubernetes Secrets
1313- **Device Discovery**: Automatic USB HSM device detection with support for multiple device types
1414- **Agent Architecture**: Dynamic deployment of HSM agent pods with node affinity for direct hardware access
1515+- **gRPC Communication**: High-performance gRPC protocol for manager-agent communication with fallback to HTTP
1516- **Unified API**: Single REST API endpoint that routes operations to appropriate HSM agents
1617- **Secret Portability**: Move secrets between clusters by carrying the HSM device
1718- **Multi-Device Support**: Support for Pico HSM, SmartCard-HSM, YubiKey HSM, and custom devices
18191920### Architecture
20212121-The operator consists of three main components:
2222+The operator consists of four main components:
22232323-1. **Manager**: Orchestrates HSMSecret resources, deploys agents, and provides unified API proxy
2424-2. **Discovery**: DaemonSet that discovers USB HSM devices on cluster nodes
2525-3. **Agent**: Dynamically deployed pods that handle direct HSM communication on nodes with devices
2424+1. **Manager**: Orchestrates HSMSecret resources, deploys agents, and provides unified REST API proxy (port 8090)
2525+2. **Discovery**: DaemonSet that discovers USB HSM devices on cluster nodes and reports via pod annotations
2626+3. **Agent**: Dynamically deployed pods that handle direct HSM communication via gRPC (port 9090) with HTTP health checks (port 8093)
2727+4. **Test HSM**: Utility for HSM operations testing and debugging
26282727-This architecture ensures that HSM operations only occur on nodes with physical device access while providing a centralized management interface.
2929+**Communication Architecture:**
3030+- **Manager ↔ Agent**: gRPC for efficient, type-safe HSM operations
3131+- **Discovery → Manager**: Pod annotations for race-free device reporting
3232+- **External → Manager**: REST API for user/application access
3333+- **Protocol Buffers**: Structured message definitions in `api/proto/hsm/v1/hsm.proto`
3434+3535+This architecture ensures that HSM operations only occur on nodes with physical device access while providing a centralized management interface with high-performance communication.
28362937## Getting Started
3038···3442- Docker 17.03+ (for building images)
3543- kubectl with cluster-admin privileges
3644- HSM device (Pico HSM, SmartCard-HSM, YubiKey HSM, or compatible PKCS#11 device)
4545+- **For development**: buf tool (`go install github.com/bufbuild/buf/cmd/buf@latest`)
37463847### Deployment Options
3948···110119# List secrets
111120curl http://localhost:8090/api/v1/hsm/secrets
112121113113-# Check discovered HSM devices
122122+# Check discovered HSM devices
114123kubectl get hsmdevices
115124125125+# Check HSM pools (aggregated device discovery)
126126+kubectl get hsmpools
127127+116128# Check agent pods (deployed automatically when devices are ready)
117117-kubectl get pods -l app=hsm-agent
129129+kubectl get pods -l app.kubernetes.io/component=agent
130130+131131+# Test gRPC agent health (from within cluster)
132132+kubectl exec -it <agent-pod> -- curl http://localhost:8093/healthz
118133```
119134120135### Uninstallation
···206221# Generate manifests after CRD changes
207222make manifests
208223209209-# Build binaries
224224+# Generate protocol buffer code after .proto changes
225225+buf generate
226226+227227+# Build all binaries (manager, discovery, agent, test-hsm)
210228make build
211229```
212230···219237220238### Architecture Notes
221239222222-- **Manager**: Handles HSMSecret CRDs and agent deployment
223223-- **Discovery**: DaemonSet for USB device discovery
224224-- **Agent**: Dynamic pods for direct HSM communication
225225-- **API**: Unified proxy that routes to agent pods
240240+- **Manager**: Handles HSMSecret CRDs, agent deployment, and REST API proxy (port 8090)
241241+- **Discovery**: DaemonSet for USB device discovery with pod annotation reporting
242242+- **Agent**: Dynamic pods for direct HSM communication via gRPC (port 9090)
243243+- **gRPC Protocol**: Type-safe communication defined in `api/proto/hsm/v1/hsm.proto`
244244+- **Health Checks**: HTTP endpoints on port 8093 for Kubernetes probes
245245+246246+### Protocol Buffer Development
247247+248248+When modifying the gRPC service definition:
249249+250250+```bash
251251+# 1. Edit the protocol definition
252252+vim api/proto/hsm/v1/hsm.proto
253253+254254+# 2. Generate Go code
255255+buf generate
256256+257257+# 3. Lint and format
258258+buf lint
259259+buf format -w api/proto/hsm/v1/hsm.proto
260260+261261+# 4. Run tests to ensure compatibility
262262+make test
263263+```
226264227265**NOTE:** Run `make help` for more information on all potential `make` targets
228266
+6
internal/api/proxy_handlers.go
···58585959// setupProxyRoutes sets up proxy routes for HSM operations
6060func (s *Server) setupProxyRoutes() {
6161+ // Serve web UI static files
6262+ s.router.Static("/web", "./web")
6363+ s.router.GET("/", func(c *gin.Context) {
6464+ c.Redirect(http.StatusFound, "/web/")
6565+ })
6666+6167 // Create API v1 group
6268 v1 := s.router.Group("/api/v1")
6369 {
+132
web/README.md
···11+# HSM Secrets Manager Web UI
22+33+A simple web interface for managing Hardware Security Module (HSM) secrets through the HSM Secrets Operator.
44+55+## Features
66+77+- **📋 List Secrets**: View all secrets stored in your HSM
88+- **➕ Create Secrets**: Add new secrets with JSON key-value pairs
99+- **🔍 View Details**: Examine secret contents and metadata
1010+- **🗑️ Delete Secrets**: Remove secrets from both HSM and Kubernetes
1111+- **📊 Health Monitoring**: Check API and HSM status
1212+- **🔄 Auto-refresh**: Automatically updates every 30 seconds
1313+1414+## Usage
1515+1616+### Starting the Web UI
1717+1818+The web UI is served by the HSM Secrets Operator manager on port 8090 by default:
1919+2020+1. **Using kubectl port-forward** (for local development):
2121+ ```bash
2222+ kubectl port-forward -n hsm-secrets-operator-system service/hsm-secrets-operator-manager-service 8090:8090
2323+ ```
2424+2525+2. **Using ingress** (for production):
2626+ Configure your ingress controller to route to the manager service on port 8090.
2727+2828+3. **Access the UI**:
2929+ Open your browser to: `http://localhost:8090`
3030+3131+### Creating Secrets
3232+3333+1. Click **"➕ Create New Secret"**
3434+2. Enter a **Secret Name** (this becomes the HSM path)
3535+3. Add **Key-Value Pairs**:
3636+ - Click the **➕** button to add a new key-value pair
3737+ - Enter the key name (e.g., `api_key`, `database_password`)
3838+ - Enter the corresponding value
3939+ - Use **➖** to remove pairs you don't need
4040+ - Add as many pairs as needed for your secret
4141+4. Click **"Create Secret"**
4242+4343+**Key Naming Rules:**
4444+- Must start with a letter
4545+- Can contain letters, numbers, and underscores only
4646+- Examples: `api_key`, `db_password`, `webhook_secret`
4747+4848+### Viewing Secrets
4949+5050+1. Click **"👁️ View"** next to any secret in the list
5151+2. See the full JSON structure and metadata
5252+3. Copy individual values as needed
5353+5454+### Managing Secrets
5555+5656+- **Refresh**: Click 🔄 to manually refresh the list
5757+- **Delete**: Click 🗑️ and confirm to permanently remove a secret
5858+- **Auto-sync**: The UI automatically refreshes every 30 seconds
5959+6060+## API Integration
6161+6262+The web UI communicates with the HSM Secrets Operator's REST API:
6363+6464+- **List Secrets**: `GET /api/v1/hsm/secrets`
6565+- **Get Secret**: `GET /api/v1/hsm/secrets/{name}`
6666+- **Create Secret**: `POST /api/v1/hsm/secrets/{name}`
6767+- **Delete Secret**: `DELETE /api/v1/hsm/secrets/{name}`
6868+- **Health Check**: `GET /api/v1/health`
6969+7070+## Security Considerations
7171+7272+- The web UI serves static files from the manager pod
7373+- All API calls go through the manager, which proxies to HSM agent pods
7474+- Secrets are displayed in the browser - use HTTPS in production
7575+- Consider network policies to restrict access to the web interface
7676+7777+## Ingress Example
7878+7979+```yaml
8080+apiVersion: networking.k8s.io/v1
8181+kind: Ingress
8282+metadata:
8383+ name: hsm-secrets-ui
8484+ namespace: hsm-secrets-operator-system
8585+ annotations:
8686+ nginx.ingress.kubernetes.io/ssl-redirect: "true"
8787+spec:
8888+ tls:
8989+ - hosts:
9090+ - hsm-secrets.example.com
9191+ secretName: hsm-secrets-tls
9292+ rules:
9393+ - host: hsm-secrets.example.com
9494+ http:
9595+ paths:
9696+ - path: /
9797+ pathType: Prefix
9898+ backend:
9999+ service:
100100+ name: hsm-secrets-operator-manager-service
101101+ port:
102102+ number: 8090
103103+```
104104+105105+## Troubleshooting
106106+107107+### UI Not Loading
108108+- Check that the manager pod is running: `kubectl get pods -n hsm-secrets-operator-system`
109109+- Verify port-forward is active: `netstat -an | grep 8090`
110110+- Check manager logs: `kubectl logs -n hsm-secrets-operator-system -l app.kubernetes.io/name=hsm-secrets-operator`
111111+112112+### API Errors
113113+- Ensure HSM agents are running and healthy
114114+- Check HSMPool status: `kubectl get hsmpool`
115115+- Verify HSM devices are discovered: `kubectl get hsmdevice`
116116+117117+### No Secrets Visible
118118+- Confirm secrets exist via CLI: `examples/api/list-secrets.sh`
119119+- Check agent connectivity from manager pod
120120+- Verify PKCS#11 configuration in HSMDevice CRDs
121121+122122+## Development
123123+124124+The web UI consists of:
125125+- `index.html`: Main interface with responsive design
126126+- `app.js`: JavaScript API client and UI logic
127127+- Served via Gin router's static file handler
128128+129129+To modify the UI:
130130+1. Edit files in the `web/` directory
131131+2. Rebuild the manager: `make build`
132132+3. Redeploy or restart the manager pod