CLAUDE.md#
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
⚠️ Important Development Context#
Remote Kubernetes Environment: The Kubernetes cluster is running remotely, NOT on this local development system. Any local device checks (like ls /dev/tty* or local USB device detection) will NOT work and will not reflect the actual state of devices on the remote cluster nodes.
Sync Architecture: The operator implements unidirectional sync from HSM to Kubernetes Secrets only. HSM is the authoritative source of truth. K8s Secrets are read-only replicas that get updated when HSM data changes. There is no K8s → HSM sync functionality.
GitOps Deployment: This Kubernetes deployment is GitOps-based. You are not able to push new images to the Kubernetes cluster from this machine. Code changes require updating the deployment through the GitOps pipeline.
Code Modernization: Avoid keeping legacy code whenever possible. Replace and improve functions as necessary rather than maintaining backward compatibility with outdated patterns.
Project Overview#
A Kubernetes operator that bridges Hardware Security Module (HSM) data storage with Kubernetes Secrets, providing true secret portability through hardware-based security. The operator implements a controller pattern that synchronizes HSM binary data files to Kubernetes Secret objects using a unified binary architecture with gRPC communication, automatic USB device discovery, and dynamic agent deployment.
Architecture: Unified Binary with Mode-Based Operation#
The project uses a unified binary (cmd/hsm-operator/main.go) that operates in different modes, replacing the previous four-binary architecture. A separate test utility (cmd/test-hsm/main.go) provides HSM testing capabilities.
Core Components#
Unified Binary Modes:
-
Manager Mode (
--mode=manager):- Orchestrates HSMSecret resources and deploys agents
- Provides unified REST API proxy on port 8090
- Handles HSMPool aggregation from discovery pod annotations
- Controllers:
HSMSecretReconciler,HSMPoolReconciler,HSMPoolAgentReconciler,DiscoveryDaemonSetReconciler
-
Discovery Mode (
--mode=discovery):- DaemonSet that discovers USB HSM devices on cluster nodes
- Reports findings via pod annotations (race-free architecture)
- Native sysfs scanning with Talos Linux support
-
Agent Mode (
--mode=agent):- Dynamically deployed pods for direct HSM communication
- gRPC server on port 9090 with HTTP health checks on port 8093
- Real PKCS#11 client for production or MockClient for testing
- Manages pcscd daemon lifecycle internally via PCSCDManager (no shell scripts required)
Test Utility:
- Test HSM Binary (
cmd/test-hsm/main.go): Standalone HSM operations testing and debugging
Key Architectural Patterns#
Race-Free Coordination:
- HSMDevice CRDs contain readonly specifications only (no status field)
- Discovery pods report via their own pod annotations
- HSMPool CRDs are the source of truth for agent discovery and multi-device operations
- HSMPool aggregates all discovery reports from multiple nodes
- Owner references ensure automatic cleanup when resources are deleted
- 5-minute grace periods prevent agent churn during outages
Multi-Device Agent Architecture:
- HSMPool-based Agent Discovery: API and controllers query HSMPool to find all agent instances for a device type
- Multiple Agent Instances: Each physical device gets its own agent pod (e.g.,
hsm-agent-pico-hsm-0,hsm-agent-pico-hsm-1) - Multi-Agent Operations: API operations (list, write, delete) work across all agents when mirroring is enabled
- Automatic Synchronization: Multi-device operations handle conflict detection and resolution between devices
gRPC Communication Architecture:
- Protocol definition in
api/proto/hsm/v1/hsm.protowith 10 HSM operations - Manager ↔ Agent: gRPC for efficient, type-safe HSM operations
- Discovery → Manager: Pod annotations for race-free device reporting
- External → Manager: REST API proxy routes to ALL agents for multi-device operations
- Generated code:
api/proto/hsm/v1/hsm.pb.goandhsm_grpc.pb.go
PCSCD Lifecycle Management:
- PCSCDManager (
internal/agent/pcscd_manager.go): Go-native pcscd process management - Automatically started in agent mode when real PKCS#11 hardware is detected
- Foreground mode with stdout/stderr piping for integrated logging
- Graceful shutdown handling: SIGTERM → 5s timeout → SIGKILL fallback
- Socket-based readiness detection (
/var/run/pcscd/pcscd.comm) - Eliminates shell script dependency for maximum container security
Controller Hierarchy:
Manager Controllers:
├── HSMSecretReconciler - HSM to K8s Secret sync
├── HSMPoolReconciler - Aggregates discovery reports from pod annotations
├── HSMPoolAgentReconciler - Deploys agents dynamically when devices are ready
└── DiscoveryDaemonSetReconciler - Manages discovery DaemonSet lifecycle
Discovery Controllers:
└── HSMDeviceController - USB device discovery via sysfs scanning
Essential Development Commands#
Quality Checks (REQUIRED before committing)#
# Single command for all quality checks (format + vet + lint)
make quality
# Run tests
make test
# Run specific test packages
go test ./internal/controller -v -run TestHSMSecretController
go test ./internal/hsm -v
go test ./internal/discovery -v
Build and Run#
# Build unified binary
make build # Creates bin/hsm-operator
# Run in different modes
make run # Manager mode (default)
make run-agent # Agent mode (requires HSM device)
make run-discovery # Discovery mode
# Build test utility
go build -o bin/test-hsm cmd/test-hsm/main.go
# Test specific modes
./bin/hsm-operator --mode=manager --help
./bin/hsm-operator --mode=agent --device-name=pico-hsm --help
./bin/hsm-operator --mode=discovery --node-name=worker1 --help
Protocol Buffer Development#
# After modifying api/proto/hsm/v1/hsm.proto
buf generate # Generate Go code from .proto files
buf lint # Lint protobuf files
buf format -w # Format protobuf files
# Verify changes don't break existing code
make test
CRD Development#
# After modifying api/v1alpha1/*.go files
make manifests generate # Generate CRDs and DeepCopy methods
make helm-sync # Sync CRDs from config/ to helm/
make quality test # Verify changes
Docker & Deployment#
# Production image (FROM scratch base - ultra-minimal ~15MB)
make docker-build IMG=hsm-secrets-operator:latest
# Multi-arch build (supports amd64 and arm64)
docker buildx build --platform linux/amd64,linux/arm64 -t hsm-secrets-operator:latest .
# Deploy to cluster
make deploy IMG=hsm-secrets-operator:latest
# Generate installer bundle
make build-installer IMG=hsm-secrets-operator:latest
Container Architecture:
- Base Image: FROM scratch (no distro, no shell - maximum security)
- Size: ~15MB (vs ~30MB with distroless:debug)
- Security: Minimal attack surface - only essential binaries and libraries
- Dependencies: Auto-discovered via iterative ldd/strings analysis in builder stage
- Multi-arch: Supports x86_64 and arm64 via dynamic linker auto-detection
- Process Management: Direct binary execution with Go-managed pcscd lifecycle
- No Shell: All process management handled in Go code (internal/agent/pcscd_manager.go)
CRD Structure and Relationships#
HSMSecret CRD (Secret Management)#
apiVersion: hsm.j5t.io/v1alpha1
kind: HSMSecret
metadata:
name: my-secret # HSM path = metadata.name
spec:
autoSync: true # Automatic sync from HSM to K8s (default)
syncInterval: 30 # Sync interval in seconds
status:
syncStatus: "InSync" # InSync|OutOfSync|Error|Pending
hsmChecksum: "sha256:abc123..." # SHA256 checksum for change detection
secretChecksum: "sha256:def456..."
HSMDevice CRD (Device Specifications)#
apiVersion: hsm.j5t.io/v1alpha1
kind: HSMDevice
metadata:
name: my-pico-hsm
spec:
deviceType: "PicoHSM"
discovery:
usb:
vendorId: "20a0"
productId: "4230"
# OR autoDiscovery: true for well-known device types
pkcs11:
libraryPath: "/usr/lib/opensc-pkcs11.so" # Use OpenSC for Pico HSM
slotId: 0
pinSecret:
name: "hsm-pin"
key: "pin"
# NOTE: No status field - readonly specs only, discovery via pod annotations
HSMPool CRD (Device Aggregation)#
apiVersion: hsm.j5t.io/v1alpha1
kind: HSMPool
status:
phase: "Ready" # Pending|Aggregating|Ready|Partial|Error
totalDevices: 2
reportingPods: # Discovery pods currently reporting
- podName: "discovery-node1"
devicesFound: 1
fresh: true # Within grace period
aggregatedDevices: # All discovered devices across cluster
- devicePath: "/dev/bus/usb/001/015"
nodeName: "worker-1"
serialNumber: "DC6A33145E23A42A"
available: true
PKCS#11 Library Requirements#
⚠️ Critical: Use OpenSC Library for Pico HSM
The Pico HSM requires the OpenSC PKCS#11 library (/usr/lib/opensc-pkcs11.so) instead of the CardContact library for proper data object support.
Build Architecture:
- Production Build (
Dockerfile): Agent has CGO enabled with real PKCS#11Client - Conditional Compilation:
internal/hsm/pkcs11_client.govspkcs11_client_nocgo.gobased on build tags
Common Development Workflows#
Adding New gRPC Operation#
# 1. Add new RPC method to api/proto/hsm/v1/hsm.proto
# 2. Add request/response message types
# 3. Generate code
buf generate
# 4. Implement in internal/agent/grpc_server.go
# 5. Update client in internal/agent/grpc_client.go
# 6. Test changes
make test
Testing with Real HSM Device#
# Build test utility
go build -o bin/test-hsm cmd/test-hsm/main.go
# Test full cycle (write/read/verify/delete)
./bin/test-hsm --library=/usr/lib/opensc-pkcs11.so --pin=$PIN --op=test
# Test specific operations
./bin/test-hsm --library=/usr/lib/opensc-pkcs11.so --pin=$PIN --op=list
./bin/test-hsm --library=/usr/lib/opensc-pkcs11.so --pin=$PIN --op=write --path=test-secret
API Testing#
# Port forward to access REST API
kubectl port-forward -n hsm-secrets-operator-system svc/hsm-secrets-operator-api 8090:8090
# Test API endpoints
cd examples/api
./create-secret.sh my-test-secret
./list-secrets.sh
# Direct curl
curl http://localhost:8090/api/v1/hsm/secrets/my-test-secret | jq '.'
Monitoring and Troubleshooting#
Resource Status Monitoring#
# View resources with custom columns
kubectl get hsmsecret # Short name: hsmsec
kubectl get hsmdevice # Short name: hsmdev
kubectl get hsmpool
# Monitor sync status
kubectl get hsmsecret my-secret -o jsonpath='{.status.syncStatus}'
# Check discovered devices in HSMPool (source of truth for agents)
kubectl get hsmpool -o jsonpath='{.status.aggregatedDevices[*].devicePath}'
# Check HSMPool readiness for agent deployment
kubectl get hsmpool -o custom-columns=NAME:.metadata.name,PHASE:.status.phase,DEVICES:.status.totalDevices
# View all agent pods for multi-device setup
kubectl get pods -l app.kubernetes.io/name=hsm-agent
# View discovery pod reports
kubectl get pods -l app.kubernetes.io/component=discovery \
-o jsonpath='{range .items[*]}{.metadata.name}: {.metadata.annotations.hsm\.j5t\.io/device-report}{"\n"}{end}'
Port Configuration#
- Manager API Server: Port 8090 (REST API)
- Manager Metrics: Port 8080 internal, 8443 via service
- Manager Health: Port 8081
- Agent gRPC: Port 9090 (HSM operations)
- Agent Health: Port 8093 (HTTP health checks)
Common Issues#
- API works, pkcs11-tool doesn't see objects: Use
--login --pinfor private objects CKR_DEVICE_REMOVEDerrors: Restart agent pod to reset PKCS#11 sessionCKR_TEMPLATE_INCONSISTENTerrors: Switch from CardContact to OpenSC library- Agent crash loop: Check library path and PIN secret configuration
- gRPC connection failed: Verify agent on port 9090, check service/endpoint configuration
- Proto generation issues: Install buf tool (
go install github.com/bufbuild/buf/cmd/buf@latest) - Metadata keys in K8s Secrets: If you see
_metadatakeys in Kubernetes Secrets, this indicates a bug in the PKCS#11 client filtering (should be excluded inReadSecret)
Manual HSM Access#
# Get agent pod
AGENT_POD=$(kubectl get pods -l app.kubernetes.io/name=hsm-agent -o jsonpath='{.items[0].metadata.name}')
# Get PIN from the HSM PIN secret
HSM_PIN=$(kubectl get secret hsm-pin -o jsonpath='{.data.pin}' | base64 -d)
# List all secrets (requires PIN authentication)
kubectl exec $AGENT_POD -- pkcs11-tool --module="/usr/lib/x86_64-linux-gnu/opensc-pkcs11.so" --login --pin="$HSM_PIN" --list-objects --type=data
# Read specific secret component
kubectl exec $AGENT_POD -- pkcs11-tool --module="/usr/lib/x86_64-linux-gnu/opensc-pkcs11.so" --login --pin="$HSM_PIN" --read-object --type=data --label="my-secret/api_key"
# HSM device info
kubectl exec $AGENT_POD -- pkcs11-tool --module="/usr/lib/x86_64-linux-gnu/opensc-pkcs11.so" -I
Secret Storage Structure:
- Each K8s Secret becomes multiple PKCS#11 data objects
- Object naming:
secret-name/key-name(e.g.,user-credentials/api_key) - Metadata stored separately with
/_metadatasuffix (filtered out from K8s Secrets) - Private objects require PIN authentication to access
Code Architecture Critical Points#
Controller Interaction Flow:
HSMSecretReconcilerreads from HSM via gRPC agentsHSMPoolReconcileraggregates device discovery reports from pod annotations (race-free)HSMPoolAgentReconcilerdeploys agents dynamically when devices are ready- Multi-device operations provide HSM synchronization capabilities
Agent Discovery Architecture:
- HSMPool as Source of Truth: API and controllers query HSMPool.Status.AggregatedDevices instead of individual HSMDevice resources
- Multi-Instance Agent Tracking: Agent manager tracks agents by keys like
pico-hsm-0,pico-hsm-1for multiple physical devices - Pool-Based Cleanup: Agent cleanup based on HSMSecret existence rather than device-specific references
- API Multi-Device Operations:
findAvailableAgent()queries HSMPools to find any available agent for a device typegetAllAvailableAgents()returns all agents across all pools for mirroring operations- Operations like delete/write with
mirror=truetarget ALL agents simultaneously
PKCS#11 Client Implementation:
- Production:
internal/hsm/pkcs11_client.gowith CGO - Testing:
internal/hsm/pkcs11_client_nocgo.goandmock_client.go - Critical: ReadSecret must filter out
metadataKeySuffix("/_metadata") objects - Build tags control which client is compiled
Protocol Buffer Workflow:
- Modify
api/proto/hsm/v1/hsm.proto - Run
buf generateto update Go code - Implement in
internal/agent/grpc_server.go - Update client calls in controller or agent code
This operator provides secure, hardware-backed secret management that integrates seamlessly with Kubernetes while maintaining the security benefits of HSM-based storage.
- do not try and deploy to the kubernetes cluster