A Kubernetes operator that bridges Hardware Security Module (HSM) data storage with Kubernetes Secrets, providing true secret portability th
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

update claude.md and fix issue with hsm-agent deployment

+147 -21
+129 -15
CLAUDE.md
··· 56 56 make manifests # Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects 57 57 make generate # Generate DeepCopy methods for CRD types 58 58 make helm-sync # Sync generated CRDs from config/ to helm/ after CRD changes 59 + 60 + # Protocol Buffer generation (for gRPC) 61 + buf generate # Generate Go code from proto files (requires buf tool) 62 + buf lint # Lint protobuf files 63 + buf format -w # Format protobuf files 59 64 ``` 60 65 61 66 ### Docker Images ··· 193 198 194 199 ### API Development and Testing 195 200 ```bash 196 - # Test API endpoints locally 201 + # Test REST API endpoints locally (manager) 197 202 cd examples/api && ./health-check.sh 198 203 199 204 # Create test secrets via API ··· 203 208 ./list-secrets.sh 204 209 ``` 205 210 211 + ### gRPC Development and Testing 212 + ```bash 213 + # Generate protobuf code after modifying .proto files 214 + buf generate 215 + 216 + # Test gRPC agent connectivity 217 + # Note: Agent runs on port 9090 (gRPC) and 8093 (health) 218 + 219 + # Test agent health via HTTP (from within cluster) 220 + curl http://hsm-agent-pod:8093/healthz 221 + 222 + # Test gRPC connection programmatically 223 + # See internal/agent/grpc_integration_test.go for examples 224 + 225 + # Protocol buffer linting 226 + buf lint api/proto/hsm/v1/hsm.proto 227 + ``` 228 + 229 + ### Protocol Buffer Development 230 + ```bash 231 + # Install buf tool (required for proto generation) 232 + go install github.com/bufbuild/buf/cmd/buf@latest 233 + 234 + # Modify proto files 235 + # Edit api/proto/hsm/v1/hsm.proto 236 + 237 + # Regenerate Go code 238 + buf generate 239 + 240 + # Format proto files 241 + buf format -w api/proto/hsm/v1/hsm.proto 242 + 243 + # Validate proto files 244 + buf lint 245 + ``` 246 + 206 247 ## Project Overview 207 248 208 249 A Kubernetes operator that bridges Pico HSM binary data storage with Kubernetes Secrets, providing true secret portability through hardware-based storage. The operator implements a controller pattern that watches HSMSecret Custom Resource Definitions (CRDs) and maintains bidirectional synchronization between HSM binary data files and Kubernetes Secret objects. ··· 228 269 - Can fallback to **MockClient** for testing 229 270 - Deployed close to HSM hardware (DaemonSet pattern) 230 271 - Heavy image with full PKCS#11 library dependencies 231 - - Serves HSM operations via API for manager requests 272 + - **gRPC API**: Serves HSM operations via gRPC on port 9090 (default) 273 + - **HTTP API**: Legacy HTTP support via `--use-grpc=false` 274 + - **Health Checks**: HTTP health endpoints on port 8093 232 275 233 276 3. **Discovery Binary** (`cmd/discovery/main.go`) 234 277 - Handles **HSMDevice CRDs** (readonly specs) and USB device discovery ··· 279 322 HSM Storage ←→ HSMSecret CRD ←→ Kubernetes Secret 280 323 USB Device ←→ HSMDevice CRD (readonly spec) ←→ Pod Annotations ←→ HSMPool CRD (aggregated status) 281 324 282 - Manager: HSMPath ←→ Agent API ←→ PKCS#11 Client ←→ K8s Secret (owner refs) 325 + Manager: HSMPath ←→ Agent gRPC ←→ PKCS#11 Client ←→ K8s Secret (owner refs) 283 326 HSMDevice ←→ HSMPool (auto-created with owner refs) 284 327 Pod Annotations ←→ HSMPool Status (aggregated discovery results) 285 328 286 329 Discovery: /sys/bus/usb ←→ Pod Annotations (ephemeral reports) 287 - Agent: PKCS#11 Library ←→ HSM Device ←→ API Server 330 + Agent: PKCS#11 Library ←→ HSM Device ←→ gRPC Server (port 9090) 288 331 ``` 289 332 290 333 **Key Benefits:** ··· 293 336 - ✅ **Grace Periods**: 5-minute buffer prevents agent churn during outages 294 337 - ✅ **Kubernetes Native**: Standard patterns (annotations, owner refs, watches) 295 338 339 + ### gRPC Communication Architecture 340 + 341 + The operator uses **Protocol Buffers (protobuf)** and **gRPC** for efficient, type-safe communication between manager and agent components: 342 + 343 + **Protocol Definition**: `api/proto/hsm/v1/hsm.proto` 344 + - **HSMAgent Service**: Complete gRPC service definition 345 + - **10 Operations**: GetInfo, ReadSecret, WriteSecret, WriteSecretWithMetadata, ReadMetadata, DeleteSecret, ListSecrets, GetChecksum, IsConnected, Health 346 + - **Type Safety**: Structured messages for HSMInfo, SecretData, SecretMetadata 347 + - **Error Handling**: gRPC status codes for proper error propagation 348 + 349 + **gRPC Server** (`internal/agent/grpc_server.go`): 350 + - **Port 9090**: Default gRPC service port 351 + - **Port 8093**: HTTP health checks (`/healthz`, `/readyz`) 352 + - **Interceptors**: Request logging and metrics collection 353 + - **Graceful Shutdown**: Context-based cancellation support 354 + 355 + **gRPC Client** (`internal/agent/grpc_client.go`): 356 + - **Connection Management**: Automatic keepalive and reconnection 357 + - **Timeouts**: Configurable request timeouts (default: 30s) 358 + - **Error Handling**: gRPC status code interpretation 359 + - **Interface Compatibility**: Implements `hsm.Client` interface 360 + 361 + **Protocol Buffer Generation**: 362 + ```bash 363 + # Generate Go code from .proto files 364 + buf generate 365 + 366 + # Lint proto files 367 + buf lint 368 + 369 + # Format proto files 370 + buf format -w 371 + ``` 372 + 373 + **Generated Files**: 374 + - `api/proto/hsm/v1/hsm.pb.go` - Message types 375 + - `api/proto/hsm/v1/hsm_grpc.pb.go` - Service client/server code 376 + - `hsm/v1/hsm.pb.go` - Duplicate for backward compatibility 377 + 296 378 ### Key Architectural Patterns 297 379 298 380 1. **Status-Driven Reconciliation**: Controllers use comprehensive status fields to track state ··· 555 637 **Root Cause**: API server was configured to use port 8080, conflicting with metrics server. 556 638 557 639 **Solution Applied**: 558 - - **API Server**: Restored to port 8090 (dedicated for REST API) 559 - - **Metrics Server**: Port 8080 internal, exposed as 8443 via service 560 - - **Health Probes**: Port 8081 (unchanged) 561 - - **Service Mapping**: Corrected service target ports to match actual server ports 640 + - **Manager API Server**: Port 8090 (dedicated for REST API) 641 + - **Manager Metrics Server**: Port 8080 internal, exposed as 8443 via service 642 + - **Manager Health Probes**: Port 8081 (unchanged) 643 + - **Agent gRPC Server**: Port 9090 (default for HSM operations) 644 + - **Agent Health Server**: Port 8093 (HTTP health checks) 562 645 563 646 **Result**: Clean port separation with no conflicts. 564 647 ··· 694 777 │ ├── discovery/main.go # Discovery: HSMPool controller (removed from new arch) 695 778 │ ├── agent/main.go # Agent: Direct HSM communication 696 779 │ └── test-hsm/main.go # Test utility for HSM operations 697 - ├── api/v1alpha1/ # CRD definitions 698 - │ ├── hsmsecret_types.go # HSMSecret CRD 699 - │ ├── hsmpool_types.go # HSMPool CRD (race-free aggregation) 700 - │ └── hsmdevice_types.go # HSMDevice CRD (readonly specs) 780 + ├── api/ # API definitions 781 + │ ├── proto/hsm/v1/ # Protocol buffer definitions 782 + │ │ ├── hsm.proto # gRPC service definition 783 + │ │ ├── hsm.pb.go # Generated protobuf messages 784 + │ │ └── hsm_grpc.pb.go # Generated gRPC client/server 785 + │ └── v1alpha1/ # CRD definitions 786 + │ ├── hsmsecret_types.go # HSMSecret CRD 787 + │ ├── hsmpool_types.go # HSMPool CRD (race-free aggregation) 788 + │ └── hsmdevice_types.go # HSMDevice CRD (readonly specs) 701 789 ├── internal/ 702 790 │ ├── controller/ # Kubernetes controllers 703 791 │ │ ├── hsmsecret_controller.go # Secret sync ··· 710 798 │ │ ├── pkcs11_client.go # Production PKCS#11 client (CGO) 711 799 │ │ └── pkcs11_client_nocgo.go # Stub for testing builds 712 800 │ ├── agent/ # Agent deployment and communication 713 - │ │ ├── deployment.go # Agent pod management 714 - │ │ └── client.go # Agent API client 801 + │ │ ├── deployment.go # Agent pod management 802 + │ │ ├── server.go # Legacy HTTP server 803 + │ │ ├── grpc_server.go # gRPC server implementation 804 + │ │ ├── grpc_client.go # gRPC client implementation 805 + │ │ └── client.go # Agent API client (legacy) 715 806 │ ├── api/ # REST API server 716 807 │ │ ├── server.go # HTTP server setup 717 808 │ │ └── proxy_handlers.go # API proxy to agents ··· 730 821 │ └── default/ # Default deployment configuration 731 822 ├── helm/ # Helm chart 732 823 │ └── hsm-secrets-operator/ # Complete Helm chart 824 + ├── buf.yaml # Buf protobuf tool configuration 825 + ├── buf.gen.yaml # Protobuf code generation config 826 + ├── hsm/v1/ # Legacy protobuf output (compatibility) 827 + │ ├── hsm.pb.go # Duplicate protobuf messages 828 + │ └── hsm_grpc.pb.go # Duplicate gRPC client/server 733 829 └── test/ # Test suites 734 830 ├── e2e/ # End-to-end tests 735 831 └── utils/ # Test utilities ··· 742 838 - **controller-runtime**: Kubernetes controller framework 743 839 - **PKCS#11 library**: For HSM communication (sc-hsm-embedded) 744 840 - **OpenSC**: PKCS#11 middleware for smart cards/HSMs 841 + - **buf**: Protocol buffer compiler and linter 842 + - **protoc-gen-go**: Protocol buffer Go code generator 843 + - **protoc-gen-go-grpc**: gRPC Go code generator 844 + - **google.golang.org/grpc**: gRPC Go library 745 845 746 846 ### HSM Integration 747 847 - Use PKCS#11 interface for Pico HSM communication ··· 986 1086 kubectl exec $AGENT_POD -- pkcs11-tool --module="/usr/lib/opensc-pkcs11.so" -I 987 1087 ``` 988 1088 1089 + ### Agent Configuration and Ports 1090 + ```bash 1091 + # Agent runs with gRPC by default (port 9090) 1092 + # Health checks via HTTP (port 8093) 1093 + 1094 + # To use legacy HTTP mode instead of gRPC: 1095 + # agent --use-grpc=false --port=8090 1096 + 1097 + # Check agent configuration 1098 + kubectl get deployment hsm-agent-* -o yaml | grep -A 10 containers: 1099 + ``` 1100 + 989 1101 ### Troubleshooting 990 1102 - **API works, pkcs11-tool doesn't see objects**: Use `--login --pin` for private objects 991 1103 - **`CKR_DEVICE_REMOVED` errors**: Restart agent pod to reset PKCS#11 session 992 1104 - **`CKR_TEMPLATE_INCONSISTENT` errors**: Switch from CardContact to OpenSC library 993 - - **Agent crash loop**: Check library path and PIN secret configuration 1105 + - **Agent crash loop**: Check library path and PIN secret configuration 1106 + - **gRPC connection failed**: Verify agent is running on port 9090, check service/endpoint configuration 1107 + - **Proto generation issues**: Install buf tool and run `buf generate` after proto changes
+18 -6
internal/agent/deployment.go
··· 636 636 return volumes 637 637 } 638 638 639 - // agentNeedsUpdate checks if the agent deployment needs to be updated due to device path changes 639 + // agentNeedsUpdate checks if the agent deployment needs to be updated due to device path or image changes 640 640 func (m *Manager) agentNeedsUpdate(ctx context.Context, deployment *appsv1.Deployment, hsmDevice *hsmv1alpha1.HSMDevice) (bool, error) { 641 + // Check if container image needs updating 642 + if len(deployment.Spec.Template.Spec.Containers) == 0 { 643 + return false, fmt.Errorf("deployment has no containers") 644 + } 645 + 646 + container := deployment.Spec.Template.Spec.Containers[0] 647 + currentImage := container.Image 648 + 649 + // Check if image has changed (only if ImageResolver is available) 650 + if m.ImageResolver != nil { 651 + expectedImage := m.ImageResolver.GetImage(ctx, "AGENT_IMAGE") 652 + if currentImage != expectedImage { 653 + // Image has changed, need to update 654 + return true, nil 655 + } 656 + } 657 + 641 658 // Get current HSMPool to check for updated device paths 642 659 poolName := hsmDevice.Name + "-pool" 643 660 pool := &hsmv1alpha1.HSMPool{} ··· 654 671 } 655 672 656 673 // Extract current volume mounts from deployment 657 - if len(deployment.Spec.Template.Spec.Containers) == 0 { 658 - return false, fmt.Errorf("deployment has no containers") 659 - } 660 - 661 - container := deployment.Spec.Template.Spec.Containers[0] 662 674 currentDeviceMounts := make(map[string]string) // mount name -> device path 663 675 664 676 for _, mount := range container.VolumeMounts {