this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

am i human or am i..

+373
+373
dancer/CLAUDE.md
··· 1 + I want to design an OCaml library that builds in support for modifying the 2 + program linked to it using Claude Code, and restarting itself with the fixes 3 + automatically. The idea is for long-running services to regularly consult with 4 + Claude (either on a fixed timetable, or urgently if something really unexpected 5 + happens) and improve their own functionality. Claude should be used to analyse 6 + patterns in the logs and determine whether to write code to handle a particular 7 + case. Claude should not be directly used in the application datapath itself, as 8 + it should write code. 9 + 10 + To make this work, the program needs to emit sufficient tracing data to be 11 + useful to Claude when it does an inspection, but not so much that it overwhelms 12 + the context window. Therefore, the first thing the library needs is some 13 + mechanism to intercept the logging output of the program suitably. The OCaml 14 + "logs" library is a good thing to standardise on here. It's also fine to use 15 + the OCaml direct-style Eio library for all interactions. 16 + 17 + Assume the code is running in a Linux environment with root level access. There 18 + will also be a Zulip server available with an API key that can be used to post 19 + messages to and interact with. 20 + 21 + This is an ambitious project, so before embarking on it, I need to think really 22 + carefully about the design and tradeoffs, including seeking clarificaiton where 23 + necessary about what sorts of MCP servers or other support infrastructure will 24 + be useful to making library successful. I'm ok taking risks and trying unusual 25 + approaches. The library will be called "Dancer" after the Hunter S Thompson 26 + quote "We're raising a generation of dancers, afraid to take one step out of 27 + line." 28 + 29 + ## Architecture Design (v1) 30 + 31 + ### Core Components 32 + 33 + 1. **Log Interceptor & Buffer** 34 + - Hook into OCaml Logs library at the reporter level 35 + - Maintain a persistent buffer on disk, perhaps in Sqlite, for analysis 36 + - Group consecutive identical errors with count 37 + - Tag logs with timestamp, module, and error type 38 + 39 + 2. **Pattern Detector** 40 + - Track log messages and their frequency 41 + - Use string matching to identify recurring patterns 42 + - Maintain a simple SQLite database of seen patterns 43 + - Trigger Claude consultation when: 44 + - New error pattern appears frequently (>10 times in 5 min) 45 + - Error rate spikes above baseline 46 + - Scheduled review (e.g., every 6 hours) 47 + 48 + 3. **Claude Consultation Manager** 49 + - Prepare context: recent logs + relevant source files 50 + - Ask Claude to: 51 + - Analyze the error pattern 52 + - Generate OCaml code to handle the case 53 + - Suggest where to integrate the fix 54 + - Test the fixes and trial a deployment 55 + - Store Claude's response and proposed changes 56 + 57 + 4. **Version Control Integration** 58 + - Each Claude fix creates a new git branch: `dancer/fix-<timestamp>-<error-hash>` 59 + - Use git worktrees for isolated changes: 60 + ```bash 61 + git worktree add ../dancer-fix-<id> -b dancer/fix-<id> 62 + ``` 63 + - Apply Claude's changes in the worktree include a changelog in the commits 64 + - Compile and test in isolation 65 + - If successful, merge to main and restart application 66 + - Have a script that can search for all the fix branches and update a central changelog ordered by time, suitable for a human to review regularly 67 + 68 + 5. **Restart Orchestration** 69 + - Library has a supervisor for process management of the application itself 70 + - Graceful shutdown: finish current requests with a timeout 71 + - State persistence before restart (if needed) 72 + - Automatic rollback if restart fails from the previous successful binary 73 + - Health check after restart 74 + 75 + 6. **Zulip Integration** 76 + - Post proposed changes for human review 77 + - Emergency stop command 78 + - Status updates on consultations 79 + - Performance metrics before/after changes 80 + 81 + ### Git Workflow Design 82 + 83 + 1. **Branch Strategy** 84 + ``` 85 + main (production code) 86 + ├── dancer/fix-2024-01-15-1200-auth-error 87 + ├── dancer/fix-2024-01-15-1800-timeout-handler 88 + └── dancer/rollback-2024-01-15-1900 (if needed) 89 + ``` 90 + 91 + 2. **Worktree Management** 92 + - Base directory: `/var/dancer/worktrees/` 93 + - Each fix gets its own worktree 94 + - Clean up old worktrees after successful merge 95 + - Keep failed attempts for analysis 96 + 97 + 3. **Change Process** 98 + ```ocaml 99 + type fix_status = 100 + | Proposed 101 + | Testing 102 + | Approved 103 + | Deployed 104 + | Rolled_back 105 + 106 + type fix_record = { 107 + id: string; 108 + branch: string; 109 + worktree: string; 110 + error_pattern: string; 111 + claude_solution: string; 112 + test_results: string option; 113 + status: fix_status; 114 + created_at: float; 115 + } 116 + ``` 117 + 118 + ### Simplified Log Management 119 + 120 + 1. **Log Format** 121 + ```ocaml 122 + type log_entry = { 123 + timestamp: float; 124 + level: Logs.level; 125 + source: string; (* module name *) 126 + message: string; 127 + error_type: string option; 128 + stack_trace: string option; 129 + } 130 + ``` 131 + 132 + 2. **Context Preparation for Claude** 133 + - Last 500 lines of logs 134 + - Error frequency summary 135 + - Relevant source file (where error originated) 136 + - Previous fix attempts for similar errors 137 + - System metrics (CPU, memory, request rate) 138 + 139 + ### Restart Safety Mechanisms 140 + 141 + 1. **Pre-Restart Checks** 142 + - Compile the modified code 143 + - Run unit tests if available 144 + - Check syntax with `ocamlc -i` 145 + - Verify no obvious issues (missing semicolons, etc.) 146 + 147 + 2. **Restart Process** 148 + ```bash 149 + # Save current version 150 + git tag dancer-before-$(date +%s) 151 + 152 + # Merge fix 153 + git merge --no-ff dancer/fix-<id> 154 + 155 + # Rebuild 156 + dune build 157 + 158 + # Graceful restart 159 + systemctl reload dancer-service || systemctl restart dancer-service 160 + 161 + # Health check 162 + ./health_check.sh || git reset --hard dancer-before-<timestamp> 163 + ``` 164 + 165 + 3. **Rollback Triggers** 166 + - Service fails to start 167 + - Health check fails after restart 168 + - Error rate increases by >50% 169 + - Memory usage spikes 170 + - Manual intervention via Zulip 171 + 172 + ### MCP Server Requirements (Simplified) 173 + 174 + 1. **Git Server** 175 + - Local git repository with remote backup 176 + - Web interface for viewing changes 177 + - Webhook support for CI integration 178 + 179 + 2. **Monitoring Server** 180 + - Simple metrics collection (Prometheus/Grafana) 181 + - Log aggregation (just file-based initially) 182 + - Alert routing to Zulip 183 + 184 + 3. **Claude API Gateway** 185 + - Rate limiting 186 + - Cost tracking 187 + - Request/response logging 188 + - Fallback to manual mode if quota exceeded 189 + 190 + ### Implementation Phases (Simplified) 191 + 192 + **Phase 1: Core Infrastructure (Week 1-2)** 193 + - Log interception and buffering 194 + - Basic error pattern detection 195 + - Git worktree management 196 + - Manual Claude consultation 197 + 198 + **Phase 2: Automation (Week 3-4)** 199 + - Automatic Claude triggers 200 + - Code generation and application 201 + - Restart orchestration 202 + - Basic safety checks 203 + 204 + **Phase 3: Monitoring & Safety (Week 5-6)** 205 + - Zulip integration 206 + - Rollback mechanisms 207 + - Performance tracking 208 + - Cost management 209 + 210 + ### Example Usage Flow 211 + 212 + 1. **Error Detection** 213 + ```ocaml 214 + (* Application code *) 215 + Logs.err (fun m -> m "Database connection failed: %s" error_msg); 216 + (* This error happens 20 times in 2 minutes *) 217 + ``` 218 + 219 + 2. **Claude Consultation** 220 + ``` 221 + Context: Database connection errors occurring frequently 222 + Pattern: "Database connection failed: Connection refused" 223 + 224 + Claude generates: 225 + - Exponential backoff retry logic 226 + - Connection pool management 227 + - Fallback to cached data 228 + ``` 229 + 230 + 3. **Version Control** 231 + ```bash 232 + git worktree add ../dancer-fix-db-conn -b dancer/fix-db-conn 233 + cd ../dancer-fix-db-conn 234 + # Apply Claude's changes 235 + dune build 236 + # If successful, merge and restart 237 + ``` 238 + 239 + 4. **Deployment** 240 + ```bash 241 + git checkout main 242 + git merge dancer/fix-db-conn 243 + systemctl restart dancer-service 244 + # Monitor for 5 minutes 245 + # If stable, cleanup worktree 246 + ``` 247 + 248 + ### Data Structures 249 + 250 + ```ocaml 251 + module Dancer = struct 252 + type config = { 253 + claude_api_key: string; 254 + zulip_api_key: string; 255 + zulip_stream: string; 256 + max_context_size: int; (* chars to send to Claude *) 257 + consultation_cooldown: float; (* seconds between consultations *) 258 + error_threshold: int; (* errors before triggering *) 259 + restart_timeout: float; (* max seconds for restart *) 260 + worktree_base: string; (* base directory for git worktrees *) 261 + } 262 + 263 + type consultation_request = { 264 + pattern: string; 265 + occurrences: int; 266 + timespan: float; 267 + recent_logs: string; 268 + source_context: string option; 269 + } 270 + 271 + type consultation_response = { 272 + analysis: string; 273 + proposed_fix: string; 274 + target_file: string; 275 + confidence: float; 276 + } 277 + end 278 + ``` 279 + 280 + ### Key Simplifications from Original Design 281 + 282 + 1. **No Dynamic Linking** - Just restart the process 283 + 2. **Simple Pattern Matching** - String comparison, no bloom filters 284 + 3. **Basic Git Workflow** - Branches and worktrees, no complex versioning 285 + 4. **Minimal Infrastructure** - SQLite instead of complex databases 286 + 5. **Simple Rollback** - Git reset instead of sophisticated mechanisms 287 + 6. **Direct Process Restart** - Using systemd/supervisor instead of hot-reload 288 + 7. **File-Based Logs** - No complex log aggregation initially 289 + 8. **Manual Approval Option** - Human can review via Zulip before deploy 290 + 291 + ## Library Decomposition Plan 292 + 293 + ### Core Libraries 294 + 295 + 1. **dancer-logs** - Log interception and buffering 296 + - Hook into OCaml Logs reporter 297 + - SQLite-backed circular buffer 298 + - Pattern normalization 299 + - Standalone testable 300 + 301 + 2. **dancer-patterns** - Pattern detection and tracking 302 + - Error pattern recognition 303 + - Frequency/acceleration tracking 304 + - Pattern database management 305 + - Trigger decision logic 306 + 307 + 3. **dancer-claude** - Claude CLI integration 308 + - Prompt construction 309 + - Response parsing 310 + - Context preparation 311 + - Token cost tracking 312 + 313 + 4. **dancer-git** - Git worktree management 314 + - Worktree creation/cleanup 315 + - Branch management 316 + - Safe merging operations 317 + - Rollback capabilities 318 + 319 + 5. **dancer-test** - Alcotest generation 320 + - Test template generation 321 + - Test execution in worktrees 322 + - Result parsing 323 + - Coverage tracking 324 + 325 + 6. **dancer-process** - Process management 326 + - Tmux orchestration 327 + - Service restart logic 328 + - Health checking 329 + - Graceful shutdown 330 + 331 + 7. **dancer-observe** - Observability 332 + - Metrics collection 333 + - SQLite time-series storage 334 + - Anomaly detection 335 + - Audit trail management 336 + 337 + 8. **dancer-spec** - Service specification 338 + - YAML spec parsing 339 + - Constraint validation 340 + - Fix validation against spec 341 + - Schema enforcement 342 + 343 + 9. **dancer-deploy** - Deployment pipeline 344 + - Staging environment setup 345 + - Promotion criteria evaluation 346 + - Production deployment 347 + - Rollback orchestration 348 + 349 + 10. **dancer-ui** - Human oversight interfaces 350 + - Web dashboard (Dream) 351 + - Terminal UI (Nottui) 352 + - WebSocket live updates 353 + - Audit log viewer 354 + 355 + ### Implementation Order 356 + 357 + **Phase 1: Foundation** (Week 1) 358 + 1. `dancer-logs` - Need log data first 359 + 2. `dancer-patterns` - Pattern detection on logs 360 + 3. `dancer-observe` - Basic metrics/storage 361 + 362 + **Phase 2: Claude Integration** (Week 2) 363 + 4. `dancer-claude` - Claude consultation 364 + 5. `dancer-spec` - Service constraints 365 + 6. `dancer-test` - Test generation 366 + 367 + **Phase 3: Deployment** (Week 3) 368 + 7. `dancer-git` - Worktree management 369 + 8. `dancer-process` - Process control 370 + 9. `dancer-deploy` - Staging/production 371 + 372 + **Phase 4: Oversight** (Week 4) 373 + 10. `dancer-ui` - Dashboard and monitoring