Social cloud hosting
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add multi-backend isolation design

- Add DESIGN.md documenting key architectural decisions
- Update README with isolation modes diagram and "under construction" warning
- Update ROADMAP to reflect container backend as next priority

Three isolation levels:
- none: Direct Nix execution (dev mode)
- container: OCI containers with debian-slim + seccomp (any Linux VPS)
- firecracker: MicroVMs with full isolation (bare-metal + KVM)

Auto-detection picks the best available backend.

Co-Authored-By: Claude <noreply@anthropic.com>

+398 -43
+321
DESIGN.md
··· 1 + # Design Decisions 2 + 3 + This document captures key architectural decisions for at-rund. 4 + 5 + --- 6 + 7 + ## Isolation Backends 8 + 9 + **Decision:** Support multiple isolation backends (none, container, firecracker) behind a common Executor interface. 10 + 11 + **Date:** 2026-05-02 12 + 13 + ### Context 14 + 15 + at-rund needs to execute untrusted code safely. The gold standard is VM-level isolation (Firecracker), but this requires: 16 + - Linux host 17 + - KVM support (`/dev/kvm`) 18 + - Bare-metal or nested virtualization 19 + 20 + Most VPS providers (Linode, Contabo, standard DigitalOcean) don't support nested virtualization. Requiring bare-metal servers would severely limit adoption. 21 + 22 + ### Options Considered 23 + 24 + | Option | Isolation | Requirements | Barrier to Entry | 25 + |--------|-----------|--------------|------------------| 26 + | Firecracker | VM-level (strongest) | Linux + KVM | High (bare-metal only) | 27 + | gVisor | Syscall interception | Linux only | Medium | 28 + | Containers + seccomp | Namespace + syscall filtering | Linux + Docker/Podman | Low | 29 + | Direct execution | None (permissions only) | Any OS + Nix | Lowest | 30 + 31 + ### Decision 32 + 33 + Support all three levels behind a common interface: 34 + 35 + ```go 36 + type Executor interface { 37 + Execute(req ExecuteRequest, mimeType string) (*ExecuteResponse, error) 38 + Stats() PoolStats 39 + Warm(count int) error 40 + Drain() 41 + Shutdown() 42 + } 43 + ``` 44 + 45 + Implementations: 46 + - `NixPool` — Direct execution via Nix, no isolation 47 + - `ContainerPool` — OCI containers with debian-slim base + seccomp 48 + - `FirecrackerPool` — Firecracker microVMs with virtio-fs 49 + 50 + ### Auto-Detection 51 + 52 + When `isolation = "auto"` (default): 53 + 54 + ``` 55 + 1. Check /dev/kvm exists and accessible 56 + └─ Yes → FirecrackerPool 57 + └─ No → Continue 58 + 59 + 2. Check docker/podman available 60 + └─ Yes → ContainerPool 61 + └─ No → Continue 62 + 63 + 3. Fallback → NixPool (with warning) 64 + ``` 65 + 66 + ### Runtime Artifacts 67 + 68 + Nix builds each runtime as multiple artifacts: 69 + 70 + ``` 71 + nix/runtimes/deno/ 72 + ├── flake.nix 73 + ├── executor.ts # Runtime-specific executor 74 + └── outputs: 75 + ├── at-run-exec # Direct execution wrapper 76 + ├── image.tar # OCI image for containers 77 + └── rootfs.ext4 # Filesystem for Firecracker 78 + ``` 79 + 80 + The same Nix expression produces all three, ensuring identical runtime behavior. 81 + 82 + ### Security Model 83 + 84 + | Backend | Kernel Shared | Escape Risk | Good For | 85 + |---------|---------------|-------------|----------| 86 + | none | Yes | High | Dev/testing, trusted code | 87 + | container | Yes | Medium | Production, social trust | 88 + | firecracker | No | Very Low | High-security, untrusted code | 89 + 90 + The at-rund trust model assumes operators choose who can run code on their infrastructure. This means: 91 + - `container` isolation is reasonable for most deployments 92 + - Operators handling truly untrusted code should use `firecracker` on bare-metal 93 + - `none` is for development only 94 + 95 + ### Consequences 96 + 97 + **Positive:** 98 + - Any Linux VPS can run at-rund in production (containers) 99 + - Bare-metal operators get strongest isolation (Firecracker) 100 + - Developers can test on macOS/Windows (via Docker or direct) 101 + - Common interface means bundles work identically everywhere 102 + 103 + **Negative:** 104 + - More code to maintain (3 backends) 105 + - Container isolation is weaker than VMs 106 + - Need to document security tradeoffs clearly 107 + 108 + --- 109 + 110 + ## Runtime Executor Pattern 111 + 112 + **Decision:** Each runtime defines its own `at-run-exec` that handles permission translation and execution. 113 + 114 + **Date:** 2026-05-01 115 + 116 + ### Context 117 + 118 + Different runtimes handle permissions differently: 119 + - Deno: `--allow-net=host1,host2` 120 + - Node: Environment variables or custom sandbox 121 + - Python: No built-in sandboxing 122 + 123 + Initially, we tried handling this in Go code, but this meant: 124 + - Hardcoded permission logic per runtime 125 + - Changes required recompiling at-rund 126 + - No way for operators to customize 127 + 128 + ### Decision 129 + 130 + Move executor logic into the Nix runtime definition: 131 + 132 + ``` 133 + at-rund (Go) Runtime (Nix) 134 + │ │ 135 + │ 1. Build ExecRequest JSON │ 136 + │ │ 137 + └────── stdin ─────────────────▶│ at-run-exec 138 + │ - Parse request 139 + │ - Translate permissions 140 + │ - Import bundle 141 + │ - Call endpoint 142 + │ - Return JSON response 143 + ◀────── stdout ─────────────────┘ 144 + ``` 145 + 146 + The `at-run-exec` script is runtime-specific: 147 + - Deno: TypeScript that builds `--allow-*` flags 148 + - Node: Could use vm2 or similar 149 + - Python: Could use seccomp or RestrictedPython 150 + 151 + ### ExecRequest Format 152 + 153 + ```json 154 + { 155 + "codePath": "/path/to/bundle.js", 156 + "endpoint": "handleRequest", 157 + "args": { "input": "data" }, 158 + "permissions": { 159 + "net": ["api.example.com", "*.cdn.com"], 160 + "read": ["/tmp/cache"], 161 + "write": ["/tmp/output"], 162 + "env": ["API_KEY", "DEBUG"] 163 + }, 164 + "env": { 165 + "API_KEY": "decrypted-secret" 166 + }, 167 + "timeout": 30 168 + } 169 + ``` 170 + 171 + ### ExecResponse Format 172 + 173 + ```json 174 + { 175 + "success": true, 176 + "data": { "result": "value" }, 177 + "response": { 178 + "status": 200, 179 + "headers": { "content-type": "application/json" }, 180 + "body": "...", 181 + "isBase64": false 182 + }, 183 + "error": "error message if success=false", 184 + "metrics": { 185 + "executionTimeMs": 142 186 + } 187 + } 188 + ``` 189 + 190 + ### Consequences 191 + 192 + **Positive:** 193 + - Runtime authors control permission translation 194 + - Operators can customize/fork runtimes 195 + - at-rund stays runtime-agnostic 196 + - Easy to add new runtimes 197 + 198 + **Negative:** 199 + - More complexity in runtime definitions 200 + - Permission bugs are per-runtime, not centralized 201 + 202 + --- 203 + 204 + ## Social Trust Model 205 + 206 + **Decision:** Operators explicitly choose who can run code; there is no automatic discovery or federation. 207 + 208 + **Date:** 2026-05-01 209 + 210 + ### Context 211 + 212 + Traditional serverless (AWS Lambda, Cloudflare Workers) has a clear trust model: you trust the provider, period. Decentralized alternatives often propose automatic discovery and federation, which creates new trust problems. 213 + 214 + ### Decision 215 + 216 + Trust is explicit and social: 217 + 218 + 1. **Operators choose bundles**: Via allowlist, blocklist, or open mode 219 + 2. **Authors choose runners**: By manually configuring which runner(s) to use 220 + 3. **No automatic discovery**: Runners don't announce themselves to a registry 221 + 4. **No federation**: Each runner is independent 222 + 223 + This mirrors how the AT Protocol itself works — you follow people you know, not everyone. 224 + 225 + ### Access Control 226 + 227 + ```toml 228 + [access] 229 + mode = "allowlist" # or "blocklist" or "open" 230 + 231 + # Only these DIDs can run bundles 232 + allowlist = [ 233 + "did:plc:friend1", 234 + "did:plc:friend2", 235 + ] 236 + ``` 237 + 238 + ### Consequences 239 + 240 + **Positive:** 241 + - Simple mental model 242 + - Operators have full control 243 + - No Sybil attacks on "discovery" 244 + - Matches AT Protocol philosophy 245 + 246 + **Negative:** 247 + - Authors must manually find runners 248 + - No marketplace/registry (for now) 249 + - Fragmented ecosystem possible 250 + 251 + ### Future Consideration 252 + 253 + Optional announcement could be added later: 254 + - Post runner capabilities to your PDS 255 + - Others discover via social graph 256 + - Still no central registry 257 + 258 + --- 259 + 260 + ## Nix for Runtime Builds 261 + 262 + **Decision:** Use Nix to define and build all runtime environments. 263 + 264 + **Date:** 2026-05-01 265 + 266 + ### Context 267 + 268 + Bundles must behave identically across: 269 + - Development (macOS, direct execution) 270 + - Container isolation (Linux VPS) 271 + - VM isolation (bare-metal) 272 + 273 + Docker images alone don't solve this — you still need to define what goes in them, and dev machines often can't run Docker efficiently. 274 + 275 + ### Decision 276 + 277 + Nix defines each runtime declaratively: 278 + 279 + ```nix 280 + { 281 + packages = [ pkgs.deno pkgs.jq ]; 282 + 283 + # Script that handles execution 284 + at-run-exec = writeShellScriptBin "at-run-exec" '' 285 + ... 286 + ''; 287 + 288 + # OCI image for containers 289 + ociImage = dockerTools.buildImage { ... }; 290 + 291 + # Rootfs for Firecracker 292 + rootfs = makeExt4 { ... }; 293 + } 294 + ``` 295 + 296 + One definition, multiple outputs, identical behavior. 297 + 298 + ### Consequences 299 + 300 + **Positive:** 301 + - Reproducible builds 302 + - Same runtime dev → prod 303 + - Easy to customize/extend 304 + - Hermetic (no "works on my machine") 305 + 306 + **Negative:** 307 + - Nix has steep learning curve 308 + - Build times can be long (first time) 309 + - Extra dependency for operators 310 + 311 + --- 312 + 313 + ## Future Decisions 314 + 315 + Topics that need decisions as development continues: 316 + 317 + - **Secrets management**: How are secrets encrypted/decrypted? 318 + - **Bundle signing**: Should bundles be signed? By whom? 319 + - **Rate limiting**: Built-in or middleware only? 320 + - **Multi-tenancy**: Separate pools per DID? 321 + - **Persistence**: Stateful bundles? How?
+40 -14
README.md
··· 1 1 # at-rund 2 2 3 + > **⚠️ HEAVILY UNDER CONSTRUCTION** 4 + > 5 + > This project is in early alpha. APIs will change, features are incomplete, and you will encounter bugs. See [ROADMAP.md](./ROADMAP.md) for current status. 6 + 3 7 **Social cloud hosting for AT Protocol.** 4 8 5 9 at-rund lets you host serverless bundles for the AT Protocol network. Your runner represents *you* — bundle authors trust your infrastructure because they trust you. ··· 119 123 120 124 ## Architecture 121 125 122 - at-rund has two execution modes: 123 - 124 - ### Dev Mode (macOS, Linux without KVM) 125 - 126 - Uses Nix to run bundles directly. Fast iteration, same runtimes as production, but no isolation. 126 + at-rund supports multiple isolation backends to balance security vs. accessibility: 127 127 128 128 ``` 129 - Request → Nix shell → deno run bundle.js → Response 129 + ┌─────────────────────────────────────────────────────────────┐ 130 + │ at-rund │ 131 + │ Executor interface │ 132 + ├───────────────┬───────────────────┬─────────────────────────┤ 133 + │ NixPool │ ContainerPool │ FirecrackerPool │ 134 + │ (none) │ (container) │ (firecracker) │ 135 + ├───────────────┼───────────────────┼─────────────────────────┤ 136 + │ Direct exec │ Docker/Podman │ Firecracker microVMs │ 137 + │ No isolation │ Namespace + seccomp│ Full VM isolation │ 138 + │ Any OS │ Any Linux VPS │ Linux + KVM (bare metal)│ 139 + │ Dev/testing │ Production │ High-security prod │ 140 + └───────────────┴───────────────────┴─────────────────────────┘ 130 141 ``` 131 142 132 - ### Prod Mode (Linux + KVM) 143 + ### Isolation Modes 133 144 134 - Uses Firecracker microVMs for full isolation. Each bundle runs in its own VM with only the permissions it declared. 145 + Configure via `isolation` in config.toml: 135 146 136 - ``` 137 - Request → Firecracker VM → deno run bundle.js → Response 138 - 139 - └─ virtio-fs mount (bundle code) 140 - └─ vsock (host ↔ guest RPC) 147 + ```toml 148 + # Auto-detect best available (default) 149 + isolation = "auto" 150 + 151 + # Or explicitly choose: 152 + isolation = "none" # Direct Nix execution (dev mode) 153 + isolation = "container" # OCI containers (debian-slim + seccomp) 154 + isolation = "firecracker" # Firecracker microVMs (requires KVM) 141 155 ``` 142 156 143 - Both modes use the same Nix-defined runtimes, so bundles behave identically. 157 + **Auto-detection logic:** 158 + 1. `/dev/kvm` accessible → Firecracker 159 + 2. Docker/Podman available → Container 160 + 3. Fallback → Nix direct execution 161 + 162 + ### Why Multiple Backends? 163 + 164 + - **Low barrier to entry**: Containers work on any $5 VPS 165 + - **Strong isolation available**: Firecracker for those with bare-metal 166 + - **Same bundle everywhere**: Nix ensures identical runtimes across all backends 167 + - **Operator choice**: Match isolation level to your threat model 168 + 169 + See [DESIGN.md](./DESIGN.md) for detailed architecture decisions. 144 170 145 171 ## Custom Runtimes 146 172
+37 -29
ROADMAP.md
··· 2 2 3 3 This document outlines the development roadmap for at-rund. 4 4 5 - ## Current Status: Alpha 5 + ## Current Status: Alpha (Under Heavy Construction) 6 + 7 + The core architecture is in place. Dev mode works on macOS/Linux with Nix. Production isolation is being implemented with multiple backends. 6 8 7 - The core architecture is in place. Dev mode works on macOS/Linux with Nix. Production mode (Firecracker) is scaffolded but not yet functional. 9 + **See [DESIGN.md](./DESIGN.md) for architectural decisions.** 8 10 9 11 --- 10 12 11 - ## Phase 1: Core Functionality 13 + ## Phase 1: Core Functionality ✅ 12 14 13 15 **Goal:** A working end-to-end system where bundles can be fetched from a PDS and executed. 14 16 15 17 ### ATProto Integration 16 - - [ ] DID resolution (did:plc, did:web) 17 - - [ ] PDS client for fetching bundle records 18 - - [ ] Bundle blob fetching and caching 19 - - [ ] Manifest parsing (permissions, runtime, limits) 18 + - [x] DID resolution (did:plc, did:web) 19 + - [x] PDS client for fetching bundle records 20 + - [x] Bundle blob fetching and caching 21 + - [x] Manifest parsing (permissions, runtime, limits) 20 22 21 23 ### Bundle Execution 22 - - [ ] Wire up executor to HTTP routes 23 - - [ ] Permission enforcement (net, read, write, env) 24 + - [x] Wire up executor to HTTP routes 25 + - [x] Permission enforcement (net, read, write, env) 24 26 - [ ] Resource limits (memory, CPU, timeout) 25 27 - [ ] Secrets decryption and injection 26 28 27 - ### Dev Mode Polish 28 - - [x] Nix-based execution 29 - - [x] Auto-detection of KVM availability 29 + ### Dev Mode 30 + - [x] Nix-based execution (NixPool) 31 + - [x] Auto-detection of capabilities 32 + - [x] Runtime executor pattern (at-run-exec) 30 33 - [ ] Hot reload for local development 31 34 - [ ] Better error messages 32 35 33 36 --- 34 37 35 - ## Phase 2: Production Mode 38 + ## Phase 2: Production Isolation 36 39 37 - **Goal:** Secure, isolated execution using Firecracker microVMs. 40 + **Goal:** Multiple isolation backends to balance security vs. accessibility. 38 41 39 - ### Firecracker Integration 40 - - [ ] VM lifecycle management (spawn, stop, reuse) 42 + ``` 43 + isolation = "auto" | "none" | "container" | "firecracker" 44 + ``` 45 + 46 + ### Container Backend (In Progress) 47 + - [ ] ContainerPool executor implementation 48 + - [ ] OCI image building via Nix (debian-slim base) 49 + - [ ] Docker/Podman runtime detection 50 + - [ ] seccomp profiles for syscall filtering 51 + - [ ] Network namespace isolation 52 + - [ ] Permission enforcement via container config 53 + 54 + ### Firecracker Backend (Future) 55 + - [ ] FirecrackerPool executor implementation 41 56 - [ ] Kernel + rootfs image building via Nix 57 + - [ ] VM lifecycle management (spawn, stop, reuse) 42 58 - [ ] virtio-fs for bundle mounting 43 59 - [ ] vsock for host ↔ guest communication 60 + - [ ] Guest agent (Go binary inside VMs) 44 61 45 - ### Guest Agent 46 - - [ ] Go binary running inside VMs 47 - - [ ] Execute bundles with permission flags 48 - - [ ] Report metrics (memory, CPU, execution time) 49 - - [ ] Health checks 50 - 51 - ### VM Pool 62 + ### Shared Infrastructure 63 + - [ ] Auto-detection logic (KVM → container → none) 52 64 - [ ] Pre-warming (configurable per runtime) 53 65 - [ ] Idle timeout and reclamation 54 - - [ ] Max VM limits 66 + - [ ] Max instance limits 55 67 - [ ] Graceful drain on shutdown 56 - 57 - ### Network Proxy 58 - - [ ] Bundle network requests proxied through host 59 - - [ ] Permission enforcement (allowed hosts) 60 - - [ ] Request logging 68 + - [ ] Network proxy with permission enforcement 61 69 62 70 --- 63 71