Design Decisions#
This document captures key architectural decisions for at-rund.
Isolation Backends#
Decision: Support multiple isolation backends (none, container, firecracker) behind a common Executor interface.
Date: 2026-05-02
Context#
at-rund needs to execute untrusted code safely. The gold standard is VM-level isolation (Firecracker), but this requires:
- Linux host
- KVM support (
/dev/kvm) - Bare-metal or nested virtualization
Most VPS providers (Linode, Contabo, standard DigitalOcean) don't support nested virtualization. Requiring bare-metal servers would severely limit adoption.
Options Considered#
| Option | Isolation | Requirements | Barrier to Entry |
|---|---|---|---|
| Firecracker | VM-level (strongest) | Linux + KVM | High (bare-metal only) |
| gVisor | Syscall interception | Linux only | Medium |
| Containers + seccomp | Namespace + syscall filtering | Linux + Docker/Podman | Low |
| Direct execution | None (permissions only) | Any OS + Nix | Lowest |
Decision#
Support all three levels behind a common interface:
type Executor interface {
Execute(req ExecuteRequest, mimeType string) (*ExecuteResponse, error)
Stats() PoolStats
Warm(count int) error
Drain()
Shutdown()
}
Implementations:
NixPool— Direct execution via Nix, no isolationContainerPool— OCI containers with debian-slim base + seccompFirecrackerPool— Firecracker microVMs with virtio-fs
Auto-Detection#
When isolation = "auto" (default):
1. Check /dev/kvm exists and accessible
└─ Yes → FirecrackerPool
└─ No → Continue
2. Check docker/podman available
└─ Yes → ContainerPool
└─ No → Continue
3. Fallback → NixPool (with warning)
Runtime Artifacts#
Nix builds each runtime as multiple artifacts:
nix/runtimes/deno/
├── flake.nix
├── executor.ts # Runtime-specific executor
└── outputs:
├── at-run-exec # Direct execution wrapper
├── image.tar # OCI image for containers
└── rootfs.ext4 # Filesystem for Firecracker
The same Nix expression produces all three, ensuring identical runtime behavior.
Security Model#
| Backend | Kernel Shared | Escape Risk | Good For |
|---|---|---|---|
| none | Yes | High | Dev/testing, trusted code |
| container | Yes | Medium | Production, social trust |
| firecracker | No | Very Low | High-security, untrusted code |
The at-rund trust model assumes operators choose who can run code on their infrastructure. This means:
containerisolation is reasonable for most deployments- Operators handling truly untrusted code should use
firecrackeron bare-metal noneis for development only
Consequences#
Positive:
- Any Linux VPS can run at-rund in production (containers)
- Bare-metal operators get strongest isolation (Firecracker)
- Developers can test on macOS/Windows (via Docker or direct)
- Common interface means bundles work identically everywhere
Negative:
- More code to maintain (3 backends)
- Container isolation is weaker than VMs
- Need to document security tradeoffs clearly
Runtime Executor Pattern#
Decision: Each runtime defines its own at-run-exec that handles permission translation and execution.
Date: 2026-05-01
Context#
Different runtimes handle permissions differently:
- Deno:
--allow-net=host1,host2 - Node: Environment variables or custom sandbox
- Python: No built-in sandboxing
Initially, we tried handling this in Go code, but this meant:
- Hardcoded permission logic per runtime
- Changes required recompiling at-rund
- No way for operators to customize
Decision#
Move executor logic into the Nix runtime definition:
at-rund (Go) Runtime (Nix)
│ │
│ 1. Build ExecRequest JSON │
│ │
└────── stdin ─────────────────▶│ at-run-exec
│ - Parse request
│ - Translate permissions
│ - Import bundle
│ - Call endpoint
│ - Return JSON response
◀────── stdout ─────────────────┘
The at-run-exec script is runtime-specific:
- Deno: TypeScript that builds
--allow-*flags - Node: Could use vm2 or similar
- Python: Could use seccomp or RestrictedPython
ExecRequest Format#
{
"codePath": "/path/to/bundle.js",
"endpoint": "handleRequest",
"args": { "input": "data" },
"permissions": {
"net": ["api.example.com", "*.cdn.com"],
"read": ["/tmp/cache"],
"write": ["/tmp/output"],
"env": ["API_KEY", "DEBUG"]
},
"env": {
"API_KEY": "decrypted-secret"
},
"timeout": 30
}
ExecResponse Format#
{
"success": true,
"data": { "result": "value" },
"response": {
"status": 200,
"headers": { "content-type": "application/json" },
"body": "...",
"isBase64": false
},
"error": "error message if success=false",
"metrics": {
"executionTimeMs": 142
}
}
Consequences#
Positive:
- Runtime authors control permission translation
- Operators can customize/fork runtimes
- at-rund stays runtime-agnostic
- Easy to add new runtimes
Negative:
- More complexity in runtime definitions
- Permission bugs are per-runtime, not centralized
Social Trust Model#
Decision: Operators explicitly choose who can run code; there is no automatic discovery or federation.
Date: 2026-05-01
Context#
Traditional serverless (AWS Lambda, Cloudflare Workers) has a clear trust model: you trust the provider, period. Decentralized alternatives often propose automatic discovery and federation, which creates new trust problems.
Decision#
Trust is explicit and social:
- Operators choose bundles: Via allowlist, blocklist, or open mode
- Authors choose runners: By manually configuring which runner(s) to use
- No automatic discovery: Runners don't announce themselves to a registry
- No federation: Each runner is independent
This mirrors how the AT Protocol itself works — you follow people you know, not everyone.
Access Control#
[access]
mode = "allowlist" # or "blocklist" or "open"
# Only these DIDs can run bundles
allowlist = [
"did:plc:friend1",
"did:plc:friend2",
]
Consequences#
Positive:
- Simple mental model
- Operators have full control
- No Sybil attacks on "discovery"
- Matches AT Protocol philosophy
Negative:
- Authors must manually find runners
- No marketplace/registry (for now)
- Fragmented ecosystem possible
Future Consideration#
Optional announcement could be added later:
- Post runner capabilities to your PDS
- Others discover via social graph
- Still no central registry
Nix for Runtime Builds#
Decision: Use Nix to define and build all runtime environments.
Date: 2026-05-01
Context#
Bundles must behave identically across:
- Development (macOS, direct execution)
- Container isolation (Linux VPS)
- VM isolation (bare-metal)
Docker images alone don't solve this — you still need to define what goes in them, and dev machines often can't run Docker efficiently.
Decision#
Nix defines each runtime declaratively:
{
packages = [ pkgs.deno pkgs.jq ];
# Script that handles execution
at-run-exec = writeShellScriptBin "at-run-exec" ''
...
'';
# OCI image for containers
ociImage = dockerTools.buildImage { ... };
# Rootfs for Firecracker
rootfs = makeExt4 { ... };
}
One definition, multiple outputs, identical behavior.
Consequences#
Positive:
- Reproducible builds
- Same runtime dev → prod
- Easy to customize/extend
- Hermetic (no "works on my machine")
Negative:
- Nix has steep learning curve
- Build times can be long (first time)
- Extra dependency for operators
Future Decisions#
Topics that need decisions as development continues:
- Secrets management: How are secrets encrypted/decrypted?
- Bundle signing: Should bundles be signed? By whom?
- Rate limiting: Built-in or middleware only?
- Multi-tenancy: Separate pools per DID?
- Persistence: Stateful bundles? How?