Design Decisions#

This document captures key architectural decisions for at-rund.

Isolation Backends#

Decision: Support multiple isolation backends (none, container, firecracker) behind a common Executor interface.

Date: 2026-05-02

Context#

at-rund needs to execute untrusted code safely. The gold standard is VM-level isolation (Firecracker), but this requires:

Linux host
KVM support (/dev/kvm)
Bare-metal or nested virtualization

Most VPS providers (Linode, Contabo, standard DigitalOcean) don't support nested virtualization. Requiring bare-metal servers would severely limit adoption.

Options Considered#

Option	Isolation	Requirements	Barrier to Entry
Firecracker	VM-level (strongest)	Linux + KVM	High (bare-metal only)
gVisor	Syscall interception	Linux only	Medium
Containers + seccomp	Namespace + syscall filtering	Linux + Docker/Podman	Low
Direct execution	None (permissions only)	Any OS + Nix	Lowest

Decision#

Support all three levels behind a common interface:

type Executor interface {
    Execute(req ExecuteRequest, mimeType string) (*ExecuteResponse, error)
    Stats() PoolStats
    Warm(count int) error
    Drain()
    Shutdown()
}

Implementations:

NixPool — Direct execution via Nix, no isolation
ContainerPool — OCI containers with debian-slim base + seccomp
FirecrackerPool — Firecracker microVMs with virtio-fs

Auto-Detection#

When isolation = "auto" (default):

1. Check /dev/kvm exists and accessible
   └─ Yes → FirecrackerPool
   └─ No  → Continue

2. Check docker/podman available
   └─ Yes → ContainerPool
   └─ No  → Continue

3. Fallback → NixPool (with warning)

Runtime Artifacts#

Nix builds each runtime as multiple artifacts:

nix/runtimes/deno/
├── flake.nix
├── executor.ts          # Runtime-specific executor
└── outputs:
    ├── at-run-exec      # Direct execution wrapper
    ├── image.tar        # OCI image for containers
    └── rootfs.ext4      # Filesystem for Firecracker

The same Nix expression produces all three, ensuring identical runtime behavior.

Security Model#

Backend	Kernel Shared	Escape Risk	Good For
none	Yes	High	Dev/testing, trusted code
container	Yes	Medium	Production, social trust
firecracker	No	Very Low	High-security, untrusted code

The at-rund trust model assumes operators choose who can run code on their infrastructure. This means:

container isolation is reasonable for most deployments
Operators handling truly untrusted code should use firecracker on bare-metal
none is for development only

Consequences#

Positive:

Any Linux VPS can run at-rund in production (containers)
Bare-metal operators get strongest isolation (Firecracker)
Developers can test on macOS/Windows (via Docker or direct)
Common interface means bundles work identically everywhere

Negative:

More code to maintain (3 backends)
Container isolation is weaker than VMs
Need to document security tradeoffs clearly

Runtime Executor Pattern#

Decision: Each runtime defines its own at-run-exec that handles permission translation and execution.

Date: 2026-05-01

Context#

Different runtimes handle permissions differently:

Deno: --allow-net=host1,host2
Node: Environment variables or custom sandbox
Python: No built-in sandboxing

Initially, we tried handling this in Go code, but this meant:

Hardcoded permission logic per runtime
Changes required recompiling at-rund
No way for operators to customize

Decision#

Move executor logic into the Nix runtime definition:

at-rund (Go)                    Runtime (Nix)
     │                               │
     │  1. Build ExecRequest JSON    │
     │                               │
     └────── stdin ─────────────────▶│ at-run-exec
                                     │   - Parse request
                                     │   - Translate permissions
                                     │   - Import bundle
                                     │   - Call endpoint
                                     │   - Return JSON response
     ◀────── stdout ─────────────────┘

The at-run-exec script is runtime-specific:

Deno: TypeScript that builds --allow-* flags
Node: Could use vm2 or similar
Python: Could use seccomp or RestrictedPython

ExecRequest Format#

{
  "codePath": "/path/to/bundle.js",
  "endpoint": "handleRequest",
  "args": { "input": "data" },
  "permissions": {
    "net": ["api.example.com", "*.cdn.com"],
    "read": ["/tmp/cache"],
    "write": ["/tmp/output"],
    "env": ["API_KEY", "DEBUG"]
  },
  "env": {
    "API_KEY": "decrypted-secret"
  },
  "timeout": 30
}

ExecResponse Format#

{
  "success": true,
  "data": { "result": "value" },
  "response": {
    "status": 200,
    "headers": { "content-type": "application/json" },
    "body": "...",
    "isBase64": false
  },
  "error": "error message if success=false",
  "metrics": {
    "executionTimeMs": 142
  }
}

Consequences#

Positive:

Runtime authors control permission translation
Operators can customize/fork runtimes
at-rund stays runtime-agnostic
Easy to add new runtimes

Negative:

More complexity in runtime definitions
Permission bugs are per-runtime, not centralized

Decision: Operators explicitly choose who can run code; there is no automatic discovery or federation.

Date: 2026-05-01

Context#

Traditional serverless (AWS Lambda, Cloudflare Workers) has a clear trust model: you trust the provider, period. Decentralized alternatives often propose automatic discovery and federation, which creates new trust problems.

Decision#

Trust is explicit and social:

Operators choose bundles: Via allowlist, blocklist, or open mode
Authors choose runners: By manually configuring which runner(s) to use
No automatic discovery: Runners don't announce themselves to a registry
No federation: Each runner is independent

This mirrors how the AT Protocol itself works — you follow people you know, not everyone.

Access Control#

[access]
mode = "allowlist"  # or "blocklist" or "open"

# Only these DIDs can run bundles
allowlist = [
  "did:plc:friend1",
  "did:plc:friend2",
]

Consequences#

Positive:

Simple mental model
Operators have full control
No Sybil attacks on "discovery"
Matches AT Protocol philosophy

Negative:

Authors must manually find runners
No marketplace/registry (for now)
Fragmented ecosystem possible

Future Consideration#

Optional announcement could be added later:

Post runner capabilities to your PDS
Others discover via social graph
Still no central registry

Nix for Runtime Builds#

Decision: Use Nix to define and build all runtime environments.

Date: 2026-05-01

Context#

Bundles must behave identically across:

Development (macOS, direct execution)
Container isolation (Linux VPS)
VM isolation (bare-metal)

Docker images alone don't solve this — you still need to define what goes in them, and dev machines often can't run Docker efficiently.

Decision#

Nix defines each runtime declaratively:

{
  packages = [ pkgs.deno pkgs.jq ];
  
  # Script that handles execution
  at-run-exec = writeShellScriptBin "at-run-exec" ''
    ...
  '';
  
  # OCI image for containers
  ociImage = dockerTools.buildImage { ... };
  
  # Rootfs for Firecracker
  rootfs = makeExt4 { ... };
}

One definition, multiple outputs, identical behavior.

Consequences#

Positive:

Reproducible builds
Same runtime dev → prod
Easy to customize/extend
Hermetic (no "works on my machine")

Negative:

Nix has steep learning curve
Build times can be long (first time)
Extra dependency for operators

Future Decisions#

Topics that need decisions as development continues:

Secrets management: How are secrets encrypted/decrypted?
Bundle signing: Should bundles be signed? By whom?
Rate limiting: Built-in or middleware only?
Multi-tenancy: Separate pools per DID?
Persistence: Stateful bundles? How?

Configure Feed

Configure Feed

Design Decisions#

Isolation Backends#

Context#

Options Considered#

Decision#

Auto-Detection#

Runtime Artifacts#

Security Model#

Consequences#

Runtime Executor Pattern#

Context#

Decision#

ExecRequest Format#

ExecResponse Format#

Consequences#

Social Trust Model#

Context#

Decision#

Access Control#

Consequences#

Future Consideration#

Nix for Runtime Builds#

Context#

Decision#

Consequences#

Future Decisions#