this repo has no description
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add comprehensive plan for making Darling fully capable of running Nix

Split into focused documents under plan/ to keep context manageable:

- 00-background: motivation, prior art, current state analysis
- 01-blockers: 7 identified blockers (lchflags, sandbox-exec, renameatx_np, etc.)
- 02-phase0: flake.nix, devShell with all tools for Zed, .envrc
- 03-phase1: core syscall fixes (setattrlist, renameatx_np, utimensat, etc.)
- 04-phase2: sandbox-exec stub, sandbox API fixes
- 05-phase3: automated Nix installation inside Darling
- 06-phase4: derivation building (trivial → stdenv → binary substitution)
- 07-phase5: nix-daemon, multi-user mode, Directory Services stubs
- 08-phase6: NixOS VM tests, syscall regression suite, GitHub Actions CI
- 09-phase7: Darling as a nix.buildMachines remote builder
- 10-phase8: stretch goals (aarch64-darwin, GUI testing, binary cache)
- 11-architecture: system diagram, key technical decisions, glossary

Inspired by nixie-dev/darling-nix and ersei's 'Nix All The Way Down' blog post.

+4504
+27
PLAN.md
··· 1 + # PLAN: Making Darling Fully Capable of Running Nix 2 + 3 + > **Goal**: Enable Darling (macOS compatibility layer for Linux) to run the Nix 4 + > package manager reliably, so that Linux machines can build, test, and 5 + > cross-compile `x86_64-darwin` Nix derivations — analogous to how Wine enables 6 + > building and testing Windows binaries on Linux. 7 + 8 + The full plan has been split into focused documents to keep context manageable. 9 + See the **[plan/](./plan/)** directory for all details. 10 + 11 + ## Quick Navigation 12 + 13 + | Document | Description | 14 + |---|---| 15 + | [plan/README.md](./plan/README.md) | **Start here** — index, priority table, effort estimates | 16 + | [plan/00-background.md](./plan/00-background.md) | Motivation, what works today, what doesn't | 17 + | [plan/01-blockers.md](./plan/01-blockers.md) | Detailed analysis of each blocking issue | 18 + | [plan/02-phase0-packaging.md](./plan/02-phase0-packaging.md) | `flake.nix`, devShell, `.envrc`, NixOS module | 19 + | [plan/03-phase1-syscalls.md](./plan/03-phase1-syscalls.md) | `setattrlist`, `renameatx_np`, `utimensat`, etc. | 20 + | [plan/04-phase2-sandbox.md](./plan/04-phase2-sandbox.md) | `sandbox-exec` passthrough, sandbox API stubs | 21 + | [plan/05-phase3-nix-install.md](./plan/05-phase3-nix-install.md) | Automated installer, verification, wrappers | 22 + | [plan/06-phase4-building.md](./plan/06-phase4-building.md) | Trivial derivations → stdenv → binary substitution | 23 + | [plan/07-phase5-daemon.md](./plan/07-phase5-daemon.md) | Multi-user mode, Directory Services stubs, launchd | 24 + | [plan/08-phase6-ci.md](./plan/08-phase6-ci.md) | NixOS VM tests, regression suite, GitHub Actions | 25 + | [plan/09-phase7-remote-builder.md](./plan/09-phase7-remote-builder.md) | Darling as a `nix.buildMachines` target | 26 + | [plan/10-phase8-stretch.md](./plan/10-phase8-stretch.md) | `aarch64-darwin`, GUI testing, Hydra builder | 27 + | [plan/11-architecture.md](./plan/11-architecture.md) | System diagram, key technical decisions, glossary |
+104
plan/00-background.md
··· 1 + # Background & Motivation 2 + 3 + ## Why This Matters 4 + 5 + The Nix ecosystem currently has no way to build or test `x86_64-darwin` 6 + derivations without access to real Apple hardware (or a macOS VM that requires 7 + macOS licensing). This is a serious limitation for: 8 + 9 + - **Open-source CI**: Projects that need to verify their Darwin builds cannot do 10 + so on commodity Linux infrastructure. 11 + - **Cross-compilation verification**: Even when cross-compiling *to* Darwin, the 12 + resulting binaries cannot be smoke-tested without macOS. 13 + - **Nixpkgs maintenance**: Darwin breakage often goes unnoticed until a macOS 14 + user reports it. 15 + 16 + [Darling](https://www.darlinghq.org/) is an open-source Darwin/macOS 17 + translation layer for Linux — conceptually the same as Wine, but for macOS 18 + instead of Windows. If Darling can run the Nix package manager and reliably 19 + execute Nix-built Darwin binaries, we unlock the ability to build and test 20 + `x86_64-darwin` packages on Linux. 21 + 22 + ## Prior Art 23 + 24 + The [nixie-dev/darling-nix](https://github.com/nixie-dev/darling-nix) project 25 + has already demonstrated that Darling can be packaged with Nix and integrated 26 + with NixOS module tests. Their overlay builds Darling from source with 27 + `clangStdenv` and provides a `darling` package plus an SDK output with `ld64` 28 + and `ar`/`ranlib` from cctools-port. 29 + 30 + A [blog post by ersei](https://ersei.net/en/blog/nix-all-the-way-down) 31 + documented an end-to-end attempt at installing Nix inside Darling, identifying 32 + concrete blockers along the way. Key findings from that effort: 33 + 34 + - The Nix installer fails because `xmllint`, `diskutil info`, and `dseditgroup` 35 + are missing or unimplemented in Darling. 36 + - The installer forces multi-user mode on Darwin; single-user mode requires 37 + manual patching. 38 + - `lchflags` fails with `EINVAL` during `nix-env` profile installation because 39 + `setattrlist` is not implemented. 40 + - Even after binary-patching `libnixstore.dylib` to skip the `lchflags` error 41 + check, `sandbox-exec` is missing so builds fail. 42 + - Setting `_NIX_TEST_NO_SANDBOX=1` gets past that, but then `mv` crashes on 43 + unimplemented syscall 488 (`renameatx_np`) and `touch` segfaults. 44 + - Various other programs (e.g. `fish`) crash with `Illegal instruction` due to 45 + incomplete syscall coverage. 46 + - After extensive workarounds (replacing broken coreutils in the store, removing 47 + docs from home-manager, etc.), Nix + home-manager + neovim were eventually 48 + made to work, but the result was fragile and required many manual 49 + interventions. 50 + 51 + This plan synthesizes those findings with our own code analysis of the Darling 52 + source tree into an actionable roadmap. 53 + 54 + --- 55 + 56 + ## Current State of Affairs 57 + 58 + ### What Works 59 + 60 + - Darling boots a macOS-like container with `darling shell`. 61 + - Basic command-line utilities (`echo`, `ls`, `cp`, etc.) function. 62 + - The Darling prefix (`~/.darling`) provides an overlayfs-backed macOS-like 63 + filesystem hierarchy. 64 + - DMG/XIP images can be mounted and Xcode command-line tools can be installed. 65 + - Simple C programs can be compiled and executed using Apple's toolchain. 66 + - Darling can be built with Nix via the `nixie-dev/darling-nix` overlay and 67 + ships as part of upstream nixpkgs. 68 + - The `darlingserver` provides userspace syscall translation (no kernel module 69 + required on modern builds). 70 + 71 + ### What Does Not Work (for Nix) 72 + 73 + | Issue | Root Cause | Severity | 74 + |---|---|---| 75 + | `lchflags()` returns `EINVAL` | `setattrlist()` not implemented | **Blocker** | 76 + | `/usr/bin/sandbox-exec` missing | Sandbox framework is stubbed | **Blocker** | 77 + | `mv` crashes (`Unimplemented syscall 488`) | `renameatx_np` / `renameat2` not implemented | **Blocker** | 78 + | `touch` segfaults | Likely missing `utimensat` or file-flag syscall | **Blocker** | 79 + | Nix installer forces multi-user on Darwin | Installer script checks `uname` | High | 80 + | `diskutil info` not implemented | `diskutil` is a shell script supporting only `eject` | Medium | 81 + | `xmllint` missing | Not shipped in Darling | Medium | 82 + | `dseditgroup` / Directory Services missing | User/group management unimplemented | Medium | 83 + | `posix_spawn` + `POSIX_SPAWN_SETEXEC` → `ENOEXEC` | Incomplete `posix_spawn` attribute support | High | 84 + | dyld cache load errors | Shared cache not generated for prefix | Medium | 85 + | Sporadic segfaults in various programs | Incomplete syscall/ABI coverage | High | 86 + | Darling reports macOS 10.15 (Catalina) | Newer Nix binaries target ≥ 11.0 | Medium | 87 + 88 + ### Relevant Source Locations in This Repo 89 + 90 + | Area | Path | Notes | 91 + |---|---|---| 92 + | Sandbox stubs | `src/sandbox/sandbox.c` | All functions return "Not implemented" or 0 | 93 + | Sandbox library | `src/libsandbox/` | `libsandbox.1.dylib` — thin shim | 94 + | Syscall translation | `src/external/darlingserver/` | Submodule (empty until checked out) | 95 + | libc wrappers | `src/external/libc/` | Darwin libc with BSD syscall wrappers | 96 + | launchd | `src/launchd/` | Process management, uses `posix_spawn` | 97 + | diskutil | `src/diskutil/diskutil` | Shell script, only supports `eject` verb | 98 + | duct tape shims | `src/duct/src/` | Minimal stubs for `acl`, `dns_sd`, etc. | 99 + | Build system | `CMakeLists.txt` | Top-level; deployment target is 11.0 | 100 + | CI (current) | `.github/workflows/actions.yaml` | Debian-only, no Nix | 101 + 102 + --- 103 + 104 + *Next: [Known Blockers →](./01-blockers.md)*
+241
plan/01-blockers.md
··· 1 + # Known Blockers 2 + 3 + Detailed analysis of each issue that prevents Nix from running inside Darling, 4 + with fix strategies and pointers into the codebase. 5 + 6 + --- 7 + 8 + ## B1: `lchflags` / `setattrlist` Failure 9 + 10 + **Symptom**: Running `nix-env` to install a package fails with: 11 + ``` 12 + error: clearing flags of path '/nix/store/…-user-environment/bin': Invalid argument 13 + ``` 14 + 15 + **What happens**: Nix's store optimisation code (in `libnixstore`) calls 16 + `lchflags(path, 0)` to clear `UF_IMMUTABLE` before garbage collection. The 17 + relevant Nix source: 18 + 19 + ```c 20 + #if __APPLE__ 21 + if (lchflags(path.c_str(), 0)) { 22 + if (errno != ENOTSUP) 23 + throw SysError("clearing flags of path '%1%'", path); 24 + } 25 + #endif 26 + ``` 27 + 28 + On macOS, `lchflags()` is emulated via `setattrlist(2)`. Darling does not 29 + implement `setattrlist`, so the underlying syscall fails with `EINVAL`. 30 + 31 + **Location in Darling**: 32 + - Syscall translation: `src/external/darlingserver/` (submodule) 33 + - libc wrapper: `src/external/libc/` 34 + 35 + **Fix strategy**: 36 + 1. Implement `setattrlist` / `fsetattrlist` in darlingserver's BSD syscall 37 + handler. At minimum, handle `ATTR_CMN_FLAGS` (clearing `UF_IMMUTABLE`). 38 + 2. Return success (0) for attribute sets that have no Linux equivalent but are 39 + benign to ignore (e.g., Finder info, extended security). 40 + 3. Ensure `lchflags(path, 0)` returns 0 rather than `EINVAL`. 41 + 4. Long-term: implement a proper `UF_IMMUTABLE` ↔ `FS_IMMUTABLE_FL` mapping 42 + via `ioctl(FS_IOC_SETFLAGS)`. 43 + 44 + **Workaround (from blog post)**: Binary-patch `libnixstore.dylib` — replace the 45 + `je` (jump-if-equal) after the `lchflags` call with `jmp` (unconditional jump) 46 + to skip the error path. This is fragile and version-specific. 47 + 48 + **Effort**: Medium — needs darlingserver changes + libc verification. 49 + 50 + --- 51 + 52 + ## B2: Missing `sandbox-exec` 53 + 54 + **Symptom**: `nix-build` of any derivation fails with: 55 + ``` 56 + error: executing '/bin/bash': Bad file descriptor 57 + ``` 58 + 59 + **What happens**: Nix on Darwin wraps every builder invocation with: 60 + ``` 61 + /usr/bin/sandbox-exec -f <profile> -D _GLOBAL_TMP_DIR=... <builder> 62 + ``` 63 + 64 + The binary `/usr/bin/sandbox-exec` does not exist in the Darling prefix. 65 + `posix_spawn` is called with `sandbox-exec` as the executable, which returns 66 + `ENOEXEC`. Nix reports this misleadingly as "Bad file descriptor". 67 + 68 + The sandbox API in `src/sandbox/sandbox.c` is entirely stubbed: 69 + ```c 70 + int sandbox_init(const char *profile, uint64_t flags, char **errorbuf) 71 + { 72 + *errorbuf = strdup("Not implemented"); 73 + return 0; 74 + } 75 + ``` 76 + 77 + **Fix strategy** (two options, not mutually exclusive): 78 + 79 + - **Option A — Stub `sandbox-exec` (MVP)**: Ship a `/usr/bin/sandbox-exec` 80 + shell script or small C program that: 81 + - Parses `-f <profile>` and `-D <key>=<value>` arguments (discards them). 82 + - `exec`s the remaining arguments as the builder command. 83 + - Darling already provides Linux-level isolation via namespaces and the 84 + darlingserver container, so skipping the macOS sandbox is safe. 85 + 86 + - **Option B — Translate to Linux sandboxing (stretch)**: Parse Apple's Sandbox 87 + Profile Language (Scheme-based `.sb` files) and map rules to Linux 88 + equivalents (Landlock, seccomp-bpf, namespaces). Large effort, not needed 89 + for MVP. 90 + 91 + **Workaround (from blog post)**: Set `_NIX_TEST_NO_SANDBOX=1` — this is an 92 + internal Nix environment variable that bypasses sandbox-exec. Works but is not 93 + meant for production use. 94 + 95 + **Effort**: Small for Option A (a few hours), Large for Option B (weeks). 96 + 97 + --- 98 + 99 + ## B3: Unimplemented Syscall 488 (`renameatx_np`) 100 + 101 + **Symptom**: `mv` from Nix's coreutils crashes: 102 + ``` 103 + Unimplemented syscall (488) 104 + ``` 105 + This breaks derivation builds that need to move files (very common). 106 + 107 + **What happens**: macOS syscall 488 is `renameatx_np`, which extends `rename` 108 + with atomic swap and exclusive-create semantics. Modern Darwin coreutils 109 + (fetched from the Nix binary cache) use this syscall. Darling's syscall table 110 + does not have an entry for it. 111 + 112 + **Fix strategy**: Implement `renameatx_np` by translating to Linux's 113 + `renameat2(2)`. The flag mapping: 114 + 115 + | macOS Flag | Value | Linux Equivalent | 116 + |---|---|---| 117 + | `RENAME_SWAP` | `0x00000002` | `RENAME_EXCHANGE` | 118 + | `RENAME_EXCL` | `0x00000004` | `RENAME_NOREPLACE` | 119 + 120 + When no flags are set, fall through to plain `renameat`. 121 + 122 + **Location**: darlingserver syscall table (`src/external/darlingserver/`). 123 + 124 + **Workaround (from blog post)**: Replace the Nix store's `mv` binary with 125 + Darling's built-in `/bin/mv` that uses older syscalls. This is a "Nix crime" 126 + (modifying store paths) and breaks reproducibility. 127 + 128 + **Effort**: Small — straightforward syscall mapping, well-defined semantics. 129 + 130 + --- 131 + 132 + ## B4: `touch` / `utimensat` Crash 133 + 134 + **Symptom**: Running `touch` from Nix's coreutils causes: 135 + ``` 136 + Segmentation fault: 11 (core dumped) 137 + ``` 138 + This breaks derivation builds (e.g., neovim's build script calls `touch`). 139 + 140 + **What happens**: The Nix-provided `touch` (compiled for newer macOS) likely 141 + uses `setattrlistat` or a `utimensat`-related path that is missing or buggy in 142 + Darling. It may also be related to the `setattrlist` gap from B1 — `touch -t` 143 + on macOS can go through `setattrlist` to set modification times. 144 + 145 + **Fix strategy**: 146 + 1. Audit the `utimensat` / `futimens` translation in darlingserver. 147 + 2. Ensure `UTIME_NOW` and `UTIME_OMIT` sentinel values are correctly handled. 148 + 3. If the crash is in `setattrlistat`, fixing B1 may resolve this too. 149 + 4. Test with Nix's specific coreutils version. 150 + 151 + **Workaround (from blog post)**: Same as B3 — replace the store's `touch` with 152 + Darling's built-in version. 153 + 154 + **Effort**: Medium — needs debugging to pinpoint exact crash location. 155 + 156 + --- 157 + 158 + ## B5: `posix_spawn` with `POSIX_SPAWN_SETEXEC` Returns `ENOEXEC` 159 + 160 + **Symptom**: Even when `sandbox-exec` or other executables exist, `posix_spawn` 161 + with the `POSIX_SPAWN_SETEXEC` flag (which makes it behave like `exec`) can 162 + return `ENOEXEC` for certain binaries. 163 + 164 + **What happens**: The `POSIX_SPAWN_SETEXEC` flag is used by Nix's sandbox setup 165 + and by launchd (`src/launchd/src/core.c:4553`). If darlingserver doesn't fully 166 + support this flag in its `posix_spawn` implementation, the caller gets 167 + `ENOEXEC` and reports confusing errors. 168 + 169 + **Fix strategy**: Verify `POSIX_SPAWN_SETEXEC` handling in darlingserver's 170 + `posix_spawn` implementation. Ensure it correctly replaces the current process 171 + image (like `execve`) rather than spawning a child. 172 + 173 + **Effort**: Medium — requires darlingserver debugging. 174 + 175 + --- 176 + 177 + ## B6: macOS Version Mismatch 178 + 179 + **Symptom**: Various subtle failures due to Darling reporting macOS 10.15 180 + (Catalina) while Nix's pre-built Darwin binaries increasingly target macOS 11.0+ 181 + (Big Sur). 182 + 183 + **What happens**: The Nix binary cache serves binaries built with 184 + `-mmacosx-version-min=11.0` or higher. These binaries may use APIs or syscalls 185 + that were introduced in Big Sur and aren't present in Darling's Catalina-era 186 + libraries. 187 + 188 + **Note**: The `CMakeLists.txt` already sets `CMAKE_OSX_DEPLOYMENT_TARGET` to 189 + `11.0`, but the runtime environment (`sw_vers`, `SystemVersion.plist`) may still 190 + report 10.15. 191 + 192 + **Fix strategy**: 193 + 1. Update `sw_vers` / `SystemVersion.plist` in the Darling prefix to report 11.0. 194 + 2. Audit `__MAC_OS_X_VERSION_MIN_REQUIRED` availability guards in Darling's 195 + libc, libSystem, and frameworks. 196 + 3. Ensure there are no code paths gated on version checks that disable 197 + functionality we need. 198 + 199 + **Effort**: Medium — version bumps can have cascading effects. 200 + 201 + --- 202 + 203 + ## B7: dyld Shared Cache 204 + 205 + **Symptom**: Some binaries print on startup: 206 + ``` 207 + dyld: dyld cache load error: shared cache file open() failed 208 + ``` 209 + 210 + **What happens**: macOS ships a pre-linked shared cache 211 + (`/System/Library/dyld/dyld_shared_cache_x86_64`) that contains all system 212 + libraries. Darling may not generate this cache, forcing `dyld` to fall back to 213 + loading individual `.dylib` files. This usually works but can cause errors with 214 + binaries that assume the cache exists. 215 + 216 + **Fix strategy**: 217 + 1. Determine if Darling generates a shared cache during prefix initialization. 218 + 2. If not, add a cache generation step or ensure the fallback path works 219 + reliably. 220 + 3. This is lower priority — most Nix binaries link against Nix-provided 221 + libraries, not system ones. 222 + 223 + **Effort**: Medium-to-Large depending on root cause. 224 + 225 + --- 226 + 227 + ## Blocker Dependency Graph 228 + 229 + ``` 230 + B1 (setattrlist) ──→ B4 (touch/utimensat) may share root cause 231 + B2 (sandbox-exec) ──→ B5 (posix_spawn) related but independent 232 + B3 (renameatx_np) standalone 233 + B6 (version) affects everything subtly 234 + B7 (dyld cache) standalone, lower priority 235 + ``` 236 + 237 + **Recommended fix order**: B2 (quick win) → B3 (quick win) → B1 → B4 → B5 → B6 → B7 238 + 239 + --- 240 + 241 + *[← Background](./00-background.md) | [Phase 0 — Packaging →](./02-phase0-packaging.md)*
+207
plan/02-phase0-packaging.md
··· 1 + # Phase 0 — Nix Packaging + DevShell 2 + 3 + **Priority**: P0 · **Effort**: S (1–2 weeks) · **Depends on**: Nothing 4 + 5 + This is the foundation phase. Before any Darling hacking begins, we need a 6 + reproducible build, a developer shell with all required tools, and editor 7 + integration so that contributors (human and AI) can be productive immediately. 8 + 9 + --- 10 + 11 + ## Tasks 12 + 13 + ### 0.1 — Add `flake.nix` 14 + 15 + Create a `flake.nix` at the repo root that exposes: 16 + 17 + - `packages.x86_64-linux.darling` — the main Darling binary + prefix 18 + - `packages.x86_64-linux.darling-sdk` — macOS SDK + cctools (`ld64`, `ar`, 19 + `ranlib`) for cross-compilation 20 + 21 + Use the [nixie-dev/darling-nix](https://github.com/nixie-dev/darling-nix) 22 + packaging as a reference. Their `packages/darling/default.nix` demonstrates: 23 + 24 + - Building with `clangStdenv` 25 + - A `ccWrapperBypass` that detects `-target *darwin*` and calls the unwrapped 26 + compiler to avoid `cc-wrapper` interfering with Darwin cross-compilation 27 + - Splitting the SDK into a separate output 28 + - Post-fixup checks that ensure no `/nix/store` paths leak into the Darling 29 + root (which would break the prefix overlay) 30 + 31 + Key decisions: 32 + 33 + - Pin `nixpkgs` input to a recent stable release. 34 + - Use `fetchFromGitHub` with `fetchSubmodules = true` to get all submodules 35 + (there are 100+ in `.gitmodules`). 36 + - Strip large test directories from the source to stay under Hydra output limits 37 + (see the `postFetch` in the reference packaging). 38 + 39 + ### 0.2 — Add NixOS Module 40 + 41 + Create `nixosModules.darling` that: 42 + 43 + - Ensures the Darling binary is installed. 44 + - Configures darlingserver's userspace-only mode (no kernel module required on 45 + modern kernels with `overlayfs` + user namespaces). 46 + - Sets up `/etc/darling` configuration if needed. 47 + - Optionally provides a `darling-prefix.service` systemd unit for persistent 48 + prefixes. 49 + 50 + ### 0.3 — Set Up Binary Cache 51 + 52 + - Create a [Cachix](https://cachix.org/) cache (or equivalent) for CI-built 53 + artifacts. 54 + - Add `nixConfig.extra-substituters` and `nixConfig.extra-trusted-public-keys` 55 + to `flake.nix` so users automatically use the cache. 56 + - Document the cache setup in the repo README. 57 + 58 + ### 0.4 — Pin Submodules 59 + 60 + The `darlingserver` submodule (at `src/external/darlingserver/`) is empty in a 61 + shallow checkout. Ensure the flake's `fetchFromGitHub` with 62 + `fetchSubmodules = true` captures it, so the build is fully reproducible from 63 + a single source fetch. 64 + 65 + Verify all 100+ submodules listed in `.gitmodules` are resolved. If any fail, 66 + pin their commits explicitly. 67 + 68 + ### 0.5 — Add `devShell` 69 + 70 + Add `devShells.x86_64-linux.default` to the flake. This shell must provide 71 + every tool needed to build Darling, debug issues, and work comfortably in Zed. 72 + 73 + **Build dependencies** (same as `nativeBuildInputs` for the Darling package): 74 + 75 + - `clang` / `clangStdenv.cc` 76 + - `cmake` 77 + - `ninja` 78 + - `pkg-config` 79 + - `bison` 80 + - `flex` 81 + - `python3` 82 + - `makeWrapper` 83 + 84 + **Runtime & library dependencies** (same as `buildInputs`): 85 + 86 + - `freetype`, `libjpeg`, `libpng`, `libtiff`, `giflib` 87 + - `libX11`, `libXext`, `libXrandr`, `libXcursor`, `libxkbfile` 88 + - `cairo`, `libglvnd`, `fontconfig`, `dbus`, `libGLU` 89 + - `fuse`, `ffmpeg`, `pulseaudio` 90 + - `libbsd`, `openssl` 91 + - Linux headers (`stdenv.cc.libc.linuxHeaders`) 92 + 93 + **Debugging & analysis tools**: 94 + 95 + - `gdb` — for debugging crashes inside Darling / darlingserver 96 + - `strace` — for tracing Linux syscalls made by darlingserver 97 + - `rizin` — for binary analysis / patching (used in the blog post to patch 98 + `libnixstore.dylib`) 99 + - `file` — for identifying binary types (Mach-O vs ELF) 100 + 101 + **Code exploration**: 102 + 103 + - `ripgrep` — fast grep across the large codebase 104 + - `fd` — fast find 105 + - `jq` — JSON processing (useful for Nix evaluation debugging) 106 + 107 + **Nix tooling** (critical for Zed integration): 108 + 109 + - `nil` or `nixd` — Nix language server, so Zed provides completions, 110 + diagnostics, and go-to-definition for `.nix` files 111 + - `nixfmt-rfc-style` — Nix formatter 112 + 113 + **C/C++ tooling** (critical for Zed integration): 114 + 115 + - `clang-tools` — provides `clangd` for C/C++ language server support in Zed 116 + - `bear` or `cmake`'s `CMAKE_EXPORT_COMPILE_COMMANDS` — for generating 117 + `compile_commands.json` so `clangd` understands the build 118 + 119 + **Example structure**: 120 + 121 + ```nix 122 + devShells.x86_64-linux.default = pkgs.mkShell.override { stdenv = pkgs.clangStdenv; } { 123 + packages = with pkgs; [ 124 + # Build 125 + cmake ninja pkg-config bison flex python3 makeWrapper 126 + 127 + # Libraries (for cmake to find) 128 + freetype libjpeg libpng libtiff giflib 129 + libX11 libXext libXrandr libXcursor libxkbfile 130 + cairo libglvnd fontconfig dbus libGLU 131 + fuse ffmpeg pulseaudio 132 + libbsd openssl 133 + 134 + # Debug 135 + gdb strace rizin file 136 + 137 + # Code exploration 138 + ripgrep fd jq 139 + 140 + # Nix tooling (for Zed) 141 + nil nixfmt-rfc-style 142 + 143 + # C/C++ tooling (for Zed) 144 + clang-tools 145 + ]; 146 + 147 + CMAKE_EXPORT_COMPILE_COMMANDS = "1"; 148 + }; 149 + ``` 150 + 151 + ### 0.6 — Add `.envrc` 152 + 153 + Create a `.envrc` at the repo root: 154 + 155 + ```bash 156 + use flake 157 + ``` 158 + 159 + This single line is all that's needed. When `direnv` is installed (which it 160 + should be on any NixOS or nix-with-direnv setup), entering the project directory 161 + will: 162 + 163 + 1. Evaluate the flake's `devShell`. 164 + 2. Export all environment variables (paths to tools, library paths, etc.). 165 + 3. Make tools available to the shell **and** to Zed (which reads direnv state). 166 + 167 + **Why this matters for Zed**: Zed discovers language servers, formatters, and 168 + other tools through the environment. Without `.envrc` + direnv, Zed won't find 169 + `clangd`, `nil`, or `nixfmt` — meaning no LSP support, no inline errors, and 170 + no formatting. With it, everything works automatically the moment you open the 171 + project. 172 + 173 + Add `.envrc` to `.gitignore` exclusions (make sure it's NOT ignored) and add 174 + `.direnv/` to `.gitignore` (the cache directory should be ignored). 175 + 176 + --- 177 + 178 + ## Verification Checklist 179 + 180 + After completing Phase 0, the following should all work: 181 + 182 + - [ ] `nix build .#darling` produces a working Darling installation 183 + - [ ] `nix build .#darling-sdk` produces the SDK with `ld64`, `ar`, `ranlib` 184 + - [ ] `nix develop` drops into a shell with `cmake`, `clang`, `gdb`, `nil`, etc. 185 + - [ ] `cd`-ing into the repo with direnv enabled loads the devShell automatically 186 + - [ ] Opening the repo in Zed shows Nix LSP working (completions in `.nix` files) 187 + - [ ] Opening a `.c` file in Zed shows `clangd` providing diagnostics 188 + - [ ] `darling shell echo Hello` works from the built package 189 + - [ ] `nix flake check` passes 190 + 191 + --- 192 + 193 + ## Notes 194 + 195 + - The devShell is intentionally **large**. This is a complex C/C++/Objective-C 196 + project with 100+ submodules, and developers need the full toolkit available 197 + without hunting for dependencies. 198 + - `CMAKE_EXPORT_COMPILE_COMMANDS=1` is set in the devShell so that any cmake 199 + configure run produces `compile_commands.json`, which `clangd` needs. 200 + Alternatively, contributors can run `bear -- cmake --build build/` to 201 + generate it. 202 + - The `.envrc` should be committed to the repo (not gitignored) so every 203 + contributor gets the same experience. Only `.direnv/` (the cache) is ignored. 204 + 205 + --- 206 + 207 + *[← Known Blockers](./01-blockers.md) | [Phase 1 — Syscall Fixes →](./03-phase1-syscalls.md)*
+333
plan/03-phase1-syscalls.md
··· 1 + # Phase 1 — Core Syscall & API Fixes 2 + 3 + **Priority**: P0 · **Effort**: L (4–8 weeks) · **Depends on**: Phase 0 4 + 5 + These are the minimum changes needed for Nix binaries to not crash on startup 6 + and for basic Nix operations (eval, install, build) to function inside Darling. 7 + 8 + All syscall work happens in the `darlingserver` submodule 9 + (`src/external/darlingserver/`) and/or the libc wrappers in 10 + `src/external/libc/`. Some fixes may also touch the XNU syscall shim layer in 11 + `src/external/xnu/darling/src/libsystem_kernel/`. 12 + 13 + --- 14 + 15 + ## Tasks 16 + 17 + ### 1.1 — Implement `setattrlist` / `fsetattrlist` / `getattrlist` 18 + 19 + **Resolves**: [Blocker B1](./01-blockers.md#b1-lchflags--setattrlist-failure), 20 + partially [B4](./01-blockers.md#b4-touch--utimensat-crash) 21 + 22 + **What to do**: 23 + 24 + Add the `setattrlist(2)` and `fsetattrlist(2)` BSD syscalls to darlingserver's 25 + syscall handler. These are the underlying calls that `lchflags`, `chflags`, 26 + `utimes`, and other file-metadata functions use on macOS. 27 + 28 + **Minimum viable implementation**: 29 + 30 + | Attribute Group | Attribute | Action | 31 + |---|---|---| 32 + | `ATTR_CMN_FLAGS` | `UF_IMMUTABLE` | Map to `FS_IMMUTABLE_FL` via `ioctl(FS_IOC_SETFLAGS)`, or silently succeed when clearing (value = 0) | 33 + | `ATTR_CMN_FLAGS` | All other flags | Return 0 (success), ignore silently | 34 + | `ATTR_CMN_MODTIME` | modification time | Translate to `utimensat` on the Linux side | 35 + | `ATTR_CMN_ACCTIME` | access time | Translate to `utimensat` on the Linux side | 36 + | `ATTR_CMN_CRTIME` | creation time | Silently ignore (ext4/btrfs don't expose birth time for writing) | 37 + | Everything else | — | Return `ENOTSUP` or 0 depending on whether ignoring is safe | 38 + 39 + Also implement `getattrlist(2)` at minimum for `ATTR_CMN_FLAGS` so that 40 + programs that read-then-modify flags don't crash. 41 + 42 + **Files to modify**: 43 + 44 + - `src/external/darlingserver/` — syscall dispatch table, new handler 45 + - `src/external/xnu/darling/src/libsystem_kernel/` — ensure the Mach trap / 46 + BSD syscall number is wired through 47 + - `src/external/libc/` — verify the userspace `setattrlist()` wrapper calls the 48 + correct syscall number 49 + 50 + **Testing**: Write a small C program that calls `lchflags(path, 0)` and 51 + `setattrlist()` with `ATTR_CMN_FLAGS`. Run inside `darling shell`. Must return 0. 52 + 53 + --- 54 + 55 + ### 1.2 — Fix `lchflags` Return Value 56 + 57 + **Resolves**: [Blocker B1](./01-blockers.md#b1-lchflags--setattrlist-failure) 58 + 59 + **What to do**: 60 + 61 + Once `setattrlist` is implemented (1.1), verify that `lchflags(path, 0)` returns 62 + 0 (success). On macOS, `lchflags` is implemented as: 63 + 64 + ```c 65 + int lchflags(const char *path, int flags) { 66 + struct attrlist attrlist; 67 + memset(&attrlist, 0, sizeof(attrlist)); 68 + attrlist.bitmapcount = ATTR_BIT_MAP_COUNT; 69 + attrlist.commonattr = ATTR_CMN_FLAGS; 70 + return setattrlist(path, &attrlist, &flags, sizeof(flags), 71 + FSOPT_NOFOLLOW); 72 + } 73 + ``` 74 + 75 + Trace through `src/external/libc/` to confirm this is what Darling's libc does, 76 + and that the `FSOPT_NOFOLLOW` option is respected (i.e., the Linux side uses 77 + `fstatat` / `utimensat` with `AT_SYMLINK_NOFOLLOW` rather than following 78 + symlinks). 79 + 80 + **Testing**: Same test program as 1.1. Additionally, copy the exact Nix 81 + `nix-env` invocation from the blog post and verify it completes without the 82 + "clearing flags" error. 83 + 84 + --- 85 + 86 + ### 1.3 — Implement `renameatx_np` (Syscall 488) 87 + 88 + **Resolves**: [Blocker B3](./01-blockers.md#b3-unimplemented-syscall-488-renameatx_np) 89 + 90 + **What to do**: 91 + 92 + Add syscall 488 (`renameatx_np`) to darlingserver's BSD syscall table. Translate 93 + directly to Linux's `renameat2(2)`. 94 + 95 + **Signature**: 96 + 97 + ```c 98 + int renameatx_np(int fromfd, const char *from, int tofd, const char *to, 99 + unsigned int flags); 100 + ``` 101 + 102 + **Flag translation**: 103 + 104 + | macOS Flag | macOS Value | Linux Equivalent | Linux Value | 105 + |---|---|---|---| 106 + | `RENAME_SWAP` | `0x00000002` | `RENAME_EXCHANGE` | `(1 << 1)` | 107 + | `RENAME_EXCL` | `0x00000004` | `RENAME_NOREPLACE` | `(1 << 0)` | 108 + | (none / 0) | `0x00000000` | (none — use plain `renameat`) | `0` | 109 + 110 + **Edge cases**: 111 + 112 + - If both `RENAME_SWAP` and `RENAME_EXCL` are set, return `EINVAL` (same as 113 + macOS behavior). 114 + - If the underlying Linux filesystem doesn't support `renameat2` flags (e.g., 115 + NFS), return `ENOTSUP`. 116 + 117 + **Files to modify**: 118 + 119 + - `src/external/darlingserver/` — add syscall 488 to the dispatch table 120 + - `src/external/xnu/darling/src/libsystem_kernel/` — wire the BSD syscall number 121 + 122 + **Testing**: Write a C program that: 123 + 1. Creates two files. 124 + 2. Calls `renameatx_np` with `RENAME_SWAP` to atomically swap them. 125 + 3. Calls `renameatx_np` with `RENAME_EXCL` to rename with exclusive semantics. 126 + 4. Verifies contents are correct after each operation. 127 + 128 + Also verify that Nix's `mv` (from coreutils) no longer crashes. 129 + 130 + --- 131 + 132 + ### 1.4 — Audit and Fix `utimensat` / `futimens` 133 + 134 + **Resolves**: [Blocker B4](./01-blockers.md#b4-touch--utimensat-crash) 135 + 136 + **What to do**: 137 + 138 + The Nix-provided `touch` (from coreutils, built for Darwin) segfaults. Debug 139 + this to determine the exact failing call. Likely candidates: 140 + 141 + 1. `utimensat` with `UTIME_NOW` or `UTIME_OMIT` sentinel values not being 142 + translated correctly. 143 + 2. `setattrlistat` (a variant of `setattrlist` with `at`-style directory fd) not 144 + being implemented. If `touch` uses this path, it will crash since 145 + `setattrlist` is missing (see 1.1). 146 + 3. A NULL-pointer dereference in the syscall translation layer when handling 147 + edge cases. 148 + 149 + **Debug approach**: 150 + 151 + ```bash 152 + # On the Linux host, trace darlingserver's syscalls: 153 + strace -f -p $(pidof darlingserver) -e trace=utimensat,openat,fstatat 2>&1 | head -100 154 + 155 + # Inside darling shell, with xtrace: 156 + DARLING_XTRACE=1 /nix/store/.../bin/touch /tmp/testfile 157 + ``` 158 + 159 + **Fix**: Ensure the `utimensat` handler in darlingserver: 160 + 161 + - Accepts `UTIME_NOW` (`((1 << 30) - 1)` on macOS, same on Linux) and passes 162 + it through. 163 + - Accepts `UTIME_OMIT` (`((1 << 30) - 2)` on macOS, same on Linux) and passes 164 + it through. 165 + - Handles `AT_FDCWD` correctly as the directory file descriptor. 166 + - Does not dereference NULL `timespec` pointers (which means "set to current 167 + time" on both platforms). 168 + 169 + **Testing**: `touch /tmp/testfile` inside darling shell with Nix's coreutils 170 + must not segfault. Also test `touch -t 202301011200 /tmp/testfile` (explicit 171 + timestamp). 172 + 173 + --- 174 + 175 + ### 1.5 — Implement or Stub `clonefile` / `fclonefileat` (Syscall 462) 176 + 177 + **Resolves**: Potential build failures when Nix optimises store copies. 178 + 179 + **What to do**: 180 + 181 + Nix uses `clonefile` on APFS for copy-on-write file duplication (much faster 182 + than `cp`). On Linux, the equivalents are `ioctl(FICLONE)` (for btrfs/XFS) or 183 + `copy_file_range`. 184 + 185 + **Implementation options** (in order of preference): 186 + 187 + 1. **Translate to `ioctl(FICLONE)`** if the underlying Linux filesystem supports 188 + it (btrfs, XFS). This preserves the CoW semantics. 189 + 2. **Translate to `copy_file_range`** as a fallback — not CoW but still 190 + efficient (kernel-side copy, no userspace buffering). 191 + 3. **Return `ENOTSUP`** — Nix will fall back to regular `read`/`write` copy. 192 + This is the simplest option and is perfectly functional, just slower. 193 + 194 + For MVP, option 3 is fine. Nix handles `ENOTSUP` gracefully. 195 + 196 + **Testing**: Call `clonefile("/tmp/src", "/tmp/dst", 0)` inside darling shell. 197 + Verify it either succeeds or returns `ENOTSUP` (not a crash / unimplemented 198 + syscall error). 199 + 200 + --- 201 + 202 + ### 1.6 — Implement `getentropy` / `CCRandomGenerateBytes` 203 + 204 + **Resolves**: Potential crashes in crypto / hashing code used by Nix and its 205 + dependencies. 206 + 207 + **What to do**: 208 + 209 + `getentropy(buf, len)` is a simple call to fill a buffer with random bytes. Map 210 + it to Linux's `getrandom(buf, len, 0)`. This may already be implemented in 211 + Darling — verify first. 212 + 213 + `CCRandomGenerateBytes` is part of CommonCrypto and calls `getentropy` under the 214 + hood. If `getentropy` works, this should work too. 215 + 216 + **Verify**: 217 + 218 + ```c 219 + #include <sys/random.h> 220 + int main(void) { 221 + char buf[32]; 222 + return getentropy(buf, sizeof(buf)); 223 + } 224 + ``` 225 + 226 + Compile with Apple's clang inside darling shell, run, and check return value. 227 + 228 + --- 229 + 230 + ### 1.7 — Triage Unimplemented Syscalls 231 + 232 + **Resolves**: Reduces "Unimplemented syscall (N)" crashes across the board. 233 + 234 + **What to do**: 235 + 236 + 1. Run a Nix install + trivial build inside darling shell with syscall tracing 237 + enabled. 238 + 2. Collect all "Unimplemented syscall" messages. 239 + 3. Map each syscall number to its name (using XNU headers / the macOS syscall 240 + table). 241 + 4. Categorize: 242 + - **Must fix**: Causes Nix to crash or fail. 243 + - **Should stub**: Called but return value isn't critical (e.g., `kdebug` 244 + tracing calls). Return 0 or `ENOTSUP`. 245 + - **Can ignore**: Informational, doesn't affect execution. 246 + 5. File an issue or task for each "must fix" syscall. 247 + 248 + **Output**: A table in this repo (e.g., `plan/syscall-triage.md`) tracking: 249 + 250 + | Syscall # | Name | Caller | Impact | Status | 251 + |---|---|---|---|---| 252 + | 488 | `renameatx_np` | `mv` (coreutils) | Crash | Fixed (1.3) | 253 + | 462 | `clonefile` | Nix store | Slow fallback | Stubbed (1.5) | 254 + | ... | ... | ... | ... | ... | 255 + 256 + --- 257 + 258 + ### 1.8 — Update Emulated macOS Version 259 + 260 + **Resolves**: [Blocker B6](./01-blockers.md#b6-macos-version-mismatch) 261 + 262 + **What to do**: 263 + 264 + Ensure Darling's runtime environment matches or exceeds the macOS version that 265 + Nix's pre-built Darwin binaries target (currently 11.0 / Big Sur for most 266 + Nixpkgs packages). 267 + 268 + **Check current state**: 269 + 270 + ```bash 271 + darling shell sw_vers 272 + # Expected: ProductVersion: 10.15.x (Catalina) 273 + # Desired: ProductVersion: 11.0 or higher 274 + ``` 275 + 276 + **Steps**: 277 + 278 + 1. Update `SystemVersion.plist` in the Darling prefix (likely in 279 + `src/external/files/` or generated during prefix initialization). 280 + 2. Verify `CMakeLists.txt` already sets `CMAKE_OSX_DEPLOYMENT_TARGET 11.0` 281 + (it does — confirmed in our code analysis). 282 + 3. Audit `__MAC_OS_X_VERSION_MIN_REQUIRED` / `@available` guards in Darling's: 283 + - `src/external/libc/` 284 + - `src/external/corefoundation/` 285 + - `src/external/foundation/` 286 + - `src/external/libdispatch/` 287 + 4. Ensure no code paths are gated behind version checks that would disable 288 + functionality Nix needs (e.g., newer filesystem calls, newer POSIX APIs). 289 + 5. Test that Nix's pre-built `x86_64-darwin` binaries (from `cache.nixos.org`) 290 + launch without version-related `dyld` errors. 291 + 292 + **Risks**: Bumping the version may expose new codepaths that call unimplemented 293 + APIs. This is acceptable — better to surface those issues now than to paper 294 + over them with an old version number. 295 + 296 + --- 297 + 298 + ## Recommended Implementation Order 299 + 300 + ``` 301 + 1.3 (renameatx_np) — quick win, unblocks mv 302 + 303 + 1.1 (setattrlist) — biggest impact, unblocks lchflags + possibly touch 304 + 305 + 1.2 (lchflags verify) — verification step after 1.1 306 + 307 + 1.4 (utimensat) — may be resolved by 1.1, debug to confirm 308 + 309 + 1.5 (clonefile stub) — quick, just return ENOTSUP 310 + 311 + 1.6 (getentropy) — verify first, may already work 312 + 313 + 1.7 (triage) — discovery task, informs remaining work 314 + 315 + 1.8 (version bump) — do last, may surface new issues 316 + ``` 317 + 318 + --- 319 + 320 + ## Verification Checklist 321 + 322 + After completing Phase 1, the following should all work inside `darling shell`: 323 + 324 + - [ ] `lchflags /tmp/testfile 0` returns success (exit code 0) 325 + - [ ] `mv /tmp/a /tmp/b` works with Nix's coreutils `mv` (no "Unimplemented syscall") 326 + - [ ] `touch /tmp/testfile` works with Nix's coreutils `touch` (no segfault) 327 + - [ ] A pre-built Nix binary from `cache.nixos.org` launches without dyld errors 328 + - [ ] `sw_vers` reports macOS 11.0 or higher 329 + - [ ] No "Unimplemented syscall" messages for syscalls in the critical path 330 + 331 + --- 332 + 333 + *[← Phase 0 — Packaging](./02-phase0-packaging.md) | [Phase 2 — Sandbox →](./04-phase2-sandbox.md)*
+288
plan/04-phase2-sandbox.md
··· 1 + # Phase 2 — Sandbox & Build Isolation 2 + 3 + **Priority**: P0 · **Effort**: S (1 week) · **Depends on**: Nothing (can be done in parallel with Phase 1) 4 + 5 + Nix on Darwin relies heavily on `/usr/bin/sandbox-exec` for build isolation. 6 + Every derivation builder is wrapped with it. Since Darling doesn't ship this 7 + binary and the entire sandbox API is stubbed, this is a hard blocker for any 8 + `nix-build` invocation. 9 + 10 + The good news: this is one of the **quickest wins** in the entire plan. A simple 11 + passthrough stub is sufficient because Darling already runs inside a Linux-level 12 + container with namespace isolation. 13 + 14 + --- 15 + 16 + ## Context 17 + 18 + When Nix builds a derivation on Darwin, it does roughly this: 19 + 20 + ``` 21 + posix_spawn(NULL, "/usr/bin/sandbox-exec", 22 + { attributes = POSIX_SPAWN_SETEXEC, file_actions = {} }, 23 + {"sandbox-exec", "-f", "/tmp/nix-build-foo.drv-0/.sandbox.sb", 24 + "-D", "_GLOBAL_TMP_DIR=/tmp", 25 + "/bin/bash", "-e", "/nix/store/...-builder.sh"}, 26 + {env...}) 27 + ``` 28 + 29 + The sandbox profile (`.sb` file) is a Scheme-based DSL that restricts file 30 + access, network access, and process operations. Example from Nix: 31 + 32 + ```scheme 33 + (version 1) 34 + (allow default) 35 + ; Disallow creating setuid/setgid binaries 36 + (deny file-write-setugid) 37 + ``` 38 + 39 + Nix generates these profiles dynamically per-build. The `sandbox-exec` binary 40 + reads the profile, applies the restrictions via macOS's `sandbox_init` API, then 41 + `exec`s the builder. 42 + 43 + Internally, Nix checks for `_NIX_TEST_NO_SANDBOX` to bypass this entirely, but 44 + that's a testing escape hatch, not a supported configuration. 45 + 46 + --- 47 + 48 + ## Tasks 49 + 50 + ### 2.1 — Create `/usr/bin/sandbox-exec` Stub 51 + 52 + Create a stub `sandbox-exec` binary that lives in the Darling prefix at 53 + `/usr/bin/sandbox-exec`. This is the **MVP approach**. 54 + 55 + **Behavior**: 56 + 57 + 1. Parse command-line arguments matching the real `sandbox-exec` interface: 58 + - `-f <profile-path>` — path to a `.sb` sandbox profile (ignored) 59 + - `-p <profile-string>` — inline sandbox profile (ignored) 60 + - `-D <key>=<value>` — parameter definitions for the profile (ignored) 61 + - `-n <name>` — predefined profile name (ignored) 62 + - Everything after the flags is the command to execute. 63 + 2. Ignore all sandbox-related arguments. 64 + 3. `exec` the remaining arguments (the builder command). 65 + 66 + **Implementation options** (pick one): 67 + 68 + #### Option A — Shell script (simplest) 69 + 70 + ```sh 71 + #!/bin/sh 72 + # sandbox-exec stub for Darling 73 + # Ignores sandbox profiles and exec's the builder directly. 74 + # Darling provides Linux-level isolation via namespaces/darlingserver. 75 + 76 + while [ $# -gt 0 ]; do 77 + case "$1" in 78 + -f) shift 2 ;; # skip -f <profile> 79 + -p) shift 2 ;; # skip -p <profile-string> 80 + -n) shift 2 ;; # skip -n <name> 81 + -D) shift 2 ;; # skip -D key=value 82 + -D*) shift ;; # skip -Dkey=value (no space) 83 + *) break ;; 84 + esac 85 + done 86 + 87 + exec "$@" 88 + ``` 89 + 90 + Pros: trivial, no compilation needed. 91 + Cons: requires `/bin/sh` to be working (it is in Darling); slight overhead from 92 + shell parse. 93 + 94 + #### Option B — Small C program (more robust) 95 + 96 + ```c 97 + #include <unistd.h> 98 + #include <string.h> 99 + #include <stdio.h> 100 + 101 + int main(int argc, char *argv[]) { 102 + int i = 1; 103 + while (i < argc) { 104 + if ((strcmp(argv[i], "-f") == 0 || 105 + strcmp(argv[i], "-p") == 0 || 106 + strcmp(argv[i], "-n") == 0 || 107 + strcmp(argv[i], "-D") == 0) && i + 1 < argc) { 108 + i += 2; /* skip flag + argument */ 109 + } else if (strncmp(argv[i], "-D", 2) == 0) { 110 + i += 1; /* skip -Dkey=value */ 111 + } else { 112 + break; 113 + } 114 + } 115 + 116 + if (i >= argc) { 117 + fprintf(stderr, "sandbox-exec: no command specified\n"); 118 + return 1; 119 + } 120 + 121 + execvp(argv[i], &argv[i]); 122 + perror("sandbox-exec: exec"); 123 + return 127; 124 + } 125 + ``` 126 + 127 + Pros: no shell dependency, handles edge cases better, tiny binary. 128 + Cons: needs to be compiled as a Mach-O binary and installed into the prefix. 129 + 130 + **Recommendation**: Start with Option A (shell script) for speed. Replace with 131 + Option B later if any issues arise. 132 + 133 + **Installation**: The stub must be installed during Darling's build/prefix setup. 134 + Add it to the CMake install step or to the prefix initialization script. 135 + 136 + **Location in build system**: Create `src/sandbox-exec/` with the stub and a 137 + `CMakeLists.txt` that installs it to `libexec/darling/usr/bin/sandbox-exec`. 138 + 139 + --- 140 + 141 + ### 2.2 — Fix Sandbox API Stubs 142 + 143 + **Current state** (`src/sandbox/sandbox.c`): 144 + 145 + ```c 146 + int sandbox_init(const char *profile, uint64_t flags, char **errorbuf) 147 + { 148 + *errorbuf = strdup("Not implemented"); 149 + return 0; 150 + } 151 + ``` 152 + 153 + This is subtly wrong: it returns 0 (success) but also sets `*errorbuf` to an 154 + error string. Callers that check `errorbuf != NULL` after a "successful" call 155 + may be confused, or may leak memory expecting `errorbuf` to be NULL on success. 156 + 157 + **Fix**: Set `*errorbuf = NULL` on success: 158 + 159 + ```c 160 + int sandbox_init(const char *profile, uint64_t flags, char **errorbuf) 161 + { 162 + if (errorbuf) 163 + *errorbuf = NULL; 164 + return 0; 165 + } 166 + ``` 167 + 168 + Apply the same fix to: 169 + 170 + - `sandbox_init_with_parameters` 171 + - `sandbox_init_with_extensions` 172 + - `sandbox_wakeup_daemon` (currently returns -1; change to return 0 if callers 173 + expect success, or leave as-is if it's genuinely optional) 174 + 175 + **Files to modify**: `src/sandbox/sandbox.c` 176 + 177 + **Also verify**: `src/libsandbox/src/sandbox.c` (the `libsandbox.1.dylib` shim) 178 + doesn't have the same issue. 179 + 180 + --- 181 + 182 + ### 2.3 — Ensure `sandbox_check` Always Permits 183 + 184 + **Current state** (`src/sandbox/sandbox.c`): 185 + 186 + ```c 187 + int sandbox_check(pid_t pid, const char *operation, 188 + enum sandbox_filter_type type, ...) 189 + { 190 + return 0; 191 + } 192 + ``` 193 + 194 + This is correct — returning 0 means "allowed". Verify that the `_by_audit_token` 195 + variant behaves the same (it does, based on code analysis). No changes needed 196 + unless testing reveals issues. 197 + 198 + --- 199 + 200 + ### 2.4 — (Stretch) Basic Sandbox Profile Language Parsing 201 + 202 + > **This is NOT required for Nix support.** It's documented here for 203 + > completeness and for future contributors who want proper sandbox parity. 204 + 205 + Implement basic parsing of Apple's Sandbox Profile Language (`.sb` files) and 206 + translate deny rules to Linux isolation mechanisms: 207 + 208 + | macOS Sandbox Rule | Linux Equivalent | 209 + |---|---| 210 + | `(deny file-write*)` | Read-only bind mounts or Landlock `LANDLOCK_ACCESS_FS_WRITE_FILE` deny | 211 + | `(deny file-read* (subpath "/private"))` | Landlock path-beneath rule | 212 + | `(deny network*)` | Unshare network namespace (`CLONE_NEWNET`) | 213 + | `(deny network-outbound)` | `iptables` / `nftables` OUTPUT DROP, or network namespace | 214 + | `(deny process-exec)` | `seccomp-bpf` filter on `execve` | 215 + | `(deny process-fork)` | `seccomp-bpf` filter on `clone` / `fork` | 216 + | `(deny file-write-setugid)` | `seccomp-bpf` filter on `fchmod` with setuid/setgid bits | 217 + | `(allow default)` | Baseline: allow everything, then layer on denies | 218 + 219 + This would require: 220 + 221 + 1. A Scheme parser (or at minimum a purpose-built `.sb` parser — the language is 222 + a small subset of Scheme). 223 + 2. Translation logic mapping macOS sandbox operations to Linux syscall filters. 224 + 3. Integration with `sandbox-exec` to apply the translated policy before 225 + `exec`-ing the builder. 226 + 227 + **Effort**: Weeks to months. Not recommended until after Phase 4 is working. 228 + 229 + --- 230 + 231 + ## Security Considerations 232 + 233 + **Q: Is it safe to skip the macOS sandbox?** 234 + 235 + Yes, for the Darling use case: 236 + 237 + 1. **Darling already provides isolation.** The `darlingserver` runs Darling 238 + processes inside a Linux container with namespace isolation (mount, PID, user 239 + namespaces via `overlayfs`). This is comparable to — and arguably stronger 240 + than — macOS's `sandbox-exec` for build isolation purposes. 241 + 242 + 2. **Nix's sandbox is defense-in-depth.** Nix's primary isolation comes from the 243 + build environment setup (clean `$PATH`, empty `$HOME`, controlled `$TMPDIR`). 244 + The macOS sandbox adds an extra layer but isn't the only protection. 245 + 246 + 3. **The Linux host can add its own sandboxing.** If stronger isolation is 247 + needed, the host can run Darling inside a systemd-nspawn container, a VM, or 248 + with additional seccomp profiles. This provides equivalent-or-better security 249 + to macOS's sandbox. 250 + 251 + 4. **No untrusted code.** In the Nix builder context, the code being executed is 252 + from derivations that the user has chosen to build. The sandbox prevents 253 + accidental side effects, not malicious code execution. 254 + 255 + --- 256 + 257 + ## Verification Checklist 258 + 259 + After completing Phase 2, the following should all work inside `darling shell`: 260 + 261 + - [ ] `/usr/bin/sandbox-exec` exists and is executable 262 + - [ ] `sandbox-exec -f /dev/null -D _GLOBAL_TMP_DIR=/tmp /bin/echo hello` prints "hello" 263 + - [ ] `sandbox-exec -p '(version 1)(allow default)' /bin/echo hello` prints "hello" 264 + - [ ] `sandbox-exec` with no command argument prints an error and exits non-zero 265 + - [ ] Calling `sandbox_init("no_network", 0, &err)` from C returns 0 with `err == NULL` 266 + - [ ] Nix's builder invocation (`posix_spawn` → `sandbox-exec` → `/bin/bash`) 267 + no longer returns `ENOEXEC` / "Bad file descriptor" 268 + - [ ] A trivial `nix-build` with `_NIX_TEST_NO_SANDBOX` **unset** proceeds past 269 + the sandbox-exec step (it may still fail later due to Phase 1 issues, but it 270 + must not fail at the sandbox stage) 271 + 272 + --- 273 + 274 + ## Implementation Order 275 + 276 + ``` 277 + 2.1 (sandbox-exec stub) — do first, biggest impact 278 + 279 + 2.2 (fix sandbox_init) — quick follow-up, same files 280 + 281 + 2.3 (verify sandbox_check) — no changes expected, just verify 282 + 283 + 2.4 (stretch: SBPL parse) — defer until after Phase 4 284 + ``` 285 + 286 + --- 287 + 288 + *[← Phase 1 — Syscall Fixes](./03-phase1-syscalls.md) | [Phase 3 — Nix Installation →](./05-phase3-nix-install.md)*
+329
plan/05-phase3-nix-install.md
··· 1 + # Phase 3 — Nix Installation Inside Darling 2 + 3 + **Priority**: P0 · **Effort**: M (2–3 weeks) · **Depends on**: Phase 1 (syscall fixes), Phase 2 (sandbox stub) 4 + 5 + With the syscall fixes from Phase 1 and the `sandbox-exec` stub from Phase 2, 6 + the Nix package manager should be installable inside a Darling prefix. This 7 + phase covers automating that installation, verifying core Nix commands, and 8 + providing convenient wrappers for host-side usage. 9 + 10 + --- 11 + 12 + ## Context 13 + 14 + The official Nix installer for macOS (`nix-*-x86_64-darwin`) has several 15 + assumptions that conflict with Darling's environment: 16 + 17 + 1. **Forces multi-user mode on Darwin.** The installer detects `uname -s` = 18 + `Darwin` and refuses single-user installation. Multi-user mode requires 19 + `dseditgroup`, `sysadminctl`, and a working `launchd` — none of which are 20 + fully functional in Darling yet. 21 + 22 + 2. **Requires `diskutil info /`** to check the root filesystem type (APFS vs 23 + HFS+). Darling's `diskutil` is a shell script that only supports `eject`. 24 + 25 + 3. **Requires `xmllint`** for parsing plists. Not shipped in Darling. 26 + 27 + 4. **Requires Directory Services** (`dseditgroup`, `dscl`) for creating the 28 + `nixbld` group and build users. 29 + 30 + 5. **Calls `lchflags`** during profile installation (fixed in Phase 1). 31 + 32 + 6. **Root user quirks.** Nix defaults `build-users-group = nixbld` when running 33 + as root. In single-user mode as root, this must be overridden to empty. 34 + 35 + All of these are solvable with a patched installer script and pre-configured 36 + `nix.conf`. 37 + 38 + --- 39 + 40 + ## Tasks 41 + 42 + ### 3.1 — Create Automated Nix-in-Darling Installer 43 + 44 + Create a script at `scripts/install-nix-in-darling.sh` (run from the Linux 45 + host) that automates the entire Nix installation inside a Darling prefix. 46 + 47 + **Steps the script should perform**: 48 + 49 + 1. **Verify prerequisites**: 50 + - Darling is installed and `darling shell echo ok` works. 51 + - The prefix is initialized (`~/.darling` or `$DPREFIX` exists). 52 + - Phase 1 and Phase 2 fixes are in place (check for `/usr/bin/sandbox-exec` 53 + inside the prefix). 54 + 55 + 2. **Pre-configure Nix**: 56 + ```bash 57 + darling shell mkdir -p /etc/nix 58 + darling shell tee /etc/nix/nix.conf <<'EOF' 59 + # Single-user mode: no build users group 60 + build-users-group = 61 + # Disable macOS sandbox (we use the sandbox-exec stub) 62 + sandbox = false 63 + # Use the Nix binary cache 64 + substituters = https://cache.nixos.org 65 + trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= 66 + EOF 67 + ``` 68 + 69 + 3. **Download the Nix installer**: 70 + - Fetch the latest `nix-*-x86_64-darwin` installer tarball from 71 + `https://releases.nixos.org/nix/`. 72 + - Verify its signature / hash. 73 + - Extract it into a temporary directory inside the prefix. 74 + 75 + 4. **Patch the installer**: 76 + - Remove or bypass the `uname`-based multi-user enforcement. 77 + - Remove the `diskutil info` check. 78 + - Remove the `xmllint` dependency (or provide a stub). 79 + - Suppress the "installing as root is not supported" warning. 80 + - Force `--no-daemon` mode. 81 + 82 + The patching should be done with `sed` or a patch file applied to the 83 + extracted `install` script. Keep the patch minimal and well-documented so it 84 + can be updated when Nix releases new installer versions. 85 + 86 + 5. **Run the patched installer**: 87 + ```bash 88 + darling shell /tmp/nix-installer/install --no-daemon 89 + ``` 90 + 91 + 6. **Post-install verification**: 92 + - Run each command from the verification checklist (see below). 93 + - Source the Nix profile: `. /Users/root/.nix-profile/etc/profile.d/nix.sh` 94 + - Print the installed Nix version. 95 + 96 + 7. **Clean up**: 97 + - Remove the temporary installer files. 98 + - Optionally run `nix-collect-garbage` to free space. 99 + 100 + **Error handling**: The script should `set -euo pipefail` and provide clear 101 + error messages at each step, indicating which phase/blocker is likely the cause 102 + if something fails. 103 + 104 + --- 105 + 106 + ### 3.2 — Pre-Built Darling Prefix with Nix 107 + 108 + Create a Nix derivation (`packages.x86_64-linux.darling-nix-prefix`) that 109 + produces a Darling prefix tarball with Nix pre-installed. This lets users skip 110 + the installation process entirely. 111 + 112 + **Approach**: 113 + 114 + 1. Build Darling in a Nix sandbox. 115 + 2. Initialize a fresh prefix. 116 + 3. Run the installer script from 3.1 inside the prefix (this requires a 117 + working Darling at build time — may need to be done in a NixOS VM test 118 + context rather than a pure derivation, since Darling needs namespace 119 + capabilities). 120 + 4. Snapshot the prefix as a tarball. 121 + 5. Users restore with: 122 + ```bash 123 + mkdir -p ~/.darling 124 + tar xf /nix/store/...-darling-nix-prefix.tar -C ~/.darling 125 + ``` 126 + 127 + **Alternative**: If building inside a Nix sandbox is too complex (due to 128 + namespace requirements), provide a script that generates the prefix on the 129 + user's machine and document it as a one-time setup step. 130 + 131 + --- 132 + 133 + ### 3.3 — Verify Core Nix Commands 134 + 135 + After installation, the following commands must work without errors inside 136 + `darling shell`. Each one exercises a different subsystem: 137 + 138 + | Command | What It Tests | 139 + |---|---| 140 + | `nix --version` | Binary loads, `dyld` resolves all libraries | 141 + | `nix-env --version` | Same, plus `libnixstore` loads correctly | 142 + | `nix-store --verify` | Store database access, file system operations | 143 + | `nix-instantiate --eval -E '1 + 1'` | Nix evaluator, no build needed | 144 + | `nix eval --expr '1 + 1'` | Flake-enabled CLI, evaluator | 145 + | `nix-store --dump-db` | SQLite database access in `/nix/var/nix/db/` | 146 + | `nix-env -qa hello` | Channel/registry querying, HTTP fetching | 147 + 148 + **Known potential issues at this stage**: 149 + 150 + - **SQLite**: Nix's store database uses SQLite. If Darling's `fcntl` locking 151 + (via `F_SETLK` / `F_GETLK`) is buggy, database operations will fail or hang. 152 + Add SQLite lock testing to the verification. 153 + 154 + - **curl / TLS**: `nix-env -qa` and binary substitution need working HTTPS. 155 + Darling ships its own curl and SSL certificates. If they're outdated or the 156 + TLS handshake uses unimplemented syscalls, fetching will fail. Verify with: 157 + ```bash 158 + darling shell curl -sI https://cache.nixos.org/nix-cache-info 159 + ``` 160 + 161 + - **`/nix` path**: By default, Nix installs to `/nix`. Inside Darling, this is 162 + within the prefix overlay at `~/.darling/nix`. This is fine for isolated 163 + usage. For shared-store mode (Phase 7), we'll need to symlink this to the 164 + host's `/nix` via `/Volumes/SystemRoot/nix`. 165 + 166 + --- 167 + 168 + ### 3.4 — Host-Side Wrapper: `darling-nix` 169 + 170 + Create a convenience wrapper script (installed as part of the Darling Nix 171 + package) that runs Nix commands inside Darling from the Linux host: 172 + 173 + ```bash 174 + #!/usr/bin/env bash 175 + # darling-nix — run Nix commands inside a Darling prefix 176 + set -euo pipefail 177 + 178 + # Source Nix profile and run the command 179 + exec darling shell bash -lc ' 180 + . /Users/root/.nix-profile/etc/profile.d/nix.sh 181 + exec "$@" 182 + ' -- "$@" 183 + ``` 184 + 185 + **Usage examples**: 186 + 187 + ```bash 188 + # Evaluate an expression 189 + darling-nix nix-instantiate --eval -E '1 + 1' 190 + 191 + # Build a trivial derivation 192 + darling-nix nix-build --expr 'derivation { name = "test"; builder = "/bin/bash"; args = ["-c" "echo ok > $out"]; system = "x86_64-darwin"; }' 193 + 194 + # Install a package 195 + darling-nix nix-env -iA nixpkgs.hello 196 + 197 + # Interactive Nix repl 198 + darling-nix nix repl 199 + ``` 200 + 201 + **Install location**: `$out/bin/darling-nix` in the Darling Nix package. 202 + 203 + **Enhancements for later**: 204 + 205 + - Support `--prefix <path>` to use a non-default Darling prefix. 206 + - Support `--store <path>` to configure the Nix store location. 207 + - Capture and forward exit codes correctly. 208 + - Handle signals (SIGINT, SIGTERM) and propagate them to the Darling process. 209 + 210 + --- 211 + 212 + ### 3.5 — Nix Channel / Registry Setup 213 + 214 + After Nix is installed, set up a usable channel or flake registry so users can 215 + immediately start building packages: 216 + 217 + ```bash 218 + # Add the nixpkgs channel (for nix-env / nix-shell) 219 + darling shell nix-channel --add https://nixos.org/channels/nixpkgs-unstable nixpkgs 220 + darling shell nix-channel --update 221 + 222 + # Or, for flakes: 223 + darling shell nix registry add nixpkgs github:NixOS/nixpkgs/nixpkgs-unstable 224 + ``` 225 + 226 + This should be part of the installer script (3.1) as an optional post-install 227 + step. 228 + 229 + **Potential issue**: `nix-channel --update` downloads and unpacks a tarball, 230 + which exercises `curl`, `xz`, `tar`, and filesystem operations. Any crash here 231 + points to remaining syscall gaps from Phase 1. 232 + 233 + --- 234 + 235 + ## Shared Store Considerations 236 + 237 + For Phase 3, Nix runs with its own store inside the Darling prefix 238 + (`~/.darling/nix/store`). This is the simplest setup and avoids any 239 + interaction with the host's Nix store. 240 + 241 + For later phases (especially Phase 7 — Remote Builder), we'll want to share the 242 + host's `/nix/store` with the Darling prefix. The mechanism: 243 + 244 + ```bash 245 + # Inside the Darling prefix, /Volumes/SystemRoot is the host's / 246 + # So /Volumes/SystemRoot/nix/store is the host's /nix/store 247 + 248 + # Option A: Symlink 249 + darling shell ln -sf /Volumes/SystemRoot/nix /nix 250 + 251 + # Option B: Bind mount (if overlayfs allows it) 252 + # Configured in darlingserver / prefix init 253 + ``` 254 + 255 + This is NOT part of Phase 3 — just documented here so the installation script 256 + doesn't make assumptions that would conflict with shared-store mode later. In 257 + particular: 258 + 259 + - Don't hardcode paths that assume `/nix` is local to the prefix. 260 + - Make the store location configurable in `nix.conf`. 261 + - Ensure the installer doesn't fail if `/nix` is a symlink. 262 + 263 + --- 264 + 265 + ## Debugging Tips 266 + 267 + If installation fails, here are the most useful debugging techniques: 268 + 269 + **Trace the installer script**: 270 + ```bash 271 + darling shell bash -x /tmp/nix-installer/install --no-daemon 2>&1 | tee install.log 272 + ``` 273 + 274 + **Trace Nix binary startup**: 275 + ```bash 276 + # On the host, trace darlingserver while running a Nix command: 277 + strace -f -p $(pidof darlingserver) -e trace=openat,stat,fstat,lstat,readlink 2>&1 | head -200 & 278 + darling shell /nix/store/.../bin/nix --version 279 + ``` 280 + 281 + **Trace inside Darling with xtrace**: 282 + ```bash 283 + DARLING_XTRACE=1 darling shell /nix/store/.../bin/nix-env --version 2>&1 | head -500 284 + ``` 285 + 286 + **Check for unimplemented syscalls**: 287 + ```bash 288 + darling shell /nix/store/.../bin/nix --version 2>&1 | grep -i "unimplemented\|STUB\|not.implemented" 289 + ``` 290 + 291 + **Inspect the store database**: 292 + ```bash 293 + darling shell sqlite3 /nix/var/nix/db/db.sqlite ".tables" 294 + darling shell sqlite3 /nix/var/nix/db/db.sqlite "SELECT count(*) FROM ValidPaths;" 295 + ``` 296 + 297 + --- 298 + 299 + ## Verification Checklist 300 + 301 + After completing Phase 3, ALL of the following must pass: 302 + 303 + - [ ] `scripts/install-nix-in-darling.sh` completes without errors 304 + - [ ] `darling shell nix --version` prints the Nix version 305 + - [ ] `darling shell nix-env --version` prints the Nix version 306 + - [ ] `darling shell nix-store --verify` reports no errors 307 + - [ ] `darling shell nix-instantiate --eval -E '1 + 1'` prints `2` 308 + - [ ] `darling shell nix eval --expr '1 + 1'` prints `2` 309 + - [ ] `darling shell curl -sI https://cache.nixos.org/nix-cache-info` returns HTTP 200 310 + - [ ] `darling-nix nix --version` works from the Linux host 311 + - [ ] `darling-nix nix-instantiate --eval -E 'builtins.currentSystem'` prints `"x86_64-darwin"` 312 + - [ ] No "Unimplemented syscall" messages during any of the above 313 + - [ ] No segfaults during any of the above 314 + 315 + --- 316 + 317 + ## Risk Assessment 318 + 319 + | Risk | Likelihood | Impact | Mitigation | 320 + |---|---|---|---| 321 + | SQLite locking doesn't work | Medium | High — store operations fail | Test `fcntl` locking early; if broken, use `PRAGMA locking_mode=EXCLUSIVE` | 322 + | curl/TLS fails | Medium | High — no binary substitution | Test HTTPS early; fall back to `--option substitute false` for offline mode | 323 + | Nix installer changes break our patches | Medium | Medium — need to update patches | Pin a specific Nix version; provide a patch file rather than inline sed | 324 + | `/nix` path conflicts with shared store | Low | Medium — need reconfiguration | Keep store location configurable from the start | 325 + | Nix evaluator hits unimplemented syscalls | Low | Medium — eval works but slowly | Phase 1 triage (1.7) should catch these | 326 + 327 + --- 328 + 329 + *[← Phase 2 — Sandbox](./04-phase2-sandbox.md) | [Phase 4 — Derivation Building →](./06-phase4-building.md)*
+452
plan/06-phase4-building.md
··· 1 + # Phase 4 — Derivation Building 2 + 3 + **Priority**: P1 · **Effort**: L (4–8 weeks) · **Depends on**: Phase 3 (Nix installation) 4 + 5 + This is the acid test: can Nix actually *build* derivations inside Darling? Phase 6 + 3 got Nix installed and evaluating; this phase gets it building real software. 7 + 8 + We progress from trivial derivations through to full stdenv builds and binary 9 + substitution from the official cache. 10 + 11 + --- 12 + 13 + ## Context 14 + 15 + A Nix derivation build on Darwin involves: 16 + 17 + 1. Nix creates a temporary build directory (`/tmp/nix-build-<name>.drv-N/`). 18 + 2. Nix generates a sandbox profile (`.sb` file) in that directory. 19 + 3. Nix calls `posix_spawn` to execute `/usr/bin/sandbox-exec -f <profile> <builder>`. 20 + 4. The builder (usually `/bin/bash`) runs inside the sandbox with a clean 21 + environment (`$PATH`, `$HOME`, `$TMPDIR` all controlled by Nix). 22 + 5. The builder script sources `$stdenv/setup` and runs the build phases 23 + (unpack, configure, build, install, fixup, etc.). 24 + 6. Build output is written to `$out` (a path in `/nix/store`). 25 + 7. Nix registers the output in the store database and makes it read-only. 26 + 27 + Each step exercises different parts of the Darling compatibility layer. This 28 + phase works through them incrementally. 29 + 30 + --- 31 + 32 + ## Tasks 33 + 34 + ### 4.1 — Build a Trivial Derivation 35 + 36 + The simplest possible derivation — no dependencies, no stdenv, just `/bin/bash` 37 + writing a file: 38 + 39 + ```nix 40 + derivation { 41 + name = "hello-darling"; 42 + builder = "/bin/bash"; 43 + args = [ "-c" "echo 'Hello from Darling!' > $out" ]; 44 + system = "x86_64-darwin"; 45 + } 46 + ``` 47 + 48 + **Build command**: 49 + 50 + ```bash 51 + darling-nix nix-build --expr 'derivation { name = "hello-darling"; builder = "/bin/bash"; args = [ "-c" "echo hello > $out" ]; system = "x86_64-darwin"; }' 52 + ``` 53 + 54 + **What this exercises**: 55 + 56 + - `posix_spawn` → `sandbox-exec` stub → `/bin/bash` (Phase 2 must be working) 57 + - File creation in `/nix/store` 58 + - `$out` environment variable propagation 59 + - Basic file I/O (`echo`, redirect) 60 + - Store path registration (SQLite write) 61 + - Setting store path read-only (`chmod`, possibly `lchflags`) 62 + 63 + **Expected failure modes**: 64 + 65 + | Failure | Likely Cause | Fix | 66 + |---|---|---| 67 + | "Bad file descriptor" / ENOEXEC | `sandbox-exec` stub not installed or not executable | Phase 2 — verify installation | 68 + | "clearing flags of path" | `lchflags` still failing | Phase 1.1 / 1.2 | 69 + | Sandbox profile write fails | `/tmp` not writable or path issue | Check prefix `/private/tmp` setup | 70 + | Builder hangs | `posix_spawn` with `POSIX_SPAWN_SETEXEC` broken | Phase 1 — B5 | 71 + | "build failure may have been caused by lack of free disk space" | Generic Nix error wrapping the real issue | Check build log in `/nix/var/log/nix/` | 72 + 73 + **Debugging**: 74 + 75 + ```bash 76 + # Verbose build with debug output 77 + darling-nix nix-build --expr '...' -vvvv --debug 2>&1 | tee build.log 78 + 79 + # Check if the builder can be invoked manually 80 + darling shell /usr/bin/sandbox-exec -f /dev/null -D _GLOBAL_TMP_DIR=/tmp /bin/bash -c 'echo ok' 81 + 82 + # Manually run the derivation's builder to isolate the failure 83 + darling shell nix-shell --pure --run 'echo $out' /nix/store/...-hello-darling.drv 84 + ``` 85 + 86 + --- 87 + 88 + ### 4.2 — Get `bash` Executing Reliably in Build Sandboxes 89 + 90 + Even after 4.1 works, there may be subtle issues with bash inside Nix's build 91 + environment. The build environment is intentionally spartan: 92 + 93 + - `$HOME=/homeless-shelter` (doesn't exist) 94 + - `$PATH=/path-not-set` (intentionally broken) 95 + - `$TMPDIR=/tmp/nix-build-<name>.drv-N/` 96 + - `$NIX_STORE=/nix/store` 97 + 98 + **Requirements for bash to function**: 99 + 100 + - `/dev/null` must exist and be readable/writable 101 + - `/dev/urandom` must exist (some builds need random data) 102 + - `/dev/zero` must exist 103 + - `$TMPDIR` must be writable 104 + - `posix_spawn` with `POSIX_SPAWN_SETEXEC` must work (acts like `exec`) 105 + - Signal handling must work (Nix sends `SIGTERM` to cancel builds) 106 + - `pipe2` / `dup2` must work (for shell redirections) 107 + - `fcntl` with `F_GETFD` / `F_SETFD` must work (for `O_CLOEXEC`) 108 + 109 + **Verification**: 110 + 111 + ```bash 112 + # Inside darling shell, simulate a Nix build environment: 113 + env -i HOME=/homeless-shelter PATH=/path-not-set \ 114 + TMPDIR=/tmp/test-build NIX_STORE=/nix/store \ 115 + /bin/bash -c 'echo "PATH=$PATH"; echo "HOME=$HOME"; echo ok > /tmp/test-build/out' 116 + ``` 117 + 118 + **Check device nodes**: 119 + 120 + ```bash 121 + darling shell ls -la /dev/null /dev/urandom /dev/zero 122 + # These should exist. If not, they need to be created during prefix init 123 + # or symlinked from /Volumes/SystemRoot/dev/ 124 + ``` 125 + 126 + --- 127 + 128 + ### 4.3 — Build with Nix's `bash` (from the Binary Cache) 129 + 130 + The trivial derivation in 4.1 uses Darling's built-in `/bin/bash`. Real 131 + derivations use Nix's own bash from the store (e.g., 132 + `/nix/store/...-bash-5.2-p26/bin/bash`). This is a pre-built `x86_64-darwin` 133 + Mach-O binary fetched from `cache.nixos.org`. 134 + 135 + **Test**: 136 + 137 + ```nix 138 + let 139 + bash = builtins.fetchurl { 140 + url = "https://cache.nixos.org/nar/..."; # or use a pinned store path 141 + }; 142 + in derivation { 143 + name = "test-nix-bash"; 144 + builder = "${bash}/bin/bash"; 145 + args = [ "-c" "echo 'Using Nix bash!' > $out" ]; 146 + system = "x86_64-darwin"; 147 + } 148 + ``` 149 + 150 + Or more practically: 151 + 152 + ```bash 153 + # Force-fetch bash from the binary cache 154 + darling-nix nix-store -r /nix/store/...-bash-5.2-p26 155 + 156 + # Then build a derivation that uses it 157 + darling-nix nix-build --expr ' 158 + let pkgs = import <nixpkgs> { system = "x86_64-darwin"; }; 159 + in derivation { 160 + name = "test-nix-bash"; 161 + builder = "${pkgs.bash}/bin/bash"; 162 + args = [ "-c" "echo ok > \$out" ]; 163 + system = "x86_64-darwin"; 164 + } 165 + ' 166 + ``` 167 + 168 + **What this additionally exercises**: 169 + 170 + - `dyld` loading the Nix-built bash and all its dependencies (`libSystem`, 171 + `libc++`, etc.) — these are Mach-O binaries that must be translated by Darling 172 + - NAR unpacking (when fetching from the cache) 173 + - Symlink handling in `/nix/store` 174 + - `LC_RPATH` / `@rpath` resolution in Mach-O binaries 175 + 176 + **Expected failure modes**: 177 + 178 + | Failure | Likely Cause | Fix | 179 + |---|---|---| 180 + | `dyld: Symbol not found` | Nix bash built for macOS 11+ but Darling reports 10.15 | Phase 1.8 (version bump) | 181 + | `dyld: Library not loaded` | Missing `libSystem` or `libc++` dylib in Darling prefix | Verify Darling's system libraries cover the needed symbols | 182 + | `Illegal instruction: 4` | Binary uses CPU instruction Darling doesn't translate | Check if SSE/AVX instructions are involved; may need darlingserver fix | 183 + | Segfault during load | `dyld` cache issue or broken mmap translation | See [Blocker B7](./01-blockers.md#b7-dyld-shared-cache) | 184 + 185 + --- 186 + 187 + ### 4.4 — Handle Binary Substitution from `cache.nixos.org` 188 + 189 + Binary substitution (downloading pre-built packages instead of building them) is 190 + critical for practical use — building everything from source inside Darling would 191 + be extremely slow. 192 + 193 + **What to test**: 194 + 195 + ```bash 196 + # Fetch a simple package from the cache 197 + darling-nix nix-store -r /nix/store/...-hello-2.12.1 198 + 199 + # Or build with substitution: 200 + darling-nix nix-build '<nixpkgs>' -A hello --system x86_64-darwin 201 + ``` 202 + 203 + **Substitution pipeline**: 204 + 205 + ``` 206 + nix-store --realise 207 + → curl HTTPS request to cache.nixos.org 208 + → download .narinfo (package metadata) 209 + → download .nar.xz (compressed archive) 210 + → xz decompress 211 + → NAR unpack to /nix/store/... 212 + → set permissions (chmod, chown) 213 + → register in SQLite database 214 + → clear flags (lchflags — Phase 1) 215 + ``` 216 + 217 + **Requirements**: 218 + 219 + - **HTTPS/TLS**: `curl` must successfully connect to `cache.nixos.org`. Test: 220 + ```bash 221 + darling shell curl -sI https://cache.nixos.org/nix-cache-info 222 + ``` 223 + 224 + - **xz decompression**: The `xz` binary from the store must work. If it uses 225 + unimplemented syscalls, we need the host-side `xz` or a fallback. 226 + 227 + - **NAR unpacking**: Nix's NAR format uses `mknod`, `symlink`, `chmod`, 228 + `chown`, `utimes`. All must work. 229 + 230 + - **Large file support**: Some NARs are hundreds of MB. Ensure `mmap`, `ftruncate`, 231 + and large `read`/`write` calls work correctly. 232 + 233 + - **Certificate verification**: Nix verifies the binary cache's signing key, not 234 + TLS certificates for trust. But `curl` still needs working TLS. Darling ships 235 + OpenSSL certificates via `src/external/openssl_certificates/` — verify they're 236 + up to date. 237 + 238 + --- 239 + 240 + ### 4.5 — Build a Simple C Program with Darwin stdenv 241 + 242 + This is the first "real" build — compiling C code using Nixpkgs' Darwin stdenv, 243 + which pulls in clang, ld64, Apple SDK headers, and the full build machinery. 244 + 245 + ```bash 246 + darling-nix nix-build '<nixpkgs>' -A hello --system x86_64-darwin 247 + ``` 248 + 249 + **What the Darwin stdenv does**: 250 + 251 + 1. Sources `$stdenv/setup` (a large bash script). 252 + 2. Unpacks the source tarball. 253 + 3. Runs `./configure` (or cmake, meson, etc.). 254 + 4. Compiles with `clang` targeting `x86_64-apple-darwin`. 255 + 5. Links with `ld64` (Apple's linker, from cctools-port). 256 + 6. Runs fixup phase: `install_name_tool`, `codesign`, `strip`, etc. 257 + 7. Produces a Mach-O executable or library in `$out`. 258 + 259 + **Key binaries that must work** (all from the Nix store, built for Darwin): 260 + 261 + | Binary | Role | Concern | 262 + |---|---|---| 263 + | `bash` | Builder shell | Covered in 4.2/4.3 | 264 + | `coreutils` (`mv`, `cp`, `touch`, `install`, `mkdir`) | Basic file operations | `mv` needs `renameatx_np` (Phase 1.3), `touch` needs `utimensat` (Phase 1.4) | 265 + | `clang` | C/C++/ObjC compiler | May use `posix_spawn` internally; large binary with many dylib deps | 266 + | `ld64` | Apple linker | Writes Mach-O output; may use `fcntl` advisory locks | 267 + | `ar` / `ranlib` | Archive tools | From cctools, should be straightforward | 268 + | `install_name_tool` | Fix dylib paths | Modifies Mach-O headers; needs working `mmap` + `ftruncate` | 269 + | `codesign_allocate` | Code signature space | May fail (no codesign in Darling); needs graceful fallback | 270 + | `strip` | Strip symbols | Modifies Mach-O binaries | 271 + | `xattr` | Extended attributes | `xattr -cr` is run during fixup; needs `removexattr` / `listxattr` | 272 + | `sed`, `grep`, `awk` | Text processing | Usually fine, but check for syscall issues | 273 + | `tar`, `gzip`, `xz` | Archive handling | `tar` may use `fchflags`; `xz` may use newer syscalls | 274 + 275 + **Coreutils crash workaround strategy**: 276 + 277 + If specific coreutils binaries from the Nix store crash due to unimplemented 278 + syscalls, there are two approaches: 279 + 280 + 1. **Preferred**: Fix the syscall in darlingserver (Phase 1). 281 + 2. **Temporary**: Create wrapper scripts in the prefix that intercept the 282 + crashing commands and redirect to Darling's built-in versions: 283 + ```bash 284 + # In the Darling prefix: 285 + mkdir -p /usr/local/nix-compat/bin 286 + cat > /usr/local/nix-compat/bin/mv << 'EOF' 287 + #!/bin/sh 288 + exec /bin/mv "$@" 289 + EOF 290 + chmod +x /usr/local/nix-compat/bin/mv 291 + # Add /usr/local/nix-compat/bin early in $PATH for builds 292 + ``` 293 + This is a "Nix crime" if done inside the store, but putting it in `$PATH` 294 + via `nix.conf` or a build hook is acceptable as a temporary measure. 295 + 296 + --- 297 + 298 + ### 4.6 — Fix Remaining Coreutils / Build-Tool Crashes 299 + 300 + Based on the blog post and analysis, the following specific binaries are known 301 + to crash inside Darling when fetched from the Nix binary cache. Each needs 302 + either a syscall fix or a workaround. 303 + 304 + | Binary | Crash Symptom | Root Cause | Fix | 305 + |---|---|---|---| 306 + | `mv` | `Unimplemented syscall (488)` | `renameatx_np` missing | Phase 1.3 | 307 + | `touch` | `Segmentation fault: 11` | `utimensat` / `setattrlist` | Phase 1.4 / 1.1 | 308 + | `install` | `clearing flags` or crash | `fchflags` / `chflags` | Phase 1.1 | 309 + | `cp` | Possible crash | `clonefile` / `fclonefileat` | Phase 1.5 | 310 + | `tar` | `fchflags` warning or crash | `fchflags` on extracted files | Phase 1.1 | 311 + | `xattr` | `removexattr` failure | xattr syscalls incomplete | New task — implement `removexattr`, `listxattr`, `getxattr` | 312 + | `codesign_allocate` | Likely failure | Code signing not supported | Stub or skip in stdenv fixup phase | 313 + | `fish` | `Illegal instruction: 4` | Uses newer CPU/syscall features | Lower priority — not in the critical build path | 314 + 315 + **Approach**: Work through these in order of build-pipeline criticality. A build 316 + can't succeed if `mv` crashes, so that's fixed first (Phase 1.3). The codesign 317 + tools are less critical — if they fail, we can patch the stdenv fixup phase to 318 + skip codesigning inside Darling. 319 + 320 + **Extended attribute (xattr) handling**: 321 + 322 + The Darwin stdenv fixup phase runs `xattr -cr $out` to clear quarantine 323 + attributes. This requires: 324 + 325 + - `listxattr` — list all xattrs on a file 326 + - `removexattr` — remove a specific xattr 327 + - `getxattr` / `setxattr` — read/write xattr values 328 + 329 + On Linux, these have direct equivalents (`listxattr(2)`, `removexattr(2)`, 330 + etc.). Darlingserver needs to translate the macOS xattr syscalls to the Linux 331 + ones, mapping the `com.apple.*` namespace appropriately. 332 + 333 + If full xattr support is too complex, a minimal approach: 334 + 335 + - `listxattr` → return empty list (no xattrs) 336 + - `removexattr` → return success (nothing to remove) 337 + - `getxattr` → return `ENODATA` (no such xattr) 338 + 339 + This is safe because Darling files won't have real Apple quarantine attributes. 340 + 341 + --- 342 + 343 + ### 4.7 — Verify Build Output Correctness 344 + 345 + After a successful `nix-build`, verify the output is correct: 346 + 347 + ```bash 348 + # Build hello 349 + darling-nix nix-build '<nixpkgs>' -A hello --system x86_64-darwin 350 + 351 + # Check the output exists and is a Mach-O binary 352 + darling shell file /nix/store/...-hello-2.12.1/bin/hello 353 + 354 + # Run it 355 + darling shell /nix/store/...-hello-2.12.1/bin/hello 356 + # Expected: "Hello, world!" 357 + 358 + # Verify the store path is valid 359 + darling-nix nix-store --verify-path /nix/store/...-hello-2.12.1 360 + 361 + # Check closure (all dependencies resolved) 362 + darling-nix nix-store -qR /nix/store/...-hello-2.12.1 363 + ``` 364 + 365 + **Important**: The output hash of a derivation built inside Darling will differ 366 + from the same derivation built on real macOS if the build is not perfectly 367 + reproducible (input-addressed derivations use the same hash regardless of 368 + content, but if there are build failures or different outputs, something is 369 + wrong). 370 + 371 + --- 372 + 373 + ### 4.8 — Handle `codesign` in the Fixup Phase 374 + 375 + The Darwin stdenv's fixup phase attempts to ad-hoc codesign all Mach-O binaries. 376 + This calls `codesign_allocate` and/or `codesign` (or `sigtool` in recent 377 + Nixpkgs). Darling is unlikely to support code signing. 378 + 379 + **Options**: 380 + 381 + 1. **Stub `codesign`**: Provide a `/usr/bin/codesign` that does nothing and 382 + returns 0. Mach-O binaries will work fine inside Darling without signatures. 383 + 384 + 2. **Patch stdenv**: Override the Darwin stdenv to skip the signing fixup phase 385 + when running inside Darling. Detect this via an environment variable 386 + (e.g., `NIX_DARLING=1`). 387 + 388 + 3. **Use `sigtool`**: Recent Nixpkgs uses a pure-Nix `sigtool` for ad-hoc 389 + signing that may actually work since it's just modifying Mach-O bytes. 390 + Test before assuming it fails. 391 + 392 + **Recommendation**: Test option 3 first. If it fails, use option 1 (quickest). 393 + Option 2 is cleanest but requires Nixpkgs patching. 394 + 395 + --- 396 + 397 + ## Verification Checklist 398 + 399 + After completing Phase 4, ALL of the following must pass: 400 + 401 + - [ ] Trivial derivation (4.1) builds and produces correct output 402 + - [ ] Derivation using Nix's bash from the store (4.3) builds successfully 403 + - [ ] `nix-store -r` fetches packages from `cache.nixos.org` without errors 404 + - [ ] NAR unpacking works for at least 10 different packages 405 + - [ ] `nix-build '<nixpkgs>' -A hello --system x86_64-darwin` succeeds 406 + - [ ] The built `hello` binary runs and prints "Hello, world!" 407 + - [ ] `nix-store --verify-path` confirms the output is valid 408 + - [ ] No "Unimplemented syscall" messages during the build 409 + - [ ] No segfaults during the build 410 + - [ ] `nix-collect-garbage` runs without errors (exercises store deletion + `lchflags`) 411 + 412 + --- 413 + 414 + ## Risk Assessment 415 + 416 + | Risk | Likelihood | Impact | Mitigation | 417 + |---|---|---|---| 418 + | clang crashes inside Darling | Medium | Critical — can't compile anything | Test clang standalone first; may need specific dyld/ABI fixes | 419 + | ld64 produces bad Mach-O output | Low | Critical — binaries won't run | Compare output with real macOS build; use `otool -L` to verify | 420 + | Stdenv setup script uses unsupported shell features | Low | High — all builds fail | Test bash compatibility thoroughly in 4.2 | 421 + | Binary cache signatures fail verification | Low | High — no substitution | Check Nix's ed25519 verification code path; may need `libsodium` to work | 422 + | Build takes hours due to Darling overhead | High | Medium — usable but slow | Focus on binary substitution; only build what can't be fetched | 423 + | Race conditions from incomplete `fcntl` locking | Medium | Medium — intermittent failures | Test concurrent builds only in Phase 5; keep Phase 4 single-threaded | 424 + 425 + --- 426 + 427 + ## Performance Expectations 428 + 429 + Darling adds overhead to every syscall (Darwin → Linux translation). Expect: 430 + 431 + - **Evaluation**: ~2–5× slower than native Linux Nix evaluation. The Nix 432 + evaluator is CPU-bound, so the overhead is mostly from dyld and library 433 + translation, not syscall volume. 434 + 435 + - **Binary substitution**: ~1.5–2× slower. Network I/O dominates; the overhead 436 + is in NAR unpacking (filesystem syscalls). 437 + 438 + - **Compilation**: ~3–10× slower. Compilation is both CPU-intensive and makes 439 + many syscalls (file reads, process spawning). The `clang` → `ld64` pipeline 440 + inside Darling will be noticeably slower than on native macOS. 441 + 442 + - **Disk usage**: Each Darling prefix uses overlayfs, so the base system files 443 + are shared. The Nix store will be the main disk consumer. Plan for ~10–20 GB 444 + for a basic set of packages. 445 + 446 + These are rough estimates. Actual performance will depend heavily on the host 447 + hardware and which syscalls are hot. Profiling after Phase 4 is complete will 448 + identify optimisation opportunities. 449 + 450 + --- 451 + 452 + *[← Phase 3 — Nix Installation](./05-phase3-nix-install.md) | [Phase 5 — Nix Daemon →](./07-phase5-daemon.md)*
+377
plan/07-phase5-daemon.md
··· 1 + # Phase 5 — Nix Daemon & Multi-User Mode 2 + 3 + **Priority**: P2 · **Effort**: M (2–4 weeks) · **Depends on**: Phase 4 (derivation building) 4 + 5 + Single-user mode (Phases 0–4) is sufficient for development and testing, but a 6 + production-grade setup benefits from the Nix daemon for concurrent builds, 7 + proper garbage collection, and user isolation. This phase adds multi-user Nix 8 + support inside Darling. 9 + 10 + --- 11 + 12 + ## Context 13 + 14 + On a real macOS system, the Nix daemon (`nix-daemon`) runs as a LaunchDaemon 15 + managed by `launchd`. It: 16 + 17 + 1. Listens on a Unix domain socket (`/nix/var/nix/daemon-socket/socket`). 18 + 2. Accepts build requests from unprivileged users. 19 + 3. Spawns builds as dedicated `_nixbldN` users (members of the `nixbld` group). 20 + 4. Manages the Nix store exclusively — only the daemon writes to `/nix/store`. 21 + 5. Handles garbage collection, signing, and binary cache downloads. 22 + 23 + The multi-user Nix installer on macOS creates: 24 + 25 + - A `nixbld` group (GID 30000 by convention). 26 + - 32 build users `_nixbld1` through `_nixbld32` (UIDs 300–331). 27 + - A LaunchDaemon plist at `/Library/LaunchDaemons/org.nixos.nix-daemon.plist`. 28 + - Nix profile scripts in `/etc/profile.d/` and `/etc/bashrc.d/`. 29 + 30 + All of this relies on Directory Services (`dscl`, `dseditgroup`, `sysadminctl`) 31 + for user/group management and `launchd`/`launchctl` for service management. 32 + Darling has partial `launchd` support but no Directory Services implementation. 33 + 34 + --- 35 + 36 + ## Tasks 37 + 38 + ### 5.1 — Implement Directory Services Stubs 39 + 40 + The Nix installer uses these commands to create build users and groups: 41 + 42 + ```bash 43 + # Create the nixbld group 44 + dseditgroup -o create -q -i 30000 nixbld 45 + 46 + # Create build users 47 + sysadminctl -addUser _nixbld1 -UID 300 -GID 30000 -home /var/empty -shell /usr/bin/false 48 + # ... repeated for _nixbld2 through _nixbld32 49 + 50 + # Add users to the group 51 + dseditgroup -o edit -a _nixbld1 -t user nixbld 52 + ``` 53 + 54 + Darling does not implement these commands. We need thin wrappers that translate 55 + to Linux user/group management operating on the prefix's `/etc/passwd` and 56 + `/etc/group` files. 57 + 58 + #### `dseditgroup` stub 59 + 60 + Create `src/tools/dseditgroup` (or a shell script installed to 61 + `libexec/darling/usr/sbin/dseditgroup`) that handles: 62 + 63 + | Invocation | Translation | 64 + |---|---| 65 + | `dseditgroup -o create -q -i <GID> <name>` | `echo "<name>:x:<GID>:" >> /etc/group` (if not exists) | 66 + | `dseditgroup -o edit -a <user> -t user <group>` | Append `<user>` to the group's member list in `/etc/group` | 67 + | `dseditgroup -o delete <name>` | Remove the group from `/etc/group` | 68 + | `dseditgroup -o checkmember -m <user> <group>` | Check if user is in the group; exit 0 if yes, non-zero if no | 69 + 70 + Does not need to support the full `dseditgroup` interface — only what the Nix 71 + installer uses. 72 + 73 + #### `sysadminctl` stub 74 + 75 + Create a stub that handles: 76 + 77 + | Invocation | Translation | 78 + |---|---| 79 + | `sysadminctl -addUser <name> -UID <uid> -GID <gid> -home <dir> -shell <shell>` | `echo "<name>:x:<uid>:<gid>::<dir>:<shell>" >> /etc/passwd` | 80 + | `sysadminctl -deleteUser <name>` | Remove the user from `/etc/passwd` | 81 + 82 + #### `dscl` stub 83 + 84 + The Nix installer may also use `dscl` in some code paths: 85 + 86 + | Invocation | Translation | 87 + |---|---| 88 + | `dscl . -read /Groups/<name> PrimaryGroupID` | Parse `/etc/group` and print the GID | 89 + | `dscl . -read /Users/<name> UniqueID` | Parse `/etc/passwd` and print the UID | 90 + | `dscl . -list /Users` | List all usernames from `/etc/passwd` | 91 + | `dscl . -create /Users/<name> ...` | Append to `/etc/passwd` | 92 + 93 + **Implementation notes**: 94 + 95 + - These stubs modify files within the Darling prefix (`~/.darling/etc/passwd`, 96 + `~/.darling/etc/group`), not the host's files. This is safe. 97 + - Do NOT use `useradd`/`groupadd` (those operate on the host). Directly 98 + manipulate the prefix's files. 99 + - Add basic input validation (duplicate detection, numeric ranges). 100 + - Make them idempotent — running the installer twice should not create duplicate 101 + entries. 102 + 103 + **Testing**: 104 + 105 + ```bash 106 + # Inside darling shell: 107 + dseditgroup -o create -q -i 30000 nixbld 108 + grep nixbld /etc/group 109 + # Expected: nixbld:x:30000: 110 + 111 + sysadminctl -addUser _nixbld1 -UID 300 -GID 30000 -home /var/empty -shell /usr/bin/false 112 + grep _nixbld1 /etc/passwd 113 + # Expected: _nixbld1:x:300:30000::/var/empty:/usr/bin/false 114 + ``` 115 + 116 + --- 117 + 118 + ### 5.2 — Get `nix-daemon` Running 119 + 120 + Once build users exist, launch the Nix daemon inside Darling. 121 + 122 + **Step 1 — Manual launch (for testing)**: 123 + 124 + ```bash 125 + darling shell nix-daemon & 126 + ``` 127 + 128 + The daemon should: 129 + 130 + - Create the socket at `/nix/var/nix/daemon-socket/socket`. 131 + - Listen for connections. 132 + - Fork build processes as `_nixbldN` users (requires working `setuid`/`setgid` 133 + inside the Darling prefix). 134 + 135 + **Step 2 — Verify client connectivity**: 136 + 137 + ```bash 138 + # In another darling shell, as a non-root user: 139 + darling shell nix-store --version 140 + # This should connect to the daemon over the Unix socket 141 + ``` 142 + 143 + **Requirements for the daemon to function**: 144 + 145 + | Requirement | Status in Darling | Notes | 146 + |---|---|---| 147 + | Unix domain sockets | Likely works | Darling maps to Linux AF_UNIX sockets | 148 + | `setuid` / `setgid` | Needs verification | Daemon drops privileges to build users; must work within the namespace | 149 + | `fork` / `posix_spawn` | Partially works | Phase 1/B5 fixes needed for reliability | 150 + | `fcntl` advisory locking | Needs verification | Store database locking; critical for concurrent access | 151 + | `kill` / signal delivery | Likely works | Daemon sends SIGTERM to cancel builds | 152 + | `/var/empty` exists | May need creation | Home directory for build users | 153 + 154 + **Potential issues**: 155 + 156 + - **`setuid` within namespaces**: Darling uses user namespaces. `setuid` inside a 157 + user namespace works differently — the process can only switch to UIDs mapped 158 + in the namespace. The Darling prefix must have the `_nixbldN` UIDs mapped. 159 + This may require changes to darlingserver's namespace setup. 160 + 161 + - **Socket permissions**: The daemon socket must be readable/writable by all 162 + users who should be able to trigger builds. Check that `chmod 0660` on the 163 + socket works and that group membership is respected. 164 + 165 + - **Process isolation**: The daemon expects to be able to create per-build 166 + temporary directories under `/tmp` or `$TMPDIR`, owned by the build user. 167 + Verify that `chown` works for changing file ownership to build users. 168 + 169 + **Debugging**: 170 + 171 + ```bash 172 + # Watch daemon logs: 173 + darling shell nix-daemon --debug 2>&1 | tee daemon.log 174 + 175 + # Test socket connectivity: 176 + darling shell ls -la /nix/var/nix/daemon-socket/socket 177 + 178 + # Trace daemon syscalls from the host: 179 + strace -f -p $(pgrep -f nix-daemon) -e trace=socket,bind,listen,accept,clone,setuid,setgid 2>&1 | head -200 180 + ``` 181 + 182 + --- 183 + 184 + ### 5.3 — LaunchDaemon Integration 185 + 186 + Make the Nix daemon manageable via `launchctl`, as it would be on real macOS. 187 + 188 + **Step 1 — Install the plist**: 189 + 190 + The Nix installer creates `/Library/LaunchDaemons/org.nixos.nix-daemon.plist`: 191 + 192 + ```xml 193 + <?xml version="1.0" encoding="UTF-8"?> 194 + <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" 195 + "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> 196 + <plist version="1.0"> 197 + <dict> 198 + <key>Label</key> 199 + <string>org.nixos.nix-daemon</string> 200 + <key>ProgramArguments</key> 201 + <array> 202 + <string>/nix/var/nix/profiles/default/bin/nix-daemon</string> 203 + </array> 204 + <key>KeepAlive</key> 205 + <true/> 206 + <key>RunAtLoad</key> 207 + <true/> 208 + <key>StandardErrorPath</key> 209 + <string>/var/log/nix-daemon.log</string> 210 + </dict> 211 + </plist> 212 + ``` 213 + 214 + **Step 2 — Load with launchctl**: 215 + 216 + ```bash 217 + darling shell launchctl load /Library/LaunchDaemons/org.nixos.nix-daemon.plist 218 + ``` 219 + 220 + **Step 3 — Verify**: 221 + 222 + ```bash 223 + darling shell launchctl list | grep nix 224 + # Expected: org.nixos.nix-daemon with a PID 225 + 226 + darling shell launchctl print system/org.nixos.nix-daemon 227 + # Expected: status information 228 + ``` 229 + 230 + **Known risks**: Darling's `launchd` implementation (`src/launchd/`) is 231 + functional for basic service management but may not support all plist keys. 232 + `KeepAlive` (automatic restart) is the most likely to have issues. If launchd 233 + integration is unreliable, fall back to manual daemon startup or a simple 234 + wrapper script. 235 + 236 + **Fallback — systemd integration on the host**: 237 + 238 + If launchd proves too unreliable, an alternative is to manage the daemon from 239 + the Linux host using systemd: 240 + 241 + ```ini 242 + # /etc/systemd/system/darling-nix-daemon.service 243 + [Unit] 244 + Description=Nix Daemon inside Darling 245 + After=network.target 246 + 247 + [Service] 248 + ExecStart=/usr/bin/darling shell /nix/var/nix/profiles/default/bin/nix-daemon 249 + Restart=on-failure 250 + Type=simple 251 + 252 + [Install] 253 + WantedBy=multi-user.target 254 + ``` 255 + 256 + This bypasses launchd entirely while still providing reliable daemon management. 257 + 258 + --- 259 + 260 + ### 5.4 — Test Concurrent Builds 261 + 262 + Multi-user mode enables parallel builds. Test that multiple derivations can build 263 + simultaneously without interference. 264 + 265 + **Test procedure**: 266 + 267 + ```bash 268 + # Start the daemon 269 + darling shell nix-daemon & 270 + 271 + # In parallel, build several independent packages: 272 + darling-nix nix-build '<nixpkgs>' -A hello --system x86_64-darwin & 273 + darling-nix nix-build '<nixpkgs>' -A which --system x86_64-darwin & 274 + darling-nix nix-build '<nixpkgs>' -A yes --system x86_64-darwin & 275 + wait 276 + ``` 277 + 278 + **What to watch for**: 279 + 280 + - **Store database locking**: SQLite must handle concurrent reads/writes via 281 + `fcntl` locking. If locking is broken, you'll see `database is locked` errors 282 + or silent corruption. 283 + 284 + - **Build user contention**: Each concurrent build should use a different 285 + `_nixbldN` user. Verify with `ps aux | grep nix-build` inside darling shell. 286 + 287 + - **`/tmp` isolation**: Each build gets its own `$TMPDIR`. Verify no cross- 288 + contamination between concurrent builds. 289 + 290 + - **File descriptor exhaustion**: Darling's fd table is backed by Linux fds. Many 291 + concurrent builds can exhaust the per-process limit. Check `ulimit -n` inside 292 + darling shell and increase if needed. 293 + 294 + - **Deadlocks**: If `posix_spawn` or `fork` has race conditions in Darling's 295 + implementation, concurrent builds may deadlock. Monitor with `strace -f` and 296 + look for stuck processes. 297 + 298 + **Expected outcome**: All three builds complete (possibly via binary 299 + substitution) without errors. If building from source, expect it to be slow but 300 + correct. 301 + 302 + --- 303 + 304 + ### 5.5 — Nix Profile Scripts 305 + 306 + The multi-user installer sets up profile scripts so Nix is available to all 307 + users. Verify these work: 308 + 309 + ```bash 310 + # /etc/profile.d/nix.sh should be sourced on login 311 + darling shell bash -l -c 'which nix' 312 + # Expected: /nix/var/nix/profiles/default/bin/nix 313 + 314 + # Verify $NIX_PATH is set 315 + darling shell bash -l -c 'echo $NIX_PATH' 316 + 317 + # Verify the daemon socket is used (not direct store access) 318 + darling shell bash -l -c 'nix-store --version' 319 + # Should connect via /nix/var/nix/daemon-socket/socket 320 + ``` 321 + 322 + --- 323 + 324 + ## Upgrade Path: Single-User → Multi-User 325 + 326 + Users who completed Phase 3 (single-user installation) should be able to 327 + upgrade to multi-user mode. Document a migration procedure: 328 + 329 + 1. Stop any running Nix processes. 330 + 2. Run the Directory Services stubs to create build users (5.1). 331 + 3. Update `/etc/nix/nix.conf`: 332 + ```diff 333 + - build-users-group = 334 + + build-users-group = nixbld 335 + - sandbox = false 336 + + sandbox = true 337 + ``` 338 + 4. Start the daemon (5.2 or 5.3). 339 + 5. Verify with `nix-store --version` (should connect to daemon). 340 + 341 + The Nix store itself doesn't need migration — it's the same `/nix/store` 342 + regardless of single-user or multi-user mode. Only the access method changes 343 + (direct vs. via daemon). 344 + 345 + --- 346 + 347 + ## Verification Checklist 348 + 349 + After completing Phase 5, ALL of the following must pass: 350 + 351 + - [ ] `dseditgroup -o create -q -i 30000 nixbld` succeeds 352 + - [ ] `sysadminctl -addUser _nixbld1 -UID 300 -GID 30000 -home /var/empty -shell /usr/bin/false` succeeds 353 + - [ ] `/etc/group` and `/etc/passwd` inside the prefix contain the expected entries 354 + - [ ] `nix-daemon` starts without errors 355 + - [ ] `/nix/var/nix/daemon-socket/socket` exists after daemon start 356 + - [ ] `nix-store --version` (as non-root) connects to the daemon 357 + - [ ] A derivation build via the daemon completes successfully 358 + - [ ] The build runs as a `_nixbldN` user (not root) 359 + - [ ] `launchctl load` of the nix-daemon plist starts the service (or the systemd fallback works) 360 + - [ ] Two concurrent `nix-build` invocations complete without database errors 361 + - [ ] `nix-collect-garbage -d` works via the daemon 362 + 363 + --- 364 + 365 + ## Risk Assessment 366 + 367 + | Risk | Likelihood | Impact | Mitigation | 368 + |---|---|---|---| 369 + | `setuid` doesn't work in Darling's namespace | High | Critical — daemon can't use build users | Test early; may need darlingserver namespace mapping changes | 370 + | `fcntl` locking broken → database corruption | Medium | Critical — store becomes unusable | Test with `PRAGMA integrity_check` after concurrent builds | 371 + | launchd can't manage the daemon reliably | Medium | Medium — use systemd fallback | Have the systemd unit file ready as Plan B | 372 + | Build users can't write to `$TMPDIR` | Medium | High — all daemon builds fail | Verify `chown` and directory permissions for build user UIDs | 373 + | Socket permissions prevent non-root access | Low | Medium — only root can build | Check `chmod`/`chgrp` on the socket; may need a `nix-users` group | 374 + 375 + --- 376 + 377 + *[← Phase 4 — Derivation Building](./06-phase4-building.md) | [Phase 6 — CI & Testing →](./08-phase6-ci.md)*
+640
plan/08-phase6-ci.md
··· 1 + # Phase 6 — CI & Automated Testing 2 + 3 + **Priority**: P1 · **Effort**: M (2–3 weeks) · **Depends on**: Phase 3 (Nix installation) 4 + 5 + Automated testing is essential to prevent regressions as we add syscall 6 + implementations, sandbox support, and other compatibility fixes. This phase 7 + establishes a comprehensive CI pipeline that verifies Darling builds correctly 8 + and that Nix functions inside it. 9 + 10 + CI work can begin as soon as Phase 3 is working (Nix installs and evaluates 11 + inside Darling). Tests for later phases (daemon, remote builder) are added 12 + incrementally as those phases land. 13 + 14 + --- 15 + 16 + ## Context 17 + 18 + The current CI (`.github/workflows/actions.yaml`) only builds Debian packages. 19 + It does not: 20 + 21 + - Build Darling with Nix. 22 + - Run any functional tests. 23 + - Verify Nix compatibility. 24 + - Test inside a NixOS VM (which is needed for namespace/overlay support). 25 + 26 + We need to replace or supplement this with a Nix-native CI pipeline that runs 27 + real integration tests. 28 + 29 + --- 30 + 31 + ## Tasks 32 + 33 + ### 6.1 — NixOS VM Test: Nix-in-Darling 34 + 35 + Create a NixOS VM test at `tests/nix-in-darling.nix` that exercises the full 36 + Nix-inside-Darling pipeline end-to-end. 37 + 38 + **Test structure** (using `nixos/lib/testing-python.nix`): 39 + 40 + ```nix 41 + { pkgs, ... }: 42 + { 43 + name = "nix-in-darling"; 44 + 45 + nodes.machine = { config, pkgs, ... }: { 46 + # Import our NixOS module 47 + imports = [ ../nixosModules/darling ]; 48 + 49 + # Enable Darling 50 + programs.darling.enable = true; 51 + 52 + # Give the VM enough resources 53 + virtualisation.memorySize = 4096; 54 + virtualisation.diskSize = 20480; # 20 GB for Nix store 55 + virtualisation.cores = 4; 56 + }; 57 + 58 + testScript = '' 59 + machine.wait_for_unit("default.target") 60 + 61 + # Phase 0: Darling boots 62 + machine.succeed("darling shell echo 'Hello from Darling'") 63 + 64 + # Phase 2: sandbox-exec stub exists 65 + machine.succeed("darling shell test -x /usr/bin/sandbox-exec") 66 + machine.succeed("darling shell /usr/bin/sandbox-exec -f /dev/null /bin/echo ok") 67 + 68 + # Phase 3: Install Nix 69 + machine.succeed("scripts/install-nix-in-darling.sh") 70 + 71 + # Phase 3: Nix commands work 72 + machine.succeed("darling-nix nix --version") 73 + machine.succeed("darling-nix nix-instantiate --eval -E '1 + 1' | grep 2") 74 + machine.succeed("darling-nix nix eval --expr 'builtins.currentSystem' | grep x86_64-darwin") 75 + 76 + # Phase 4: Trivial build 77 + machine.succeed( 78 + "darling-nix nix-build --expr '" 79 + + "'derivation { name = \"test\"; builder = \"/bin/bash\"; " 80 + + "args = [\"-c\" \"echo ok > \\$out\"]; " 81 + + "system = \"x86_64-darwin\"; }'" 82 + ) 83 + 84 + # Phase 4: Verify output 85 + result = machine.succeed( 86 + "darling shell cat $(darling-nix nix-build --no-link --expr '" 87 + + "'derivation { name = \"test\"; builder = \"/bin/bash\"; " 88 + + "args = [\"-c\" \"echo ok > \\$out\"]; " 89 + + "system = \"x86_64-darwin\"; }')" 90 + ) 91 + assert "ok" in result, f"Expected 'ok' in output, got: {result}" 92 + 93 + machine.log("All Nix-in-Darling tests passed!") 94 + ''; 95 + } 96 + ``` 97 + 98 + **Key considerations**: 99 + 100 + - The test runs in a NixOS VM, which provides the kernel namespace support 101 + Darling needs. This avoids requiring special privileges on the CI runner. 102 + - The VM needs ample disk space (Darling prefix + Nix store + build artifacts). 103 + - Timeout must be generous — Darling operations are slow, and the first Nix 104 + installation involves downloading and unpacking the installer. 105 + - The test should be structured so that early failures (Darling doesn't boot) 106 + produce clear error messages rather than cryptic timeouts. 107 + 108 + --- 109 + 110 + ### 6.2 — Wire Tests into `flake.nix` 111 + 112 + Add the NixOS VM test to the flake's `checks` output: 113 + 114 + ```nix 115 + checks.x86_64-linux = { 116 + # Build Darling itself 117 + darling-build = self.packages.x86_64-linux.darling; 118 + 119 + # NixOS integration test 120 + nix-in-darling = import ./tests/nix-in-darling.nix { 121 + inherit pkgs; 122 + }; 123 + }; 124 + ``` 125 + 126 + This allows running: 127 + 128 + ```bash 129 + # Run all checks 130 + nix flake check 131 + 132 + # Run just the integration test 133 + nix build .#checks.x86_64-linux.nix-in-darling 134 + ``` 135 + 136 + --- 137 + 138 + ### 6.3 — GitHub Actions Workflow 139 + 140 + Replace or supplement the existing `.github/workflows/actions.yaml` with a 141 + Nix-native workflow. 142 + 143 + **Workflow file**: `.github/workflows/nix-ci.yaml` 144 + 145 + ```yaml 146 + name: Nix CI 147 + 148 + on: 149 + push: 150 + branches: [main] 151 + pull_request: 152 + 153 + jobs: 154 + build: 155 + runs-on: ubuntu-latest 156 + steps: 157 + - uses: actions/checkout@v4 158 + with: 159 + submodules: recursive 160 + 161 + - uses: cachix/install-nix-action@v27 162 + with: 163 + extra_nix_config: | 164 + experimental-features = nix-command flakes 165 + 166 + - uses: cachix/cachix-action@v15 167 + with: 168 + name: darling-nix # our Cachix cache 169 + authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}' 170 + 171 + - name: Build Darling 172 + run: nix build .#darling -L 173 + 174 + - name: Build Darling SDK 175 + run: nix build .#darling-sdk -L 176 + 177 + test-syscalls: 178 + runs-on: ubuntu-latest 179 + needs: build 180 + steps: 181 + - uses: actions/checkout@v4 182 + with: 183 + submodules: recursive 184 + 185 + - uses: cachix/install-nix-action@v27 186 + with: 187 + extra_nix_config: | 188 + experimental-features = nix-command flakes 189 + 190 + - uses: cachix/cachix-action@v15 191 + with: 192 + name: darling-nix 193 + 194 + - name: Run syscall regression tests 195 + run: nix build .#checks.x86_64-linux.syscall-regression -L 196 + 197 + test-nix-integration: 198 + runs-on: ubuntu-latest 199 + needs: build 200 + steps: 201 + - uses: actions/checkout@v4 202 + with: 203 + submodules: recursive 204 + 205 + - uses: cachix/install-nix-action@v27 206 + with: 207 + extra_nix_config: | 208 + experimental-features = nix-command flakes 209 + system-features = kvm 210 + 211 + - uses: cachix/cachix-action@v15 212 + with: 213 + name: darling-nix 214 + 215 + - name: Run Nix-in-Darling integration test 216 + run: nix build .#checks.x86_64-linux.nix-in-darling -L 217 + timeout-minutes: 60 # generous timeout for VM test 218 + ``` 219 + 220 + **Notes**: 221 + 222 + - The integration test requires KVM for the NixOS VM. GitHub's `ubuntu-latest` 223 + runners have KVM available. Verify with `system-features = kvm` in the Nix 224 + config. 225 + - The build job runs first and pushes artifacts to Cachix. Subsequent test jobs 226 + pull from the cache, avoiding redundant rebuilds. 227 + - The `timeout-minutes: 60` is important — Darling operations inside a VM inside 228 + CI can be very slow. Adjust as needed based on real-world timings. 229 + - `submodules: recursive` is required because Darling has 100+ submodules. This 230 + checkout step may itself take 5–10 minutes. 231 + 232 + **Alternative: use a self-hosted runner** if GitHub's runners are too slow or 233 + lack KVM. A dedicated NixOS machine with nested virtualisation enabled would 234 + provide the most reliable CI environment. 235 + 236 + --- 237 + 238 + ### 6.4 — Syscall Regression Test Suite 239 + 240 + Create a set of small C programs under `tests/syscalls/` that exercise every 241 + syscall we've fixed. These run inside `darling shell` and assert expected 242 + behavior. 243 + 244 + **Directory structure**: 245 + 246 + ``` 247 + tests/ 248 + ├── syscalls/ 249 + │ ├── test_lchflags.c 250 + │ ├── test_setattrlist.c 251 + │ ├── test_renameatx_np.c 252 + │ ├── test_utimensat.c 253 + │ ├── test_clonefile.c 254 + │ ├── test_getentropy.c 255 + │ ├── test_posix_spawn.c 256 + │ ├── test_xattr.c 257 + │ ├── test_fcntl_locking.c 258 + │ └── run_all.sh 259 + ├── sandbox/ 260 + │ ├── test_sandbox_exec.sh 261 + │ ├── test_sandbox_init.c 262 + │ └── run_all.sh 263 + └── nix/ 264 + ├── test_nix_eval.sh 265 + ├── test_nix_build_trivial.sh 266 + ├── test_nix_substitution.sh 267 + └── run_all.sh 268 + ``` 269 + 270 + **Example test — `test_lchflags.c`**: 271 + 272 + ```c 273 + #include <stdio.h> 274 + #include <stdlib.h> 275 + #include <sys/stat.h> 276 + #include <unistd.h> 277 + #include <fcntl.h> 278 + #include <errno.h> 279 + #include <string.h> 280 + 281 + #define ASSERT(cond, msg) do { \ 282 + if (!(cond)) { \ 283 + fprintf(stderr, "FAIL: %s (errno=%d: %s)\n", msg, errno, strerror(errno)); \ 284 + exit(1); \ 285 + } \ 286 + } while (0) 287 + 288 + int main(void) { 289 + const char *path = "/tmp/test_lchflags_XXXXXX"; 290 + char tmppath[256]; 291 + strncpy(tmppath, path, sizeof(tmppath)); 292 + 293 + int fd = mkstemp(tmppath); 294 + ASSERT(fd >= 0, "mkstemp failed"); 295 + close(fd); 296 + 297 + /* Clear all flags — this is what Nix does */ 298 + int ret = lchflags(tmppath, 0); 299 + ASSERT(ret == 0, "lchflags(path, 0) should return 0"); 300 + 301 + unlink(tmppath); 302 + printf("PASS: test_lchflags\n"); 303 + return 0; 304 + } 305 + ``` 306 + 307 + **Example test — `test_renameatx_np.c`**: 308 + 309 + ```c 310 + #include <stdio.h> 311 + #include <stdlib.h> 312 + #include <fcntl.h> 313 + #include <unistd.h> 314 + #include <string.h> 315 + #include <errno.h> 316 + #include <sys/stat.h> 317 + 318 + /* macOS renameatx_np flags */ 319 + #ifndef RENAME_SWAP 320 + #define RENAME_SWAP 0x00000002 321 + #endif 322 + #ifndef RENAME_EXCL 323 + #define RENAME_EXCL 0x00000004 324 + #endif 325 + 326 + extern int renameatx_np(int fromfd, const char *from, 327 + int tofd, const char *to, unsigned int flags); 328 + 329 + #define ASSERT(cond, msg) do { \ 330 + if (!(cond)) { \ 331 + fprintf(stderr, "FAIL: %s (errno=%d: %s)\n", msg, errno, strerror(errno)); \ 332 + exit(1); \ 333 + } \ 334 + } while (0) 335 + 336 + static void write_file(const char *path, const char *content) { 337 + int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644); 338 + ASSERT(fd >= 0, "open for write failed"); 339 + write(fd, content, strlen(content)); 340 + close(fd); 341 + } 342 + 343 + static void read_file(const char *path, char *buf, size_t len) { 344 + int fd = open(path, O_RDONLY); 345 + ASSERT(fd >= 0, "open for read failed"); 346 + ssize_t n = read(fd, buf, len - 1); 347 + ASSERT(n >= 0, "read failed"); 348 + buf[n] = '\0'; 349 + close(fd); 350 + } 351 + 352 + int main(void) { 353 + const char *a = "/tmp/renameatx_a"; 354 + const char *b = "/tmp/renameatx_b"; 355 + char buf[64]; 356 + 357 + /* Test RENAME_SWAP */ 358 + write_file(a, "AAA"); 359 + write_file(b, "BBB"); 360 + 361 + int ret = renameatx_np(AT_FDCWD, a, AT_FDCWD, b, RENAME_SWAP); 362 + ASSERT(ret == 0, "renameatx_np RENAME_SWAP failed"); 363 + 364 + read_file(a, buf, sizeof(buf)); 365 + ASSERT(strcmp(buf, "BBB") == 0, "after swap, a should contain BBB"); 366 + 367 + read_file(b, buf, sizeof(buf)); 368 + ASSERT(strcmp(buf, "AAA") == 0, "after swap, b should contain AAA"); 369 + 370 + /* Test RENAME_EXCL */ 371 + unlink(b); 372 + ret = renameatx_np(AT_FDCWD, a, AT_FDCWD, b, RENAME_EXCL); 373 + ASSERT(ret == 0, "renameatx_np RENAME_EXCL (target absent) should succeed"); 374 + 375 + write_file(a, "CCC"); 376 + ret = renameatx_np(AT_FDCWD, a, AT_FDCWD, b, RENAME_EXCL); 377 + ASSERT(ret != 0 && errno == EEXIST, 378 + "renameatx_np RENAME_EXCL (target exists) should fail with EEXIST"); 379 + 380 + unlink(a); 381 + unlink(b); 382 + printf("PASS: test_renameatx_np\n"); 383 + return 0; 384 + } 385 + ``` 386 + 387 + **Runner script — `tests/syscalls/run_all.sh`**: 388 + 389 + ```bash 390 + #!/bin/bash 391 + set -euo pipefail 392 + 393 + SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" 394 + PASS=0 395 + FAIL=0 396 + ERRORS="" 397 + 398 + for test_src in "$SCRIPT_DIR"/test_*.c; do 399 + test_name="$(basename "$test_src" .c)" 400 + test_bin="/tmp/$test_name" 401 + 402 + echo "--- $test_name ---" 403 + 404 + # Compile inside Darling using Apple's clang 405 + if ! cc -o "$test_bin" "$test_src" 2>&1; then 406 + echo "FAIL: $test_name (compilation failed)" 407 + FAIL=$((FAIL + 1)) 408 + ERRORS="$ERRORS\n $test_name: compilation failed" 409 + continue 410 + fi 411 + 412 + # Run 413 + if "$test_bin"; then 414 + PASS=$((PASS + 1)) 415 + else 416 + FAIL=$((FAIL + 1)) 417 + ERRORS="$ERRORS\n $test_name: test failed" 418 + fi 419 + 420 + rm -f "$test_bin" 421 + done 422 + 423 + echo "" 424 + echo "=== Results: $PASS passed, $FAIL failed ===" 425 + if [ $FAIL -gt 0 ]; then 426 + echo -e "Failures:$ERRORS" 427 + exit 1 428 + fi 429 + ``` 430 + 431 + **Integration with Nix**: Create a derivation that compiles and runs all tests 432 + inside a Darling prefix (this requires a NixOS VM test context since Darling 433 + needs namespace support): 434 + 435 + ```nix 436 + checks.x86_64-linux.syscall-regression = nixosTest { 437 + name = "darling-syscall-regression"; 438 + nodes.machine = { ... }: { 439 + imports = [ ../nixosModules/darling ]; 440 + programs.darling.enable = true; 441 + }; 442 + testScript = '' 443 + machine.wait_for_unit("default.target") 444 + machine.succeed("darling shell bash /path/to/tests/syscalls/run_all.sh") 445 + ''; 446 + }; 447 + ``` 448 + 449 + --- 450 + 451 + ### 6.5 — Nix Compatibility Test Matrix 452 + 453 + Create a test that attempts to build an expanding set of Nixpkgs packages inside 454 + Darling and tracks pass/fail rates over time. 455 + 456 + **File**: `tests/nix/compatibility-matrix.sh` 457 + 458 + **Approach**: 459 + 460 + ```bash 461 + #!/bin/bash 462 + # Test building a set of packages inside Darling 463 + # Tracks pass/fail for each package 464 + 465 + PACKAGES=( 466 + # Tier 1: Must work (no native compilation, just fetch from cache) 467 + "hello" 468 + "which" 469 + "yes" 470 + 471 + # Tier 2: Should work (simple C programs) 472 + "tree" 473 + "jq" 474 + 475 + # Tier 3: Stretch (complex builds) 476 + "curl" 477 + "git" 478 + "python3" 479 + ) 480 + 481 + RESULTS_FILE="/tmp/compat-matrix-$(date +%Y%m%d).json" 482 + echo '{"results": [' > "$RESULTS_FILE" 483 + 484 + for pkg in "${PACKAGES[@]}"; do 485 + echo "--- Testing: $pkg ---" 486 + start_time=$(date +%s) 487 + 488 + if darling-nix nix-build '<nixpkgs>' -A "$pkg" --system x86_64-darwin --no-out-link 2>/tmp/build-$pkg.log; then 489 + status="pass" 490 + else 491 + status="fail" 492 + fi 493 + 494 + end_time=$(date +%s) 495 + duration=$((end_time - start_time)) 496 + 497 + echo " $pkg: $status (${duration}s)" 498 + echo " {\"package\": \"$pkg\", \"status\": \"$status\", \"duration\": $duration}," >> "$RESULTS_FILE" 499 + done 500 + 501 + # Close JSON (remove trailing comma hack) 502 + sed -i '$ s/,$//' "$RESULTS_FILE" 503 + echo ']}' >> "$RESULTS_FILE" 504 + 505 + echo "" 506 + echo "Results written to $RESULTS_FILE" 507 + 508 + # Summary 509 + pass_count=$(grep -c '"pass"' "$RESULTS_FILE" || true) 510 + fail_count=$(grep -c '"fail"' "$RESULTS_FILE" || true) 511 + total=${#PACKAGES[@]} 512 + echo "=== Compatibility: $pass_count/$total passed ($fail_count failed) ===" 513 + ``` 514 + 515 + **Tracking over time**: Store the JSON results as CI artifacts. A simple script 516 + can compare results between runs to detect regressions (a package that was 517 + passing now fails) or progress (a package that was failing now passes). 518 + 519 + --- 520 + 521 + ### 6.6 — Darling Build Smoke Test 522 + 523 + A lighter-weight test that doesn't need a NixOS VM — just verifies Darling 524 + builds from source with Nix: 525 + 526 + ```nix 527 + checks.x86_64-linux.darling-build = self.packages.x86_64-linux.darling; 528 + ``` 529 + 530 + This runs as part of `nix flake check` and catches build regressions (missing 531 + dependencies, broken patches, compiler errors) without the overhead of a VM 532 + test. 533 + 534 + --- 535 + 536 + ### 6.7 — Test Darling SDK Cross-Compilation 537 + 538 + Verify that the SDK output can be used to cross-compile Darwin binaries from 539 + Linux (without running them inside Darling — just the compilation step): 540 + 541 + ```bash 542 + # Use the SDK's clang + ld64 to compile a Darwin binary on Linux 543 + $darling_sdk/bin/x86_64-apple-darwin-ld64 ... # or however the SDK exposes the tools 544 + ``` 545 + 546 + This tests the SDK packaging independently of the Darling runtime. 547 + 548 + --- 549 + 550 + ## Test Categories 551 + 552 + | Category | Runs In | Needs VM | Frequency | Phase Dependency | 553 + |---|---|---|---|---| 554 + | Build smoke test | Nix sandbox | No | Every PR | Phase 0 | 555 + | SDK cross-compile | Nix sandbox | No | Every PR | Phase 0 | 556 + | Syscall regression | Darling shell (in VM) | Yes | Every PR | Phase 1 | 557 + | Sandbox stub test | Darling shell (in VM) | Yes | Every PR | Phase 2 | 558 + | Nix installation | Darling shell (in VM) | Yes | Every PR | Phase 3 | 559 + | Trivial build | Darling shell (in VM) | Yes | Every PR | Phase 4 | 560 + | Compatibility matrix | Darling shell (in VM) | Yes | Nightly / weekly | Phase 4 | 561 + | Daemon & multi-user | Darling shell (in VM) | Yes | Every PR | Phase 5 | 562 + | Remote builder | NixOS VM with Nix daemon | Yes | Nightly / weekly | Phase 7 | 563 + 564 + --- 565 + 566 + ## CI Performance Considerations 567 + 568 + NixOS VM tests are slow. Strategies to keep CI times reasonable: 569 + 570 + 1. **Cachix**: Push all build artifacts to a binary cache. Subsequent runs skip 571 + rebuilding Darling (which takes 30+ minutes from scratch). 572 + 573 + 2. **Test parallelism**: Run the build smoke test and SDK test in parallel with 574 + the VM-based tests (they're independent). 575 + 576 + 3. **Incremental testing**: On PRs that only touch `plan/` or `docs/`, skip the 577 + expensive VM tests. Use path filters in the workflow: 578 + ```yaml 579 + on: 580 + push: 581 + paths-ignore: 582 + - 'plan/**' 583 + - '*.md' 584 + ``` 585 + 586 + 4. **Test VM snapshots**: If the NixOS testing framework supports it, take a 587 + snapshot after Darling initialization and restore from it for each test. This 588 + avoids re-bootstrapping Darling's prefix on every test run. 589 + 590 + 5. **Split VM tests**: Rather than one monolithic test, split into focused tests 591 + (syscalls, sandbox, Nix install, build). Failed tests give faster feedback 592 + about what broke. 593 + 594 + 6. **Timeout management**: Set aggressive but realistic timeouts per test step. 595 + A hanging test should fail fast rather than consume the full CI allocation: 596 + ```python 597 + # In the NixOS test script: 598 + machine.succeed("timeout 300 darling-nix nix-build ...") 599 + ``` 600 + 601 + --- 602 + 603 + ## Verification Checklist 604 + 605 + After completing Phase 6, ALL of the following should be true: 606 + 607 + - [ ] `nix flake check` passes (includes build smoke test) 608 + - [ ] `.github/workflows/nix-ci.yaml` exists and runs on PRs 609 + - [ ] Syscall regression tests exist for `lchflags`, `renameatx_np`, `utimensat` (at minimum) 610 + - [ ] Sandbox stub tests verify `sandbox-exec` passthrough works 611 + - [ ] NixOS VM test installs Nix inside Darling and evaluates an expression 612 + - [ ] NixOS VM test builds a trivial derivation inside Darling 613 + - [ ] CI results are visible on GitHub PR checks 614 + - [ ] Cachix cache is populated by CI and speeds up subsequent runs 615 + - [ ] Compatibility matrix script exists and produces JSON output 616 + - [ ] Adding a new syscall implementation has a clear path: implement → add test → CI verifies 617 + 618 + --- 619 + 620 + ## Maintenance 621 + 622 + - **Adding new tests**: When a new syscall is implemented (Phase 1), add a 623 + corresponding `test_<syscall>.c` to `tests/syscalls/`. The runner script 624 + picks it up automatically. 625 + 626 + - **Updating the compatibility matrix**: As more packages start working, add them 627 + to the `PACKAGES` array. The matrix should only grow, never shrink (removing a 628 + package hides regressions). 629 + 630 + - **Flaky tests**: If a test passes intermittently (likely due to Darling's 631 + incomplete implementation), mark it as `@flaky` in the test script and track 632 + it separately. Do not disable it — flaky tests are signals of real issues. 633 + 634 + - **CI costs**: NixOS VM tests are expensive. Monitor CI usage and adjust the 635 + trigger frequency (e.g., move the compatibility matrix to weekly if it's too 636 + costly to run on every PR). 637 + 638 + --- 639 + 640 + *[← Phase 5 — Nix Daemon](./07-phase5-daemon.md) | [Phase 7 — Remote Builder →](./09-phase7-remote-builder.md)*
+630
plan/09-phase7-remote-builder.md
··· 1 + # Phase 7 — Nixpkgs `x86_64-darwin` Remote Builder 2 + 3 + **Priority**: P2 · **Effort**: L (4–8 weeks) · **Depends on**: Phase 4 (derivation building), Phase 5 (Nix daemon) 4 + 5 + The ultimate goal of this project: use Darling as a **remote builder** so that a 6 + Linux host's Nix daemon can offload `x86_64-darwin` builds to a Darling 7 + instance — just as it would offload to a real macOS machine over SSH. 8 + 9 + This unlocks the ability for any NixOS machine to build and test Darwin packages 10 + without Apple hardware. 11 + 12 + --- 13 + 14 + ## Context 15 + 16 + Nix supports remote builds via two mechanisms: 17 + 18 + 1. **SSH-based remote builders** (`nix.buildMachines`): The local Nix daemon 19 + connects to a remote machine over SSH, copies the derivation closure, runs 20 + the build remotely, and copies the result back. The remote machine must run 21 + `nix-daemon` and accept SSH connections. 22 + 23 + 2. **Build hooks**: A custom `build-hook` program that Nix invokes when it 24 + encounters a derivation for a system it can't build locally. The hook decides 25 + where and how to build it. 26 + 27 + For Darling, the SSH approach is the most natural: run `sshd` inside Darling, 28 + configure the host's Nix daemon to treat it as a remote builder for 29 + `x86_64-darwin`, and let the standard Nix remote-build protocol handle the rest. 30 + 31 + An alternative is a custom build hook that calls `darling shell` directly, 32 + avoiding SSH overhead. Both approaches are covered below. 33 + 34 + --- 35 + 36 + ## Architecture 37 + 38 + ``` 39 + ┌──────────────────────────────────────────────────────────┐ 40 + │ Linux Host (NixOS) │ 41 + │ │ 42 + │ User runs: nix build .#myPackage --system x86_64-darwin │ 43 + │ │ │ 44 + │ ▼ │ 45 + │ ┌──────────────────────────────────┐ │ 46 + │ │ Nix Daemon (Linux) │ │ 47 + │ │ system = x86_64-linux │ │ 48 + │ │ buildMachines includes: │ │ 49 + │ │ { hostName = "darling-vm"; │ │ 50 + │ │ systems = ["x86_64-darwin"];│ │ 51 + │ │ sshKey = "..."; } │ │ 52 + │ └──────────┬───────────────────────┘ │ 53 + │ │ SSH (or darling-exec) │ 54 + │ ▼ │ 55 + │ ┌──────────────────────────────────┐ │ 56 + │ │ Darling Container │ │ 57 + │ │ ┌────────────────────────────┐ │ │ 58 + │ │ │ sshd (port 2222) │ │ │ 59 + │ │ │ nix-daemon │ │ │ 60 + │ │ │ sandbox-exec stub │ │ │ 61 + │ │ │ /nix/store (shared) │──┼── bind mount ──┐ │ 62 + │ │ └────────────────────────────┘ │ │ │ 63 + │ └──────────────────────────────────┘ │ │ 64 + │ │ │ 65 + │ /nix/store ◄────────────────────────────────────────┘ │ 66 + │ │ 67 + └──────────────────────────────────────────────────────────┘ 68 + ``` 69 + 70 + The key insight is the **shared `/nix/store`**: by bind-mounting or symlinking 71 + the host's `/nix/store` into the Darling prefix, we avoid the expensive step of 72 + copying store paths back and forth over SSH. The SSH connection is still used for 73 + the build protocol (derivation transfer, build log streaming, result 74 + registration) but the actual store content is shared via the filesystem. 75 + 76 + --- 77 + 78 + ## Tasks 79 + 80 + ### 7.1 — Run `sshd` Inside Darling 81 + 82 + Set up an SSH server inside the Darling prefix so the host's Nix daemon can 83 + connect to it as a remote builder. 84 + 85 + **Steps**: 86 + 87 + 1. **Install sshd**: Darling ships OpenSSH (`src/external/openssh/`). Verify 88 + that `/usr/sbin/sshd` exists in the prefix and is functional. 89 + 90 + 2. **Generate host keys**: 91 + ```bash 92 + darling shell ssh-keygen -A 93 + ``` 94 + 95 + 3. **Configure sshd** (`/etc/ssh/sshd_config` inside the prefix): 96 + ``` 97 + Port 2222 98 + ListenAddress 127.0.0.1 99 + PermitRootLogin yes 100 + PubkeyAuthentication yes 101 + AuthorizedKeysFile .ssh/authorized_keys 102 + PasswordAuthentication no 103 + UsePAM no 104 + Subsystem sftp /usr/libexec/sftp-server 105 + ``` 106 + 107 + Using port 2222 avoids conflict with the host's sshd on port 22. 108 + 109 + 4. **Set up SSH keys**: Generate a keypair for the Nix daemon to use: 110 + ```bash 111 + ssh-keygen -t ed25519 -N "" -f /etc/nix/darling-builder-key 112 + darling shell mkdir -p /var/root/.ssh 113 + cat /etc/nix/darling-builder-key.pub | darling shell tee /var/root/.ssh/authorized_keys 114 + darling shell chmod 600 /var/root/.ssh/authorized_keys 115 + ``` 116 + 117 + 5. **Start sshd**: 118 + ```bash 119 + darling shell /usr/sbin/sshd -f /etc/ssh/sshd_config 120 + ``` 121 + 122 + 6. **Verify connectivity**: 123 + ```bash 124 + ssh -i /etc/nix/darling-builder-key -p 2222 root@127.0.0.1 echo ok 125 + # Expected: ok 126 + ``` 127 + 128 + **Potential issues**: 129 + 130 + - **Network stack**: Darling's network layer needs to support `bind()` on 131 + `127.0.0.1:2222` and `accept()` incoming connections. Since Darling maps to 132 + Linux sockets, this should work, but verify. 133 + 134 + - **PTY allocation**: SSH uses pseudo-terminals. Darling needs working `/dev/ptmx` 135 + and `openpty()`. Non-interactive commands (which is what Nix uses) may not need 136 + a PTY, but the SSH handshake might still require basic PTY support. 137 + 138 + - **`sshd` privilege separation**: OpenSSH's privilege separation uses `fork`, 139 + `setuid`, and `chroot`. If these don't work inside Darling, configure sshd 140 + with `UsePrivilegeSeparation no` (deprecated but functional). 141 + 142 + - **PAM**: Set `UsePAM no` since Darling doesn't implement PAM. 143 + 144 + --- 145 + 146 + ### 7.2 — Configure the Host as a Remote Build Client 147 + 148 + Add the Darling instance as a remote builder in the host's Nix configuration. 149 + 150 + **NixOS configuration**: 151 + 152 + ```nix 153 + nix.buildMachines = [{ 154 + hostName = "127.0.0.1"; 155 + port = 2222; 156 + systems = [ "x86_64-darwin" ]; 157 + sshUser = "root"; 158 + sshKey = "/etc/nix/darling-builder-key"; 159 + maxJobs = 4; 160 + speedFactor = 1; # lower than native builders; adjust based on benchmarks 161 + supportedFeatures = [ ]; 162 + mandatoryFeatures = [ ]; 163 + }]; 164 + 165 + nix.distributedBuilds = true; 166 + 167 + # Optional: only use the Darling builder for Darwin, not for Linux 168 + nix.settings.extra-platforms = [ "x86_64-darwin" ]; 169 + ``` 170 + 171 + **Verification**: 172 + 173 + ```bash 174 + # Test that Nix can connect to the builder 175 + nix store ping --store ssh://root@127.0.0.1:2222 176 + 177 + # Test a remote build 178 + nix build --expr 'derivation { name = "test"; builder = "/bin/bash"; args = ["-c" "echo ok > $out"]; system = "x86_64-darwin"; }' -L 179 + 180 + # The build should be offloaded to the Darling instance 181 + ``` 182 + 183 + **Troubleshooting**: 184 + 185 + ```bash 186 + # Check if the Nix daemon can reach sshd 187 + sudo -u nix-daemon ssh -i /etc/nix/darling-builder-key -p 2222 root@127.0.0.1 nix --version 188 + 189 + # Check the Nix daemon logs for builder connection errors 190 + journalctl -u nix-daemon -f 191 + 192 + # Verify the Darling sshd is listening 193 + ss -tlnp | grep 2222 194 + ``` 195 + 196 + --- 197 + 198 + ### 7.3 — Shared `/nix/store` 199 + 200 + The naive remote-build setup copies store paths over SSH, which is extremely slow 201 + for large closures. Since the Darling instance runs on the same machine, we can 202 + share the store filesystem directly. 203 + 204 + **Mechanism**: Darling mounts the host's root filesystem at `/Volumes/SystemRoot` 205 + inside the prefix. The host's `/nix/store` is therefore accessible at 206 + `/Volumes/SystemRoot/nix/store` from within Darling. 207 + 208 + **Setup**: 209 + 210 + ```bash 211 + # Inside the Darling prefix, symlink /nix to the host's /nix 212 + darling shell ln -sf /Volumes/SystemRoot/nix /nix 213 + ``` 214 + 215 + Or, if that conflicts with Darling's overlayfs: 216 + 217 + ```bash 218 + # Bind mount the host's /nix into the prefix 219 + # This may need to be done during prefix initialization in darlingserver 220 + mount --bind /nix ~/.darling/nix 221 + ``` 222 + 223 + **Benefits**: 224 + 225 + - **No copy overhead**: Store paths don't need to be transferred over SSH. The 226 + Nix daemon on both sides sees the same physical files. 227 + - **Shared garbage collection**: The host's GC manages the shared store. 228 + - **Instant result availability**: After a Darwin build completes, its output is 229 + immediately available on the host without copying. 230 + 231 + **Caveats**: 232 + 233 + - **Store database**: Nix's SQLite database (`/nix/var/nix/db/db.sqlite`) must 234 + not be shared between the host and Darling Nix daemons — they're different 235 + Nix instances with potentially different database schemas. Each needs its own 236 + database. 237 + 238 + Solution: Configure the Darling Nix instance to use a different database 239 + location: 240 + ``` 241 + # In /etc/nix/nix.conf inside Darling: 242 + store = /nix 243 + state = /var/nix # Darling-local state, not shared 244 + ``` 245 + Or use a local overlay for `/nix/var` while sharing `/nix/store`. 246 + 247 + - **Concurrent writes**: If both the host and Darling write to `/nix/store` 248 + simultaneously, there's a risk of corruption. Mitigate by: 249 + - Making the Darling Nix daemon the exclusive writer for `x86_64-darwin` paths. 250 + - Using Nix's content-addressed store paths (which are safe for concurrent 251 + writes since paths are determined by content). 252 + - Using file-level locking (`fcntl`) which works across the shared mount. 253 + 254 + - **Permission mapping**: Darling's UID/GID namespace may differ from the host's. 255 + Ensure that files written by Darling's `_nixbldN` users are readable by the 256 + host's Nix daemon. This may require mapping UIDs or using a shared `nixbld` 257 + group. 258 + 259 + **Fallback**: If shared store proves too complex, fall back to SSH-based copying. 260 + It's slower but simpler and guaranteed correct. Use Nix's `--builders` flag with 261 + `ssh-ng://` protocol which has optimised store path transfer. 262 + 263 + --- 264 + 265 + ### 7.4 — Alternative: Custom Build Hook (No SSH) 266 + 267 + Instead of SSH, implement a custom Nix build hook that invokes `darling shell` 268 + directly. This avoids the SSH setup entirely and may have lower overhead. 269 + 270 + **How Nix build hooks work**: 271 + 272 + 1. Nix calls the `build-hook` program (configured in `nix.conf`) when a build 273 + can't be performed locally. 274 + 2. The hook reads the derivation path and system type from stdin. 275 + 3. The hook decides whether to accept the build. If yes, it outputs the builder 276 + machine specification. 277 + 4. Nix then proceeds to run the build on that machine. 278 + 279 + **Custom hook — `darling-build-hook`**: 280 + 281 + ```bash 282 + #!/usr/bin/env bash 283 + # darling-build-hook — Nix build hook that offloads x86_64-darwin builds to Darling 284 + 285 + set -euo pipefail 286 + 287 + # Read build request from Nix 288 + # Protocol: https://nixos.org/manual/nix/stable/advanced-topics/distributed-builds 289 + read -r drv_path system 290 + 291 + if [[ "$system" != "x86_64-darwin" ]]; then 292 + echo "# decline" # Not a Darwin build, let Nix handle it 293 + exit 0 294 + fi 295 + 296 + echo "# accept" 297 + echo "darling-builder x86_64-darwin /etc/nix/darling-builder-key 4 1" 298 + 299 + # Nix will now SSH to "darling-builder" (which must resolve, or use the 300 + # machines file). Alternatively, this hook could run the build directly: 301 + # 302 + # darling shell nix-store --realise "$drv_path" 303 + # echo "$drv_path" 304 + ``` 305 + 306 + **Note**: The build hook protocol is somewhat complex and version-dependent. The 307 + SSH approach (7.1/7.2) is more battle-tested and recommended for initial 308 + implementation. The custom hook is an optimisation for later. 309 + 310 + **Nix configuration for the hook**: 311 + 312 + ```nix 313 + nix.settings.build-hook = "/path/to/darling-build-hook"; 314 + ``` 315 + 316 + --- 317 + 318 + ### 7.5 — NixOS Module for the Darling Builder 319 + 320 + Wrap all the setup (sshd, keys, store sharing, `nix.buildMachines`) into a 321 + reusable NixOS module. 322 + 323 + **Module file**: `nixosModules/darling-builder.nix` 324 + 325 + ```nix 326 + { config, lib, pkgs, ... }: 327 + 328 + with lib; 329 + 330 + let 331 + cfg = config.services.darling-builder; 332 + in { 333 + options.services.darling-builder = { 334 + enable = mkEnableOption "Darling-based x86_64-darwin remote builder"; 335 + 336 + port = mkOption { 337 + type = types.port; 338 + default = 2222; 339 + description = "SSH port for the Darling builder"; 340 + }; 341 + 342 + maxJobs = mkOption { 343 + type = types.int; 344 + default = 4; 345 + description = "Maximum concurrent builds on the Darling builder"; 346 + }; 347 + 348 + speedFactor = mkOption { 349 + type = types.int; 350 + default = 1; 351 + description = "Speed factor (lower = deprioritised vs native builders)"; 352 + }; 353 + 354 + shareStore = mkOption { 355 + type = types.bool; 356 + default = true; 357 + description = "Share /nix/store between host and Darling (avoids copying)"; 358 + }; 359 + 360 + sshKeyPath = mkOption { 361 + type = types.str; 362 + default = "/etc/nix/darling-builder-key"; 363 + description = "Path to the SSH private key for connecting to the builder"; 364 + }; 365 + }; 366 + 367 + config = mkIf cfg.enable { 368 + # Ensure Darling is available 369 + programs.darling.enable = true; 370 + 371 + # Generate SSH keys if they don't exist 372 + system.activationScripts.darling-builder-keys = '' 373 + if [ ! -f ${cfg.sshKeyPath} ]; then 374 + ${pkgs.openssh}/bin/ssh-keygen -t ed25519 -N "" -f ${cfg.sshKeyPath} 375 + chown root:root ${cfg.sshKeyPath} 376 + chmod 600 ${cfg.sshKeyPath} 377 + fi 378 + ''; 379 + 380 + # Set up the Darling prefix with sshd and Nix 381 + systemd.services.darling-builder = { 382 + description = "Darling x86_64-darwin Nix builder"; 383 + wantedBy = [ "multi-user.target" ]; 384 + after = [ "network.target" ]; 385 + 386 + serviceConfig = { 387 + Type = "simple"; 388 + ExecStartPre = [ 389 + # Initialize prefix and install Nix if needed 390 + "${pkgs.writeShellScript "darling-builder-init" '' 391 + darling shell test -x /usr/sbin/sshd || exit 1 392 + darling shell test -x /usr/bin/sandbox-exec || exit 1 393 + 394 + # Set up SSH authorized keys 395 + darling shell mkdir -p /var/root/.ssh 396 + cat ${cfg.sshKeyPath}.pub | darling shell tee /var/root/.ssh/authorized_keys > /dev/null 397 + darling shell chmod 600 /var/root/.ssh/authorized_keys 398 + 399 + # Generate host keys if needed 400 + darling shell test -f /etc/ssh/ssh_host_ed25519_key || darling shell ssh-keygen -A 401 + 402 + ${optionalString cfg.shareStore '' 403 + # Symlink /nix to host's /nix via /Volumes/SystemRoot 404 + darling shell ln -sf /Volumes/SystemRoot/nix /nix 2>/dev/null || true 405 + ''} 406 + ''}" 407 + ]; 408 + ExecStart = "${pkgs.darling}/bin/darling shell /usr/sbin/sshd -D -f /etc/ssh/sshd_config -p ${toString cfg.port}"; 409 + Restart = "on-failure"; 410 + RestartSec = 5; 411 + }; 412 + }; 413 + 414 + # Register as a Nix remote builder 415 + nix.buildMachines = [{ 416 + hostName = "127.0.0.1"; 417 + port = cfg.port; 418 + systems = [ "x86_64-darwin" ]; 419 + sshUser = "root"; 420 + sshKey = cfg.sshKeyPath; 421 + maxJobs = cfg.maxJobs; 422 + speedFactor = cfg.speedFactor; 423 + supportedFeatures = [ ]; 424 + mandatoryFeatures = [ ]; 425 + }]; 426 + 427 + nix.distributedBuilds = true; 428 + }; 429 + } 430 + ``` 431 + 432 + **Usage** (in a NixOS configuration): 433 + 434 + ```nix 435 + { 436 + imports = [ ./path/to/darling-nix/nixosModules/darling-builder.nix ]; 437 + 438 + services.darling-builder = { 439 + enable = true; 440 + maxJobs = 8; 441 + shareStore = true; 442 + }; 443 + } 444 + ``` 445 + 446 + After `nixos-rebuild switch`, the user can immediately build Darwin packages: 447 + 448 + ```bash 449 + nix build nixpkgs#hello --system x86_64-darwin 450 + ``` 451 + 452 + --- 453 + 454 + ### 7.6 — Test Top Nixpkgs Packages 455 + 456 + Once the builder is operational, systematically test building the most 457 + commonly-used `x86_64-darwin` packages from Nixpkgs. 458 + 459 + **Tier 1 — Must pass** (fetch from binary cache, minimal building): 460 + 461 + | Package | Why It Matters | 462 + |---|---| 463 + | `hello` | Simplest C program; validates full stdenv pipeline | 464 + | `which` | Trivial utility; shell script install | 465 + | `coreutils` | Foundation of every build; exercises many syscalls | 466 + | `bash` | Builder shell; must work perfectly | 467 + | `gnugrep` | Used in stdenv setup scripts | 468 + | `gnused` | Used in stdenv setup scripts | 469 + | `gawk` | Used in stdenv setup scripts | 470 + 471 + **Tier 2 — Should pass** (moderate complexity): 472 + 473 + | Package | Why It Matters | 474 + |---|---| 475 + | `curl` | Needed for fetching; exercises TLS + network | 476 + | `git` | Needed for `fetchgit` in derivations | 477 + | `python3` | Common build dependency; complex build | 478 + | `jq` | Used in many CI scripts | 479 + | `openssl` | Crypto library; exercises many low-level APIs | 480 + | `pkg-config` | Build tool; should be straightforward | 481 + | `cmake` | Build tool; complex but well-tested | 482 + 483 + **Tier 3 — Stretch** (complex, many dependencies): 484 + 485 + | Package | Why It Matters | 486 + |---|---| 487 + | `nodejs` | Large build; JavaScript ecosystem foundation | 488 + | `go` | Self-hosting compiler; stresses the runtime | 489 + | `rustc` | Very large build; exercises many syscalls | 490 + | `llvm` | Compiler infrastructure; tests C++ heavily | 491 + | `ghc` | Haskell compiler; extremely complex build | 492 + 493 + **Tracking**: Use the compatibility matrix from [Phase 6](./08-phase6-ci.md) 494 + (task 6.5) to track pass/fail rates. Run this as a nightly CI job and publish 495 + results to a dashboard or markdown file in the repo. 496 + 497 + **When something fails**: For each failure: 498 + 499 + 1. Capture the full build log. 500 + 2. Identify the first error (often buried under cascading failures). 501 + 3. Determine if it's a syscall issue (→ Phase 1), a sandbox issue (→ Phase 2), 502 + a coreutils issue (→ Phase 4.6), or a new category. 503 + 4. File an issue with the `[compat]` label. 504 + 5. Add it to `plan/syscall-triage.md` if it's a new syscall. 505 + 506 + --- 507 + 508 + ### 7.7 — Documentation and Templates 509 + 510 + Create user-facing documentation so others can set up their own Darling builders. 511 + 512 + **Deliverables**: 513 + 514 + 1. **NixOS wiki page**: Step-by-step guide for setting up a Darling-based Darwin 515 + builder on NixOS. Cover both the NixOS module approach and the manual setup. 516 + 517 + 2. **Flake template** (`templates/darling-builder`): 518 + ```bash 519 + nix flake init -t github:user/darling-nix#darling-builder 520 + ``` 521 + Generates a minimal `flake.nix` + NixOS configuration that sets up the 522 + builder. 523 + 524 + 3. **Troubleshooting guide**: Common issues and their solutions: 525 + - "Connection refused" → sshd not running or wrong port 526 + - "Permission denied" → SSH key mismatch 527 + - "Build failed with signal 11" → unimplemented syscall → file an issue 528 + - "Store path not valid" → shared store database mismatch 529 + - "builder for '...' failed with exit code 1" → check the build log 530 + 531 + 4. **Performance tuning guide**: Tips for getting the best performance: 532 + - Use binary substitution aggressively (`substituters` in `nix.conf`) 533 + - Set `max-jobs` based on available CPU cores 534 + - Use `--cores N` to limit per-build parallelism 535 + - Enable store sharing to avoid copy overhead 536 + - Put the Nix store on fast storage (SSD/NVMe) 537 + 538 + --- 539 + 540 + ## Security Considerations 541 + 542 + Running sshd inside Darling on `127.0.0.1:2222` is relatively safe: 543 + 544 + - **Loopback only**: The SSH server only listens on localhost. It's not reachable 545 + from the network. 546 + - **Key-based auth only**: Password authentication is disabled. Only the specific 547 + key generated for the builder can connect. 548 + - **Contained environment**: The Darling prefix is isolated from the host via 549 + namespaces. Even if an attacker gains access to the Darling sshd, they're 550 + inside a container with limited host access. 551 + - **Shared store risk**: If `/nix/store` is shared, a compromised builder could 552 + write malicious store paths. Mitigate by: 553 + - Only sharing the store read-only from the host side. 554 + - Using Nix's content-addressing to verify outputs. 555 + - Running the Darling builder with minimal host capabilities. 556 + 557 + For production use, consider running the Darling builder inside an additional 558 + isolation layer (systemd-nspawn, VM, or dedicated user namespace) to defense-in- 559 + depth against container escapes. 560 + 561 + --- 562 + 563 + ## Performance Expectations 564 + 565 + With store sharing enabled: 566 + 567 + | Operation | Expected Overhead vs Native macOS | 568 + |---|---| 569 + | Binary substitution | ~1.2–1.5× (NAR unpack syscall overhead) | 570 + | Nix evaluation | ~2–5× (CPU-bound, translation overhead) | 571 + | C compilation (clang) | ~3–8× (many syscalls, process spawning) | 572 + | Linking (ld64) | ~2–4× (I/O bound, moderate syscall count) | 573 + | Full `hello` build | ~3–5× (mostly substitution + simple compile) | 574 + | Full `python3` build | ~5–10× (complex build, many phases) | 575 + 576 + Without store sharing (SSH copy): 577 + 578 + - Add ~30 seconds per 100 MB of closure for each copy direction. 579 + - A typical stdenv closure is ~500 MB, so expect ~2.5 minutes overhead per build 580 + just for copying. 581 + 582 + **Recommendation**: Always enable store sharing for local Darling builders. SSH 583 + copy mode is only useful for remote machines running Darling (future work). 584 + 585 + --- 586 + 587 + ## Verification Checklist 588 + 589 + After completing Phase 7, ALL of the following must pass: 590 + 591 + - [ ] `sshd` runs inside Darling and accepts SSH connections from the host 592 + - [ ] `ssh -p 2222 root@127.0.0.1 nix --version` returns a Nix version string 593 + - [ ] Host's `nix.buildMachines` includes the Darling builder 594 + - [ ] `nix build --expr '...' --system x86_64-darwin` offloads to the Darling builder 595 + - [ ] Build log is streamed back to the host in real time 596 + - [ ] Build output is available in the host's `/nix/store` after completion 597 + - [ ] `/nix/store` is shared (no SSH copy overhead) when `shareStore = true` 598 + - [ ] `nix build nixpkgs#hello --system x86_64-darwin` succeeds (Tier 1 package) 599 + - [ ] At least 5/7 Tier 1 packages build successfully 600 + - [ ] At least 3/7 Tier 2 packages build successfully 601 + - [ ] The NixOS module (`services.darling-builder`) works end-to-end 602 + - [ ] Documentation exists for manual and module-based setup 603 + 604 + --- 605 + 606 + ## What This Enables 607 + 608 + Once Phase 7 is working, any NixOS user can: 609 + 610 + ```nix 611 + # flake.nix 612 + { 613 + outputs = { self, nixpkgs }: { 614 + packages.x86_64-darwin.myApp = nixpkgs.legacyPackages.x86_64-darwin.callPackage ./. {}; 615 + }; 616 + } 617 + ``` 618 + 619 + ```bash 620 + # Build a Darwin package on a Linux machine 621 + nix build .#packages.x86_64-darwin.myApp 622 + ``` 623 + 624 + This is the same workflow they'd use with a real macOS remote builder, but 625 + without needing Apple hardware. The Darling builder is transparent to the user — 626 + they don't need to know or care that it's running inside a compatibility layer. 627 + 628 + --- 629 + 630 + *[← Phase 6 — CI & Testing](./08-phase6-ci.md) | [Phase 8 — Stretch Goals →](./10-phase8-stretch.md)*
+412
plan/10-phase8-stretch.md
··· 1 + # Phase 8 — Long-Term / Stretch Goals 2 + 3 + **Priority**: P3 · **Effort**: XL (months–years) · **Depends on**: Phase 7 (remote builder) 4 + 5 + These are aspirational items that would make the Darling+Nix story truly 6 + compelling but are not required for basic functionality. Each is a significant 7 + project in its own right. They're documented here to provide direction for 8 + future contributors and to ensure the earlier phases don't make architectural 9 + decisions that would preclude these goals. 10 + 11 + --- 12 + 13 + ## 8.1 — `aarch64-darwin` Support 14 + 15 + **What**: Build and test Apple Silicon (`aarch64-darwin`) packages on Linux. 16 + 17 + **Why it matters**: Apple has fully transitioned to ARM. The majority of macOS 18 + users now run Apple Silicon. Nixpkgs' `aarch64-darwin` support is growing 19 + rapidly, but CI coverage is limited by hardware availability. 20 + 21 + **Current state**: Darling only supports `x86_64`. The entire codebase — 22 + darlingserver's syscall translation, the Mach-O loader, dyld, and all the Darwin 23 + libraries — is x86_64-only. 24 + 25 + **Approach options**: 26 + 27 + | Option | Complexity | Performance | Notes | 28 + |---|---|---|---| 29 + | QEMU user-mode emulation | Medium | Slow (~10–50×) | Translate AArch64 instructions to x86_64; `qemu-aarch64` already exists but doesn't handle Mach-O | 30 + | Full AArch64 Darling port | Very High | Near-native on aarch64-linux | Requires porting all of darlingserver, dyld, and libSystem to AArch64 | 31 + | Rosetta-like translation | Extremely High | Fast (~1.5–3×) | AOT binary translation from AArch64 Mach-O to x86_64 ELF; research-grade effort | 32 + | FEX-Emu integration | High | Moderate (~3–8×) | FEX-Emu handles x86_64→AArch64 translation; combine with Darling for Mach-O→ELF on AArch64 Linux hosts | 33 + 34 + **Recommended path**: Start with QEMU user-mode for correctness testing (not 35 + performance). A Darling-aware QEMU wrapper that loads Mach-O binaries and 36 + translates Darwin syscalls via darlingserver, with AArch64 instruction emulation 37 + handled by QEMU. 38 + 39 + Long-term, a native AArch64 port of Darling is the right answer if the project 40 + gains enough contributors. 41 + 42 + **Prerequisites**: 43 + - All Phase 1–7 work must be solid on x86_64 first. 44 + - Darlingserver's architecture must be cleanly separated from x86_64 specifics 45 + (register mapping, calling conventions, instruction patching). 46 + - The Mach-O loader must handle `arm64` and `arm64e` slices. 47 + 48 + **Effort**: 6–18 months for QEMU approach; years for native port. 49 + 50 + --- 51 + 52 + ## 8.2 — GUI Application Testing 53 + 54 + **What**: Run macOS GUI applications inside Darling on Linux with enough fidelity 55 + for automated screenshot-based testing. 56 + 57 + **Why it matters**: Many Nixpkgs Darwin packages include GUI components (e.g., 58 + Emacs with Cocoa frontend, various `.app` bundles). Currently there's no way to 59 + test these on Linux. 60 + 61 + **Current state**: Darling has partial Cocoa/AppKit support via 62 + [Cocotron](https://github.com/darlinghq/darling-cocotron), which translates 63 + Cocoa drawing calls to X11. Basic windows can be created but most applications 64 + crash or render incorrectly. 65 + 66 + **Approach**: 67 + 68 + 1. **Headless rendering**: Run Darling with a virtual X11 server (`Xvfb`) or 69 + Wayland compositor (`wlheadless`). Cocotron renders to the virtual display. 70 + 71 + 2. **Screenshot capture**: After launching an app, capture the framebuffer and 72 + compare against reference screenshots using image comparison tools (e.g., 73 + `perceptualdiff`, `pixelmatch`). 74 + 75 + 3. **Accessibility-based testing**: If Darling implements enough of the 76 + Accessibility framework, use it for UI testing without screenshots (more 77 + robust to rendering differences). 78 + 79 + **Example test workflow**: 80 + 81 + ```bash 82 + # Start Xvfb 83 + Xvfb :99 -screen 0 1920x1080x24 & 84 + export DISPLAY=:99 85 + 86 + # Launch a Cocoa app inside Darling 87 + darling shell open -a /Applications/TextEdit.app & 88 + 89 + # Wait for window to appear 90 + sleep 5 91 + 92 + # Capture screenshot 93 + import -window root /tmp/screenshot.png 94 + 95 + # Compare against reference 96 + perceptualdiff /tmp/screenshot.png tests/references/textedit.png 97 + ``` 98 + 99 + **Blockers**: 100 + - Cocotron's X11 backend needs significant work for modern Cocoa APIs. 101 + - Core Animation, Metal, and modern AppKit features are unimplemented. 102 + - Font rendering differences between macOS (Core Text) and Linux (FreeType) will 103 + cause pixel-level mismatches — need fuzzy comparison. 104 + 105 + **Effort**: 3–12 months for basic "does the window open and look roughly right" 106 + testing. Much longer for full GUI fidelity. 107 + 108 + --- 109 + 110 + ## 8.3 — Nix Flake Integration Library 111 + 112 + **What**: A Nix library function (`buildDarwinWithDarling`) that lets any flake 113 + build Darwin packages using Darling, without the user needing to set up a remote 114 + builder. 115 + 116 + **Why it matters**: The remote builder approach (Phase 7) requires system-level 117 + NixOS configuration. A flake-level library would make Darwin-on-Linux accessible 118 + to any Nix user, even those not running NixOS. 119 + 120 + **Design**: 121 + 122 + ```nix 123 + # In any project's flake.nix: 124 + { 125 + inputs = { 126 + nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable"; 127 + darling-nix.url = "github:user/darling-nix"; 128 + }; 129 + 130 + outputs = { self, nixpkgs, darling-nix }: { 131 + packages.x86_64-linux.hello-darwin = 132 + darling-nix.lib.buildDarwinWithDarling { 133 + inherit nixpkgs; 134 + # Standard mkDerivation arguments: 135 + pname = "hello"; 136 + version = "2.12.1"; 137 + src = ./. ; 138 + buildInputs = [ ]; 139 + # Darling-specific options: 140 + darlingPrefix = "~/.darling"; # optional 141 + shareStore = true; # optional 142 + }; 143 + }; 144 + } 145 + ``` 146 + 147 + **Implementation sketch**: 148 + 149 + The `buildDarwinWithDarling` function would: 150 + 151 + 1. Build the derivation specification (`.drv` file) using Nixpkgs' Darwin stdenv. 152 + 2. Wrap the build invocation in a `darling shell` call. 153 + 3. Handle store path management (shared or copied). 154 + 4. Return the output path as a normal Nix derivation result. 155 + 156 + **Challenges**: 157 + 158 + - This requires Darling to be runnable inside a Nix sandbox (needs namespace 159 + capabilities). May need `__noChroot = true` or a fixed-output derivation 160 + wrapper. 161 + - Must handle the bootstrap problem: the Darling binary itself needs to be built 162 + for Linux before it can be used to build Darwin packages. 163 + - Nix's build sandbox on Linux may conflict with Darling's namespace usage. 164 + 165 + **Alternative**: Instead of embedding Darling in the build, provide a flake that 166 + sets up the remote builder and let users `nix build --system x86_64-darwin` as 167 + usual. This is simpler and avoids the sandbox-within-sandbox issues. 168 + 169 + **Effort**: 2–4 months for the library; ongoing maintenance as Nixpkgs evolves. 170 + 171 + --- 172 + 173 + ## 8.4 — Upstream Contributions 174 + 175 + **What**: Push all syscall fixes, sandbox stubs, and compatibility improvements 176 + back to the [upstream Darling project](https://github.com/darlinghq/darling). 177 + 178 + **Why it matters**: Maintaining a fork is expensive. Upstream contributions 179 + benefit the entire Darling community and reduce our maintenance burden. 180 + 181 + **Strategy**: 182 + 183 + 1. **Keep changes modular**: Each syscall fix should be a self-contained commit 184 + with a clear description and test case. This makes upstream review easier. 185 + 186 + 2. **Separate Nix-specific changes**: Things like the `sandbox-exec` stub, 187 + Directory Services stubs, and the NixOS module should be kept in our fork/ 188 + overlay. They're useful for the Nix use case but may not align with 189 + upstream's goals. 190 + 191 + 3. **Coordinate with upstream**: Open issues/discussions on the Darling GitHub 192 + before submitting large changes. The Darling team may have opinions on 193 + implementation approaches (e.g., they may prefer a different `setattrlist` 194 + implementation than what we propose). 195 + 196 + 4. **Contribute tests**: Upstream Darling has minimal testing. Contributing our 197 + syscall regression tests (Phase 6.4) would be valuable even without the 198 + corresponding fixes. 199 + 200 + **Candidates for upstreaming**: 201 + 202 + | Change | Upstream Value | Nix-Specific? | 203 + |---|---|---| 204 + | `setattrlist` / `fsetattrlist` implementation | High — many programs need this | No | 205 + | `renameatx_np` (syscall 488) implementation | High — modern coreutils need this | No | 206 + | `utimensat` fixes | High — affects `touch` and many tools | No | 207 + | `clonefile` stub (returns `ENOTSUP`) | Medium — graceful degradation | No | 208 + | `getentropy` mapping to `getrandom` | Medium — security-related programs need this | No | 209 + | macOS version bump (10.15 → 11.0) | High — unblocks modern software | No | 210 + | `sandbox-exec` stub | Medium — useful but opinionated | Somewhat | 211 + | `sandbox_init` errorbuf fix | Low — cosmetic | No | 212 + | Directory Services stubs | Low — very Nix-specific | Yes | 213 + | NixOS module | None — Nix ecosystem only | Yes | 214 + 215 + **Effort**: Ongoing; each upstream PR takes 1–4 weeks including review cycles. 216 + 217 + --- 218 + 219 + ## 8.5 — macOS SDK Management 220 + 221 + **What**: Automate downloading, unpacking, and managing macOS SDKs inside the 222 + Darling prefix via Nix derivations. 223 + 224 + **Why it matters**: Building Darwin software requires Apple's SDK headers and 225 + frameworks. Currently, users must manually download Xcode or the Command Line 226 + Tools and install them. This is a friction point and a licensing grey area. 227 + 228 + **Approach**: 229 + 230 + 1. **Use Nixpkgs' existing SDK infrastructure**: Nixpkgs already packages macOS 231 + SDKs (e.g., `apple-sdk_15`, `apple-sdk_14`). These are available as Nix 232 + derivations and can be installed into the Darling prefix. 233 + 234 + 2. **Automatic SDK installation**: The Darling builder setup (Phase 7 NixOS 235 + module) should automatically install the appropriate SDK into the prefix: 236 + ```nix 237 + services.darling-builder.sdk = pkgs.darwin.apple_sdk_15; 238 + ``` 239 + 240 + 3. **SDK version selection**: Allow users to choose which SDK version to use. 241 + Different Nixpkgs branches may require different SDK versions. 242 + 243 + **Licensing considerations**: 244 + 245 + - Apple's Xcode license allows use on Apple hardware. Using Apple's SDK headers 246 + on Linux (via Darling) is a legal grey area. 247 + - Nixpkgs' SDK packages contain only headers and `.tbd` stub files (not actual 248 + binaries), which may be covered by fair use for interoperability purposes. 249 + - Darling itself ships significant Apple-derived open-source code under APSL. 250 + - **Recommendation**: Document the licensing situation clearly. Do not distribute 251 + Apple proprietary binaries. Use open-source headers where possible and let 252 + users supply their own SDK if needed. 253 + 254 + **Effort**: 2–4 weeks for the Nix integration; legal review is separate. 255 + 256 + --- 257 + 258 + ## 8.6 — Binary Cache for `x86_64-darwin` 259 + 260 + **What**: Run a Darling-based build farm (Hydra, Garnix, or custom) that 261 + continuously builds `x86_64-darwin` packages and populates a public binary 262 + cache. 263 + 264 + **Why it matters**: If Darling can reliably build Darwin packages, we can provide 265 + a community binary cache that eliminates the need for Apple hardware for most 266 + users. Even partial coverage (the top 1000 most-used packages) would be 267 + enormously valuable. 268 + 269 + **Architecture**: 270 + 271 + ``` 272 + ┌──────────────────────────────────────────────┐ 273 + │ Hydra / Build Coordinator (NixOS) │ 274 + │ jobset: nixpkgs x86_64-darwin │ 275 + │ │ 276 + │ ┌────────────────────────────────────────┐ │ 277 + │ │ Builder 1: NixOS + Darling │ │ 278 + │ │ services.darling-builder.enable │ │ 279 + │ │ maxJobs = 8 │ │ 280 + │ └────────────────────────────────────────┘ │ 281 + │ ┌────────────────────────────────────────┐ │ 282 + │ │ Builder 2: NixOS + Darling │ │ 283 + │ │ ... │ │ 284 + │ └────────────────────────────────────────┘ │ 285 + │ │ 286 + │ → pushes NARs to: darling-cache.example.org │ 287 + └──────────────────────────────────────────────┘ 288 + ``` 289 + 290 + **Users add the cache**: 291 + 292 + ```nix 293 + nix.settings = { 294 + substituters = [ "https://darling-cache.example.org" ]; 295 + trusted-public-keys = [ "darling-cache.example.org-1:AAAA..." ]; 296 + }; 297 + ``` 298 + 299 + **Challenges**: 300 + 301 + - **Reproducibility**: Builds inside Darling may not produce bit-for-bit 302 + identical outputs to builds on real macOS. This means the cache serves 303 + "Darling-built" packages that might differ from the official `cache.nixos.org` 304 + Darwin packages. Users need to understand this. 305 + 306 + - **Coverage**: Not all packages will build successfully inside Darling. The 307 + cache must gracefully handle partial coverage — users fall back to building 308 + locally (or on real macOS) for packages that aren't cached. 309 + 310 + - **Maintenance**: A build farm requires ongoing infrastructure maintenance, 311 + monitoring, and storage management. 312 + 313 + - **Trust**: Users must trust the cache operator. Use Nix's content-addressing 314 + and signing to provide integrity guarantees. 315 + 316 + **Effort**: 1–3 months to set up the infrastructure; ongoing maintenance. 317 + 318 + --- 319 + 320 + ## 8.7 — Build Reproducibility Verification 321 + 322 + **What**: Ensure that derivation outputs built inside Darling are as close to 323 + bit-for-bit identical as possible to those built on real macOS. 324 + 325 + **Why it matters**: If Darling-built packages differ from real macOS-built 326 + packages, it undermines the value of the compatibility layer. Ideally, a package 327 + built inside Darling should be indistinguishable from one built on real macOS. 328 + 329 + **Approach**: 330 + 331 + 1. **Identify sources of non-determinism**: 332 + - Timestamps embedded in binaries (Mach-O headers, `__DATA` segments). 333 + - Hostname / username embedded in build artifacts. 334 + - Random data (UUIDs, build IDs) that differ between builds. 335 + - Filesystem ordering differences (`readdir` order). 336 + - Floating-point rounding differences (unlikely but possible if Darling's FPU 337 + emulation differs). 338 + 339 + 2. **Compare build outputs**: 340 + ```bash 341 + # Build on real macOS 342 + real_output=$(nix-build '<nixpkgs>' -A hello --system x86_64-darwin) 343 + 344 + # Build inside Darling 345 + darling_output=$(darling-nix nix-build '<nixpkgs>' -A hello --system x86_64-darwin) 346 + 347 + # Compare 348 + diffoscope "$real_output" "$darling_output" 349 + ``` 350 + 351 + 3. **Fix divergences**: For each difference, determine if it's a Darling bug 352 + (fix it) or inherent non-determinism (document it). 353 + 354 + 4. **Content-addressed derivations**: Nix's experimental content-addressed (CA) 355 + derivation mode hashes outputs by content rather than by input. This means 356 + two builds that produce identical content (regardless of where they were 357 + built) share the same store path. Push for CA derivation support to make 358 + Darling-built and macOS-built outputs interchangeable. 359 + 360 + **Effort**: Ongoing; this is a continuous improvement process rather than a 361 + one-time task. 362 + 363 + --- 364 + 365 + ## 8.8 — Container / VM Image Distribution 366 + 367 + **What**: Distribute pre-configured Darling+Nix environments as OCI container 368 + images or VM images for easy adoption. 369 + 370 + **Why it matters**: Not everyone uses NixOS. A Docker/Podman image or a QEMU VM 371 + image with Darling+Nix pre-installed would make Darwin-on-Linux accessible to 372 + the broader developer community. 373 + 374 + **Deliverables**: 375 + 376 + 1. **OCI image** (`ghcr.io/user/darling-nix:latest`): 377 + ```dockerfile 378 + FROM nixos/nix:latest 379 + RUN nix build github:user/darling-nix#darling 380 + RUN /path/to/scripts/install-nix-in-darling.sh 381 + ENTRYPOINT ["darling-nix"] 382 + ``` 383 + Requires: Docker-in-Docker or privileged mode for namespaces. 384 + 385 + 2. **NixOS VM image**: A QEMU qcow2 image built with `nixos-generators` that 386 + includes the `darling-builder` NixOS module pre-configured. 387 + 388 + 3. **GitHub Codespaces / Gitpod integration**: A `.devcontainer.json` that sets 389 + up a development environment with Darling+Nix for cloud-based development. 390 + 391 + **Effort**: 2–4 weeks per distribution format. 392 + 393 + --- 394 + 395 + ## Summary: Stretch Goal Prioritization 396 + 397 + If resources allow work beyond Phase 7, prioritize in this order: 398 + 399 + 1. **8.4 — Upstream contributions**: Lowest effort, highest community value. 400 + 2. **8.5 — SDK management**: Directly improves usability of Phases 4–7. 401 + 3. **8.7 — Reproducibility**: Builds confidence in Darling-built packages. 402 + 4. **8.6 — Binary cache**: High value but requires infrastructure commitment. 403 + 5. **8.3 — Flake library**: Nice developer experience but requires solving hard 404 + sandbox-in-sandbox problems. 405 + 6. **8.8 — Container images**: Broadens the audience beyond NixOS users. 406 + 7. **8.2 — GUI testing**: Niche but uniquely valuable; depends on Cocotron 407 + maturity. 408 + 8. **8.1 — `aarch64-darwin`**: Most impactful long-term, but the most work. 409 + 410 + --- 411 + 412 + *[← Phase 7 — Remote Builder](./09-phase7-remote-builder.md) | [Architecture →](./11-architecture.md)*
+403
plan/11-architecture.md
··· 1 + # Architecture & Key Technical Decisions 2 + 3 + This document describes the high-level system architecture and records the 4 + rationale behind major technical decisions. It serves as a reference for 5 + contributors who need to understand *why* things are designed the way they are, 6 + not just *what* to build. 7 + 8 + --- 9 + 10 + ## System Architecture 11 + 12 + ### Overview 13 + 14 + ``` 15 + ┌──────────────────────────────────────────────────────────────────────┐ 16 + │ Linux Host (NixOS) │ 17 + │ │ 18 + │ ┌────────────────────────────────────────────────────────────────┐ │ 19 + │ │ Host Nix Daemon │ │ 20 + │ │ ┌──────────────────────────────────────────────────────────┐ │ │ 21 + │ │ │ nix.buildMachines = [{ │ │ │ 22 + │ │ │ hostName = "127.0.0.1"; port = 2222; │ │ │ 23 + │ │ │ systems = ["x86_64-darwin"]; │ │ │ 24 + │ │ │ }] │ │ │ 25 + │ │ └────────────────────┬─────────────────────────────────────┘ │ │ 26 + │ └───────────────────────┼────────────────────────────────────────┘ │ 27 + │ │ SSH / darling-exec │ 28 + │ ┌───────────────────────▼────────────────────────────────────────┐ │ 29 + │ │ Darling Container (overlayfs prefix at ~/.darling) │ │ 30 + │ │ │ │ 31 + │ │ ┌──────────────────────────────────────────────────────────┐ │ │ 32 + │ │ │ darlingserver │ │ │ 33 + │ │ │ • Translates Darwin/XNU syscalls → Linux syscalls │ │ │ 34 + │ │ │ • Manages Mach-O loading via mldr + dyld │ │ │ 35 + │ │ │ • Provides namespace isolation (mount, PID, user) │ │ │ 36 + │ │ └──────────────────────────────────────────────────────────┘ │ │ 37 + │ │ │ │ 38 + │ │ ┌─────────────────┐ ┌──────────────────────────────────┐ │ │ 39 + │ │ │ Darwin Userland │ │ Nix (Darwin build) │ │ │ 40 + │ │ │ • dyld │ │ • nix / nix-daemon │ │ │ 41 + │ │ │ • libSystem │ │ • nix-build / nix-store │ │ │ 42 + │ │ │ • libc │ │ • sandbox-exec stub │ │ │ 43 + │ │ │ • CoreFoundation│ │ • curl, bash, coreutils │ │ │ 44 + │ │ │ • libdispatch │ │ • clang, ld64 (from stdenv) │ │ │ 45 + │ │ │ • Obj-C runtime │ │ • Darwin stdenv build machinery │ │ │ 46 + │ │ └─────────────────┘ └──────────────────────────────────┘ │ │ 47 + │ │ │ │ 48 + │ │ /nix/store ──symlink──▶ /Volumes/SystemRoot/nix/store │ │ 49 + │ │ /dev, /proc ──mount──▶ host kernel interfaces │ │ 50 + │ └────────────────────────────────────────────────────────────────┘ │ 51 + │ │ 52 + │ /nix/store (shared filesystem — single source of truth) │ 53 + │ │ 54 + └──────────────────────────────────────────────────────────────────────┘ 55 + ``` 56 + 57 + ### Component Responsibilities 58 + 59 + | Component | Role | Location | 60 + |---|---|---| 61 + | **darlingserver** | Userspace syscall translator. Intercepts Mach/BSD traps from Darwin binaries and translates them to Linux equivalents. Manages the container namespace. | `src/external/darlingserver/` (submodule) | 62 + | **mldr** | Mach-O loader. Loads Darwin Mach-O executables on Linux, sets up the process image, and hands off to `dyld`. | `src/libelfloader/` | 63 + | **dyld** | Apple's dynamic linker. Resolves `@rpath`, `@loader_path`, loads `.dylib` dependencies. Runs inside the translated environment. | `src/external/dyld/` | 64 + | **libSystem / libc** | Darwin's standard C library. Provides POSIX wrappers (`lchflags`, `setattrlist`, `posix_spawn`, etc.) that ultimately invoke darlingserver-translated syscalls. | `src/external/libc/`, `src/external/libsystem/` | 65 + | **sandbox-exec stub** | Passthrough shim replacing Apple's `sandbox-exec`. Ignores sandbox profiles and directly `exec`s the builder command. | `src/sandbox-exec/` (to be created, Phase 2) | 66 + | **Nix (Darwin)** | The Nix package manager compiled for `x86_64-darwin`, fetched from the official binary cache. Runs inside Darling as a Darwin process. | `/nix/store/...-nix-*/` inside the prefix | 67 + | **Host Nix Daemon** | The Linux-native Nix daemon that orchestrates builds. Offloads `x86_64-darwin` builds to the Darling instance via SSH or a custom build hook. | Standard NixOS `nix-daemon.service` | 68 + | **Darling prefix** | An overlayfs-backed directory (`~/.darling`) that provides a macOS-like filesystem hierarchy. System files are read-only from the Darling installation; user/build files are writable in the upper layer. | `~/.darling/` (runtime) | 69 + 70 + ### Data Flow: Building a Darwin Derivation 71 + 72 + ``` 73 + 1. User: nix build .#myPkg --system x86_64-darwin 74 + 75 + 2. Host Nix Daemon: Identifies x86_64-darwin → selects Darling builder 76 + 77 + 3. SSH transport: Connects to sshd inside Darling (port 2222) 78 + 79 + 4. Darling nix-daemon: Receives build request 80 + 81 + 5. Nix build setup: Creates /tmp/nix-build-myPkg.drv-0/ 82 + Writes .sandbox.sb profile 83 + 84 + 6. sandbox-exec stub: Ignores profile, exec's /bin/bash 85 + 86 + 7. bash builder: Sources $stdenv/setup 87 + Runs unpack → configure → build → install → fixup 88 + 89 + 8. Syscall translation: Every Darwin syscall (open, stat, mmap, posix_spawn, 90 + lchflags, renameatx_np, ...) goes through darlingserver 91 + and becomes the Linux equivalent 92 + 93 + 9. Build output: Written to /nix/store/...-myPkg 94 + 95 + 10. Store registration: nix-daemon registers the path in SQLite 96 + 97 + 11. Shared store: Output is immediately visible to the host 98 + (shared /nix/store via bind mount / symlink) 99 + 100 + 12. Host Nix Daemon: Marks the build as complete, returns result to user 101 + ``` 102 + 103 + --- 104 + 105 + ## Key Technical Decisions 106 + 107 + ### Decision 1: Syscall Implementation Depth 108 + 109 + **Decision**: Implement syscalls to the minimum depth required for Nix, not for 110 + general macOS compatibility. 111 + 112 + **Rationale**: Full macOS API coverage is a multi-year effort (and the upstream 113 + Darling project's ongoing goal). We should be surgical about what we implement. 114 + For example: 115 + 116 + - `setattrlist` only needs to handle `ATTR_CMN_FLAGS` for clearing 117 + `UF_IMMUTABLE`. We don't need full Finder-info, resource-fork, or ACL 118 + support through this API. 119 + - `clonefile` can return `ENOTSUP` — Nix gracefully falls back to regular copy. 120 + - `sandbox_init` can return success with a NULL error buffer — Darling's 121 + namespace isolation is already sufficient. 122 + 123 + **Trade-off**: Some non-Nix Darwin programs may still fail. That's acceptable — 124 + this project's scope is Nix support, not universal macOS compatibility. 125 + 126 + **How this affects contributors**: When implementing a syscall, always check 127 + what the *caller* actually needs. Read the Nix source (or whatever Nix-ecosystem 128 + program is calling it) and implement only what's required to make that caller 129 + succeed. Document the scope of the implementation in code comments. 130 + 131 + --- 132 + 133 + ### Decision 2: Sandbox Strategy 134 + 135 + **Decision**: Start with a `sandbox-exec` stub that passes through to `exec`. 136 + Do NOT attempt to implement Apple's Sandbox Profile Language initially. 137 + 138 + **Rationale**: Nix's sandbox on Darwin is defense-in-depth. The macOS sandbox 139 + (`sandbox-exec` + `.sb` profiles) restricts file access, network access, and 140 + process operations during builds. Inside Darling, we already have: 141 + 142 + 1. **Linux namespace isolation**: The Darling container uses mount namespaces 143 + (overlayfs), PID namespaces, and optionally network namespaces. This provides 144 + equivalent-or-stronger isolation to macOS's sandbox for build purposes. 145 + 146 + 2. **Nix's own isolation**: Nix controls `$PATH`, `$HOME`, `$TMPDIR`, and other 147 + environment variables. The build environment is intentionally spartan. The 148 + macOS sandbox adds a second layer, but its absence doesn't fundamentally 149 + compromise build isolation. 150 + 151 + 3. **No untrusted code**: In the Nix builder context, the code being executed 152 + comes from derivations the user has chosen to build. The sandbox prevents 153 + accidental side effects (e.g., a build script accidentally writing to `/usr`), 154 + not malicious code execution. 155 + 156 + **When to revisit**: If Darling is ever used to run arbitrary untrusted macOS 157 + software (not just Nix builds), proper sandbox support becomes important. See 158 + [Phase 2, task 2.4](./04-phase2-sandbox.md#24--stretch-basic-sandbox-profile-language-parsing) 159 + for the stretch-goal design. 160 + 161 + --- 162 + 163 + ### Decision 3: Single-User vs. Multi-User Nix 164 + 165 + **Decision**: Target single-user mode first (Phase 3), add multi-user later 166 + (Phase 5). 167 + 168 + **Rationale**: Single-user mode has far fewer moving parts: 169 + 170 + | Aspect | Single-User | Multi-User | 171 + |---|---|---| 172 + | Daemon required | No | Yes | 173 + | Build users required | No | Yes (30+ users) | 174 + | Directory Services required | No | Yes (`dseditgroup`, `sysadminctl`) | 175 + | `launchd` integration | No | Yes | 176 + | `setuid` / privilege separation | No | Yes | 177 + | Concurrent builds | No | Yes | 178 + | Suitable for development/testing | Yes | Yes | 179 + | Suitable for production builders | Maybe | Yes | 180 + 181 + Single-user mode is sufficient for the MVP (Phases 0–4). It lets us validate 182 + that Nix works inside Darling without solving the much harder problems of user 183 + management and privilege separation inside a namespace-based container. 184 + 185 + --- 186 + 187 + ### Decision 4: Shared vs. Separate Nix Store 188 + 189 + **Decision**: Share the host's `/nix/store` with the Darling prefix via the 190 + existing `/Volumes/SystemRoot` mount. 191 + 192 + **Rationale**: 193 + 194 + - **Avoids duplicating store contents.** A typical Nix closure for building 195 + Darwin packages is 500 MB–2 GB. Duplicating this inside the Darling prefix 196 + wastes disk and slows down builds (SSH copy overhead). 197 + 198 + - **Host Nix daemon can manage garbage collection.** With a shared store, there's 199 + a single GC root set. Without sharing, the Darling-side store accumulates 200 + garbage that's invisible to the host's `nix-collect-garbage`. 201 + 202 + - **Darwin build outputs are immediately available on the host.** No need to 203 + copy results back after a build completes — the output is already in the 204 + shared `/nix/store`. 205 + 206 + **Implementation**: 207 + 208 + ``` 209 + # Inside the Darling prefix: 210 + /nix → /Volumes/SystemRoot/nix (symlink) 211 + → /nix/store (host's store, shared) 212 + → /nix/var (Darling-local state, NOT shared) 213 + ``` 214 + 215 + The store content (`/nix/store`) is shared, but the state 216 + (`/nix/var/nix/db/db.sqlite`, `/nix/var/nix/daemon-socket/`, etc.) is 217 + Darling-local. This prevents database conflicts between the host and Darling 218 + Nix instances. 219 + 220 + **Caveat**: Darling's overlayfs may interfere with writes to the shared store. 221 + If so, use a direct bind mount (`mount --bind /nix/store ~/.darling/nix/store`) 222 + during prefix initialization, bypassing the overlayfs upper layer for the store 223 + directory. Test this during Phase 3. 224 + 225 + **Fallback**: If shared store causes issues (permission mismatches, locking 226 + conflicts, overlayfs quirks), fall back to a fully separate store inside the 227 + Darling prefix. This is simpler but slower (requires SSH-based store path 228 + transfer for the remote builder in Phase 7). 229 + 230 + --- 231 + 232 + ### Decision 5: macOS Version Target 233 + 234 + **Decision**: Target macOS 11.0 (Big Sur) as the emulated version. 235 + 236 + **Rationale**: 237 + 238 + - Darling's `CMakeLists.txt` already sets `CMAKE_OSX_DEPLOYMENT_TARGET` to 11.0. 239 + - Nixpkgs' Darwin stdenv targets macOS 11.0+ for `x86_64-darwin` builds. 240 + - The official Nix binary cache (`cache.nixos.org`) serves binaries built with 241 + `-mmacosx-version-min=11.0` or higher. 242 + - macOS 10.15 (Catalina, which Darling currently reports at runtime) is past 243 + end-of-life and increasingly unsupported by modern software. 244 + 245 + **What this means**: 246 + 247 + - `sw_vers` inside Darling should report `ProductVersion: 11.0`. 248 + - `__MAC_OS_X_VERSION_MIN_REQUIRED` should be `110000` (Big Sur). 249 + - Any `@available(macOS 11.0, *)` checks in Darling's libraries should evaluate 250 + to true. 251 + - APIs introduced in Big Sur (e.g., `os_log` improvements, certain 252 + `posix_spawn` attributes) should be available or gracefully stubbed. 253 + 254 + **Risk**: Bumping the version may expose new code paths in Darling's libraries 255 + that call unimplemented APIs. This is acceptable — it surfaces real issues rather 256 + than papering over them with an artificially old version number. 257 + 258 + --- 259 + 260 + ### Decision 6: CI Strategy 261 + 262 + **Decision**: Use NixOS VM tests as the primary CI mechanism, with lighter-weight 263 + build-only tests for fast feedback. 264 + 265 + **Rationale**: Darling requires Linux namespace support (user namespaces, 266 + overlayfs, mount namespaces) that isn't available inside a standard container or 267 + Nix build sandbox. NixOS VM tests provide a full Linux kernel, which guarantees 268 + the necessary capabilities. 269 + 270 + **Trade-off**: VM tests are slow (5–30 minutes). We mitigate this with: 271 + 272 + 1. A fast "build smoke test" that just builds Darling (no VM, runs in Nix 273 + sandbox). Catches compilation regressions in ~10 minutes. 274 + 2. Binary caching (Cachix) so that the Darling build itself is rarely rebuilt 275 + from scratch in CI. 276 + 3. Parallelised test jobs — the build test and VM test run concurrently. 277 + 4. Path-based CI triggers — documentation-only changes skip the VM test. 278 + 279 + See [Phase 6](./08-phase6-ci.md) for full CI design. 280 + 281 + --- 282 + 283 + ### Decision 7: SSH vs. Custom Build Hook for Remote Builds 284 + 285 + **Decision**: Use SSH-based remote builds as the primary mechanism. A custom 286 + build hook is a secondary optimisation. 287 + 288 + **Rationale**: 289 + 290 + | Aspect | SSH Remote Builder | Custom Build Hook | 291 + |---|---|---| 292 + | Protocol maturity | Battle-tested, standard Nix feature | Custom, must handle edge cases | 293 + | Setup complexity | Moderate (sshd + keys) | Low (single script) | 294 + | Store transfer | Built-in (SSH or shared mount) | Must be implemented | 295 + | Build log streaming | Built-in | Must be implemented | 296 + | Error handling | Built-in | Must be implemented | 297 + | Nix version coupling | Low (protocol is stable) | High (hook interface can change) | 298 + 299 + SSH remote builds are the standard way to offload Nix builds to another machine. 300 + Even though Darling runs on the same host, treating it as a "remote" builder via 301 + SSH reuses all of Nix's existing remote-build infrastructure — derivation 302 + closure transfer, build log streaming, result retrieval, and error handling. 303 + 304 + The custom build hook (calling `darling shell` directly) avoids SSH overhead and 305 + is simpler to set up, but it requires reimplementing protocol details that SSH 306 + remote builds handle automatically. It's better suited as an optimisation after 307 + the SSH approach is proven. 308 + 309 + See [Phase 7](./09-phase7-remote-builder.md) for both approaches. 310 + 311 + --- 312 + 313 + ## Subsystem Map 314 + 315 + A quick reference for where to find things in the Darling source tree: 316 + 317 + ``` 318 + darling-nix/ 319 + ├── plan/ # This planning documentation 320 + ├── src/ 321 + │ ├── sandbox/ # sandbox_init, sandbox_check, etc. (stubs) 322 + │ │ └── sandbox.c # ← Fix errorbuf handling (Phase 2.2) 323 + │ ├── libsandbox/ # libsandbox.1.dylib shim 324 + │ ├── diskutil/ # diskutil shell script (eject only) 325 + │ ├── duct/src/ # Minimal shims (acl, dns_sd, os_log) 326 + │ ├── launchd/ # launchd + launchctl implementation 327 + │ │ ├── src/core.c # posix_spawn usage for job management 328 + │ │ └── support/launchctl.c # launchctl CLI, uses lchflags 329 + │ ├── external/ 330 + │ │ ├── darlingserver/ # ← Main syscall translation (submodule) 331 + │ │ ├── libc/ # Darwin libc (lchflags, setattrlist wrappers) 332 + │ │ ├── xnu/ # XNU kernel headers + libsystem_kernel 333 + │ │ │ └── darling/src/libsystem_kernel/ # BSD syscall stubs 334 + │ │ ├── dyld/ # Apple's dynamic linker 335 + │ │ ├── libsystem/ # libSystem umbrella library 336 + │ │ ├── corefoundation/ # CoreFoundation framework 337 + │ │ ├── foundation/ # Foundation framework (NSFileManager, etc.) 338 + │ │ ├── libdispatch/ # Grand Central Dispatch 339 + │ │ ├── objc4/runtime/ # Objective-C runtime 340 + │ │ ├── openssh/ # OpenSSH (sshd for remote builder) 341 + │ │ ├── bash/ # Darling's built-in bash 342 + │ │ ├── cctools-port/ # ld64, ar, ranlib (Apple linker tools) 343 + │ │ ├── swift/ # Swift runtime libraries 344 + │ │ └── ... # ~100 more submodules 345 + │ ├── native/ # Linux-native wrappers (wraps ELF libs for Darwin use) 346 + │ ├── frameworks/ # macOS frameworks (AppKit, CoreGraphics, etc.) 347 + │ └── private-frameworks/ # Private frameworks (Bom, etc.) 348 + ├── CMakeLists.txt # Top-level build configuration 349 + ├── .gitmodules # Submodule definitions (~100 entries) 350 + ├── .github/workflows/ # CI (currently Debian-only) 351 + └── tools/ # Build/install utilities 352 + ``` 353 + 354 + ### Where Syscall Changes Go 355 + 356 + ``` 357 + User code (e.g. Nix) calls lchflags() 358 + 359 + 360 + src/external/libc/ ← Darwin libc wrapper: translates to setattrlist() 361 + 362 + 363 + src/external/xnu/darling/src/libsystem_kernel/ ← BSD syscall stub: 364 + packages args into a trap 365 + 366 + 367 + src/external/darlingserver/ ← Handles the trap on the Linux side: 368 + translates setattrlist → ioctl/utimensat/etc. 369 + 370 + 371 + Linux kernel ← Actual filesystem operation 372 + ``` 373 + 374 + Understanding this call chain is essential for debugging. If `lchflags` fails: 375 + 376 + 1. Is the libc wrapper calling the right syscall number? → Check `src/external/libc/` 377 + 2. Is the syscall number wired in the kernel trap table? → Check `src/external/xnu/.../libsystem_kernel/` 378 + 3. Is darlingserver handling it? → Check `src/external/darlingserver/` 379 + 4. Is the Linux translation correct? → `strace` on the darlingserver process 380 + 381 + --- 382 + 383 + ## Glossary 384 + 385 + | Term | Meaning | 386 + |---|---| 387 + | **Darling prefix** (DPREFIX) | The overlayfs-backed directory (`~/.darling`) that provides the macOS filesystem hierarchy. Analogous to Wine's WINEPREFIX. | 388 + | **darlingserver** | The userspace process that handles Darwin syscall translation. Replaces the earlier LKM (Linux Kernel Module) approach. | 389 + | **mldr** | Mach-O loader — the ELF-side binary that loads a Mach-O executable and sets up the Darling execution environment. | 390 + | **dyld** | Apple's dynamic linker. Handles `@rpath`, `@loader_path`, and `.dylib` loading within the Darwin process. | 391 + | **Mach-O** | The executable format used by macOS (analogous to ELF on Linux). | 392 + | **libSystem** | macOS's umbrella system library (analogous to `libc.so` on Linux but includes more). Contains libc, libm, libpthread, etc. | 393 + | **stdenv** | Nix's standard build environment. The Darwin stdenv provides clang, ld64, Apple SDK headers, and shell scripts for the build phases. | 394 + | **NAR** | Nix Archive — Nix's serialisation format for store paths. Used for binary substitution (downloading pre-built packages). | 395 + | **Binary substitution** | Downloading pre-built packages from a binary cache instead of building from source. Critical for performance inside Darling. | 396 + | **sandbox-exec** | macOS's command-line sandbox tool. Applies a Sandbox Profile (`.sb` file) before executing a command. Nix uses this for build isolation on Darwin. | 397 + | **SBPL** | Sandbox Profile Language — the Scheme-based DSL used in `.sb` files to define sandbox rules. | 398 + | **cctools** | Apple's binary tools suite (`ld64`, `ar`, `ranlib`, `otool`, `install_name_tool`). Darling uses the `cctools-port` fork that builds on Linux. | 399 + | **overlayfs** | Linux filesystem that layers a writable upper directory over a read-only lower directory. Darling uses this for prefixes so the base system is shared and user changes are isolated. | 400 + 401 + --- 402 + 403 + *[← Phase 8 — Stretch Goals](./10-phase8-stretch.md) | [Back to Plan Index](./README.md)*
+61
plan/README.md
··· 1 + # PLAN: Making Darling Fully Capable of Running Nix 2 + 3 + > **Goal**: Enable Darling (macOS compatibility layer for Linux) to run the Nix 4 + > package manager reliably, so that Linux machines can build, test, and 5 + > cross-compile `x86_64-darwin` Nix derivations — analogous to how Wine enables 6 + > building and testing Windows binaries on Linux. 7 + 8 + ## Plan Documents 9 + 10 + | Document | Description | 11 + |---|---| 12 + | [Background & Current State](./00-background.md) | Motivation, what works today, what doesn't | 13 + | [Known Blockers](./01-blockers.md) | Detailed analysis of each blocking issue with fix strategies | 14 + | [Phase 0 — Nix Packaging + DevShell](./02-phase0-packaging.md) | `flake.nix`, devShell, `.envrc`, NixOS module | 15 + | [Phase 1 — Core Syscall Fixes](./03-phase1-syscalls.md) | `setattrlist`, `renameatx_np`, `utimensat`, etc. | 16 + | [Phase 2 — Sandbox Stub](./04-phase2-sandbox.md) | `sandbox-exec` passthrough, sandbox API stubs | 17 + | [Phase 3 — Nix Installation](./05-phase3-nix-install.md) | Automated installer, verification, wrappers | 18 + | [Phase 4 — Derivation Building](./06-phase4-building.md) | Trivial derivations → stdenv → binary substitution | 19 + | [Phase 5 — Nix Daemon](./07-phase5-daemon.md) | Multi-user mode, Directory Services stubs, launchd | 20 + | [Phase 6 — CI & Testing](./08-phase6-ci.md) | NixOS VM tests, regression suite, GitHub Actions | 21 + | [Phase 7 — Remote Builder](./09-phase7-remote-builder.md) | Darling as a `nix.buildMachines` target | 22 + | [Phase 8 — Stretch Goals](./10-phase8-stretch.md) | `aarch64-darwin`, GUI testing, Hydra builder | 23 + | [Architecture](./11-architecture.md) | System diagram, key technical decisions | 24 + 25 + ## Priority & Effort Estimates 26 + 27 + | Phase | Priority | Effort | Depends On | 28 + |-------|----------|--------|------------| 29 + | Phase 0 — Nix packaging + devShell | P0 | S (1–2 weeks) | — | 30 + | Phase 1 — Syscall fixes | P0 | L (4–8 weeks) | Phase 0 | 31 + | Phase 2 — Sandbox stub | P0 | S (1 week) | — | 32 + | Phase 3 — Nix installation | P0 | M (2–3 weeks) | Phases 1, 2 | 33 + | Phase 4 — Derivation building | P1 | L (4–8 weeks) | Phase 3 | 34 + | Phase 5 — Nix daemon | P2 | M (2–4 weeks) | Phase 4 | 35 + | Phase 6 — CI/testing | P1 | M (2–3 weeks) | Phase 3 | 36 + | Phase 7 — Remote builder | P2 | L (4–8 weeks) | Phases 4, 5 | 37 + | Phase 8 — Stretch goals | P3 | XL (months) | Phase 7 | 38 + 39 + **Estimated time to MVP** (Phases 0–3): ~8–14 weeks of focused effort. 40 + 41 + **Estimated time to usable Darwin builder** (through Phase 7): ~6–12 months. 42 + 43 + ## How to Contribute 44 + 45 + 1. **Pick a task** from any phase document (earlier phases first). 46 + 2. **Check upstream** [Darling issues](https://github.com/darlinghq/darling/issues) for existing work. 47 + 3. **Write a minimal reproducer** — a small C program or shell command that demonstrates the bug inside `darling shell`. 48 + 4. **Fix it** in the appropriate subsystem (`darlingserver` for syscalls, `src/external/libc` for wrappers, `src/sandbox` for sandbox, etc.). 49 + 5. **Add a test** to the regression suite (see [Phase 6](./08-phase6-ci.md)). 50 + 6. **Submit a PR** to this repo, and consider upstreaming to `darlinghq/darling`. 51 + 52 + ## References 53 + 54 + - [Darling Project](https://www.darlinghq.org/) — upstream macOS compatibility layer 55 + - [Darling GitHub](https://github.com/darlinghq/darling) — upstream source 56 + - [nixie-dev/darling-nix](https://github.com/nixie-dev/darling-nix) — Nix overlay for Darling 57 + - [Nix All The Way Down](https://ersei.net/en/blog/nix-all-the-way-down) — blog post documenting Nix-in-Darling attempt 58 + - [Nix Darwin sandbox source](https://github.com/NixOS/nix/blob/master/src/libstore/platform/darwin.cc) — Nix's `sandbox-exec` invocation 59 + - [Apple `setattrlist` docs](https://developer.apple.com/documentation/kernel/1387673-setattrlist) 60 + - [Apple `renameatx_np` docs](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/renameatx_np.2.html) 61 + - [Darling Docs — Build Instructions](https://docs.darlinghq.org/build-instructions.html)