atproto relay implementation in zig zlay.waow.tech
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

add fiber GPF issue writeup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+65
+65
scripts/fiber_gpf_issue.md
··· 1 + # Io.Evented + ReleaseSafe: GPF in fiber.zig contextSwitch on x86_64-linux 2 + 3 + `Io.Evented` crashes with a general protection fault under ReleaseSafe on x86_64-linux. Debug, ReleaseFast, and ReleaseSmall all pass. Tested on unpatched zig — no modifications to the stdlib. 4 + 5 + ## Reproduction 6 + 7 + ```zig 8 + const std = @import("std"); 9 + const Io = std.Io; 10 + 11 + var evented: Io.Evented = undefined; 12 + 13 + var debug_threaded_io: Io.Threaded = undefined; 14 + pub const std_options_debug_threaded_io: ?*Io.Threaded = &debug_threaded_io; 15 + 16 + fn fiberReturn(_: Io) void {} 17 + 18 + pub fn main() !void { 19 + const allocator = std.heap.c_allocator; 20 + debug_threaded_io = Io.Threaded.init(allocator, .{}); 21 + try Io.Evented.init(&evented, allocator, .{}); 22 + const io = evented.io(); 23 + 24 + var f = try io.concurrent(fiberReturn, .{io}); 25 + io.sleep(Io.Duration.fromMilliseconds(10), .awake) catch {}; 26 + f.cancel(io); 27 + } 28 + ``` 29 + 30 + ``` 31 + $ zig build-exe -OReleaseSafe repro.zig -lc && ./repro 32 + General protection exception (no address available) 33 + lib/std/Io/fiber.zig:30:20: 0x1079589 in contextSwitch 34 + lib/std/Io/Uring.zig:1142:12: 0x109cc86 in mainIdle 35 + ``` 36 + 37 + | Mode | Result | 38 + |------|--------| 39 + | Debug | pass | 40 + | ReleaseFast | pass | 41 + | ReleaseSmall | pass | 42 + | ReleaseSafe | GPF | 43 + 44 + ## Environment 45 + 46 + - zig 0.16.0-dev.3059+42e33db9d, unmodified 47 + - x86_64-linux, kernel 6.8.0-101-generic (Debian bookworm, glibc) 48 + 49 + ## What we've observed 50 + 51 + The crash is at the inline asm in `fiber.zig` contextSwitch (x86_64 path, line 244). This gets inlined into `Uring.idle`. 52 + 53 + Comparing the disassembly of `Uring.idle` between ReleaseFast and ReleaseSafe, the difference we see is in how `%rsi` is set up before the inline asm block. The asm uses `"{rsi}" (s)` as an input constraint: 54 + 55 + **ReleaseFast** — there's a `lea -0x80(%rbp),%rsi` that loads the SwitchMessage address into `%rsi` before the asm block. 56 + 57 + **ReleaseSafe** — the corresponding `lea` appears to be absent. `%rsi` seems to hold a stale value from a prior function call (`Thread.current`, which is caller-saved). The ReleaseSafe prologue includes `__zig_probe_stack` and a stack canary load (`fs:0x28`) that aren't present in ReleaseFast. 58 + 59 + We're not certain this is the full picture — we may be misreading the disassembly or missing something about how the register constraint interacts with the surrounding code. But the crash is consistent and the repro is deterministic. 60 + 61 + ## Context 62 + 63 + We're building an AT Protocol relay ([zlay](https://tangled.sh/zzstoatzz.io/zlay)) that runs ~2,800 concurrent fibers on `Io.Evented`. We've been working around this by building with ReleaseFast, but recently a separate bug (a websocket off-by-one) manifested as a silent SIGSEGV under ReleaseFast for weeks. ReleaseSafe would have caught it immediately as a bounds-check panic with a stack trace. 64 + 65 + We understand Evented is experimental. Happy to provide more information or test patches.