zlay: zig 0.16 patches · by zzstoatzz.io

moving zlay to an evented backend
zlay: zig 0.16 patches 6w ago
41 lines 3.8 kB view raw
 1  what we patched and why
 2
 3  the core problem
 4
 5  zig 0.16 introduced a new unified Io abstraction. it has two backends: Threaded (one OS thread per concurrent task) and Evented (lightweight fibers on top of Linux's io_uring). the
 6  Evented backend is what makes zlay go from ~2,800 threads (one per PDS) down to ~35 threads. that's the whole reason to use it.
 7
 8  but when the zig stdlib shipped, the Evented backend had all TCP networking stubbed out. six functions — listen, accept, connect, send, read, write — all just returned error.NetworkDown.
 9   the Evented runtime could schedule fibers just fine, but couldn't do any network I/O through them. for a relay that's nothing but network I/O, that's a non-starter.
10
11  what the patch does
12
13  we implemented those six functions using io_uring opcodes. the idea is straightforward: instead of blocking a thread on a syscall, you submit a submission queue entry (SQE) to the
14  kernel's io_uring ring, yield the fiber, and when the kernel completes the operation it posts a completion queue entry (CQE) that wakes the fiber back up.
15
16  - listen — creates a socket, sets SO_REUSEADDR/SO_REUSEPORT, binds, and listens. bind and listen are done as regular syscalls (not io_uring) because the io_uring opcodes for those need
17  kernel 6.11+ and production runs kernel 6.1. bind/listen are instant anyway — no I/O wait.
18  - accept — submits IORING_OP_ACCEPT. the fiber sleeps until a client connects. the kernel hands back the new socket fd.
19  - connect — creates a socket, submits IORING_OP_CONNECT. the fiber sleeps until the TCP handshake completes.
20  - read — submits IORING_OP_READ (single buffer) or IORING_OP_READV (scatter-gather). fiber sleeps until data arrives.
21  - write — assembles an iovec array from the buffer pieces zig's writer API provides, submits IORING_OP_SENDMSG. fiber sleeps until the kernel accepts the data.
22  - send — same as write but for the message-oriented send API. iterates messages, one SENDMSG per message.
23
24  each one handles the full error space (ECONNREFUSED, ETIMEDOUT, ECONNRESET, etc.) and retries on EINTR/ECANCELED (which io_uring can produce when canceling in-flight operations).
25
26  why we didn't upstream it
27
28  the zig team knows these are missing (tracked as issue #31723). our implementations are pragmatic — sync bind/listen because our kernel is too old, specific error mappings that might not
29   match what upstream wants. we'd need to align with whatever API design they settle on. for now the patch works and is pinned to our exact zig version (0.16.0-dev.3059).
30
31  the other workarounds (not patches)
32
33  there are five more issues we work around in application code, not patches:
34
35  1. DNS — netLookup isn't implemented in the Evented backend either. instead of patching it, subscribers do DNS through the Threaded backend (pool_io) and then hand the resulting socket
36  to Evented for actual data transfer.
37  2. ReleaseSafe GPF — the zig optimizer under ReleaseSafe mode breaks fiber context-switch somehow. we just build with ReleaseFast. repro_evented.zig reproduces it.
38  3. debug_io threading — std.debug.print assumes single-threaded by default. we override it with a Threaded io instance so concurrent log output doesn't corrupt.
39  4. pg.Pool Io.Event panic — the connection pool uses Io.Event.reset() which assumes only one waiter. with 16 worker threads, that panics. we forked pg.zig to use a futex counter instead.
40  5. cross-Io mutex — the biggest architectural lesson. Io.Mutex from one backend can't be used on the other. this caused crashes 1, 6, 8, and today's fix (crash 9). the fix is always the
41  same: keep Threaded work on plain threads, keep Evented work on fibers, never cross the streams. the host_ops queue we just built is the latest instance of this pattern.