bench: fair 1-direction stream+channel, expand channel fan-out
Add noopStreamGen and noopConsume moroutines + matching benches that
pit 1-direction stream (worker→parent) against 1-direction channel
(parent→worker) on equal footing — both are one cross-thread hop per
item.
Findings (on this branch, HEAD):
stream (worker generates → parent consumes) ~746K items/s
channel (parent generates → worker consumes) ~652K items/s
stream round-trip (parent → worker → parent) ~464K items/s
vs unoptimized (bb41d5d):
stream 1-dir: ~66K (+11.3x)
channel 1-dir: ~67K (+9.7x)
Two takeaways:
- The earlier "channel beats stream" observation was an artifact of
comparing channel's 1-hop case to stream's 2-hop (pass-through)
noopStream-with-input case. When measured 1-direction-vs-1-direction,
stream and channel are tied pre-optimization.
- Post-optimization, stream is ~15% faster than channel — that's the
win from atomics-based backpressure over message+adaptive-yield.
Applying atomics to the Distributor would likely close that gap.
Also keeps channelFanout at varying worker counts as a separate
section — shows the Distributor's single-threaded producer bottleneck
(peak at ~2 consumers, regresses past 4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>