bench: add dispatch perf tool (per-task, pipelined, parallel, batch, stream)
Measures pure round-trip dispatch overhead with a noop moroutine —
the task does essentially no work, so timings reflect the cost of
postMessage + event loop re-entry per task.
Shapes measured:
- Ping-pong latency (strict await-each, 1 worker)
- Pipelined throughput (1 worker, variable in-flight window)
- Parallel throughput (N workers, Promise.all)
- Batch arg vs per-task vs streaming (1 worker, 100K items)
The batch-vs-stream comparison is particularly informative: batching
100K items into a single task arg is ~34x faster than per-task
dispatch, while the current streaming implementation is ~8x slower
than per-task because of the per-item setImmediate yield used to
keep pause/resume backpressure responsive.
Used together with docs/atomics-bench/ (local-only research notes)
to evaluate Piscina's atomics technique — conclusion on the
atomics-dispatch branch; this bench stays useful regardless.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>