add pipeline contention metrics + zero-consumer fast path
instrumentation for the ~1,300 host CPU cliff:
- relay_persist_order_spins_total: spin iterations on the ordering lock
- relay_broadcast_queue_full_total: spin iterations on full broadcast queue
- relay_broadcast_queue_depth_hwm: high-water mark of queue depth
- relay_broadcast_no_consumers_total: frames that skipped SharedFrame alloc
zero-consumer fast path: when no consumers are connected, broadcast()
returns after history.push() without allocating SharedFrame or taking
consumers_mutex. saves one heap alloc + one mutex per frame.
also includes the cursor coalesce fix (CursorMap) and slot reuse
(free list with unregister on subscriber exit) from previous commits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>