optimize CBOR encoder: skip-sort, batched writes, stack CID
Three targeted optimizations based on benchmark profiling:
- Skip map key sort allocation when keys are already in DAG-CBOR order.
Decoded data always has sorted keys, so the decode-re-encode
verification path is now allocation-free for maps (-12%).
- Batch writeArgument into a single writeAll call per argument instead
of 2-3 separate writer dispatches (-8% encode).
- Build CID bytes in a 72-byte stack buffer then dupe, replacing the
dynamically-growing Writer.Allocating for the fixed-size output.
Also adds diagnostic benchmarks (SHA-256 isolation, UTF-8 cost, 10x
scaling, stack CID) to support data-driven optimization decisions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>