Implement encoding sniffing: BOM, HTTP charset, meta prescan
Add encoding detection per WHATWG Encoding Standard and HTML spec:
- BOM sniffing for UTF-8, UTF-16LE, UTF-16BE
- HTTP Content-Type charset extraction (quoted/unquoted values)
- HTML meta prescan: scan first 1024 bytes for <meta charset> and
<meta http-equiv="Content-Type" content="...;charset=...">
- Priority: BOM > HTTP header > meta prescan > default (Windows-1252)
- UTF-16 from HTTP/meta overridden to UTF-8 per spec
- EncodingSource enum for tracking detection method confidence
39 new tests covering all detection methods and priority ordering.
Implements issue 3mhkt62ryxo2l
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
authored by
tangled.org
276dfeb1
550d8f66