add extractor module for platform-agnostic content extraction
- new extractor.zig with extractDocument() and detectPlatform()
- uses textContent field for standard.site records (no block parsing needed)
- falls back to leaflet block parsing for pub.leaflet.* records
- detects platform from content.$type prefix (leaflet, pckt, offprint)
- ExtractedDocument owns allocated memory, has deinit() method
- explicit error types (Allocator.Error) for all internal functions
- uses @tagName for Platform.name()
- StaticStringMap for plaintext block types
- indexer now stores platform and source_collection
- removed inline extraction code from tap.zig (~100 lines)
adds test infrastructure:
- zig build test step in build.zig
- tests for buildFtsQuery (8 cases)
- tests for Platform.fromContentType and Platform.name
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>