talent(sense): tighten entity extraction guardrails
Backfill spot-check on 2025 gap-zone segments surfaced five recurring
edge cases the prompt rules implied but didn't explicitly enforce.
Hardening before a planned ~67k-segment historical backfill:
- Person: strengthen first-name-only skip; explicitly ban generic
speaker labels ("Speaker 1", "Colleague") from entities (they belong
in the speakers array when meeting_detected=true)
- Project: exclude generic git/file identifiers ("main", "dev",
"staging") and one-word lowercase tokens
- speakers ↔ meeting_detected consistency: empty speakers + true
meeting_detected is invalid; use generic labels rather than emptying
- New rule 7: skip placeholder names with speaker-uncertainty markers
("Museum something", "whatever-thing-it's-called")
- New rule 8: summary-mentioned named entities must appear in entities
No structural / schema changes. Only adds explicit guardrails the model
already implicitly violated under the prior wording.