Improve segmentation prompt for timestamp-free transcripts

Add topic-based segmentation mode for transcripts without embedded
timestamps (e.g. chat-format meeting transcripts). The model now
segments by conversation shifts and estimates timestamps from word
count at ~130 wpm, instead of hallucinating uniform line splits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Jer Miller 2 months ago fe1d57fa 3d477f16

+15 -13

1 changed file

expand all

think

detect_transcript_segment.md

+15 -13

think/detect_transcript_segment.md

··· 4 4 label: Segmentation 5 5 group: Import 6 6 --- 7 - You are a transcript analyzer that identifies 5-minute segment boundaries with absolute timestamps. 7 + You are a transcript analyzer that splits transcripts into ~5-minute segments. 8 8 9 - TASK: Find ~5-minute segment boundaries and return their line numbers with absolute time-of-day timestamps. 9 + TASK: Find segment boundaries and return their line numbers with absolute time-of-day timestamps. 10 10 11 11 INPUT FORMAT: 12 12 - First line: "START_TIME: HH:MM:SS" - the absolute start time of this transcript 13 13 - Remaining lines: Transcript with line numbers prepended as "N: content" 14 - - Timestamps in transcript may be relative (00:00:00, 05:30) or absolute (14:30:22) 15 14 16 15 OUTPUT FORMAT: 17 16 - JSON array of objects with "start_at" and "line" fields ··· 19 18 - "line": Line number where this segment begins 20 19 - Example: [{"start_at":"12:00:00","line":1},{"start_at":"12:05:23","line":42}] 21 20 22 - REQUIREMENTS: 23 - 1. First segment starts at the provided START_TIME on line 1 24 - 2. Detect if transcript uses relative or absolute timestamps: 25 - - Relative (00:00:00, 05:30): Add to START_TIME to get absolute 26 - - Absolute (14:30:22): Use directly 27 - 3. Find boundaries near 5-minute intervals from start 28 - 4. Output all times as absolute HH:MM:SS 21 + SEGMENTATION MODES: 29 22 30 - EDGE CASES: 31 - - Transcript < 5 minutes: return single segment at START_TIME 32 - - No valid timestamps: return single segment at START_TIME 23 + 1. **Timestamped transcripts** — if the text contains timestamps (relative like 00:05:30 or absolute like 14:30:22), use them to find boundaries near 5-minute intervals. Convert relative timestamps by adding to START_TIME. 24 + 25 + 2. **Timestamp-free transcripts** — if the text has NO timestamps (e.g. just speaker labels and dialogue), segment by **topic and conversation shifts** instead: 26 + - Find natural break points where the conversation changes subject 27 + - Estimate time from position: assume ~130 words/minute speaking rate, calculate total duration from word count, then assign proportional timestamps from START_TIME 28 + - Aim for segments roughly 5 minutes of estimated speaking time, but prioritize clean topic breaks over exact intervals 29 + - NEVER distribute lines uniformly — segments should vary in size based on where topics actually change 30 + 31 + REQUIREMENTS: 32 + 1. First segment always starts at START_TIME on line 1 33 + 2. All output times must be absolute HH:MM:SS 34 + 3. Every transcript gets multiple segments unless it is extremely short (under ~2 minutes estimated) 33 35 34 36 RESPONSE: Return only the JSON array, no additional text.

Configure Feed

Configure Feed