personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

storytellers: write onto activity record; add commitments/closures/decisions

Consolidate the three storytelling talents (conversation/work/event) onto
the activity record instead of a separate facets/*/spans/*.jsonl sink.
Each talent now emits story (body + topics + confidence) plus structured
commitments, closures, and decisions; the new talent/story.py hook merges
these onto the activity record via a dedicated writer in think/activities.py
(merge_story_fields), atomically under one file lock with a single edits[]
entry per merge.

- delete talent/spans.py, think/spans.py, and their tests
- drop the facets/*/spans/*.jsonl formatter entry
- storytellers run at priority 20 so they serialize after participation (10),
relying on existing priority-group dispatch rather than new locking
- format_activities renders story body + topics so FTS coverage moves onto
the activity record formatter
- owner/counterparty fuzzy-resolve to *_entity_id via find_matching_entity;
originals preserved, null on miss
- closures with non-vocab resolution dropped per-entry
- hook returns "" to suppress the generator JSON artifact

Unblocks the Ledger consumer surface without any back-compat shims or
historical backfill.

+1027 -736
+265
docs/design/story-talent-refactor.md
··· 1 + # Story-Talent Refactor 2 + This refactor replaces the old storyteller span-row write path with activity-record 3 + story merges. It is a clean break: 4 + - storytellers stop writing `facets/*/spans/*.jsonl` 5 + - story content lives on the activity record itself 6 + - `think/activities.py` remains the only writer for activity records 7 + - priority ordering, not extra locking, serializes participation before story 8 + ## 1. New writer: `merge_story_fields` 9 + Location: 10 + - add `merge_story_fields(...)` in `think/activities.py` 11 + - place it next to `update_record_fields()` and `update_activity_record()` 12 + - this is the only activity-record write added by the refactor, so L2 stays 13 + satisfied inside the domain owner 14 + Signature and docstring: 15 + ```python 16 + def merge_story_fields( 17 + facet: str, 18 + day: str, 19 + record_id: str, 20 + *, 21 + story: dict, 22 + commitments: list[dict], 23 + closures: list[dict], 24 + decisions: list[dict], 25 + actor: str, 26 + note: str | None = None, 27 + ) -> bool: 28 + """Replace story-derived fields on an activity record and append one edit.""" 29 + ``` 30 + Behavior and return semantics: 31 + - use one `locked_modify(...)` call only, following the same pattern as 32 + `update_activity_record()` and `_set_activity_hidden_state()` 33 + (`think/activities.py:762-790`, `1046-1089`, `1092-1133`) 34 + - inside the callback, find the record with the same `record.get("id") == record_id` 35 + match used by `update_activity_record()` 36 + - if found: 37 + - normalize the current record 38 + - replace `story`, `commitments`, `closures`, and `decisions` wholesale 39 + - call `append_edit(...)` exactly once with 40 + `fields=["story", "commitments", "closures", "decisions"]` 41 + - pass through `actor` 42 + - pass through `note` 43 + - return `True` 44 + - if the day file is missing or the record is absent: 45 + - log a warning 46 + - return `False` 47 + - do not raise 48 + Why this shape: 49 + - `update_activity_record()` is intentionally narrow and only allows 50 + `title`, `description`, and `details` (`think/activities.py:1046-1062`) 51 + - the CLI mirrors that exact scope 52 + (`apps/activities/call.py:503-532`, 53 + `apps/activities/talent/activities/SKILL.md:116-130`) 54 + - `_set_activity_hidden_state()` already establishes the specialized-writer 55 + pattern in this module (`think/activities.py:1092-1133`) 56 + - `update_record_fields()` stays the generic no-edit helper used by participation 57 + (`think/activities.py:979-1011`, `talent/participation.py:99-105`) 58 + ## 2. `story.py` `post_process` flow 59 + The new hook lives at `talent/story.py` and always returns `""` so the JSON 60 + generator artifact is suppressed. 61 + Dispatcher context shape, confirmed: 62 + - `run_activity_prompts()` sends `facet`, `day`, `span`, `activity`, and 63 + `output_path` in the activity request (`think/thinking.py:2064-2083`) 64 + - `prepare_config()` merges those request keys into the full talent config and 65 + always carries `name` (`think/talents.py:438-520`) 66 + - `_run_post_hooks()` passes the full prepared config dict directly to the hook 67 + (`think/talents.py:712-734`) 68 + The hook can rely on: 69 + - `context["name"]` 70 + - `context["facet"]` 71 + - `context["day"]` 72 + - `context["activity"]` 73 + - `context["span"]` 74 + - `context["output_path"]` 75 + Execution order: 76 + 1. Parse `result` with `json.loads(result.strip())`. 77 + On failure: log and return `""`. 78 + 2. Require a top-level `dict`. 79 + Otherwise: log and return `""`. 80 + 3. Validate required top-level fields. 81 + - `body`: `str`, non-empty after strip 82 + - `topics`: `list[str]`, may be empty 83 + - `confidence`: numeric in `0.0..1.0` 84 + - `commitments`: `list` 85 + - `closures`: `list` 86 + - `decisions`: `list` 87 + Any missing field, wrong type, or out-of-range `confidence` logs and returns 88 + `""`. 89 + 4. Validate required context. 90 + - `context["activity"]` must be a `dict` 91 + - `context["activity"]["id"]` must exist 92 + - `context["facet"]` and `context["day"]` must exist 93 + Missing context logs and returns `""`. 94 + 5. Load entities once with: 95 + `load_entities(facet=context["facet"], day=context["day"])`. 96 + 6. Validate `commitments` entry by entry. 97 + - each entry must be a `dict` 98 + - required keys: `owner`, `action`, `counterparty`, `when`, `context` 99 + - each required value must be a `str` 100 + - invalid entries are skipped with a per-entry log 101 + 7. Validate `closures` entry by entry. 102 + - each entry must be a `dict` 103 + - required keys: `owner`, `action`, `counterparty`, `resolution`, `context` 104 + - each required value must be a `str` 105 + - `resolution` must be one of: 106 + `sent`, `done`, `signed`, `dropped`, `deferred` 107 + - invalid entries are skipped with a per-entry log 108 + 8. Validate `decisions` entry by entry. 109 + - each entry must be a `dict` 110 + - required keys: `owner`, `action`, `context` 111 + - each required value must be a `str` 112 + - invalid entries are skipped with a per-entry log 113 + 9. Resolve entity ids for every valid entry with 114 + `find_matching_entity(name, entities, fuzzy_threshold=90)`. 115 + - commitments: add `owner_entity_id` and `counterparty_entity_id` 116 + - closures: add `owner_entity_id` and `counterparty_entity_id` 117 + - decisions: add `owner_entity_id` 118 + - unmatched values become `None` 119 + - preserve the original `owner` and `counterparty` strings 120 + 10. Build: 121 + `story = {"body": body, "topics": topics, "confidence": confidence}`. 122 + 11. Extract: 123 + - `record_id = context["activity"]["id"]` 124 + - `facet = context["facet"]` 125 + - `day = context["day"]` 126 + 12. Call: 127 + `merge_story_fields(facet, day, record_id, story=..., commitments=..., closures=..., decisions=..., actor="story", note=None)`. 128 + If it returns `False`: log a warning and return `""`. 129 + 13. Return `""`. 130 + This is required because `_execute_with_tools()` only writes the output file 131 + when `result` is truthy (`think/talents.py:837-846`); returning `None` would 132 + fall back to the original JSON result (`think/talents.py:726-734`). 133 + Intentional differences from `talent/spans.py`: 134 + - no spans-file write 135 + - no topic dedupe/normalization 136 + - no confidence clamping 137 + - no fence-stripping carryover unless explicitly added during implementation 138 + ## 3. Activity-record formatter extension 139 + Target: 140 + - extend `think/activities.py::format_activities()` 141 + - current order is: 142 + title, activity, facet, day, time, level, description, details, participation, 143 + hidden (`think/activities.py:1271-1307`) 144 + Chosen insertion point: 145 + - add the story block after participation and before hidden 146 + Behavior: 147 + - if `record.get("story")` is not a `dict`, do nothing 148 + - if `story["body"]` is a non-empty string, render it as prose rather than 149 + `- Story: ...` 150 + - if `story["topics"]` is a non-empty list of strings, render one line as 151 + `Topics: a, b, c` 152 + - if `body` is missing/empty, skip the prose block 153 + - if `topics` is missing, non-list, or empty, skip the topics line 154 + - keep all other formatter output unchanged 155 + - keep the existing activity formatter registration; no new registry entry is 156 + needed because activities are already mapped to `format_activities()` 157 + (`think/formatters.py:143-144`) 158 + Why this insertion point is best: 159 + - description/details remain raw activity metadata 160 + - participation remains the structured who-was-involved summary 161 + - story reads naturally after those structured fields 162 + - hidden stays last because it is record state, not content 163 + ## 4. Storyteller prompt changes 164 + Common frontmatter changes for all three storyteller talents: 165 + - `priority: 10` -> `priority: 20` 166 + - `hook: {"post": "spans"}` -> `hook: {"post": "story"}` 167 + - keep `schedule: "activity"` 168 + - keep `output: "json"` 169 + - keep existing activity filters per talent 170 + Common schema changes for all three: 171 + - require exactly: 172 + `body`, `topics`, `confidence`, `commitments`, `closures`, `decisions` 173 + - all six fields are required on every response 174 + - `topics` may be `[]` 175 + - `commitments`, `closures`, and `decisions` may be `[]` 176 + - add the explicit instruction: 177 + `Return [] if you do not observe a clear commitment / closure / decision. Better to omit than invent.` 178 + - state the controlled closure `resolution` vocabulary exactly: 179 + `sent`, `done`, `signed`, `dropped`, `deferred` 180 + `talent/conversation.md` 181 + - keep the meeting/call/messaging/email narrative focus 182 + - expand the schema block to the six-field JSON shape 183 + - inline examples: 184 + - commitment: send a follow-up, draft, or deck by a date 185 + - closure: an open item was `sent` or `done` 186 + - decision: the group chose a direction, owner, or timing 187 + - keep the current guidance that brief quotes are allowed when they sharpen a 188 + decision, commitment, or disagreement 189 + `talent/work.md` 190 + - keep the coding/browsing/reading progress focus 191 + - expand the schema block to the six-field JSON shape 192 + - inline examples: 193 + - commitment: ship a patch, benchmark, or send results 194 + - closure: a task was `done` or a review was `sent` 195 + - decision: a code-path, retry strategy, or API choice was made 196 + - keep the instruction to emphasize actual work performed over UI description 197 + `talent/event.md` 198 + - keep the appointment/event/travel/errand outcome focus 199 + - expand the schema block to the six-field JSON shape 200 + - inline examples: 201 + - commitment: a travel or logistics follow-up 202 + - closure: a form was `signed`, a reservation was `done`, or a task was 203 + `deferred` 204 + - decision: a route, plan, or next-step choice was made 205 + - keep the guidance to prefer what actually happened over generic event labels 206 + ## 5. Test matrix 207 + | test name | file | pins | 208 + | --- | --- | --- | 209 + | `test_story_hook_parses_and_writes` | `tests/test_story_hook.py` | Valid JSON writes `story`, `commitments`, `closures`, `decisions` onto the activity record and appends one edit with actor `story`. | 210 + | `test_story_hook_empty_arrays` | `tests/test_story_hook.py` | Empty `commitments`/`closures`/`decisions` still persist alongside the story payload. | 211 + | `test_story_hook_bad_resolution_skipped` | `tests/test_story_hook.py` | Invalid closure `resolution` is dropped while valid sibling closures survive. | 212 + | `test_story_hook_missing_required_field_skipped` | `tests/test_story_hook.py` | Missing required per-entry fields skip only the bad item. | 213 + | `test_story_hook_resolves_entities` | `tests/test_story_hook.py` | `owner`/`counterparty` resolve to `*_entity_id` with `fuzzy_threshold=90`; misses become `None`. | 214 + | `test_story_hook_idempotent_rerun` | `tests/test_story_hook.py` | Second run replaces story/list fields wholesale and appends one more edit entry. | 215 + | `test_story_hook_missing_record_logs_and_returns` | `tests/test_story_hook.py` | `merge_story_fields()` returns `False`, hook logs warning, nothing raises. | 216 + | `test_story_hook_no_json_file_written` | `tests/test_story_hook.py` | Returning `""` suppresses the storyteller JSON artifact. | 217 + | `test_format_activities_renders_story` | `tests/test_activities.py` | Story prose and `Topics:` line appear when present and disappear cleanly when absent. | 218 + | `test_no_spans_formatter_registered` | `tests/test_formatters.py` | `"facets/*/spans/*.jsonl"` is removed from `FORMATTERS`; spans paths no longer resolve to a formatter. | 219 + | `test_no_spans_writes` | `tests/test_formatters.py` | Search-style assertion that no `format_spans` or `spans/` write targets remain in `think/`, `talent/`, or `apps/`. | 220 + Existing test templates to reuse: 221 + - `tests/test_activity_record_merge.py` for temp-journal setup, activity seeding, 222 + hook execution, and record reload assertions 223 + - `tests/test_schedule_hook.py` for per-entry skip behavior and entity-resolution 224 + patterns 225 + ## 6. Files touched / deleted 226 + Create: 227 + - `docs/design/story-talent-refactor.md` 228 + - `talent/story.py` 229 + - `tests/test_story_hook.py` 230 + Modify: 231 + - `think/activities.py` 232 + - `think/formatters.py` 233 + - `talent/conversation.md` 234 + - `talent/work.md` 235 + - `talent/event.md` 236 + - `tests/test_activities.py` 237 + - `tests/test_formatters.py` 238 + - `tests/baselines/api/stats/stats.json` 239 + - `talent/journal/references/captures.md` 240 + Delete: 241 + - `talent/spans.py` 242 + - `think/spans.py` 243 + - `tests/test_spans_hook.py` 244 + - `tests/test_spans_formatter.py` 245 + Intentionally untouched: 246 + - `think/thinking.py` because priority-group serialization already does the job 247 + - `apps/activities/call.py` because no CLI surface change is needed 248 + ## 7. Risks / gotchas 249 + - Preserve the hook-return behavior exactly: `""`, not `None`. 250 + `None` would fall back to the original JSON result and write a generator file 251 + (`think/talents.py:726-734`, `837-846`). 252 + - Preserve the missing-record behavior of `update_record_fields()`: 253 + no raise, but the story path must log the failure like participation does 254 + (`think/activities.py:1007-1011`, `talent/participation.py:104-105`). 255 + - `format_activities()` is already registered for 256 + `"facets/*/activities/*.jsonl"` (`think/formatters.py:144`). 257 + Do not add a new formatter entry for story data. 258 + - Layer hygiene L2 stays strict: 259 + only `think/activities.py` writes the activity record. 260 + `talent/story.py` imports and calls the new writer; it does not perform raw 261 + file I/O. 262 + - Story merges serialize after participation via priority ordering, not new locks. 263 + Keep participation at `10`, storytellers at `20`, and rely on the existing 264 + group ordering/drain in `run_activity_prompts()` 265 + (`think/thinking.py:1925-1928`, `2150-2183`).
+14 -6
talent/conversation.md
··· 1 1 { 2 2 "type": "generate", 3 3 "title": "Conversation Story", 4 - "description": "Writes a structured narrative span row for meeting, call, messaging, and email activities.", 4 + "description": "Generates a conversation story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 5 5 "color": "#00796b", 6 6 "schedule": "activity", 7 7 "activities": ["meeting", "call", "messaging", "email"], 8 - "priority": 10, 8 + "priority": 20, 9 9 "output": "json", 10 - "hook": {"post": "spans"}, 10 + "hook": {"post": "story"}, 11 11 "load": { 12 12 "transcripts": true, 13 13 "percepts": true, ··· 29 29 Participation and entity extraction already happened upstream. Reuse that context; 30 30 do not re-extract people or entities into new structures. 31 31 32 - Return exactly these three fields: 32 + Return exactly this six-field JSON object: 33 33 - `body`: string narrative prose covering what was discussed, what moved, and any commitments. 34 - - `topics`: array of 3-8 short string tags. 34 + - `topics`: array of short string tags; use `[]` when there are no durable topics worth preserving. 35 35 - `confidence`: float from 0.0 to 1.0. 36 + - `commitments`: array of objects with required string fields `owner`, `action`, `counterparty`, `when`, `context`. 37 + Example: `{"owner":"Mina","action":"send the revised deck","counterparty":"Ravi","when":"Friday morning","context":"Mina committed to send the deck before the investor follow-up."}` 38 + - `closures`: array of objects with required string fields `owner`, `action`, `counterparty`, `resolution`, `context`. `resolution` must be one of `sent`, `done`, `signed`, `dropped`, `deferred`. 39 + Example: `{"owner":"Ravi","action":"intro email","counterparty":"Mina","resolution":"sent","context":"Ravi confirmed the intro email already went out during the call."}` 40 + - `decisions`: array of objects with required string fields `owner`, `action`, `context`. 41 + Example: `{"owner":"Team","action":"schedule the launch review for next Tuesday","context":"The group agreed to move the review to Tuesday after checking calendars."}` 42 + 43 + Return `[]` if you do not observe a clear commitment / closure / decision. Better to omit than invent. 36 44 37 45 Body requirements: 38 46 - Write one tight paragraph in chronological order. ··· 42 50 - If the activity mixes channels, unify them into one narrative rather than 43 51 listing separate threads. 44 52 45 - Output a single JSON object with only `body`, `topics`, and `confidence`. 53 + Output a single JSON object with all six required fields: `body`, `topics`, `confidence`, `commitments`, `closures`, and `decisions`.
+14 -6
talent/event.md
··· 1 1 { 2 2 "type": "generate", 3 3 "title": "Event Story", 4 - "description": "Writes a structured narrative span row for appointment, event, travel, errand, celebration, deadline, and reminder activities.", 4 + "description": "Generates an event story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 5 5 "color": "#ff7043", 6 6 "schedule": "activity", 7 7 "activities": ["appointment", "event", "travel", "errand", "celebration", "deadline", "reminder"], 8 - "priority": 10, 8 + "priority": 20, 9 9 "output": "json", 10 - "hook": {"post": "spans"}, 10 + "hook": {"post": "story"}, 11 11 "load": { 12 12 "transcripts": true, 13 13 "percepts": true, ··· 30 30 upstream. Use that context; do not re-extract people or entities into new 31 31 structures. 32 32 33 - Return exactly these three fields: 33 + Return exactly this six-field JSON object: 34 34 - `body`: string narrative prose describing what happened and any outcome. 35 - - `topics`: array of 3-8 short string tags. 35 + - `topics`: array of short string tags; use `[]` when there are no durable topics worth preserving. 36 36 - `confidence`: float from 0.0 to 1.0. 37 + - `commitments`: array of objects with required string fields `owner`, `action`, `counterparty`, `when`, `context`. 38 + Example: `{"owner":"Jordan","action":"send the updated itinerary","counterparty":"Taylor","when":"tonight","context":"Jordan said the revised travel plan would be sent after the delay was confirmed."}` 39 + - `closures`: array of objects with required string fields `owner`, `action`, `counterparty`, `resolution`, `context`. `resolution` must be one of `sent`, `done`, `signed`, `dropped`, `deferred`. 40 + Example: `{"owner":"Jordan","action":"hotel confirmation","counterparty":"Taylor","resolution":"signed","context":"Jordan completed and signed the hotel check-in form during the event."}` 41 + - `decisions`: array of objects with required string fields `owner`, `action`, `context`. 42 + Example: `{"owner":"Travel group","action":"take the shuttle instead of renting a car","context":"After the delay, the group agreed the shuttle was the fastest remaining option."}` 43 + 44 + Return `[]` if you do not observe a clear commitment / closure / decision. Better to omit than invent. 37 45 38 46 Body requirements: 39 47 - Write one tight paragraph in chronological order. ··· 41 49 - Prefer what actually occurred over generic labels from the activity type. 42 50 - If evidence is thin, keep the narrative modest and confidence honest. 43 51 44 - Output a single JSON object with only `body`, `topics`, and `confidence`. 52 + Output a single JSON object with all six required fields: `body`, `topics`, `confidence`, `commitments`, `closures`, and `decisions`.
+1 -1
talent/journal/references/captures.md
··· 271 271 - System outputs: `talents/{agent}.md` (e.g., `talents/briefing.md`, `talents/default.md`) 272 272 - App outputs: `talents/_{app}_{agent}.md` (e.g., `talents/_entities_observer.md`) 273 273 - JSON output: `talents/{agent}.json` when metadata specifies `"output": "json"` 274 - - Story span rows: `facets/{facet}/spans/{day}.jsonl` 274 + - Story fields (`story`, `commitments`, `closures`, `decisions`) live on the activity record in `facets/{facet}/activities/{day}.jsonl` 275 275 276 276 Each generator type has a corresponding template file (`{name}.md`) that defines how the AI synthesizes extracts into narrative form.
-170
talent/spans.py
··· 1 - # SPDX-License-Identifier: AGPL-3.0-only 2 - # Copyright (c) 2026 sol pbc 3 - 4 - """Post-hook for structured storytelling span rows.""" 5 - 6 - from __future__ import annotations 7 - 8 - import json 9 - import logging 10 - import math 11 - import re 12 - from pathlib import Path 13 - from typing import Any 14 - 15 - from think.activities import locked_modify 16 - from think.utils import get_journal, segment_parse 17 - 18 - logger = logging.getLogger(__name__) 19 - 20 - 21 - def _strip_code_fences(result: str) -> str: 22 - stripped = result.strip() 23 - stripped = re.sub(r"^```(?:json)?\s*", "", stripped) 24 - return re.sub(r"\s*```$", "", stripped) 25 - 26 - 27 - def _normalize_topics(value: Any) -> list[str] | None: 28 - if not isinstance(value, list): 29 - logger.warning("spans hook: missing topics list") 30 - return None 31 - 32 - topics: list[str] = [] 33 - seen: set[str] = set() 34 - for item in value: 35 - if not isinstance(item, str): 36 - logger.warning("spans hook: invalid topics list") 37 - return None 38 - topic = item.strip().lower() 39 - if not topic or topic in seen: 40 - continue 41 - seen.add(topic) 42 - topics.append(topic) 43 - if len(topics) >= 10: 44 - break 45 - 46 - if not topics: 47 - logger.warning("spans hook: empty topics after normalization") 48 - return None 49 - 50 - return topics 51 - 52 - 53 - def _normalize_confidence(value: Any) -> float | None: 54 - if isinstance(value, bool) or not isinstance(value, (int, float)): 55 - logger.warning("spans hook: invalid confidence value") 56 - return None 57 - 58 - confidence = float(value) 59 - if math.isnan(confidence): 60 - logger.warning("spans hook: invalid confidence value") 61 - return None 62 - 63 - clamped = min(1.0, max(0.0, confidence)) 64 - if clamped != confidence: 65 - logger.warning( 66 - "spans hook: clamped confidence %.3f to %.3f", confidence, clamped 67 - ) 68 - return clamped 69 - 70 - 71 - def _activity_time_bounds(segments: Any) -> tuple[str, str] | None: 72 - if not isinstance(segments, list) or not segments: 73 - logger.warning("spans hook: missing activity segments") 74 - return None 75 - 76 - start_time, _ = segment_parse(str(segments[0])) 77 - _, end_time = segment_parse(str(segments[-1])) 78 - if start_time is None or end_time is None: 79 - logger.warning("spans hook: invalid activity segments") 80 - return None 81 - 82 - return start_time.strftime("%H:%M:%S"), end_time.strftime("%H:%M:%S") 83 - 84 - 85 - def _spans_path(facet: str, day: str) -> Path: 86 - return Path(get_journal()) / "facets" / facet / "spans" / f"{day}.jsonl" 87 - 88 - 89 - def post_process(result: str, context: dict) -> str: 90 - """Parse model JSON and persist a single storytelling span row.""" 91 - try: 92 - try: 93 - data = json.loads(_strip_code_fences(result)) 94 - except (json.JSONDecodeError, ValueError) as exc: 95 - logger.warning("spans hook: failed to parse JSON: %s", exc) 96 - return "" 97 - 98 - if not isinstance(data, dict): 99 - logger.warning("spans hook: expected top-level object") 100 - return "" 101 - 102 - body = data.get("body") 103 - if not isinstance(body, str) or not body.strip(): 104 - logger.warning("spans hook: missing body") 105 - return "" 106 - normalized_body = body.strip() 107 - 108 - topics = _normalize_topics(data.get("topics")) 109 - if topics is None: 110 - return "" 111 - 112 - confidence = _normalize_confidence(data.get("confidence")) 113 - if confidence is None: 114 - return "" 115 - 116 - activity = context.get("activity") 117 - if not isinstance(activity, dict): 118 - logger.warning("spans hook: missing activity context") 119 - return "" 120 - 121 - facet = str(context.get("facet") or "").strip() 122 - day = str(context.get("day") or "").strip() 123 - if not facet or not day: 124 - logger.warning("spans hook: missing facet/day context") 125 - return "" 126 - 127 - span_id = str(activity.get("id") or "").strip() 128 - activity_type = str(activity.get("activity") or "").strip() 129 - talent = str(context.get("name") or "").strip() 130 - if not span_id or not activity_type or not talent: 131 - logger.warning("spans hook: missing span metadata") 132 - return "" 133 - 134 - bounds = _activity_time_bounds(activity.get("segments")) 135 - if bounds is None: 136 - return "" 137 - start, end = bounds 138 - 139 - row = { 140 - "span_id": span_id, 141 - "talent": talent, 142 - "facet": facet, 143 - "day": day, 144 - "activity_type": activity_type, 145 - "start": start, 146 - "end": end, 147 - "body": normalized_body, 148 - "topics": topics, 149 - "confidence": confidence, 150 - } 151 - 152 - def modify_fn(records: list[dict[str, Any]]) -> list[dict[str, Any]]: 153 - updated: list[dict[str, Any]] = [] 154 - replaced = False 155 - for record in records: 156 - if record.get("span_id") == span_id and record.get("talent") == talent: 157 - if not replaced: 158 - updated.append(dict(row)) 159 - replaced = True 160 - continue 161 - updated.append(record) 162 - if not replaced: 163 - updated.append(dict(row)) 164 - return updated 165 - 166 - locked_modify(_spans_path(facet, day), modify_fn, create_if_missing=True) 167 - except Exception as exc: 168 - logger.warning("spans hook: failed to persist row: %s", exc) 169 - 170 - return ""
+191
talent/story.py
··· 1 + # SPDX-License-Identifier: AGPL-3.0-only 2 + # Copyright (c) 2026 sol pbc 3 + 4 + """Hook for merging storyteller outputs onto activity records.""" 5 + 6 + from __future__ import annotations 7 + 8 + import json 9 + import logging 10 + from typing import Any 11 + 12 + from think.activities import merge_story_fields 13 + from think.entities.loading import load_entities 14 + from think.entities.matching import find_matching_entity 15 + 16 + logger = logging.getLogger(__name__) 17 + 18 + ALLOWED_RESOLUTIONS = frozenset({"sent", "done", "signed", "dropped", "deferred"}) 19 + 20 + 21 + def _resolve_entity_id(name: str, entities: list[dict[str, Any]]) -> str | None: 22 + match = find_matching_entity(name, entities, fuzzy_threshold=90) 23 + return match.get("id") if match else None 24 + 25 + 26 + def _validate_fields( 27 + entry: dict[str, Any], required_fields: tuple[str, ...] 28 + ) -> dict[str, str] | None: 29 + normalized: dict[str, str] = {} 30 + for field in required_fields: 31 + value = entry.get(field) 32 + if not isinstance(value, str): 33 + return None 34 + normalized[field] = value 35 + return normalized 36 + 37 + 38 + def post_process(result: str, context: dict) -> str: 39 + """Validate storyteller JSON and merge it onto an activity record.""" 40 + try: 41 + data = json.loads(result.strip()) 42 + except (json.JSONDecodeError, ValueError) as exc: 43 + logger.error("story hook: failed to parse JSON: %s", exc) 44 + return "" 45 + 46 + if not isinstance(data, dict): 47 + logger.warning("story hook: expected top-level object") 48 + return "" 49 + 50 + body = data.get("body") 51 + topics = data.get("topics") 52 + confidence = data.get("confidence") 53 + commitments = data.get("commitments") 54 + closures = data.get("closures") 55 + decisions = data.get("decisions") 56 + 57 + if not isinstance(body, str) or not body.strip(): 58 + logger.warning("story hook: missing body") 59 + return "" 60 + if not isinstance(topics, list) or any( 61 + not isinstance(topic, str) for topic in topics 62 + ): 63 + logger.warning("story hook: invalid topics") 64 + return "" 65 + if ( 66 + isinstance(confidence, bool) 67 + or not isinstance(confidence, (int, float)) 68 + or not 0.0 <= float(confidence) <= 1.0 69 + ): 70 + logger.warning("story hook: invalid confidence") 71 + return "" 72 + if not isinstance(commitments, list): 73 + logger.warning("story hook: missing commitments list") 74 + return "" 75 + if not isinstance(closures, list): 76 + logger.warning("story hook: missing closures list") 77 + return "" 78 + if not isinstance(decisions, list): 79 + logger.warning("story hook: missing decisions list") 80 + return "" 81 + 82 + activity = context.get("activity") 83 + if not isinstance(activity, dict): 84 + logger.warning("story hook: missing activity context") 85 + return "" 86 + 87 + record_id = activity.get("id") 88 + if not isinstance(record_id, str) or not record_id: 89 + logger.warning("story hook: missing activity record id") 90 + return "" 91 + 92 + facet = context.get("facet") 93 + day = context.get("day") 94 + if not isinstance(facet, str) or not facet or not isinstance(day, str) or not day: 95 + logger.warning("story hook: missing facet/day context") 96 + return "" 97 + 98 + entities = load_entities(facet=facet, day=day) 99 + 100 + resolved_commitments: list[dict[str, Any]] = [] 101 + for index, entry in enumerate(commitments): 102 + if not isinstance(entry, dict): 103 + logger.warning( 104 + "story hook: skipping commitment[%d]: expected object", index 105 + ) 106 + continue 107 + normalized = _validate_fields( 108 + entry, ("owner", "action", "counterparty", "when", "context") 109 + ) 110 + if normalized is None: 111 + logger.warning( 112 + "story hook: skipping commitment[%d]: missing required string field", 113 + index, 114 + ) 115 + continue 116 + resolved_commitment = dict(normalized) 117 + resolved_commitment["owner_entity_id"] = _resolve_entity_id( 118 + normalized["owner"], entities 119 + ) 120 + resolved_commitment["counterparty_entity_id"] = _resolve_entity_id( 121 + normalized["counterparty"], entities 122 + ) 123 + resolved_commitments.append(resolved_commitment) 124 + 125 + resolved_closures: list[dict[str, Any]] = [] 126 + for index, entry in enumerate(closures): 127 + if not isinstance(entry, dict): 128 + logger.warning("story hook: skipping closure[%d]: expected object", index) 129 + continue 130 + normalized = _validate_fields( 131 + entry, ("owner", "action", "counterparty", "resolution", "context") 132 + ) 133 + if normalized is None: 134 + logger.warning( 135 + "story hook: skipping closure[%d]: missing required string field", 136 + index, 137 + ) 138 + continue 139 + if normalized["resolution"] not in ALLOWED_RESOLUTIONS: 140 + logger.warning( 141 + "story hook: skipping closure[%d]: invalid resolution '%s'", 142 + index, 143 + normalized["resolution"], 144 + ) 145 + continue 146 + resolved_closure = dict(normalized) 147 + resolved_closure["owner_entity_id"] = _resolve_entity_id( 148 + normalized["owner"], entities 149 + ) 150 + resolved_closure["counterparty_entity_id"] = _resolve_entity_id( 151 + normalized["counterparty"], entities 152 + ) 153 + resolved_closures.append(resolved_closure) 154 + 155 + resolved_decisions: list[dict[str, Any]] = [] 156 + for index, entry in enumerate(decisions): 157 + if not isinstance(entry, dict): 158 + logger.warning("story hook: skipping decision[%d]: expected object", index) 159 + continue 160 + normalized = _validate_fields(entry, ("owner", "action", "context")) 161 + if normalized is None: 162 + logger.warning( 163 + "story hook: skipping decision[%d]: missing required string field", 164 + index, 165 + ) 166 + continue 167 + resolved_decision = dict(normalized) 168 + resolved_decision["owner_entity_id"] = _resolve_entity_id( 169 + normalized["owner"], entities 170 + ) 171 + resolved_decisions.append(resolved_decision) 172 + 173 + story = { 174 + "body": body.strip(), 175 + "topics": list(topics), 176 + "confidence": float(confidence), 177 + } 178 + 179 + merge_story_fields( 180 + facet, 181 + day, 182 + record_id, 183 + story=story, 184 + commitments=resolved_commitments, 185 + closures=resolved_closures, 186 + decisions=resolved_decisions, 187 + actor="story", 188 + note=None, 189 + ) 190 + 191 + return ""
+14 -6
talent/work.md
··· 1 1 { 2 2 "type": "generate", 3 3 "title": "Work Story", 4 - "description": "Writes a structured narrative span row for coding, browsing, and reading activities.", 4 + "description": "Generates a work story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 5 5 "color": "#6d4c41", 6 6 "schedule": "activity", 7 7 "activities": ["coding", "browsing", "reading"], 8 - "priority": 10, 8 + "priority": 20, 9 9 "output": "json", 10 - "hook": {"post": "spans"}, 10 + "hook": {"post": "story"}, 11 11 "load": { 12 12 "transcripts": true, 13 13 "percepts": true, ··· 29 29 the activity. Participation and entity extraction already happened upstream. 30 30 Use that context; do not re-extract people or entities into new structures. 31 31 32 - Return exactly these three fields: 32 + Return exactly this six-field JSON object: 33 33 - `body`: string narrative prose about the work performed and what changed. 34 - - `topics`: array of 3-8 short string tags. 34 + - `topics`: array of short string tags; use `[]` when there are no durable topics worth preserving. 35 35 - `confidence`: float from 0.0 to 1.0. 36 + - `commitments`: array of objects with required string fields `owner`, `action`, `counterparty`, `when`, `context`. 37 + Example: `{"owner":"Avery","action":"post the benchmark results","counterparty":"Priya","when":"after lunch","context":"Avery said the new retry benchmark would be shared once the run completed."}` 38 + - `closures`: array of objects with required string fields `owner`, `action`, `counterparty`, `resolution`, `context`. `resolution` must be one of `sent`, `done`, `signed`, `dropped`, `deferred`. 39 + Example: `{"owner":"Avery","action":"follow-up PR","counterparty":"Priya","resolution":"done","context":"Avery noted the cleanup PR was merged during this work block."}` 40 + - `decisions`: array of objects with required string fields `owner`, `action`, `context`. 41 + Example: `{"owner":"Avery","action":"switch the retry path to queue-backed backoff","context":"The work session concluded that queue-backed backoff was simpler than the timer-based branch."}` 42 + 43 + Return `[]` if you do not observe a clear commitment / closure / decision. Better to omit than invent. 36 44 37 45 Body requirements: 38 46 - Write one tight paragraph in chronological order. ··· 41 49 - If evidence is partial, describe the most defensible story and keep the 42 50 confidence honest. 43 51 44 - Output a single JSON object with only `body`, `topics`, and `confidence`. 52 + Output a single JSON object with all six required fields: `body`, `topics`, `confidence`, `commitments`, `closures`, and `decisions`.
+3 -3
tests/baselines/api/sol/talents-day.json
··· 85 85 "conversation": { 86 86 "app": null, 87 87 "color": "#00796b", 88 - "description": "Writes a structured narrative span row for meeting, call, messaging, and email activities.", 88 + "description": "Generates a conversation story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 89 89 "multi_facet": false, 90 90 "output_format": "json", 91 91 "schedule": "activity", ··· 184 184 "event": { 185 185 "app": null, 186 186 "color": "#ff7043", 187 - "description": "Writes a structured narrative span row for appointment, event, travel, errand, celebration, deadline, and reminder activities.", 187 + "description": "Generates an event story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 188 188 "multi_facet": false, 189 189 "output_format": "json", 190 190 "schedule": "activity", ··· 459 459 "work": { 460 460 "app": null, 461 461 "color": "#6d4c41", 462 - "description": "Writes a structured narrative span row for coding, browsing, and reading activities.", 462 + "description": "Generates a work story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 463 463 "multi_facet": false, 464 464 "output_format": "json", 465 465 "schedule": "activity",
+9 -9
tests/baselines/api/stats/stats.json
··· 8 8 "email" 9 9 ], 10 10 "color": "#00796b", 11 - "description": "Writes a structured narrative span row for meeting, call, messaging, and email activities.", 11 + "description": "Generates a conversation story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 12 12 "hook": { 13 - "post": "spans" 13 + "post": "story" 14 14 }, 15 15 "load": { 16 16 "percepts": true, ··· 20 20 "mtime": 0, 21 21 "output": "json", 22 22 "path": "<PROJECT>/talent/conversation.md", 23 - "priority": 10, 23 + "priority": 20, 24 24 "schedule": "activity", 25 25 "source": "system", 26 26 "title": "Conversation Story", ··· 130 130 "reminder" 131 131 ], 132 132 "color": "#ff7043", 133 - "description": "Writes a structured narrative span row for appointment, event, travel, errand, celebration, deadline, and reminder activities.", 133 + "description": "Generates an event story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 134 134 "hook": { 135 - "post": "spans" 135 + "post": "story" 136 136 }, 137 137 "load": { 138 138 "percepts": true, ··· 142 142 "mtime": 0, 143 143 "output": "json", 144 144 "path": "<PROJECT>/talent/event.md", 145 - "priority": 10, 145 + "priority": 20, 146 146 "schedule": "activity", 147 147 "source": "system", 148 148 "title": "Event Story", ··· 287 287 "reading" 288 288 ], 289 289 "color": "#6d4c41", 290 - "description": "Writes a structured narrative span row for coding, browsing, and reading activities.", 290 + "description": "Generates a work story, topics, and structured commitments, closures, and decisions to merge onto the activity record.", 291 291 "hook": { 292 - "post": "spans" 292 + "post": "story" 293 293 }, 294 294 "load": { 295 295 "percepts": true, ··· 299 299 "mtime": 0, 300 300 "output": "json", 301 301 "path": "<PROJECT>/talent/work.md", 302 - "priority": 10, 302 + "priority": 20, 303 303 "schedule": "activity", 304 304 "source": "system", 305 305 "title": "Work Story",
+38
tests/test_activities.py
··· 781 781 note="bad field", 782 782 ) 783 783 784 + def test_format_activities_renders_story(self): 785 + from think.activities import format_activities 786 + 787 + chunks, _meta = format_activities( 788 + [ 789 + { 790 + "id": "meeting_090000_300", 791 + "activity": "meeting", 792 + "description": "Team sync", 793 + "segments": ["090000_300"], 794 + "created_at": 1, 795 + "participation": [{"name": "Mina"}], 796 + "story": { 797 + "body": "Aligned on the launch plan and assigned owners.", 798 + "topics": ["launch", "owners"], 799 + "confidence": 0.9, 800 + }, 801 + }, 802 + { 803 + "id": "coding_100000_300", 804 + "activity": "coding", 805 + "description": "Implementation block", 806 + "segments": ["100000_300"], 807 + "created_at": 2, 808 + }, 809 + ] 810 + ) 811 + 812 + assert ( 813 + "Aligned on the launch plan and assigned owners." in chunks[0]["markdown"] 814 + ) 815 + assert "Topics: launch, owners" in chunks[0]["markdown"] 816 + assert "Topics:" not in chunks[1]["markdown"] 817 + assert ( 818 + "Aligned on the launch plan and assigned owners." 819 + not in chunks[1]["markdown"] 820 + ) 821 + 784 822 def test_hidden_records_filtered_by_default(self, monkeypatch): 785 823 from think.activities import ( 786 824 append_activity_record,
+35
tests/test_formatters.py
··· 79 79 formatter = get_formatter("random/path/unknown.jsonl") 80 80 assert formatter is None 81 81 82 + def test_no_spans_formatter_registered(self): 83 + """Spans JSONL is no longer registered after the story refactor.""" 84 + from think.formatters import FORMATTERS, get_formatter 85 + 86 + assert "facets/*/spans/*.jsonl" not in FORMATTERS 87 + assert get_formatter("facets/work/spans/20260418.jsonl") is None 88 + 89 + def test_no_spans_writes(self): 90 + """No spans formatter or spans JSONL write targets remain in source dirs.""" 91 + repo_root = Path(__file__).resolve().parent.parent 92 + patterns = [ 93 + "format_spans", 94 + "talent.spans", 95 + "think.spans", 96 + ' / "spans" / ', 97 + "facets/*/spans", 98 + "spans/{day}.jsonl", 99 + ] 100 + 101 + hits: list[str] = [] 102 + for directory in ("think", "talent", "apps"): 103 + for path in (repo_root / directory).rglob("*"): 104 + if not path.is_file() or path.suffix not in {".py", ".md"}: 105 + continue 106 + for lineno, line in enumerate( 107 + path.read_text(encoding="utf-8").splitlines(), start=1 108 + ): 109 + for pattern in patterns: 110 + if pattern in line: 111 + hits.append( 112 + f"{path.relative_to(repo_root)}:{lineno}:{pattern}" 113 + ) 114 + 115 + assert hits == [] 116 + 82 117 83 118 class TestLoadJsonl: 84 119 """Tests for JSONL loading utility."""
-108
tests/test_spans_formatter.py
··· 1 - # SPDX-License-Identifier: AGPL-3.0-only 2 - # Copyright (c) 2026 sol pbc 3 - 4 - """Tests for spans JSONL formatting.""" 5 - 6 - from __future__ import annotations 7 - 8 - from datetime import datetime 9 - from pathlib import Path 10 - 11 - 12 - def test_format_spans_builds_chunks_and_metadata(): 13 - from think.spans import format_spans 14 - 15 - entries = [ 16 - { 17 - "span_id": "meeting_090000_300", 18 - "talent": "conversation", 19 - "facet": "work", 20 - "day": "20260101", 21 - "activity_type": "meeting", 22 - "start": "09:00:00", 23 - "end": "09:15:00", 24 - "body": "Aligned on the launch plan and confirmed owners.", 25 - "topics": ["launch", "owners", "planning"], 26 - "confidence": 0.93, 27 - }, 28 - { 29 - "span_id": "coding_130000_300", 30 - "talent": "work", 31 - "facet": "work", 32 - "day": "20260101", 33 - "activity_type": "coding", 34 - "start": "13:00:00", 35 - "end": "13:10:00", 36 - "body": "Implemented the migration and updated the tests.", 37 - "topics": ["migration", "tests", "backend"], 38 - "confidence": 0.81, 39 - }, 40 - ] 41 - 42 - file_path = Path("/tmp/journal/facets/work/spans/20260101.jsonl") 43 - chunks, meta = format_spans(entries, {"file_path": file_path}) 44 - 45 - assert len(chunks) == 2 46 - assert meta["header"] == "# Spans for 'work' facet on 2026-01-01" 47 - assert meta["indexer"] == {"agent": "span"} 48 - 49 - first = chunks[0] 50 - expected_ts = int(datetime.strptime("20260101", "%Y%m%d").timestamp() * 1000) 51 - expected_ts += 9 * 3600 * 1000 52 - assert first["timestamp"] == expected_ts 53 - assert first["source"] == entries[0] 54 - assert "### Meeting: meeting_090000_300" in first["markdown"] 55 - assert "**Time:** 09:00:00-09:15:00" in first["markdown"] 56 - assert "**Activity Type:** meeting" in first["markdown"] 57 - assert "**Topics:** launch, owners, planning" in first["markdown"] 58 - assert "**Confidence:** 0.93" in first["markdown"] 59 - assert "**Talent:** conversation" in first["markdown"] 60 - assert "Aligned on the launch plan" in first["markdown"] 61 - 62 - 63 - def test_format_spans_skips_invalid_rows_and_reports_error(): 64 - from think.spans import format_spans 65 - 66 - entries = [ 67 - { 68 - "span_id": "valid_1", 69 - "talent": "work", 70 - "facet": "work", 71 - "day": "20260101", 72 - "activity_type": "coding", 73 - "start": "08:00:00", 74 - "end": "08:05:00", 75 - "body": "Valid row.", 76 - "topics": ["alpha", "beta", "gamma"], 77 - "confidence": 0.5, 78 - }, 79 - { 80 - "span_id": "invalid_1", 81 - "talent": "work", 82 - "facet": "work", 83 - "day": "20260101", 84 - "activity_type": "coding", 85 - "start": "08:05:00", 86 - "end": "08:10:00", 87 - "topics": ["alpha", "beta", "gamma"], 88 - "confidence": 0.5, 89 - }, 90 - ] 91 - 92 - chunks, meta = format_spans( 93 - entries, {"file_path": Path("/tmp/journal/facets/work/spans/20260101.jsonl")} 94 - ) 95 - 96 - assert len(chunks) == 1 97 - assert "Skipped 1 entries missing required fields" in meta["error"] 98 - assert "20260101.jsonl" in meta["error"] 99 - assert meta["indexer"] == {"agent": "span"} 100 - 101 - 102 - def test_get_formatter_returns_spans_formatter(): 103 - from think.formatters import get_formatter 104 - 105 - formatter = get_formatter("facets/foo/spans/20260101.jsonl") 106 - 107 - assert formatter is not None 108 - assert formatter.__name__ == "format_spans"
-297
tests/test_spans_hook.py
··· 1 - # SPDX-License-Identifier: AGPL-3.0-only 2 - # Copyright (c) 2026 sol pbc 3 - 4 - """Tests for the storytelling spans post-hook.""" 5 - 6 - from __future__ import annotations 7 - 8 - import json 9 - from pathlib import Path 10 - 11 - from talent.spans import post_process 12 - 13 - 14 - def _activity( 15 - *, 16 - activity_id: str = "coding_100000_300", 17 - activity_type: str = "coding", 18 - segments: list[str] | None = None, 19 - ) -> dict: 20 - return { 21 - "id": activity_id, 22 - "activity": activity_type, 23 - "segments": segments or ["100000_300", "100500_300"], 24 - } 25 - 26 - 27 - def _context( 28 - *, 29 - name: str = "work", 30 - facet: str = "work", 31 - day: str = "20260418", 32 - activity: dict | None = None, 33 - ) -> dict: 34 - return { 35 - "name": name, 36 - "facet": facet, 37 - "day": day, 38 - "activity": activity or _activity(), 39 - } 40 - 41 - 42 - def _rows(tmp_path: Path, *, facet: str = "work", day: str = "20260418") -> list[dict]: 43 - path = tmp_path / "facets" / facet / "spans" / f"{day}.jsonl" 44 - if not path.exists(): 45 - return [] 46 - return [json.loads(line) for line in path.read_text(encoding="utf-8").splitlines()] 47 - 48 - 49 - def test_post_process_writes_all_fields_and_renders_coding_span(monkeypatch, tmp_path): 50 - from think.spans import format_spans 51 - 52 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 53 - 54 - result = json.dumps( 55 - { 56 - "body": "Implemented the retry path and verified the failing case.", 57 - "topics": ["Retry Logic", "Testing", "retry logic", " Testing "], 58 - "confidence": 0.82, 59 - } 60 - ) 61 - 62 - returned = post_process(result, _context()) 63 - 64 - assert returned == "" 65 - 66 - rows = _rows(tmp_path) 67 - assert len(rows) == 1 68 - assert rows[0] == { 69 - "span_id": "coding_100000_300", 70 - "talent": "work", 71 - "facet": "work", 72 - "day": "20260418", 73 - "activity_type": "coding", 74 - "start": "10:00:00", 75 - "end": "10:10:00", 76 - "body": "Implemented the retry path and verified the failing case.", 77 - "topics": ["retry logic", "testing"], 78 - "confidence": 0.82, 79 - } 80 - 81 - file_path = tmp_path / "facets" / "work" / "spans" / "20260418.jsonl" 82 - chunks, meta = format_spans(rows, {"file_path": file_path}) 83 - assert len(chunks) == 1 84 - assert chunks[0]["source"] == rows[0] 85 - assert "### Coding: coding_100000_300" in chunks[0]["markdown"] 86 - assert meta["indexer"] == {"agent": "span"} 87 - 88 - 89 - def test_post_process_writes_single_conversation_row_for_meeting(monkeypatch, tmp_path): 90 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 91 - 92 - result = json.dumps( 93 - { 94 - "body": 'Aligned on next steps and confirmed "ship it Friday".', 95 - "topics": ["planning", "alignment", "delivery"], 96 - "confidence": 0.91, 97 - } 98 - ) 99 - ctx = _context( 100 - name="conversation", 101 - activity=_activity( 102 - activity_id="meeting_090000_300", 103 - activity_type="meeting", 104 - segments=["090000_300", "091500_300"], 105 - ), 106 - ) 107 - 108 - returned = post_process(result, ctx) 109 - 110 - assert returned == "" 111 - rows = _rows(tmp_path) 112 - assert len(rows) == 1 113 - assert rows[0]["talent"] == "conversation" 114 - assert rows[0]["activity_type"] == "meeting" 115 - assert rows[0]["start"] == "09:00:00" 116 - assert rows[0]["end"] == "09:20:00" 117 - assert not ( 118 - tmp_path 119 - / "facets" 120 - / "work" 121 - / "activities" 122 - / "20260418" 123 - / "meeting_090000_300" 124 - / "conversation.json" 125 - ).exists() 126 - 127 - 128 - def test_post_process_clamps_confidence_and_logs(monkeypatch, tmp_path, caplog): 129 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 130 - 131 - returned = post_process( 132 - json.dumps( 133 - { 134 - "body": "Shipped the fix.", 135 - "topics": ["release", "shipping", "qa"], 136 - "confidence": 1.4, 137 - } 138 - ), 139 - _context(), 140 - ) 141 - 142 - assert returned == "" 143 - assert _rows(tmp_path)[0]["confidence"] == 1.0 144 - assert "clamped confidence" in caplog.text 145 - 146 - 147 - def test_post_process_rejects_bad_confidence(monkeypatch, tmp_path, caplog): 148 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 149 - 150 - returned = post_process( 151 - json.dumps( 152 - { 153 - "body": "Investigated the issue.", 154 - "topics": ["debugging", "logs", "triage"], 155 - "confidence": "high", 156 - } 157 - ), 158 - _context(), 159 - ) 160 - 161 - assert returned == "" 162 - assert _rows(tmp_path) == [] 163 - assert "invalid confidence" in caplog.text 164 - 165 - caplog.clear() 166 - returned = post_process( 167 - json.dumps( 168 - { 169 - "body": "Investigated the issue.", 170 - "topics": ["debugging", "logs", "triage"], 171 - "confidence": float("nan"), 172 - } 173 - ), 174 - _context(), 175 - ) 176 - 177 - assert returned == "" 178 - assert _rows(tmp_path) == [] 179 - assert "invalid confidence" in caplog.text 180 - 181 - 182 - def test_post_process_rejects_missing_or_empty_topics(monkeypatch, tmp_path, caplog): 183 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 184 - 185 - missing_topics = json.dumps({"body": "Worked through the task.", "confidence": 0.7}) 186 - returned = post_process(missing_topics, _context()) 187 - assert returned == "" 188 - assert _rows(tmp_path) == [] 189 - assert "missing topics" in caplog.text 190 - 191 - caplog.clear() 192 - empty_topics = json.dumps( 193 - {"body": "Worked through the task.", "topics": [" ", "\t"], "confidence": 0.7} 194 - ) 195 - returned = post_process(empty_topics, _context()) 196 - assert returned == "" 197 - assert _rows(tmp_path) == [] 198 - assert "empty topics" in caplog.text 199 - 200 - caplog.clear() 201 - invalid_topics = json.dumps( 202 - { 203 - "body": "Worked through the task.", 204 - "topics": ["valid", 7, "other"], 205 - "confidence": 0.7, 206 - } 207 - ) 208 - returned = post_process(invalid_topics, _context()) 209 - assert returned == "" 210 - assert _rows(tmp_path) == [] 211 - assert "invalid topics" in caplog.text 212 - 213 - 214 - def test_post_process_replaces_existing_row_by_span_and_talent(monkeypatch, tmp_path): 215 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 216 - ctx = _context() 217 - 218 - first = json.dumps( 219 - {"body": "First pass.", "topics": ["alpha", "beta", "gamma"], "confidence": 0.5} 220 - ) 221 - second = json.dumps( 222 - { 223 - "body": "Second pass.", 224 - "topics": ["delta", "epsilon", "zeta"], 225 - "confidence": 0.9, 226 - } 227 - ) 228 - 229 - assert post_process(first, ctx) == "" 230 - assert post_process(second, ctx) == "" 231 - 232 - rows = _rows(tmp_path) 233 - assert len(rows) == 1 234 - assert rows[0]["body"] == "Second pass." 235 - assert rows[0]["topics"] == ["delta", "epsilon", "zeta"] 236 - assert rows[0]["confidence"] == 0.9 237 - 238 - 239 - def test_post_process_appends_distinct_talent_rows(monkeypatch, tmp_path): 240 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 241 - activity = _activity(activity_id="event_130000_300", activity_type="event") 242 - 243 - event_ctx = _context(name="event", activity=activity) 244 - conversation_ctx = _context(name="conversation", activity=activity) 245 - 246 - assert ( 247 - post_process( 248 - json.dumps( 249 - { 250 - "body": "Wrapped the event.", 251 - "topics": ["planning", "venue", "timeline"], 252 - "confidence": 0.66, 253 - } 254 - ), 255 - event_ctx, 256 - ) 257 - == "" 258 - ) 259 - assert ( 260 - post_process( 261 - json.dumps( 262 - { 263 - "body": "Captured the side conversation.", 264 - "topics": ["alignment", "follow-up", "owners"], 265 - "confidence": 0.72, 266 - } 267 - ), 268 - conversation_ctx, 269 - ) 270 - == "" 271 - ) 272 - 273 - rows = _rows(tmp_path) 274 - assert len(rows) == 2 275 - assert {(row["span_id"], row["talent"]) for row in rows} == { 276 - ("event_130000_300", "event"), 277 - ("event_130000_300", "conversation"), 278 - } 279 - 280 - 281 - def test_post_process_handles_parse_failures_and_fenced_json( 282 - monkeypatch, tmp_path, caplog 283 - ): 284 - monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 285 - 286 - assert post_process("{not-json", _context()) == "" 287 - assert _rows(tmp_path) == [] 288 - assert "failed to parse JSON" in caplog.text 289 - 290 - caplog.clear() 291 - fenced = """```json 292 - {"body":"Recovered.","topics":["alpha","beta","gamma"],"confidence":0.6} 293 - ```""" 294 - assert post_process(fenced, _context()) == "" 295 - rows = _rows(tmp_path) 296 - assert len(rows) == 1 297 - assert rows[0]["body"] == "Recovered."
+376
tests/test_story_hook.py
··· 1 + # SPDX-License-Identifier: AGPL-3.0-only 2 + # Copyright (c) 2026 sol pbc 3 + 4 + import json 5 + from pathlib import Path 6 + 7 + 8 + def _write_detected_entities(tmp_path, facet: str, day: str, rows: list[dict]) -> None: 9 + path = tmp_path / "facets" / facet / "entities" / f"{day}.jsonl" 10 + path.parent.mkdir(parents=True, exist_ok=True) 11 + path.write_text( 12 + "".join(json.dumps(row, ensure_ascii=False) + "\n" for row in rows), 13 + encoding="utf-8", 14 + ) 15 + 16 + 17 + def _activity_record(record_id: str = "meeting_090000_300") -> dict: 18 + return { 19 + "id": record_id, 20 + "activity": "meeting", 21 + "description": "Team sync", 22 + "segments": ["090000_300"], 23 + "created_at": 1, 24 + } 25 + 26 + 27 + def _context( 28 + tmp_path: Path, 29 + *, 30 + facet: str = "work", 31 + day: str = "20260418", 32 + record_id: str = "meeting_090000_300", 33 + ) -> dict: 34 + return { 35 + "facet": facet, 36 + "day": day, 37 + "activity": {"id": record_id}, 38 + "output_path": str( 39 + tmp_path / "facets" / facet / "activities" / day / record_id / "story.json" 40 + ), 41 + } 42 + 43 + 44 + def _valid_result(**overrides) -> str: 45 + payload = { 46 + "body": "Aligned on launch work and assigned the follow-up.", 47 + "topics": ["launch", "follow-up"], 48 + "confidence": 0.82, 49 + "commitments": [ 50 + { 51 + "owner": "Mina", 52 + "action": "send the revised deck", 53 + "counterparty": "Ravi", 54 + "when": "Friday morning", 55 + "context": "Mina committed to send the deck before the next investor call.", 56 + } 57 + ], 58 + "closures": [ 59 + { 60 + "owner": "Ravi", 61 + "action": "intro email", 62 + "counterparty": "Mina", 63 + "resolution": "sent", 64 + "context": "Ravi confirmed the intro email already went out.", 65 + } 66 + ], 67 + "decisions": [ 68 + { 69 + "owner": "Team", 70 + "action": "move the launch review to Tuesday", 71 + "context": "The group aligned on Tuesday after checking calendars.", 72 + } 73 + ], 74 + } 75 + payload.update(overrides) 76 + return json.dumps(payload) 77 + 78 + 79 + def _load_record(facet: str, day: str): 80 + from think.activities import load_activity_records 81 + 82 + return load_activity_records(facet, day, include_hidden=True)[0] 83 + 84 + 85 + def test_story_hook_parses_and_writes(tmp_path, monkeypatch): 86 + from talent.story import post_process 87 + from think.activities import append_activity_record 88 + 89 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 90 + 91 + append_activity_record("work", "20260418", _activity_record()) 92 + 93 + returned = post_process( 94 + _valid_result(body=" Wrapped the launch prep and assigned follow-up. "), 95 + _context(tmp_path), 96 + ) 97 + 98 + record = _load_record("work", "20260418") 99 + assert returned == "" 100 + assert record["story"] == { 101 + "body": "Wrapped the launch prep and assigned follow-up.", 102 + "topics": ["launch", "follow-up"], 103 + "confidence": 0.82, 104 + } 105 + assert record["commitments"][0]["owner"] == "Mina" 106 + assert record["closures"][0]["resolution"] == "sent" 107 + assert record["decisions"][0]["owner"] == "Team" 108 + assert record["edits"][-1]["actor"] == "story" 109 + assert record["edits"][-1]["fields"] == [ 110 + "story", 111 + "commitments", 112 + "closures", 113 + "decisions", 114 + ] 115 + 116 + 117 + def test_story_hook_empty_arrays(tmp_path, monkeypatch): 118 + from talent.story import post_process 119 + from think.activities import append_activity_record 120 + 121 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 122 + append_activity_record("work", "20260418", _activity_record()) 123 + 124 + post_process( 125 + _valid_result(commitments=[], closures=[], decisions=[]), 126 + _context(tmp_path), 127 + ) 128 + 129 + record = _load_record("work", "20260418") 130 + assert ( 131 + record["story"]["body"] == "Aligned on launch work and assigned the follow-up." 132 + ) 133 + assert record["commitments"] == [] 134 + assert record["closures"] == [] 135 + assert record["decisions"] == [] 136 + 137 + 138 + def test_story_hook_bad_resolution_skipped(tmp_path, monkeypatch, caplog): 139 + from talent.story import post_process 140 + from think.activities import append_activity_record 141 + 142 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 143 + append_activity_record("work", "20260418", _activity_record()) 144 + 145 + post_process( 146 + _valid_result( 147 + closures=[ 148 + { 149 + "owner": "Ravi", 150 + "action": "intro email", 151 + "counterparty": "Mina", 152 + "resolution": "sent", 153 + "context": "The intro email went out.", 154 + }, 155 + { 156 + "owner": "Ravi", 157 + "action": "budget request", 158 + "counterparty": "Finance", 159 + "resolution": "approved", 160 + "context": "This resolution is invalid for the schema.", 161 + }, 162 + ] 163 + ), 164 + _context(tmp_path), 165 + ) 166 + 167 + record = _load_record("work", "20260418") 168 + assert [closure["action"] for closure in record["closures"]] == ["intro email"] 169 + assert "invalid resolution 'approved'" in caplog.text 170 + 171 + 172 + def test_story_hook_missing_required_field_skipped(tmp_path, monkeypatch, caplog): 173 + from talent.story import post_process 174 + from think.activities import append_activity_record 175 + 176 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 177 + append_activity_record("work", "20260418", _activity_record()) 178 + 179 + post_process( 180 + _valid_result( 181 + commitments=[ 182 + { 183 + "owner": "Mina", 184 + "action": "send the revised deck", 185 + "counterparty": "Ravi", 186 + "when": "Friday morning", 187 + "context": "Valid commitment.", 188 + }, 189 + { 190 + "owner": "Mina", 191 + "action": "book the room", 192 + "when": "tomorrow", 193 + "context": "Missing counterparty should skip.", 194 + }, 195 + ], 196 + closures=[ 197 + { 198 + "owner": "Ravi", 199 + "action": "intro email", 200 + "counterparty": "Mina", 201 + "resolution": "sent", 202 + "context": "Valid closure.", 203 + }, 204 + { 205 + "action": "parking pass", 206 + "counterparty": "Travel desk", 207 + "resolution": "done", 208 + "context": "Missing owner should skip.", 209 + }, 210 + ], 211 + decisions=[ 212 + { 213 + "owner": "Team", 214 + "action": "move the launch review to Tuesday", 215 + "context": "Valid decision.", 216 + }, 217 + { 218 + "owner": "Team", 219 + "context": "Missing action should skip.", 220 + }, 221 + ], 222 + ), 223 + _context(tmp_path), 224 + ) 225 + 226 + record = _load_record("work", "20260418") 227 + assert len(record["commitments"]) == 1 228 + assert len(record["closures"]) == 1 229 + assert len(record["decisions"]) == 1 230 + assert "missing required string field" in caplog.text 231 + 232 + 233 + def test_story_hook_resolves_entities(tmp_path, monkeypatch): 234 + from talent.story import post_process 235 + from think.activities import append_activity_record 236 + 237 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 238 + _write_detected_entities( 239 + tmp_path, 240 + "work", 241 + "20260418", 242 + [ 243 + {"id": "mina_lee", "type": "Person", "name": "Mina Lee", "aka": ["Mina"]}, 244 + {"id": "ravi_shah", "type": "Person", "name": "Ravi Shah", "aka": ["Ravi"]}, 245 + ], 246 + ) 247 + append_activity_record("work", "20260418", _activity_record()) 248 + 249 + post_process( 250 + _valid_result( 251 + commitments=[ 252 + { 253 + "owner": "Mina", 254 + "action": "send the revised deck", 255 + "counterparty": "Ravi", 256 + "when": "Friday morning", 257 + "context": "Valid commitment.", 258 + }, 259 + { 260 + "owner": "Unknown Owner", 261 + "action": "draft the note", 262 + "counterparty": "Unknown Counterparty", 263 + "when": "later", 264 + "context": "Unmatched names should stay null.", 265 + }, 266 + ], 267 + closures=[ 268 + { 269 + "owner": "Ravi", 270 + "action": "intro email", 271 + "counterparty": "Mina", 272 + "resolution": "sent", 273 + "context": "Valid closure.", 274 + } 275 + ], 276 + decisions=[ 277 + { 278 + "owner": "Mina Lee", 279 + "action": "move the launch review to Tuesday", 280 + "context": "Valid decision.", 281 + } 282 + ], 283 + ), 284 + _context(tmp_path), 285 + ) 286 + 287 + record = _load_record("work", "20260418") 288 + assert record["commitments"][0]["owner_entity_id"] == "mina_lee" 289 + assert record["commitments"][0]["counterparty_entity_id"] == "ravi_shah" 290 + assert record["commitments"][1]["owner_entity_id"] is None 291 + assert record["commitments"][1]["counterparty_entity_id"] is None 292 + assert record["closures"][0]["owner_entity_id"] == "ravi_shah" 293 + assert record["closures"][0]["counterparty_entity_id"] == "mina_lee" 294 + assert record["decisions"][0]["owner_entity_id"] == "mina_lee" 295 + assert record["commitments"][0]["owner"] == "Mina" 296 + assert record["closures"][0]["counterparty"] == "Mina" 297 + 298 + 299 + def test_story_hook_idempotent_rerun(tmp_path, monkeypatch): 300 + from talent.story import post_process 301 + from think.activities import append_activity_record 302 + 303 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 304 + append_activity_record("work", "20260418", _activity_record()) 305 + 306 + post_process(_valid_result(), _context(tmp_path)) 307 + first = _load_record("work", "20260418") 308 + assert len(first["edits"]) == 1 309 + 310 + post_process( 311 + _valid_result( 312 + body="Second pass with a clearer summary.", 313 + topics=["handoff"], 314 + commitments=[], 315 + closures=[], 316 + decisions=[ 317 + { 318 + "owner": "Lead", 319 + "action": "ship the patch on Wednesday", 320 + "context": "The second pass reached a more specific plan.", 321 + } 322 + ], 323 + ), 324 + _context(tmp_path), 325 + ) 326 + 327 + second = _load_record("work", "20260418") 328 + assert second["story"] == { 329 + "body": "Second pass with a clearer summary.", 330 + "topics": ["handoff"], 331 + "confidence": 0.82, 332 + } 333 + assert second["commitments"] == [] 334 + assert second["closures"] == [] 335 + assert second["decisions"] == [ 336 + { 337 + "owner": "Lead", 338 + "action": "ship the patch on Wednesday", 339 + "context": "The second pass reached a more specific plan.", 340 + "owner_entity_id": None, 341 + } 342 + ] 343 + assert len(second["edits"]) == 2 344 + 345 + 346 + def test_story_hook_missing_record_logs_and_returns(tmp_path, monkeypatch, caplog): 347 + from talent.story import post_process 348 + 349 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 350 + 351 + returned = post_process(_valid_result(), _context(tmp_path)) 352 + 353 + assert returned == "" 354 + assert "activity record not found" in caplog.text 355 + 356 + 357 + def test_story_hook_no_json_file_written(tmp_path, monkeypatch): 358 + from talent.story import post_process 359 + from think.activities import append_activity_record 360 + 361 + monkeypatch.setenv("_SOLSTONE_JOURNAL_OVERRIDE", str(tmp_path)) 362 + append_activity_record("work", "20260418", _activity_record()) 363 + 364 + output_path = ( 365 + tmp_path 366 + / "facets" 367 + / "work" 368 + / "activities" 369 + / "20260418" 370 + / "meeting_090000_300" 371 + / "story.json" 372 + ) 373 + returned = post_process(_valid_result(), _context(tmp_path)) 374 + 375 + assert returned == "" 376 + assert not output_path.exists()
+67 -1
think/activities.py
··· 803 803 804 804 805 805 def append_edit( 806 - record: dict[str, Any], *, actor: str, fields: list[str], note: str 806 + record: dict[str, Any], *, actor: str, fields: list[str], note: str | None 807 807 ) -> dict[str, Any]: 808 808 """Append an edit entry to an activity record and return the record.""" 809 809 normalized = _normalize_activity_record(record) ··· 1089 1089 return updated_record 1090 1090 1091 1091 1092 + def merge_story_fields( 1093 + facet: str, 1094 + day: str, 1095 + record_id: str, 1096 + *, 1097 + story: dict, 1098 + commitments: list[dict], 1099 + closures: list[dict], 1100 + decisions: list[dict], 1101 + actor: str, 1102 + note: str | None = None, 1103 + ) -> bool: 1104 + """Replace story-derived fields on an activity record and append one edit.""" 1105 + updated = False 1106 + path = _get_records_path(facet, day) 1107 + 1108 + def modify_fn(records: list[dict[str, Any]]) -> list[dict[str, Any]]: 1109 + nonlocal updated 1110 + new_records: list[dict[str, Any]] = [] 1111 + for record in records: 1112 + if record.get("id") == record_id: 1113 + merged = _normalize_activity_record(record) 1114 + merged["story"] = dict(story) 1115 + merged["commitments"] = [dict(entry) for entry in commitments] 1116 + merged["closures"] = [dict(entry) for entry in closures] 1117 + merged["decisions"] = [dict(entry) for entry in decisions] 1118 + merged = append_edit( 1119 + merged, 1120 + actor=actor, 1121 + fields=["story", "commitments", "closures", "decisions"], 1122 + note=note, 1123 + ) 1124 + new_records.append(merged) 1125 + updated = True 1126 + else: 1127 + new_records.append(record) 1128 + return new_records 1129 + 1130 + try: 1131 + locked_modify(path, modify_fn, create_if_missing=False) 1132 + except FileNotFoundError: 1133 + logger.warning("story hook: activity record not found: %s", record_id) 1134 + return False 1135 + 1136 + if not updated: 1137 + logger.warning("story hook: activity record not found: %s", record_id) 1138 + return updated 1139 + 1140 + 1092 1141 def _set_activity_hidden_state( 1093 1142 facet: str, 1094 1143 day: str, ··· 1301 1350 participants = _format_participation(record) 1302 1351 if participants: 1303 1352 lines.append(f"- Participation: {participants}") 1353 + 1354 + story = record.get("story") 1355 + if isinstance(story, dict): 1356 + body = story.get("body") 1357 + if isinstance(body, str) and body.strip(): 1358 + lines.append("") 1359 + lines.append(body.strip()) 1360 + 1361 + topics = story.get("topics") 1362 + if isinstance(topics, list): 1363 + topic_values = [ 1364 + topic.strip() 1365 + for topic in topics 1366 + if isinstance(topic, str) and topic.strip() 1367 + ] 1368 + if topic_values: 1369 + lines.append(f"Topics: {', '.join(topic_values)}") 1304 1370 1305 1371 if record.get("hidden", False): 1306 1372 lines.append("- Hidden: yes")
-1
think/formatters.py
··· 140 140 False, # Indexed via _index_entity_search_chunks (enriched with relationship data) 141 141 ), 142 142 "facets/*/events/*.jsonl": ("think.event_formatter", "format_events", True), 143 - "facets/*/spans/*.jsonl": ("think.spans", "format_spans", True), 144 143 "facets/*/activities/*.jsonl": ("think.activities", "format_activities", True), 145 144 "facets/*/todos/*.jsonl": ("apps.todos.todo", "format_todos", True), 146 145 "facets/*/logs/*.jsonl": ("think.facets", "format_logs", True),
-128
think/spans.py
··· 1 - # SPDX-License-Identifier: AGPL-3.0-only 2 - # Copyright (c) 2026 sol pbc 3 - 4 - """Formatting helpers for storytelling spans JSONL files.""" 5 - 6 - from __future__ import annotations 7 - 8 - import logging 9 - import re 10 - from datetime import datetime 11 - from pathlib import Path 12 - from typing import Any 13 - 14 - 15 - def _extract_spans_path_context(file_path: str | Path | None) -> tuple[str, str | None]: 16 - facet_name = "unknown" 17 - day_str: str | None = None 18 - 19 - if not file_path: 20 - return facet_name, day_str 21 - 22 - path = Path(file_path) 23 - path_str = str(path) 24 - facet_match = re.search(r"facets/([^/]+)/spans", path_str) 25 - if facet_match: 26 - facet_name = facet_match.group(1) 27 - 28 - if path.stem.isdigit() and len(path.stem) == 8: 29 - day_str = path.stem 30 - 31 - return facet_name, day_str 32 - 33 - 34 - def _start_seconds(start: str) -> int | None: 35 - try: 36 - parts = start.split(":") 37 - hours = int(parts[0]) 38 - minutes = int(parts[1]) if len(parts) > 1 else 0 39 - seconds = int(parts[2]) if len(parts) > 2 else 0 40 - except (IndexError, ValueError, TypeError): 41 - return None 42 - return hours * 3600 + minutes * 60 + seconds 43 - 44 - 45 - def format_spans( 46 - entries: list[dict], 47 - context: dict | None = None, 48 - ) -> tuple[list[dict], dict]: 49 - """Format storytelling span JSONL rows into markdown chunks.""" 50 - ctx = context or {} 51 - file_path = ctx.get("file_path") 52 - meta: dict[str, Any] = {"indexer": {"agent": "span"}} 53 - chunks: list[dict[str, Any]] = [] 54 - skipped_count = 0 55 - 56 - facet_name, day_str = _extract_spans_path_context(file_path) 57 - 58 - base_ts = 0 59 - if day_str: 60 - try: 61 - dt = datetime.strptime(day_str, "%Y%m%d") 62 - base_ts = int(dt.timestamp() * 1000) 63 - except ValueError: 64 - pass 65 - 66 - if day_str: 67 - formatted_day = f"{day_str[:4]}-{day_str[4:6]}-{day_str[6:8]}" 68 - meta["header"] = f"# Spans for '{facet_name}' facet on {formatted_day}" 69 - else: 70 - meta["header"] = f"# Spans for '{facet_name}' facet" 71 - 72 - for entry in entries: 73 - body = str(entry.get("body") or "").strip() 74 - start = str(entry.get("start") or "").strip() 75 - talent = str(entry.get("talent") or "").strip() 76 - activity_type = str(entry.get("activity_type") or "").strip() 77 - span_id = str(entry.get("span_id") or "").strip() 78 - 79 - if not all((body, start, talent, activity_type, span_id)): 80 - skipped_count += 1 81 - continue 82 - 83 - ts = base_ts 84 - start_seconds = _start_seconds(start) 85 - if start_seconds is not None and base_ts: 86 - ts = base_ts + start_seconds * 1000 87 - 88 - end = str(entry.get("end") or "").strip() 89 - time_display = start if not end else f"{start}-{end}" 90 - topics = entry.get("topics", []) 91 - topics_display = ( 92 - ", ".join(str(topic).strip() for topic in topics if str(topic).strip()) 93 - if isinstance(topics, list) 94 - else "" 95 - ) 96 - 97 - confidence = entry.get("confidence") 98 - if isinstance(confidence, (int, float)): 99 - confidence_display = f"{float(confidence):.2f}" 100 - else: 101 - confidence_display = str(confidence or "") 102 - 103 - lines = [f"### {activity_type.capitalize()}: {span_id}\n", ""] 104 - lines.append(f"**Time:** {time_display}") 105 - lines.append(f"**Activity Type:** {activity_type}") 106 - lines.append(f"**Topics:** {topics_display}") 107 - lines.append(f"**Confidence:** {confidence_display}") 108 - lines.append(f"**Talent:** {talent}") 109 - lines.append("") 110 - lines.append(body) 111 - lines.append("") 112 - 113 - chunks.append( 114 - { 115 - "timestamp": ts, 116 - "markdown": "\n".join(lines), 117 - "source": entry, 118 - } 119 - ) 120 - 121 - if skipped_count > 0: 122 - error_msg = f"Skipped {skipped_count} entries missing required fields" 123 - if file_path: 124 - error_msg += f" in {file_path}" 125 - meta["error"] = error_msg 126 - logging.info(error_msg) 127 - 128 - return chunks, meta