personal memory agent
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat(write-path): guard megameetings and unknown facets at event write time

Add a facet-registry gate in hooks.py that consults get_facets() once per write call and skips events with unknown or non-canonical facets with a WARNING log.
Add a megameeting guard in occurrence.py that rejects meeting events with more than 25 participants with a WARNING log.
Tighten the meetings.md and timeline.md prompts so orgs, projects, tools, and topics are not lifted into structured participants.
Document the existing muted-facet silent-disable behavior in docs/JOURNAL.md for agents such as entity_observer.
Add 8 focused tests in tests/test_output_hooks.py and prepend the worktree root in tests/conftest.py so pytest exercises local code.
Defensive guards against pollution patterns surfaced in the CPO journal-data-quality investigation.

Co-Authored-By: OpenAI Codex <codex@openai.com>

+321 -10
+1
docs/JOURNAL.md
··· 430 430 - `color` – hex color code for the facet card background in the web UI 431 431 - `emoji` – emoji icon displayed in the top-left of the facet card 432 432 - `muted` – boolean flag to mute/hide the facet from views (default: false) 433 + - Muted facets are filtered out by `get_enabled_facets()`, so agents that iterate enabled facets, such as `entity_observer`, skip them silently. 433 434 434 435 ### Facet Entities 435 436
+4 -4
talent/meetings.md
··· 32 32 Prioritize the audio transcript as the primary source of truth: 33 33 34 34 1. **Participants** 35 - - Analyze the audio transcript for names of individuals speaking or referred to by name. 36 - - Use screen activity to supplement: meeting software participant lists, chat names, etc. 37 - - Consolidate names that overlap due to transcription errors. 38 - - Include $name as a default participant. 35 + - Use the meeting software participant panel (Zoom/Meet/Teams UI participant list) as the primary source of truth. 36 + - If no reliable participant panel is visible, fall back to people who are visibly presenting or audibly speaking in the meeting. 37 + - Exclude organizations, companies, products, projects, tools, topics, podcast guests, and people only mentioned in conversation but not present. 38 + - Consolidate transcription variants of the same person and include $name as the default participant when unknown. 39 39 40 40 2. **Topics Discussed** 41 41 - Synthesize the conversation into a concise summary of key subjects.
+13
talent/occurrence.py
··· 83 83 logging.error("Extraction did not return array") 84 84 return None 85 85 86 + filtered_events = [] 87 + for event in events: 88 + if event.get("type") == "meeting" and len(event.get("participants", [])) > 25: 89 + logging.warning( 90 + "Dropping megameeting occurrence: title=%r agent=%s participants=%d", 91 + event.get("title", ""), 92 + name, 93 + len(event.get("participants", [])), 94 + ) 95 + continue 96 + filtered_events.append(event) 97 + events = filtered_events 98 + 86 99 # Write to facet JSONL files 87 100 source_output = compute_output_source(context) 88 101 output_name = get_output_name(name)
+3 -3
talent/timeline.md
··· 35 35 - **Primary Activity**: What was being actively worked on 36 36 - **Tools & Applications**: All software/websites being used 37 37 - **Files & Documents**: Specific files opened, edited, or referenced 38 - - **People**: Anyone interacted with via meeting, chat, or email 38 + - **Collaborator Context**: Mention collaborators in prose only when materially relevant to the activity; do not emit standalone lists of names 39 39 - **Content Details**: Topics discussed, code written, problems solved 40 40 - **Parallel Activities**: Background meetings, music, notifications 41 41 - **Physical Context**: Any mentions of location, movement, breaks 42 42 43 43 ### Level of Detail 44 - - Include specific project names, file names, and people's names 44 + - Include specific project names and file names; mention collaborators in prose only when materially relevant, never as standalone name lists 45 45 - Describe the work or activity (what code was written, what was discussed) 46 46 - Note transitions between activities, even small ones 47 47 - Capture the substance of meetings and conversations ··· 76 76 77 77 5. **Rich Context** 78 78 - Include enough detail that someone could understand what was accomplished 79 - - Note tools, files, people, and resources involved 79 + - Note tools, files, collaborators, and resources involved in prose when materially relevant; do not emit standalone lists of names 80 80 - Capture the substance of work, not just categories 81 81 82 82 Remember: The goal is to create a detailed historical record of the day that captures not just what activities occurred, but the rich detail of how the work actually unfolded. This timeline should serve as a comprehensive reference that could help reconstruct any part of the day's work.
+4
tests/conftest.py
··· 10 10 import numpy as np 11 11 import pytest 12 12 13 + ROOT = Path(__file__).resolve().parents[1] 14 + if str(ROOT) not in sys.path: 15 + sys.path.insert(0, str(ROOT)) 16 + 13 17 from tests._baseline_harness import copytree_tracked 14 18 from think.entities.journal import clear_journal_entity_cache 15 19 from think.entities.loading import clear_entity_loading_cache
+283
tests/test_output_hooks.py
··· 12 12 import importlib 13 13 import io 14 14 import json 15 + import logging 15 16 import os 16 17 from pathlib import Path 17 18 19 + import talent.occurrence as occurrence 18 20 from tests.conftest import copytree_tracked 19 21 from think.agents import _apply_template_vars 22 + from think.hooks import write_events_jsonl 20 23 from think.talent import load_post_hook, load_pre_hook 21 24 from think.utils import day_path 22 25 ··· 301 304 finish_events = [e for e in events if e["event"] == "finish"] 302 305 assert len(finish_events) == 1 303 306 assert finish_events[0]["result"] == MOCK_RESULT["text"] 307 + 308 + 309 + def test_occurrence_post_process_drops_meeting_with_26_participants( 310 + monkeypatch, caplog 311 + ): 312 + """Test megameeting occurrences are dropped before writing.""" 313 + captured = {} 314 + participants = [f"Person {i}" for i in range(26)] 315 + 316 + def mock_generate(**kwargs): 317 + return json.dumps( 318 + [ 319 + { 320 + "type": "meeting", 321 + "title": "All Hands", 322 + "summary": "Large meeting", 323 + "work": True, 324 + "participants": participants, 325 + "facet": "capulet", 326 + "details": "", 327 + } 328 + ] 329 + ) 330 + 331 + def mock_write_events_jsonl(**kwargs): 332 + captured.update(kwargs) 333 + return [] 334 + 335 + monkeypatch.setattr(occurrence, "generate", mock_generate) 336 + monkeypatch.setattr( 337 + occurrence, "compute_output_source", lambda context: "source.md" 338 + ) 339 + monkeypatch.setattr(occurrence, "write_events_jsonl", mock_write_events_jsonl) 340 + caplog.set_level(logging.WARNING) 341 + 342 + result = occurrence.post_process( 343 + "x" * 60, 344 + { 345 + "name": "meetings", 346 + "day": "20240101", 347 + "meta": {}, 348 + "output_path": "ignored", 349 + }, 350 + ) 351 + 352 + assert result is None 353 + assert captured["events"] == [] 354 + assert "Dropping megameeting occurrence" in caplog.text 355 + assert "All Hands" in caplog.text 356 + assert "meetings" in caplog.text 357 + assert "26" in caplog.text 358 + 359 + 360 + def test_occurrence_post_process_keeps_meeting_with_25_participants( 361 + monkeypatch, caplog 362 + ): 363 + """Test meetings at the participant threshold are preserved.""" 364 + captured = {} 365 + event = { 366 + "type": "meeting", 367 + "title": "Planning", 368 + "summary": "Planning meeting", 369 + "work": True, 370 + "participants": [f"Person {i}" for i in range(25)], 371 + "facet": "capulet", 372 + "details": "", 373 + } 374 + 375 + monkeypatch.setattr(occurrence, "generate", lambda **kwargs: json.dumps([event])) 376 + monkeypatch.setattr( 377 + occurrence, "compute_output_source", lambda context: "source.md" 378 + ) 379 + monkeypatch.setattr( 380 + occurrence, 381 + "write_events_jsonl", 382 + lambda **kwargs: captured.update(kwargs) or [], 383 + ) 384 + caplog.set_level(logging.WARNING) 385 + 386 + result = occurrence.post_process( 387 + "x" * 60, 388 + { 389 + "name": "meetings", 390 + "day": "20240101", 391 + "meta": {}, 392 + "output_path": "ignored", 393 + }, 394 + ) 395 + 396 + assert result is None 397 + assert captured["events"] == [event] 398 + assert "Dropping megameeting occurrence" not in caplog.text 399 + 400 + 401 + def test_occurrence_post_process_keeps_non_meeting_with_large_participants_list( 402 + monkeypatch, caplog 403 + ): 404 + """Test non-meeting events are not filtered by participant count.""" 405 + captured = {} 406 + event = { 407 + "type": "message", 408 + "title": "Inbox review", 409 + "summary": "Reviewed messages", 410 + "work": True, 411 + "participants": [f"Person {i}" for i in range(100)], 412 + "facet": "capulet", 413 + "details": "", 414 + } 415 + 416 + monkeypatch.setattr(occurrence, "generate", lambda **kwargs: json.dumps([event])) 417 + monkeypatch.setattr( 418 + occurrence, "compute_output_source", lambda context: "source.md" 419 + ) 420 + monkeypatch.setattr( 421 + occurrence, 422 + "write_events_jsonl", 423 + lambda **kwargs: captured.update(kwargs) or [], 424 + ) 425 + caplog.set_level(logging.WARNING) 426 + 427 + result = occurrence.post_process( 428 + "x" * 60, 429 + { 430 + "name": "timeline", 431 + "day": "20240101", 432 + "meta": {}, 433 + "output_path": "ignored", 434 + }, 435 + ) 436 + 437 + assert result is None 438 + assert captured["events"] == [event] 439 + assert "Dropping megameeting occurrence" not in caplog.text 440 + 441 + 442 + def test_write_events_jsonl_skips_trailing_comma_facet(journal_copy, caplog): 443 + """Test invalid trailing punctuation facets are rejected.""" 444 + caplog.set_level(logging.WARNING) 445 + 446 + written = write_events_jsonl( 447 + events=[ 448 + { 449 + "type": "message", 450 + "title": "Chat", 451 + "summary": "Sent a chat", 452 + "work": True, 453 + "participants": [], 454 + "facet": "kognova,", 455 + "details": "", 456 + } 457 + ], 458 + agent="timeline", 459 + occurred=True, 460 + source_output="20240101/agents/timeline.md", 461 + capture_day="20240101", 462 + ) 463 + 464 + assert written == [] 465 + assert "Skipping event with unknown facet" in caplog.text 466 + assert "kognova," in caplog.text 467 + assert "timeline" in caplog.text 468 + assert "20240101/agents/timeline.md" in caplog.text 469 + assert not (journal_copy / "facets" / "kognova," / "events").exists() 470 + 471 + 472 + def test_write_events_jsonl_skips_unknown_person_facet(journal_copy, caplog): 473 + """Test unknown person-like facet names are rejected.""" 474 + caplog.set_level(logging.WARNING) 475 + 476 + written = write_events_jsonl( 477 + events=[ 478 + { 479 + "type": "message", 480 + "title": "Chat", 481 + "summary": "Sent a chat", 482 + "work": True, 483 + "participants": [], 484 + "facet": "Person", 485 + "details": "", 486 + } 487 + ], 488 + agent="timeline", 489 + occurred=True, 490 + source_output="20240101/agents/timeline.md", 491 + capture_day="20240101", 492 + ) 493 + 494 + assert written == [] 495 + assert "Skipping event with unknown facet" in caplog.text 496 + assert "Person" in caplog.text 497 + assert not (journal_copy / "facets" / "Person" / "events").exists() 498 + 499 + 500 + def test_write_events_jsonl_skips_mixed_case_known_facet(journal_copy, caplog): 501 + """Test mixed-case facet values are rejected when not exact registry matches.""" 502 + caplog.set_level(logging.WARNING) 503 + 504 + written = write_events_jsonl( 505 + events=[ 506 + { 507 + "type": "message", 508 + "title": "Chat", 509 + "summary": "Sent a chat", 510 + "work": True, 511 + "participants": [], 512 + "facet": "Capulet", 513 + "details": "", 514 + } 515 + ], 516 + agent="timeline", 517 + occurred=True, 518 + source_output="20240101/agents/timeline.md", 519 + capture_day="20240101", 520 + ) 521 + 522 + assert written == [] 523 + assert "Skipping event with unknown facet" in caplog.text 524 + assert "Capulet" in caplog.text 525 + assert not (journal_copy / "facets" / "Capulet" / "events").exists() 526 + 527 + 528 + def test_write_events_jsonl_writes_valid_registry_facet(journal_copy): 529 + """Test valid registry facets are written normally.""" 530 + written = write_events_jsonl( 531 + events=[ 532 + { 533 + "type": "message", 534 + "title": "Chat", 535 + "summary": "Sent a chat", 536 + "work": True, 537 + "participants": [], 538 + "facet": "capulet", 539 + "details": "", 540 + } 541 + ], 542 + agent="timeline", 543 + occurred=True, 544 + source_output="20240101/agents/timeline.md", 545 + capture_day="20240101", 546 + ) 547 + 548 + jsonl_path = journal_copy / "facets" / "capulet" / "events" / "20240101.jsonl" 549 + 550 + assert written == [jsonl_path] 551 + rows = [ 552 + json.loads(line) 553 + for line in jsonl_path.read_text(encoding="utf-8").splitlines() 554 + if line 555 + ] 556 + assert len(rows) == 1 557 + assert rows[0]["facet"] == "capulet" 558 + assert rows[0]["agent"] == "timeline" 559 + assert rows[0]["source"] == "20240101/agents/timeline.md" 560 + 561 + 562 + def test_write_events_jsonl_skips_empty_facet(journal_copy, caplog): 563 + """Test missing facets are skipped and logged.""" 564 + caplog.set_level(logging.WARNING) 565 + 566 + written = write_events_jsonl( 567 + events=[ 568 + { 569 + "type": "message", 570 + "title": "Chat", 571 + "summary": "Sent a chat", 572 + "work": True, 573 + "participants": [], 574 + "details": "", 575 + } 576 + ], 577 + agent="timeline", 578 + occurred=True, 579 + source_output="20240101/agents/timeline.md", 580 + capture_day="20240101", 581 + ) 582 + 583 + assert written == [] 584 + assert "Skipping event with unknown facet" in caplog.text 585 + assert "timeline" in caplog.text 586 + assert not (journal_copy / "facets" / "" / "events").exists() 304 587 305 588 306 589 # =============================================================================
+13 -3
think/hooks.py
··· 12 12 import os 13 13 from pathlib import Path 14 14 15 + from think.facets import get_facets 16 + 15 17 # Minimum content length for meaningful event extraction 16 18 MIN_EXTRACTION_CHARS = 50 17 19 ··· 122 124 from think.utils import get_journal 123 125 124 126 journal = get_journal() 127 + known_facets = set(get_facets().keys()) 125 128 126 129 # Group events by (facet, event_day) 127 130 grouped: dict[tuple[str, str], list[dict]] = {} 128 131 129 132 for event in events: 130 - facet = event.get("facet", "") 131 - if not facet: 132 - continue # Skip events without facet 133 + raw_facet = event.get("facet", "") 134 + facet = raw_facet.strip().lower() 135 + if facet not in known_facets or raw_facet != facet: 136 + logging.warning( 137 + "Skipping event with unknown facet: facet=%r agent=%s source=%s", 138 + raw_facet, 139 + agent, 140 + source_output, 141 + ) 142 + continue 133 143 134 144 # Determine the event day 135 145 if occurred: