···11-personal data pipeline that digests my github and [tangled.org](https://tangled.org) activity, scores items by importance, and generates an LLM-curated briefing. self-hosted on a single hetzner VM (k3s) running prefect OSS.
11+personal data pipeline and intelligence layer. digests github, [tangled.org](https://tangled.org), and bluesky activity, scores items, generates LLM-curated briefings, and maintains phi's long-term memory. self-hosted on a single hetzner VM (k3s) running prefect OSS.
2233[hub](https://hub.waow.tech) · [grafana](https://prefect-metrics.waow.tech/d/executive-overview/executive-overview?orgId=1&from=now-6h&to=now&timezone=browser)
4455```
66github API ──┐
77- ├──► ingest ──► raw_github_issues ──┐
88-tangled PDS ─┘ (hourly) raw_tangled_items ──┤
99- ▼
1010- transform (dbt)
1111- [on ingest ✓]
1212- │
1313- ▼
1414- hub_action_items
1515- (top 200)
1616- │
1717- ┌──────────────┼──────────┐
1818- ▼ ▼ ▼
1919- brief /api/cards hub UI
2020- [on transform ✓]
2121- │
2222- ▼
2323- briefing.json
77+ ├──► ingest ──► raw_github_issues ──┐
88+tangled PDS ─┤ (hourly) raw_tangled_items ─┤
99+bluesky PDS ─┤ raw_likes + raw_liked_posts │
1010+phi (tpuf) ─┘ raw_phi_observations ─┘
1111+ │
1212+ transform (dbt) ◄──────┘
1313+ [on ingest ✓]
1414+ │
1515+ ┌───────────────┼───────────────┐
1616+ ▼ ▼ ▼
1717+ brief compact hub UI
1818+ [on transform ✓] [on transform ✓]
1919+ │ │
2020+ ▼ ▼
2121+ briefing.json TurboPuffer
2222+ (phi-users-*)
2323+2424+ morning ──► TurboPuffer + semble
2525+ (daily 8am CT)
2626+2727+ rebuild-atlas ──► Cloudflare Pages
2828+ (every 6h)
2429```
25302631see [docs/hub.md](docs/hub.md) for the full pipeline breakdown.
+65-22
docs/hub.md
···11# hub
2233-action item dashboard at [hub.waow.tech](https://hub.waow.tech). aggregates issues and PRs from github and [tangled.org](https://tangled.org), scores them by importance, and generates a daily briefing with an LLM.
33+action item dashboard and intelligence pipeline at [hub.waow.tech](https://hub.waow.tech). aggregates issues and PRs from github and [tangled.org](https://tangled.org), scores them by importance, generates a daily briefing with an LLM, and maintains phi's long-term memory.
4455## data sources
6677-a single `ingest` flow runs hourly on cron and fetches both data sources concurrently, then writes to DuckDB sequentially (same process = no single-writer lock contention). downstream flows (transform, brief) are event-driven via deployment triggers — they only run when upstream completes.
77+a single `ingest` flow runs hourly on cron and fetches all data sources concurrently, then writes to DuckDB sequentially (same process = no single-writer lock contention). downstream flows (transform, brief, compact) are event-driven via deployment triggers — they only run when upstream completes.
8899**github** — fetches notifications (issues + PRs) and open items authored by `zzstoatzz` via the search API. each issue is cached by repo+number for 24h. persists to `raw_github_issues`.
10101111**tangled.org** — fetches issues, PRs, and comments from the PDS (`pds.zzstoatzz.io`) via AT Protocol's `com.atproto.repo.listRecords`. no auth needed — records are public. targets repos: zat, zlay, plyr.fm, at-me, pollz, typeahead. persists to `raw_tangled_items`.
12121313+**bluesky likes** — fetches nate's recent likes from the PDS via `app.bsky.feed.like` records, then batch-resolves post content via `app.bsky.feed.getPosts` (25 per call). persists to `raw_likes` and `raw_liked_posts`.
1414+1515+**phi memory** — reads phi's TurboPuffer namespaces (`phi-users-*`) to snapshot observations and interactions into DuckDB for dbt processing. persists to `raw_phi_observations` and `raw_phi_interactions`.
1616+1317## pipeline
14181519```
1620github API ──┐
1717- ├──► ingest ──► raw_github_issues ──┐
1818-tangled PDS ─┘ (hourly) raw_tangled_items ──┤
1919- ▼
2020- transform (dbt)
2121- [on ingest ✓]
2222- │
2323- ▼
2424- hub_action_items
2525- (mart, top 200)
2626- │
2727- ┌──────────────┼──────────────┐
2828- ▼ ▼ ▼
2929- brief /api/cards.json +page.svelte
3030- [on transform ✓] (SSR loader)
3131- │
3232- ▼
3333- briefing.json ──► /api/briefing.json
2121+ ├──► ingest ──► raw_github_issues ──┐
2222+tangled PDS ─┤ (hourly) raw_tangled_items ─┤
2323+bluesky PDS ─┤ raw_likes + raw_liked_posts │
2424+phi (tpuf) ─┘ raw_phi_observations ─┘
2525+ │
2626+ transform (dbt) ◄──────┘
2727+ [on ingest ✓]
2828+ │
2929+ ┌───────────────┼───────────────┐
3030+ ▼ ▼ ▼
3131+ brief compact hub UI
3232+ [on transform ✓] [on transform ✓] /api/cards.json
3333+ │ │ +page.svelte
3434+ ▼ ▼
3535+ briefing.json TurboPuffer
3636+ /api/briefing (phi-users-*)
3737+```
3838+3939+two additional flows run independently:
4040+4141+```
4242+morning (daily 8am CT) ──► TurboPuffer (tag maintenance) + semble (curation)
4343+rebuild-atlas (every 6h) ──► Cloudflare Pages (leaflet-search atlas)
3444```
35453646## flows
37473848| deployment | trigger | what it does |
3949|---|---|---|
4040-| `diagnostics` | cron `*/5 * * * *` | prints system info — canary for worker health |
4141-| `ingest` | cron `0 * * * *` | fetches github notifications + authored items and tangled.org items concurrently, persists both to DuckDB sequentially |
4242-| `transform` | on `ingest` completion | dbt build: staging → scoring → mart. concurrency limit 1. runs under python 3.13 (dbt-core compat) |
5050+| `diagnostics` | cron `*/5 * * * *` (inactive) | prints system info — canary for worker health |
5151+| `ingest` | cron `0 * * * *` | fetches github, tangled.org, bluesky likes, and phi memory concurrently, resolves liked post content, persists all to DuckDB sequentially |
5252+| `transform` | on `ingest` completion | dbt build: staging → enrichment → mart. concurrency limit 1. runs under python 3.13 (dbt-core compat) |
4353| `brief` | on `transform` completion | loads top 200 scored items, sends to claude haiku 4.5 via pydantic-ai, writes `briefing.json`. cached by items content hash (skips LLM when data unchanged) |
5454+| `compact` | on `transform` completion | synthesizes per-user relationship summaries from phi's observations + interactions. extracts new observations from liked posts (LLM). writes summaries to TurboPuffer (`phi-users-*`). cached by observations content hash |
5555+| `morning` | cron `0 13 * * *` (8am CT) | tag maintenance (dedup, merge, relationship discovery) + agentic semble curation (promotes observations to public cosmik cards). runs 1h before phi's daily reflection |
5656+| `rebuild-atlas` | cron `0 */6 * * *` | rebuilds the leaflet-search 2D semantic map (UMAP + HDBSCAN on TurboPuffer embeddings), deploys to Cloudflare Pages |
4457| `cleanup` | cron `0 2 * * 0` | deletes old terminal flow runs (completed, failed, cancelled, crashed) older than 30 days |
45584659all flows run in the `kubernetes-pool` work pool. code is pulled at runtime via `git clone` from tangled.sh (github fallback). deps install via `uv run --with 'my-prefect-server @ git+...'`. deployments are registered by CI on every push to main.
···5366|---|---|---|
5467| `stg_github_issues` | view | dedup `raw_github_issues` by (repo, number), keep most recent fetch |
5568| `stg_tangled_items` | view | dedup `raw_tangled_items` by `at_uri`, exclude comments, keep most recent |
6969+| `stg_likes` | view | dedup `raw_likes` by `subject_uri` |
7070+| `stg_liked_posts` | view | dedup `raw_liked_posts` by `subject_uri`, join with like timestamps |
7171+| `stg_phi_observations` | view | dedup `raw_phi_observations` by observation id |
7272+| `stg_phi_interactions` | view | dedup `raw_phi_interactions` by interaction id |
5673| `int_github_issues_scored` | table | scoring: recency (30-day decay) x engagement (log scale) x label multiplier (bug=1.5) x contributor weight |
5774| `int_tangled_items_scored` | table | scoring: recency (30-day decay) x 0.5 (no engagement data) x contributor weight |
7575+| `int_phi_user_profiles` | table | aggregates per-user observation + interaction counts for compact flow |
5876| `hub_action_items` | mart | union of both scored tables, ordered by `importance_score` desc, limit 200 |
59776078contributor weights come from the `known_contributors` seed (zzstoatzz + zzstoatzz.io at 2.0x).
···70885. writes a structured `Briefing` (headline, 4 themed sections with accent colors, icons, priority) to `briefing.json`
71897290briefing model is defined in `packages/mps/src/mps/briefing.py`.
9191+9292+## compact
9393+9494+the `compact` flow fires in parallel with `brief` when `transform` completes. it maintains phi's long-term memory in TurboPuffer.
9595+9696+**relationship summaries** — for each user phi has interacted with (from `int_phi_user_profiles`), loads their observations and recent interactions from DuckDB, resolves their bluesky profile, and sends to claude haiku to synthesize a dense relationship summary. writes to `phi-users-{handle}` namespaces as `kind="summary"` records. cached by observations content hash — skips the LLM when data unchanged.
9797+9898+**liked post observations** — loads recently resolved liked posts from DuckDB (`raw_liked_posts`), groups by author, queries TurboPuffer for existing knowledge about each author, searches pub-search for their publications, then extracts 1-3 atomic observations per author via LLM. uses ADD/UPDATE/NOOP reconciliation to avoid duplicating what phi already knows. writes to `phi-users-{author_handle}` namespaces as `kind="observation"` records.
9999+100100+the result: observations from likes are indistinguishable from observations phi creates during conversations — same schema, same namespaces, same reconciliation logic.
101101+102102+## morning
103103+104104+the `morning` flow runs daily at 8am CT (1h before phi's reflection). it has two halves:
105105+106106+**tag maintenance (phases 1-3)** — mechanical operations on TurboPuffer:
107107+1. collect all tags across all `phi-users-*` namespaces
108108+2. embed tags and identify near-duplicates via LLM (e.g., "atproto" / "at protocol" / "AT Protocol")
109109+3. apply merges: rewrite tags in TurboPuffer, discover inter-tag relationships, store in `phi-tag-relationships` namespace
110110+111111+**agentic curation (phase 4)** — assembles phi's recent observations, episodic knowledge, existing cosmik cards, and tag relationships into a context bundle. sends to an LLM that decides what (if anything) deserves promotion to semble as a public cosmik card. executes the plan: creates `network.cosmik.card` records on phi's PDS, which semble's firehose subscriber auto-indexes for semantic search.
112112+113113+## atlas
114114+115115+the `rebuild-atlas` flow runs every 6h. it clones leaflet-search, runs the build-atlas script (PCA → UMAP → HDBSCAN on TurboPuffer document embeddings), produces `atlas.json`, and deploys the static site to Cloudflare Pages via wrangler.
7311674117## frontend
75118