hub#
action item dashboard and intelligence pipeline at hub.waow.tech. aggregates issues and PRs from github and tangled.org, scores them by importance, generates a daily briefing with an LLM, and maintains phi's long-term memory.
data sources#
a single ingest flow runs hourly on cron and fetches all data sources concurrently, then writes to DuckDB sequentially (same process = no single-writer lock contention). downstream flows (transform, brief, compact) are event-driven via deployment triggers — they only run when upstream completes.
github — fetches notifications (issues + PRs) and open items authored by zzstoatzz via the search API. each issue is cached by repo+number for 24h. persists to raw_github_issues.
tangled.org — fetches issues, PRs, and comments from the PDS (pds.zzstoatzz.io) via AT Protocol's com.atproto.repo.listRecords. no auth needed — records are public. targets repos: zat, zlay, plyr.fm, at-me, pollz, typeahead. persists to raw_tangled_items.
bluesky likes — fetches nate's recent likes from the PDS via app.bsky.feed.like records, then batch-resolves post content via app.bsky.feed.getPosts (25 per call). persists to raw_likes and raw_liked_posts.
phi memory — reads phi's TurboPuffer namespaces (phi-users-*) to snapshot observations and interactions into DuckDB for dbt processing. persists to raw_phi_observations and raw_phi_interactions.
pipeline#
data sources
─────────────────────────────────────────────
github API ──┐
tangled PDS ──┤
bluesky likes ──┼──► ingest (hourly) ──► DuckDB
phi memory (tpuf) ──┘
│
▼
transform (dbt)
[on ingest ✓]
│
┌─────────────────────────┼──────────┐
▼ ▼ ▼
brief compact hub UI
[on transform ✓] [on transform ✓]
│ │
▼ ▼
briefing.json TurboPuffer
/api/briefing (phi-users-*)
standalone flows
─────────────────────────────────────────────
morning (daily 8am CT) ──► TurboPuffer + semble
rebuild-atlas (every 6h) ──► Cloudflare Pages
flows#
| deployment | trigger | what it does |
|---|---|---|
diagnostics |
cron */5 * * * * (inactive) |
prints system info — canary for worker health |
ingest |
cron 0 * * * * |
fetches github, tangled.org, bluesky likes, and phi memory concurrently, resolves liked post content, persists all to DuckDB sequentially |
transform |
on ingest completion |
dbt build: staging → enrichment → mart. concurrency limit 1. runs under python 3.13 (dbt-core compat) |
brief |
on transform completion |
loads top 200 scored items, sends to claude haiku 4.5 via pydantic-ai, writes briefing.json. cached by items content hash (skips LLM when data unchanged) |
compact |
on transform completion |
synthesizes per-user relationship summaries from phi's observations + interactions. extracts new observations from liked posts (LLM). writes summaries to TurboPuffer (phi-users-*). cached by observations content hash |
morning |
cron 0 13 * * * (8am CT) |
tag maintenance (dedup, merge, relationship discovery) + agentic semble curation (promotes observations to public cosmik cards). runs 1h before phi's daily reflection |
rebuild-atlas |
cron 0 */6 * * * |
rebuilds the leaflet-search 2D semantic map (UMAP + HDBSCAN on TurboPuffer embeddings), deploys to Cloudflare Pages |
cleanup |
cron 0 2 * * 0 |
deletes old terminal flow runs (completed, failed, cancelled, crashed) older than 30 days |
all flows run in the kubernetes-pool work pool. code is pulled at runtime via git clone from tangled.sh (github fallback). deps install via uv run --with 'my-prefect-server @ git+...'. deployments are registered by CI on every push to main.
dbt layer#
project lives in analytics/. DuckDB database at /var/lib/prefect-analytics/analytics.duckdb.
| model | type | description |
|---|---|---|
stg_github_issues |
view | dedup raw_github_issues by (repo, number), keep most recent fetch |
stg_tangled_items |
view | dedup raw_tangled_items by at_uri, exclude comments, keep most recent |
stg_likes |
view | dedup raw_likes by subject_uri |
stg_liked_posts |
view | dedup raw_liked_posts by subject_uri, join with like timestamps |
stg_phi_observations |
view | dedup raw_phi_observations by observation id |
stg_phi_interactions |
view | dedup raw_phi_interactions by interaction id |
int_github_issues_scored |
table | scoring: recency (30-day decay) x engagement (log scale) x label multiplier (bug=1.5) x contributor weight |
int_tangled_items_scored |
table | scoring: recency (30-day decay) x 0.5 (no engagement data) x contributor weight |
int_phi_user_profiles |
table | aggregates per-user observation + interaction counts for compact flow |
hub_action_items |
mart | union of both scored tables, ordered by importance_score desc, limit 200 |
contributor weights come from the known_contributors seed (zzstoatzz + zzstoatzz.io at 2.0x).
curation#
the brief flow fires automatically when transform completes (via deployment trigger). it:
- snapshots DuckDB to
/tmp(bypass exclusive flock) - loads top 200 items from
hub_action_items - checks cache — the
generate_briefingtask uses aByItemsContentcache policy that hashes the items text + system prompt. if the data hasn't changed since the last run, the cached briefing is returned without calling the LLM (4h expiration) - on cache miss, sends items to claude haiku 4.5 with a system prompt that groups by actionability
- writes a structured
Briefing(headline, 4 themed sections with accent colors, icons, priority) tobriefing.json
briefing model is defined in packages/mps/src/mps/briefing.py.
compact#
the compact flow fires in parallel with brief when transform completes. it maintains phi's long-term memory in TurboPuffer.
relationship summaries — for each user phi has interacted with (from int_phi_user_profiles), loads their observations and recent interactions from DuckDB, resolves their bluesky profile, and sends to claude haiku to synthesize a dense relationship summary. writes to phi-users-{handle} namespaces as kind="summary" records. cached by observations content hash — skips the LLM when data unchanged.
liked post observations — loads recently resolved liked posts from DuckDB (raw_liked_posts), groups by author, queries TurboPuffer for existing knowledge about each author, searches pub-search for their publications, then extracts 1-3 atomic observations per author via LLM. uses ADD/UPDATE/NOOP reconciliation to avoid duplicating what phi already knows. writes to phi-users-{author_handle} namespaces as kind="observation" records.
the result: observations from likes are indistinguishable from observations phi creates during conversations — same schema, same namespaces, same reconciliation logic.
morning#
the morning flow runs daily at 8am CT (1h before phi's reflection). it has two halves:
tag maintenance (phases 1-3) — mechanical operations on TurboPuffer:
- collect all tags across all
phi-users-*namespaces - embed tags and identify near-duplicates via LLM (e.g., "atproto" / "at protocol" / "AT Protocol")
- apply merges: rewrite tags in TurboPuffer, discover inter-tag relationships, store in
phi-tag-relationshipsnamespace
agentic curation (phase 4) — assembles phi's recent observations, episodic knowledge, existing cosmik cards, and tag relationships into a context bundle. sends to an LLM that decides what (if anything) deserves promotion to semble as a public cosmik card. executes the plan: creates network.cosmik.card records on phi's PDS, which semble's firehose subscriber auto-indexes for semantic search.
atlas#
the rebuild-atlas flow runs every 6h. it clones leaflet-search, runs the build-atlas script (PCA → UMAP → HDBSCAN on TurboPuffer document embeddings), produces atlas.json, and deploys the static site to Cloudflare Pages via wrangler.
frontend#
SvelteKit app in web/. bun runtime, node adapter, port 3000.
routes#
| route | description |
|---|---|
/ |
SSR page — loads stats, cards, and briefing in parallel |
/api/cards.json |
JSON array of scored action items from hub_action_items |
/api/briefing.json |
curated briefing object from briefing.json on disk |
/api/stats.json |
aggregate counts from raw_github_issues (tracked, open, with_reactions, repos) |
the frontend reads DuckDB through a snapshot copy (/tmp/hub_analytics_snapshot.duckdb) that refreshes when the source file's mtime changes. all queries are read-only.
deployment#
just web # build + push + deploy
this runs: docker build (bun image) → push to atcr.io/zzstoatzz.io/hub:latest → apply k8s manifests → rolling restart. the hub pod mounts the analytics PVC at /analytics and reads DUCKDB_PATH=/analytics/analytics.duckdb.