hub#
action item dashboard at hub.waow.tech. aggregates issues and PRs from github and tangled.org, scores them by importance, and generates a daily briefing with an LLM.
data sources#
a single ingest flow runs hourly on cron and fetches both data sources concurrently, then writes to DuckDB sequentially (same process = no single-writer lock contention). downstream flows (transform, brief) are event-driven via deployment triggers — they only run when upstream completes.
github — fetches notifications (issues + PRs) and open items authored by zzstoatzz via the search API. each issue is cached by repo+number for 24h. persists to raw_github_issues.
tangled.org — fetches issues, PRs, and comments from the PDS (pds.zzstoatzz.io) via AT Protocol's com.atproto.repo.listRecords. no auth needed — records are public. targets repos: zat, zlay, plyr.fm, at-me, pollz, typeahead. persists to raw_tangled_items.
pipeline#
github API ──┐
├──► ingest ──► raw_github_issues ──┐
tangled PDS ─┘ (hourly) raw_tangled_items ──┤
▼
transform (dbt)
[on ingest ✓]
│
▼
hub_action_items
(mart, top 200)
│
┌──────────────┼──────────────┐
▼ ▼ ▼
brief /api/cards.json +page.svelte
[on transform ✓] (SSR loader)
│
▼
briefing.json ──► /api/briefing.json
flows#
| deployment | trigger | what it does |
|---|---|---|
diagnostics |
cron */5 * * * * |
prints system info — canary for worker health |
ingest |
cron 0 * * * * |
fetches github notifications + authored items and tangled.org items concurrently, persists both to DuckDB sequentially |
transform |
on ingest completion |
dbt build: staging → scoring → mart. concurrency limit 1. runs under python 3.13 (dbt-core compat) |
brief |
on transform completion |
loads top 200 scored items, sends to claude haiku 4.5 via pydantic-ai, writes briefing.json. cached by items content hash (skips LLM when data unchanged) |
cleanup |
cron 0 2 * * 0 |
deletes old terminal flow runs (completed, failed, cancelled, crashed) older than 30 days |
all flows run in the kubernetes-pool work pool. code is pulled at runtime via git clone from tangled.sh (github fallback). deps install via uv run --with 'my-prefect-server @ git+...'. deployments are registered by CI on every push to main.
dbt layer#
project lives in analytics/. DuckDB database at /var/lib/prefect-analytics/analytics.duckdb.
| model | type | description |
|---|---|---|
stg_github_issues |
view | dedup raw_github_issues by (repo, number), keep most recent fetch |
stg_tangled_items |
view | dedup raw_tangled_items by at_uri, exclude comments, keep most recent |
int_github_issues_scored |
table | scoring: recency (30-day decay) x engagement (log scale) x label multiplier (bug=1.5) x contributor weight |
int_tangled_items_scored |
table | scoring: recency (30-day decay) x 0.5 (no engagement data) x contributor weight |
hub_action_items |
mart | union of both scored tables, ordered by importance_score desc, limit 200 |
contributor weights come from the known_contributors seed (zzstoatzz + zzstoatzz.io at 2.0x).
curation#
the brief flow fires automatically when transform completes (via deployment trigger). it:
- snapshots DuckDB to
/tmp(bypass exclusive flock) - loads top 200 items from
hub_action_items - checks cache — the
generate_briefingtask uses aByItemsContentcache policy that hashes the items text + system prompt. if the data hasn't changed since the last run, the cached briefing is returned without calling the LLM (4h expiration) - on cache miss, sends items to claude haiku 4.5 with a system prompt that groups by actionability
- writes a structured
Briefing(headline, 4 themed sections with accent colors, icons, priority) tobriefing.json
briefing model is defined in packages/mps/src/mps/briefing.py.
frontend#
SvelteKit app in web/. bun runtime, node adapter, port 3000.
routes#
| route | description |
|---|---|
/ |
SSR page — loads stats, cards, and briefing in parallel |
/api/cards.json |
JSON array of scored action items from hub_action_items |
/api/briefing.json |
curated briefing object from briefing.json on disk |
/api/stats.json |
aggregate counts from raw_github_issues (tracked, open, with_reactions, repos) |
the frontend reads DuckDB through a snapshot copy (/tmp/hub_analytics_snapshot.duckdb) that refreshes when the source file's mtime changes. all queries are read-only.
deployment#
just web # build + push + deploy
this runs: docker build (bun image) → push to atcr.io/zzstoatzz.io/hub:latest → apply k8s manifests → rolling restart. the hub pod mounts the analytics PVC at /analytics and reads DUCKDB_PATH=/analytics/analytics.duckdb.