add hub architecture doc and link from README · zzstoatzz.io/my-prefect-server@30b6562

+99

2 changed files

expand all

README.md

docs

hub.md

README.md

··· 2 2 3 3 [executive dashboard](https://prefect-metrics.waow.tech/d/executive-overview/executive-overview?orgId=1&from=now-6h&to=now&timezone=browser) · [hub](https://hub.waow.tech) 4 4 5 + see [docs/hub.md](docs/hub.md) for how the data pipeline and hub frontend work. 6 + 5 7 <details> 6 8 <summary>deployment</summary> 7 9

+97

docs/hub.md

··· 1 + # hub 2 + 3 + action item dashboard at [hub.waow.tech](https://hub.waow.tech). aggregates issues and PRs from github and [tangled.org](https://tangled.org), scores them by importance, and generates a daily briefing with an LLM. 4 + 5 + ## data sources 6 + 7 + two ingestion flows run hourly at :00, writing to separate DuckDB tables: 8 + 9 + **gh-notifications** (`flows/gh_notifications.py`) — fetches github notifications (issues + PRs) and open items authored by `zzstoatzz` via the search API. each issue is cached by repo+number for 24h. persists to `raw_github_issues`. 10 + 11 + **tangled-items** (`flows/tangled_items.py`) — fetches issues, PRs, and comments from the tangled.org PDS (`pds.zzstoatzz.io`) via AT Protocol's `com.atproto.repo.listRecords`. no auth needed — records are public. targets repos: zat, zlay, plyr.fm, at-me, pollz, typeahead. persists to `raw_tangled_items`. 12 + 13 + ## pipeline 14 + 15 + ``` 16 + github API ──► gh-notifications ──► raw_github_issues ──┐ 17 + (hourly :00) │ 18 + ▼ 19 + ┌─── enrich (dbt) ───┐ 20 + │ (hourly :05) │ 21 + └────────────────────┘ 22 + │ 23 + tangled PDS ──► tangled-items ───► raw_tangled_items ────┘ 24 + (hourly :00) │ 25 + ▼ 26 + hub_action_items 27 + (mart, top 200) 28 + │ 29 + ┌──────────────┼──────────────┐ 30 + ▼ ▼ ▼ 31 + curate /api/cards.json +page.svelte 32 + (hourly :10) (SSR loader) 33 + │ 34 + ▼ 35 + briefing.json ──► /api/briefing.json 36 + ``` 37 + 38 + ## flows 39 + 40 + | deployment | schedule | what it does | 41 + |---|---|---| 42 + | `diagnostics` | `*/5 * * * *` | prints system info — canary for worker health | 43 + | `gh-notifications` | `0 * * * *` | github notifications + authored open issues/PRs → `raw_github_issues` | 44 + | `tangled-items` | `0 * * * *` | tangled.org issues/PRs/comments → `raw_tangled_items` | 45 + | `enrich` | `5 * * * *` | dbt build: staging → enrichment → mart. concurrency limit 1. runs under python 3.13 (dbt-core compat) | 46 + | `curate` | `10 * * * *` | loads top 200 scored items, sends to claude haiku 4.5 via pydantic-ai, writes `briefing.json` | 47 + | `cleanup` | `0 2 * * 0` | deletes old terminal flow runs (completed, failed, cancelled, crashed) older than 30 days | 48 + 49 + all flows run in the `kubernetes-pool` work pool. code is pulled at runtime via `git clone` from tangled.sh (github fallback). deps install via `uv run --with 'my-prefect-server @ git+...'`. deployments are registered by CI on every push to main. 50 + 51 + ## dbt layer 52 + 53 + project lives in `analytics/`. DuckDB database at `/var/lib/prefect-analytics/analytics.duckdb`. 54 + 55 + | model | type | description | 56 + |---|---|---| 57 + | `stg_github_issues` | view | dedup `raw_github_issues` by (repo, number), keep most recent fetch | 58 + | `stg_tangled_items` | view | dedup `raw_tangled_items` by `at_uri`, exclude comments, keep most recent | 59 + | `int_github_issues_scored` | table | scoring: recency (30-day decay) x engagement (log scale) x label multiplier (bug=1.5) x contributor weight | 60 + | `int_tangled_items_scored` | table | scoring: recency (30-day decay) x 0.5 (no engagement data) x contributor weight | 61 + | `hub_action_items` | mart | union of both scored tables, ordered by `importance_score` desc, limit 200 | 62 + 63 + contributor weights come from the `known_contributors` seed (zzstoatzz + zzstoatzz.io at 2.0x). 64 + 65 + ## curation 66 + 67 + the `curate` flow runs at :10 each hour after `enrich` refreshes the mart. it: 68 + 69 + 1. snapshots DuckDB to `/tmp` (bypass exclusive flock) 70 + 2. loads top 200 items from `hub_action_items` 71 + 3. sends them to claude haiku 4.5 with a system prompt that groups by actionability 72 + 4. writes a structured `Briefing` (headline, 2-5 themed sections with accent colors, icons, priority) to `briefing.json` 73 + 74 + briefing model is defined in `packages/mps/src/mps/briefing.py`. 75 + 76 + ## frontend 77 + 78 + SvelteKit app in `web/`. bun runtime, node adapter, port 3000. 79 + 80 + ### routes 81 + 82 + | route | description | 83 + |---|---| 84 + | `/` | SSR page — loads stats, cards, and briefing in parallel | 85 + | `/api/cards.json` | JSON array of scored action items from `hub_action_items` | 86 + | `/api/briefing.json` | curated briefing object from `briefing.json` on disk | 87 + | `/api/stats.json` | aggregate counts from `raw_github_issues` (tracked, open, with_reactions, repos) | 88 + 89 + the frontend reads DuckDB through a snapshot copy (`/tmp/hub_analytics_snapshot.duckdb`) that refreshes when the source file's mtime changes. all queries are read-only. 90 + 91 + ### deployment 92 + 93 + ```bash 94 + just web # build + push + deploy 95 + ``` 96 + 97 + this runs: docker build (bun image) → push to `atcr.io/zzstoatzz.io/hub:latest` → apply k8s manifests → rolling restart. the hub pod mounts the analytics PVC at `/analytics` and reads `DUCKDB_PATH=/analytics/analytics.duckdb`.

Configure Feed

Configure Feed