personal activity index (bluesky, leaflet, substack)
pai.desertthunder.dev
rss
bluesky
1<!-- markdownlint-disable MD033 -->
2
3# Personal Activity Index
4
5A CLI that ingests content from Substack, Bluesky, Leaflet, and BearBlog into SQLite, with an optional Cloudflare Worker + D1 deployment path.
6
7## Features
8
9- Fetch posts from multiple sources:
10 - **Substack** via RSS feeds
11 - **Bluesky** via AT Protocol
12 - **Leaflet** publications via RSS feeds
13 - **BearBlog** publications via RSS feeds
14- Local SQLite storage with full-text search
15- Flexible filtering and querying via `pai list` / `pai export`
16- Self-hostable HTTP API (`pai serve` exposes `/api/feed`, `/api/item/{id}`, and `/status`)
17- Cloudflare Worker deployment path (D1) for serverless setups
18
19## Quick Start
20
21```bash
22# Install
23cargo install --path cli
24
25# Initialize config (creates ~/.config/pai/config.toml)
26pai init
27
28# Edit config with your sources
29$EDITOR ~/.config/pai/config.toml
30
31# Sync content
32pai sync
33
34# List items
35pai list -n 10
36
37# Check database
38pai db-check
39
40# Install the manpage so `man pai` works
41pai man --install
42
43# Generate manpage to a file
44pai man -o pai.1
45```
46
47<details>
48<summary>For server mode, run the built-in HTTP server against your SQLite database:</summary>
49
50<br>
51
52```bash
53pai serve -d /var/lib/pai/pai.db -a 127.0.0.1:8080
54```
55
56Endpoints:
57
58- `GET /api/feed` – list newest items (supports `source_kind`, `source_id`, `limit`, `since`, `q`)
59- `GET /api/item/{id}` – fetch a single item
60- `GET /status` – health/status summary (total items, counts per source)
61
62For reverse-proxy examples (nginx, Caddy, Docker), see [DEPLOYMENT.md](./DEPLOYMENT.md).
63
64</details>
65
66## Configuration
67
68Configuration is loaded from `$XDG_CONFIG_HOME/pai/config.toml` or `$HOME/.config/pai/config.toml`.
69
70See [config.example.toml](./config.example.toml) for a complete example with all available options.
71
72## Documentation
73
74- CLI synopsis: `pai -h`, `pai <command> -h`, or `pai man` for the generated `pai(1)` page.
75- `pai man --install [--install-dir DIR]` copies `pai.1` into a MANPATH directory (defaults to `~/.local/share/man/man1`) so `man pai` works like any other UNIX tool.
76- Database schema and config reference: [config.example.toml](./config.example.toml).
77- Deployment topologies: [DEPLOYMENT.md](./DEPLOYMENT.md).
78
79## Architecture
80
81The project is organized as a Cargo workspace
82
83```sh
84.
85├── core # Shared types, fetchers, and the storage trait
86├── cli # CLI binary (POSIX-compliant)
87└── worker # Cloudflare Worker deployment using workers-rs
88```
89
90<details>
91<summary><strong>Source Implementations</strong></summary>
92
93### Substack (RSS)
94
95Substack fetcher uses standard RSS 2.0 feeds available at `{base_url}/feed`.
96
97**Implementation:**
98
99- Fetches RSS feed using `feed-rs` parser
100- Maps RSS `<item>` elements to standardized `Item` struct
101- Uses GUID as item ID, falls back to link if GUID is missing
102- Normalizes `pubDate` to ISO 8601 format
103
104**Key mappings:**
105
106- `id` = RSS GUID or link
107- `source_kind` = `substack`
108- `source_id` = Domain extracted from base_url
109- `title` = RSS title
110- `summary` = RSS description
111- `url` = RSS link
112- `content_html` = RSS content (if available)
113- `published_at` = RSS pubDate (normalized to ISO 8601)
114
115**Example RSS structure:**
116
117```xml
118<item>
119 <title>Post Title</title>
120 <link>https://example.substack.com/p/post-slug</link>
121 <guid>https://example.substack.com/p/post-slug</guid>
122 <pubDate>Mon, 01 Jan 2024 12:00:00 +0000</pubDate>
123 <description>Post summary or excerpt</description>
124</item>
125```
126
127### AT Protocol Integration (Bluesky)
128
129#### Overview
130
131Bluesky is built on the AT Protocol (Authenticated Transfer Protocol), a decentralized social networking protocol.
132
133**Key Concepts:**
134
135- **DID (Decentralized Identifier)**: Unique identifier for users (e.g., `did:plc:xyz123`)
136- **Handle**: Human-readable identifier (e.g., `user.bsky.social`)
137- **AT URI**: Resource identifier (e.g., `at://did:plc:xyz/app.bsky.feed.post/abc123`)
138- **Lexicon**: Schema definition language for records and API methods
139- **XRPC**: HTTP API wrapper for AT Protocol methods
140- **PDS (Personal Data Server)**: Server that stores user data
141
142#### Implementation
143
144Bluesky uses standard `app.bsky.feed.post` records and provides a public API for fetching posts.
145
146**Endpoint:** `GET https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed`
147
148**Parameters:**
149
150- `actor` - User handle or DID
151- `limit` - Number of posts to fetch (default: 50)
152- `cursor` - Pagination cursor (optional)
153
154**Implementation:**
155
156- Fetches author feed using `app.bsky.feed.getAuthorFeed`
157- Filters out reposts and quotes (only includes original posts)
158- Converts AT URIs to canonical Bluesky URLs
159- Truncates long post text to create titles
160
161**Key mappings:**
162
163- `id` = AT URI (e.g., `at://did:plc:xyz/app.bsky.feed.post/abc123`)
164- `source_kind` = `bluesky`
165- `source_id` = User handle
166- `title` = Truncated post text (first 100 chars)
167- `summary` = Full post text
168- `url` = Canonical URL (`https://bsky.app/profile/{handle}/post/{post_id}`)
169- `author` = Post author handle
170- `published_at` = Post `createdAt` timestamp
171
172**Filtering reposts:**
173Posts with a `reason` field (indicating repost or quote) are excluded to fetch only original content.
174
175### Leaflet (RSS)
176
177#### Overview
178
179Leaflet publications provide RSS feeds at `{base_url}/rss`, making them straightforward to fetch using standard RSS parsing.
180
181**Note:** While Leaflet is built on AT Protocol and uses custom `pub.leaflet.post` records, we use RSS feeds for simplicity and reliability. Leaflet's RSS implementation provides all necessary metadata without requiring AT Protocol PDS queries.
182
183**Implementation:**
184
185- Fetches RSS feed using `feed-rs` parser
186- Maps RSS `<item>` elements to standardized `Item` struct
187- Supports multiple publications via config array
188- Uses entry ID from feed, falls back to link if missing
189- Normalizes publication dates to ISO 8601 format
190
191**Key mappings:**
192
193- `id` = RSS entry ID or link
194- `source_kind` = `leaflet`
195- `source_id` = Publication ID from config (e.g., `desertthunder`, `stormlightlabs`)
196- `title` = RSS entry title
197- `summary` = RSS entry summary/description
198- `url` = RSS entry link
199- `content_html` = RSS content body (if available)
200- `author` = RSS entry author
201- `published_at` = RSS published date or updated date (normalized to ISO 8601)
202
203**Configuration:**
204
205Leaflet supports multiple publications through array configuration:
206
207```toml
208[[sources.leaflet]]
209enabled = true
210id = "desertthunder"
211base_url = "https://desertthunder.leaflet.pub"
212
213[[sources.leaflet]]
214enabled = true
215id = "stormlightlabs"
216base_url = "https://stormlightlabs.leaflet.pub"
217```
218
219**Example RSS structure:**
220
221```xml
222<item>
223 <title>Dev Log: 2025-11-22</title>
224 <link>https://desertthunder.leaflet.pub/3m6a7fuk7u22p</link>
225 <guid>https://desertthunder.leaflet.pub/3m6a7fuk7u22p</guid>
226 <pubDate>Fri, 22 Nov 2025 16:22:54 +0000</pubDate>
227 <description>Post summary or excerpt</description>
228</item>
229```
230
231### BearBlog (RSS)
232
233#### Overview
234
235BearBlog is a minimalist blogging platform that provides RSS feeds at `{slug}.bearblog.dev/feed/`, making them straightforward to fetch using standard RSS parsing.
236
237**Implementation:**
238
239- Fetches RSS feed using `feed-rs` parser
240- Maps RSS `<item>` elements to standardized `Item` struct
241- Supports multiple blogs via config array
242- Uses entry ID from feed, falls back to link if missing
243- Normalizes publication dates to ISO 8601 format
244
245**Key mappings:**
246
247- `id` = RSS entry ID or link
248- `source_kind` = `bearblog`
249- `source_id` = Blog ID from config (e.g., `desertthunder`)
250- `title` = RSS entry title
251- `summary` = RSS entry summary/description
252- `url` = RSS entry link
253- `content_html` = RSS content body (if available)
254- `author` = RSS entry author
255- `published_at` = RSS published date or updated date (normalized to ISO 8601)
256
257**Configuration:**
258
259BearBlog supports multiple blogs through array configuration:
260
261```toml
262[[sources.bearblog]]
263enabled = true
264id = "desertthunder"
265base_url = "https://desertthunder.bearblog.dev"
266
267[[sources.bearblog]]
268enabled = true
269id = "another-blog"
270base_url = "https://another-blog.bearblog.dev"
271```
272
273**Example RSS structure:**
274
275```xml
276<item>
277 <title>My Blog Post</title>
278 <link>https://desertthunder.bearblog.dev/my-blog-post</link>
279 <guid>https://desertthunder.bearblog.dev/my-blog-post</guid>
280 <pubDate>Fri, 22 Nov 2025 16:22:54 +0000</pubDate>
281 <description>Post summary or excerpt</description>
282</item>
283```
284
285</details>
286
287## References
288
289- [AT Protocol Documentation](https://atproto.com)
290- [Lexicon Guide](https://atproto.com/guides/lexicon) - Schema definition language
291- [XRPC Specification](https://atproto.com/specs/xrpc) - HTTP API wrapper
292- [Bluesky API Documentation](https://docs.bsky.app/)
293- [Leaflet](https://tangled.org/leaflet.pub/leaflet) - Leaflet source code
294- [Leaflet Manual](https://about.leaflet.pub/) - User-facing documentation
295
296## License
297
298See [LICENSE](./LICENSE)