···11+# Code Review: Herald RSS-to-Email Project
22+33+---
44+55+# Herald - Issue Groups & Priorities
66+77+## **P0 - Critical (Security & Data Integrity)** ✅ COMPLETE
88+99+- ✅ #14: No Rate Limiting (SSH, SCP, web, email)
1010+- ✅ #15: Token Generation (verify crypto/rand usage)
1111+1212+## **P1 - High (Performance & Reliability)**
1313+1414+- #8: N+1 Query Problem (batch operations)
1515+- #26: No Cleanup of Old seen_items (6 month cleanup job)
1616+- #23: Missing Graceful Shutdown for Scheduler (panic recovery)
1717+1818+## **P2 - Medium (Code Quality & UX)**
1919+2020+### Group A: Input Validation
2121+2222+- #16: No SMTP Auth Validation
2323+- #27: No Feed Validation on Upload
2424+- #37: No Cron Validation at Upload
2525+- ✅ #36: No Max File Size on SCP Upload
2626+2727+### Group B: Performance Tuning
2828+2929+- #9: No Prepared Statements
3030+- #10: Inefficient Sorting in Handlers
3131+- #11: No HTTP Caching Headers
3232+3333+## **P3 - Low (Nice to Have)**
3434+3535+### Group C: Observability
3636+3737+- #24: No Metrics/Observability
3838+- #35: HTTP Server Doesn't Log Requests
3939+- #22: Inconsistent Logging Levels
4040+4141+### Group D: Architecture & Scalability
4242+4343+- #30: Scheduler Interval is Fixed
4444+- #31: No Pagination on Feed Endpoints
4545+4646+### Group E: Code Hygiene
4747+4848+- #3: Context Timeout Duplication
4949+- #19: Magic Numbers
5050+- #18: Error Wrapping Inconsistency
5151+- #21: Unused Context Parameter
5252+- #33-34: Minor Code Cleanup
5353+5454+### Group F: Documentation
5555+5656+- #28: Inconsistent Command Help
5757+- #29: Config Example Doesn't Match Defaults
5858+5959+### Group G: Testing
6060+6161+- #20: No Tests
6262+6363+---
6464+6565+## **Critical Issues - COMPLETED** ✅
6666+6767+### 1. **Database Connection Pool Not Configured** ✅
6868+6969+**Fixed:** Added WAL mode, busy timeout, and connection pool limits in `store/db.go:16-18`
7070+7171+### 2. **Code Duplication in Scheduler** ✅
7272+7373+**Fixed:** Refactored into shared `collectNewItems` and `sendDigestAndMarkSeen` helper methods
7474+7575+### 3. **No Context Timeout on HTTP Requests** 🔄
7676+7777+**Location:** `scheduler/fetch.go:41-43`
7878+7979+While you set a 30s timeout on the context, the HTTP client also has a separate 30s timeout. If the context times out first, the HTTP client won't respect it immediately.
8080+8181+**Fix:** Use context-aware client without separate timeout.
8282+8383+### 4. **Missing Index on Configs Active Status** ✅
8484+8585+**Fixed:** Added partial index in `store/db.go:93`
8686+8787+### 5. **Race Condition in seen_items** ✅
8888+8989+**Fixed:** Using transactions to mark items seen before email send
9090+9191+### 6. **Unbounded Memory Growth in Feed Fetching** ✅
9292+9393+**Fixed:** Added semaphore limiting to 10 concurrent fetches in `scheduler/fetch.go`
9494+9595+### 7. **Silent Failure on Email Send** ✅
9696+9797+**Fixed:** Items only marked seen after successful email via transaction commit
9898+9999+### 14. **No Rate Limiting** ✅
100100+101101+**Fixed:** Added comprehensive rate limiting using `golang.org/x/time/rate`:
102102+- Created reusable `ratelimit.Limiter` with token bucket algorithm (`ratelimit/limiter.go`)
103103+- Web handler middleware: 10 req/sec, burst of 20 per IP (`web/server.go:65-77`)
104104+- SSH authentication: 5 req/sec, burst of 10 per fingerprint (`ssh/server.go:96-101`)
105105+- SCP uploads: 5 req/sec, burst of 10 per user (`ssh/scp.go:107-110`)
106106+- Email sending: 1 per minute per user (`scheduler/scheduler.go:207-210`)
107107+- Added 1MB max file size limit for SCP uploads (`ssh/scp.go:112-115`)
108108+- Rate limiter automatically cleans up inactive entries every 5 minutes
109109+110110+### 15. **Token Generation Not Cryptographically Secure** ✅
111111+112112+**Already secure:** Confirmed using `crypto/rand` in `store/unsubscribe.go:14`
113113+114114+---
115115+116116+## **Performance Issues**
117117+118118+### 8. **N+1 Query Problem** 🐌
119119+120120+**Location:** Multiple locations
121121+122122+- `web/handlers.go:99-103` - Gets feeds for each config in a loop
123123+- `scheduler/scheduler.go` - Checks each item individually for seen status
124124+125125+**Fix:** Batch operations:
126126+127127+```go
128128+// Instead of checking each item individually
129129+seenGuids, err := s.store.GetSeenGUIDs(ctx, feedID, itemGUIDs)
130130+```
131131+132132+### 9. **No Prepared Statements** 📝
133133+134134+**Location:** All store methods
135135+136136+Every query uses `QueryContext`/`ExecContext` with raw SQL strings. These are reparsed on every call.
137137+138138+**Fix:** Use prepared statements for frequently called queries (IsItemSeen, MarkItemSeen, etc.).
139139+140140+### 10. **Inefficient Sorting in Handlers** 🔢
141141+142142+**Location:** `web/handlers.go:231-235` and `326-330`
143143+144144+You sort items by parsing time strings in a comparison function. This parses the same timestamps multiple times.
145145+146146+**Fix:** Parse once, sort by parsed time, or use database ORDER BY.
147147+148148+### 11. **No HTTP Caching Headers** 🌐
149149+150150+**Location:** `web/handlers.go` - all feed handlers
151151+152152+RSS/JSON feeds don't set `Cache-Control`, `ETag`, or `Last-Modified` headers. Every request fetches from DB.
153153+154154+**Fix:** Add caching headers:
155155+156156+```go
157157+w.Header().Set("Cache-Control", "public, max-age=300")
158158+w.Header().Set("ETag", fmt.Sprintf(`"%s-%d"`, fingerprint, cfg.LastRun.Time.Unix()))
159159+```
160160+161161+### 12. **Database Migration Runs on Every Connection** 🔄
162162+163163+**Location:** `store/db.go:26-28`
164164+165165+Migration runs inside `Open()`, which happens once at startup. However, `Migrate()` is also exposed and called separately in `main.go:160`. The schema execution uses `CREATE TABLE IF NOT EXISTS` which is fine, but it's still unnecessary work.
166166+167167+---
168168+169169+## **Security Issues**
170170+171171+### 13. **Missing Input Validation on Email Addresses** ✅
172172+173173+**Already implemented:** Using `net/mail.ParseAddress()` in `config/validate.go:24-26`
174174+175175+### 16. **No SMTP Auth Validation** 🔒
176176+177177+**Location:** `email/send.go:102-105`
178178+179179+SMTP auth is optional (`if m.cfg.User != "" && m.cfg.Pass != ""`). Many SMTP servers require auth, and this silently continues without it.
180180+181181+**Fix:** Validate SMTP config at startup.
182182+183183+### 17. **SQL Injection Potential in UPSERT** 💉
184184+185185+**Location:** `store/items.go:29-33`
186186+187187+While using parameterized queries (good!), the `ON CONFLICT` clause should explicitly name the conflict target for clarity and safety:
188188+189189+```sql
190190+ON CONFLICT(feed_id, guid) DO UPDATE SET ...
191191+```
192192+193193+(Actually you already do this correctly, but worth noting for other queries)
194194+195195+---
196196+197197+## **Code Quality Issues**
198198+199199+### 18. **Error Wrapping Inconsistency** 🎁
200200+201201+Some functions use `fmt.Errorf("verb: %w", err)`, others use `fmt.Errorf("verb %w", err)` (no colon). Inconsistent style makes logs harder to parse.
202202+203203+### 19. **Magic Numbers** 🎩
204204+205205+**Location:** Multiple
206206+207207+- `scheduler/scheduler.go:84` - hardcoded 3 months
208208+- `scheduler/scheduler.go:148-150` - hardcoded 5 items threshold
209209+- `web/handlers.go:238` and `332` - hardcoded 100 items limit
210210+- `scheduler/fetch.go:41` - hardcoded 30s timeout
211211+212212+**Fix:** Extract to constants or config.
213213+214214+### 20. **No Tests** 🧪
215215+216216+**Location:** Entire codebase
217217+218218+Zero test coverage. Critical business logic (cron parsing, config parsing, email rendering) is untested.
219219+220220+### 21. **Unused Context Parameter** 🗑️
221221+222222+**Location:** `store/db.go:109-111`
223223+224224+```go
225225+func (db *DB) Migrate(ctx context.Context) error {
226226+ return db.migrate() // ctx is ignored
227227+}
228228+```
229229+230230+Either remove the context parameter or pass it to a context-aware migrate function.
231231+232232+### 22. **Inconsistent Logging Levels** 📝
233233+234234+Some errors are `logger.Error`, some are `logger.Warn`. For example, feed fetch errors are `Warn` (line 89 of scheduler.go) but other errors are `Error`. Establish consistent criteria.
235235+236236+### 23. **Missing Graceful Shutdown for Scheduler** 🛑
237237+238238+**Location:** `main.go:194-197`
239239+240240+The scheduler runs in a goroutine with errgroup, but `Start()` only returns on context cancellation. If scheduler panics, errgroup won't capture it.
241241+242242+**Fix:** Add defer recover in scheduler or use errgroup.Go properly.
243243+244244+---
245245+246246+## **Missing Features**
247247+248248+### 24. **No Metrics/Observability** 📈
249249+250250+No Prometheus metrics, no health check endpoint, no structured logging for monitoring. For a long-running service, this is critical.
251251+252252+### 25. **No Email Validation on Successful Send** ✅
253253+254254+You log `"email sent"` but don't verify SMTP actually accepted it (some SMTP servers queue and fail later).
255255+256256+### 26. **No Cleanup of Old seen_items** 🧹
257257+258258+The `seen_items` table will grow indefinitely. With the 3-month filter, items older than 3 months can be safely deleted.
259259+260260+**Fix:** Add periodic cleanup job:
261261+262262+```go
263263+DELETE FROM seen_items WHERE seen_at < datetime('now', '-6 months')
264264+```
265265+266266+### 27. **No Feed Validation on Upload** 🔍
267267+268268+When a user uploads a config with feed URLs, you don't validate the URLs are actually RSS/Atom feeds. First run will fail.
269269+270270+**Fix:** Optionally fetch and validate feeds on upload (with short timeout).
271271+272272+---
273273+274274+## **Documentation Issues**
275275+276276+### 28. **Inconsistent Command Help** 📖
277277+278278+`ssh/server.go:160-165` shows welcome message with command list, but the actual commands are in `ssh/commands.go` (not reviewed in detail). These could drift out of sync.
279279+280280+### 29. **Config Example Doesn't Match Defaults** ⚙️
281281+282282+`main.go:88` shows `inline: false` as default in comment, but `config/parse.go:27` sets default to `false`, and `README.md:89` says default is `true`.
283283+284284+**Fix:** Align all documentation with actual code defaults.
285285+286286+---
287287+288288+## **Architectural Concerns**
289289+290290+### 30. **Scheduler Interval is Fixed** ⏲️
291291+292292+**Location:** `main.go:172`
293293+294294+60-second interval is hardcoded. This doesn't scale well—if you have thousands of users, checking every 60 seconds is wasteful. Consider event-driven scheduling with a priority queue.
295295+296296+### 31. **No Pagination on Feed Endpoints** 📄
297297+298298+**Location:** `web/handlers.go:238` and `332`
299299+300300+Hardcoded limit of 100 items. Users can't access older items.
301301+302302+### 32. **No Transaction for Config Update** ✅
303303+304304+**Fixed:** Config upload now uses transactions in `ssh/scp.go:134-161`
305305+306306+---
307307+308308+## **Minor Issues**
309309+310310+33. **Unused `getCommitHash()` function** - `main.go:127-140` - function defined but only used in one place, could be inlined
311311+34. **Inconsistent fingerprint shortening** - Sometimes 12 chars, sometimes 7 chars
312312+35. **HTTP server doesn't log requests** - No request logging middleware
313313+36. ✅ **No max file size on SCP upload** - Fixed: 1MB limit in `ssh/scp.go:112-115`
314314+37. **No validation on cron expressions at upload time** - Invalid cron is only caught on first run
315315+316316+---
317317+318318+## **Positive Notes** ✅
319319+320320+- Good use of Context for cancellation
321321+- Proper use of foreign keys and CASCADE
322322+- Clean separation of concerns (store/scheduler/ssh/web)
323323+- Good use of Charm libraries
324324+- ETag/Last-Modified support for feed fetching
325325+- Unsubscribe functionality implemented
326326+- SQL injection protection with parameterized queries
327327+- Config file validation before accepting uploads
328328+329329+---
330330+331331+## **Priority Fixes**
332332+333333+### **High Priority (Fix ASAP):**
334334+335335+1. ✅ Database connection pool configuration (#1)
336336+2. ✅ Race condition in seen_items (#5)
337337+3. ✅ Silent failure on email send (#7)
338338+4. ✅ No rate limiting (#14)
339339+5. ✅ No transaction for config updates (#32)
340340+6. ✅ Token generation security (#15)
341341+7. ✅ Max file size on SCP upload (#36)
342342+343343+### **Medium Priority:**
344344+345345+6. ✅ Code duplication in scheduler (#2)
346346+7. N+1 query problems (#8)
347347+8. ✅ Unbounded feed fetching concurrency (#6)
348348+9. ✅ Missing input validation (#13)
349349+10. No cleanup of old data (#26)
350350+351351+### **Low Priority (Technical Debt):**
352352+353353+11. Add tests (#20)
354354+12. Add metrics (#24)
355355+13. Extract magic numbers (#19)
356356+14. Add HTTP caching (#11)