Add baanfetch PRD · raviboth.com/baanfetch@3e80282

+432

1 changed file

expand all

docs

plans

2026-04-02-baanfetch-prd.md

+432

docs/plans/2026-04-02-baanfetch-prd.md

··· 1 + # baanfetch — Product Requirements Document 2 + 3 + ## 1. Overview & Problem Statement 4 + 5 + baanfetch is the evolution of baansearch from a job board link aggregator into a full job listing aggregator. Instead of generating search URLs for users to click through manually, baanfetch discovers companies via Google search, fetches job listings directly from ATS platform APIs, and presents them in a filterable, searchable interface. 6 + 7 + **Problem:** Job seekers using aggregators like LinkedIn and Indeed are competing with hundreds of applicants on the same visible listings. The best opportunities are often posted directly on company ATS boards (Greenhouse, Lever, Ashby, etc.) by lesser-known companies that don't get the same traffic. Manually checking these boards is impractical. 8 + 9 + **Solution:** Use Google search to discover company job boards across ATS platforms, pull structured listing data directly from their APIs, and present a unified, filterable view — effectively becoming a personal aggregator that surfaces jobs closer to the source. 10 + 11 + **Key differentiators from baansearch:** 12 + 13 + - Fetches actual job listing data instead of generating links 14 + - Discovers lesser-known companies via search rather than relying on curated lists 15 + - ATS-direct sources only — no middleman aggregators 16 + - Deduplication, bookmarking, and salary extraction built in 17 + 18 + --- 19 + 20 + ## 2. Tech Stack & Architecture 21 + 22 + ### Stack 23 + 24 + - **Runtime:** Bun 25 + - **Server:** Hono (serves API + static frontend) 26 + - **Database:** bun:sqlite + Drizzle ORM 27 + - **Frontend:** React + shadcn/ui + Tailwind CSS 28 + - **Build:** Vite (React frontend), Bun (server) 29 + 30 + ### Architecture 31 + 32 + ``` 33 + +-------------------------------------+ 34 + | React Frontend | 35 + | (shadcn/ui, filtering, etc.) | 36 + +-------------------------------------+ 37 + | Hono API Server | 38 + | /api/search /api/jobs /api/bookmarks 39 + +----------+---------------------------+ 40 + | Discovery| Job Source Adapters | 41 + | (Google) | +----------+----------+ | 42 + | | |Greenhouse| Lever | | 43 + | | | Ashby | ... | | 44 + | | +----------+----------+ | 45 + +----------+---------------------------+ 46 + | SQLite (via Drizzle) | 47 + | jobs | companies | bookmarks | cache | 48 + +--------------------------------------+ 49 + ``` 50 + 51 + ### Request Flow 52 + 53 + 1. User enters filters (role, tech, location, etc.) and hits search 54 + 2. Hono server sends Google search queries to discover ATS board URLs (e.g., `site:boards.greenhouse.io "software engineer" "python"`) 55 + 3. Server extracts company slugs from URLs, deduplicates against known companies 56 + 4. Server calls ATS APIs for each discovered company, extracts job listings 57 + 5. Listings are normalized, deduplicated, stored in SQLite 58 + 6. Filtered results returned to frontend 59 + 60 + ### JobSource Interface 61 + 62 + ```typescript 63 + interface JobSource { 64 + id: string; // 'greenhouse', 'lever', etc. 65 + name: string; 66 + discoverCompanies(query: string): Promise<CompanySlug[]>; 67 + fetchJobs(company: CompanySlug): Promise<RawJob[]>; 68 + normalizeJob(raw: RawJob): NormalizedJob; 69 + } 70 + ``` 71 + 72 + Adding a new ATS = implementing one adapter. Discovery, fetching, and normalization are per-source. 73 + 74 + --- 75 + 76 + ## 3. Data Model 77 + 78 + ### Tables 79 + 80 + ``` 81 + jobs 82 + id text PK (hash of source + sourceId) 83 + sourceId text (original ID from ATS) 84 + source text ('greenhouse' | 'lever' | 'ashby' | ...) 85 + companySlug text 86 + companyName text 87 + title text 88 + location text 89 + secondaryLocations JSON array, nullable 90 + remote boolean, nullable 91 + workplaceType text, nullable ('remote' | 'hybrid' | 'onsite') 92 + department text, nullable 93 + team text, nullable 94 + employmentType text, nullable ('FullTime', 'Contract', etc.) 95 + experience text, nullable (inferred from title/description) 96 + salaryMin integer, nullable 97 + salaryMax integer, nullable 98 + salaryCurrency text, nullable 99 + descriptionHtml text 100 + descriptionPlain text 101 + url text (link to original posting) 102 + applyUrl text, nullable 103 + postedAt integer (unix timestamp) 104 + discoveredAt integer (unix timestamp) 105 + tags JSON array (extracted tech keywords) 106 + 107 + companies 108 + slug text (PK composite with source) 109 + source text 110 + name text 111 + discoveredAt integer 112 + lastFetchedAt integer 113 + 114 + bookmarks 115 + id text PK 116 + jobId text FK -> jobs 117 + notes text, nullable 118 + createdAt integer 119 + 120 + searches 121 + id text PK 122 + query text (serialized filters) 123 + createdAt integer 124 + resultCount integer 125 + ``` 126 + 127 + ### ATS API Field Mapping 128 + 129 + #### Greenhouse (`boards-api.greenhouse.io/v1/boards/{company}/jobs?content=true`) 130 + 131 + | API Field | Maps To | 132 + |---|---| 133 + | `id` | sourceId | 134 + | `title` | title | 135 + | `location.name` | location | 136 + | `content` (HTML) | descriptionHtml | 137 + | `departments[].name` | department | 138 + | `offices[].name` | (secondary location info) | 139 + | `absolute_url` | url | 140 + | `first_published` | postedAt | 141 + | `updated_at` | (used for freshness) | 142 + | `metadata[]` | salary (when present), tags | 143 + | `company_name` | companyName | 144 + 145 + #### Lever (`api.lever.co/v0/postings/{company}`) 146 + 147 + | API Field | Maps To | 148 + |---|---| 149 + | `id` | sourceId | 150 + | `text` | title | 151 + | `categories.location` | location | 152 + | `categories.allLocations` | secondaryLocations | 153 + | `categories.department` | department | 154 + | `categories.team` | team | 155 + | `categories.commitment` | employmentType | 156 + | `workplaceType` | workplaceType | 157 + | `country` | (location enrichment) | 158 + | `description` (HTML) | descriptionHtml | 159 + | `descriptionPlain` | descriptionPlain | 160 + | `additional` | salary extraction source | 161 + | `hostedUrl` | url | 162 + | `applyUrl` | applyUrl | 163 + | `createdAt` (ms) | postedAt | 164 + | `lists[]` | (structured description sections) | 165 + 166 + #### Ashby (`api.ashbyhq.com/posting-api/job-board/{company}`) 167 + 168 + | API Field | Maps To | 169 + |---|---| 170 + | `id` | sourceId | 171 + | `title` | title | 172 + | `location` | location | 173 + | `secondaryLocations[]` | secondaryLocations | 174 + | `department` | department | 175 + | `team` | team | 176 + | `employmentType` | employmentType | 177 + | `isRemote` | remote | 178 + | `workplaceType` | workplaceType | 179 + | `descriptionHtml` | descriptionHtml | 180 + | `descriptionPlain` | descriptionPlain | 181 + | `jobUrl` | url | 182 + | `applyUrl` | applyUrl | 183 + | `publishedAt` | postedAt | 184 + 185 + ### Extraction Strategies 186 + 187 + **Salary extraction:** 188 + 189 + 1. Greenhouse: check `metadata` array for salary-related fields first 190 + 2. Lever: parse `additional` and `additionalPlain` fields 191 + 3. All sources: regex against description text for patterns like `$120k-$150k`, `$120,000 - $150,000`, `EUR 80,000`, `GBP 60k-80k` 192 + 4. Store as min/max/currency when found, null when not 193 + 194 + **Remote inference:** 195 + 196 + 1. Ashby: use `isRemote` + `workplaceType` directly 197 + 2. Lever: use `workplaceType` + check `categories.location` for "Remote" 198 + 3. Greenhouse: check `location.name` for "Remote"/"Anywhere", check `metadata` 199 + 200 + **Experience inference:** 201 + 202 + - Pattern match title: "Senior", "Sr.", "Lead", "Staff", "Principal", "Junior", "Jr.", "Entry" 203 + - Fallback: scan description for years-of-experience patterns ("5+ years", "3-5 years") 204 + 205 + **Deduplication:** 206 + 207 + - Primary: `source + sourceId` guarantees uniqueness per ATS 208 + - Cross-source: flag potential duplicates via fuzzy match on `companyName + title + location` 209 + 210 + --- 211 + 212 + ## 4. User Interface 213 + 214 + ### Layout 215 + 216 + Single-page app with a persistent search/filter sidebar and a main results area. 217 + 218 + ### Search Panel (left sidebar or top bar) 219 + 220 + - Role/title input (free text) 221 + - Technologies (multi-select, reusing baansearch's existing list) 222 + - Location (free text) 223 + - Remote preference (remote / hybrid / onsite / any) 224 + - Experience level (entry / mid / senior / director / any) 225 + - Salary minimum (optional number input) 226 + - ATS sources (checkboxes — which platforms to search) 227 + - Search button + loading state 228 + 229 + ### Results Area 230 + 231 + - Results table/list using shadcn DataTable 232 + - Columns: Title, Company, Location, Remote, Department, Salary, Posted, Source 233 + - Sortable by any column 234 + - Clickable rows expand to show full description inline, or open a detail panel 235 + - Result count and search time 236 + - Pagination or infinite scroll 237 + - Bookmark icon on each listing (toggle) 238 + 239 + ### Job Detail View (expanded row or side panel) 240 + 241 + - Full description (rendered HTML) 242 + - Apply button (links to original applyUrl) 243 + - Bookmark + notes 244 + - Tags (extracted technologies) 245 + - "View original" link 246 + 247 + ### Bookmarks View 248 + 249 + - Separate tab/page showing all bookmarked jobs 250 + - Same filtering/sorting as main results 251 + - User-added notes per bookmark 252 + 253 + ### Additional UI Elements 254 + 255 + - Dark/light theme toggle (carry over from baansearch) 256 + - Loading skeletons during search 257 + - Empty states for no results / first visit 258 + 259 + --- 260 + 261 + ## 5. Google Discovery Flow 262 + 263 + ### How Search-to-Discovery Works 264 + 265 + 1. User submits filters (e.g., role: "Software Engineer", tech: "Python", location: "Remote") 266 + 2. Server constructs Google search queries per ATS platform: 267 + - `site:boards.greenhouse.io "software engineer" "python" "remote"` 268 + - `site:jobs.lever.co "software engineer" "python" "remote"` 269 + - `site:jobs.ashbyhq.com "software engineer" "python" "remote"` 270 + 3. Server fetches Google search results, extracts URLs 271 + 4. URLs are parsed to extract company slugs: 272 + - `boards.greenhouse.io/v1/boards/acmecorp/jobs` -> slug: `acmecorp` 273 + - `jobs.lever.co/acmecorp` -> slug: `acmecorp` 274 + - `jobs.ashbyhq.com/acmecorp` -> slug: `acmecorp` 275 + 5. New slugs are saved to `companies` table 276 + 6. Server calls ATS APIs for each discovered company 277 + 7. Jobs are normalized, deduplicated, stored, and returned 278 + 279 + ### Google Search Method 280 + 281 + - Use Google Custom Search JSON API (100 free queries/day, $5/1000 after) 282 + - Alternatively, SerpAPI or similar if rate limits are a concern 283 + - Each user search generates ~3 Google queries (one per ATS platform) 284 + - ~33 user searches/day on the free tier 285 + 286 + ### Caching Layer 287 + 288 + - Company slugs are cached permanently — once discovered, no need to re-discover 289 + - Job listings cached with a TTL (e.g., 1 hour) — re-fetch from ATS API if stale 290 + - Searches cached — same filters within TTL return cached results instantly 291 + 292 + --- 293 + 294 + ## 6. API Routes 295 + 296 + ``` 297 + POST /api/search 298 + Body: { role, technologies, location, remote, experience, sources } 299 + Triggers discovery + fetch, returns filtered job listings 300 + Returns: { jobs: NormalizedJob[], totalCount, searchId } 301 + 302 + GET /api/jobs 303 + Query: ?source=&company=&remote=&salaryMin=&sort=&page=&limit= 304 + Returns cached jobs with filtering/pagination 305 + Returns: { jobs: NormalizedJob[], totalCount, page, totalPages } 306 + 307 + GET /api/jobs/:id 308 + Returns full job detail 309 + Returns: NormalizedJob (with full description) 310 + 311 + POST /api/bookmarks 312 + Body: { jobId, notes? } 313 + Bookmark a job 314 + 315 + DELETE /api/bookmarks/:id 316 + Remove bookmark 317 + 318 + PATCH /api/bookmarks/:id 319 + Body: { notes } 320 + Update bookmark notes 321 + 322 + GET /api/bookmarks 323 + Query: ?sort=&page=&limit= 324 + List all bookmarked jobs 325 + 326 + GET /api/searches 327 + List recent searches for quick re-run 328 + 329 + GET /api/health 330 + Server status + DB stats (job count, company count, last fetch) 331 + ``` 332 + 333 + ### Job Response Shape 334 + 335 + ```typescript 336 + interface JobResponse { 337 + id: string; 338 + title: string; 339 + companyName: string; 340 + companySlug: string; 341 + source: string; 342 + location: string; 343 + remote: boolean | null; 344 + workplaceType: string | null; 345 + department: string | null; 346 + experience: string | null; 347 + salaryMin: number | null; 348 + salaryMax: number | null; 349 + salaryCurrency: string | null; 350 + url: string; 351 + applyUrl: string | null; 352 + postedAt: string; // ISO 8601 353 + discoveredAt: string; 354 + tags: string[]; 355 + isBookmarked: boolean; 356 + // descriptionHtml only in detail endpoint 357 + } 358 + ``` 359 + 360 + --- 361 + 362 + ## 7. Scope & Milestones 363 + 364 + ### MVP (Milestone 1) 365 + 366 + - Bun + Hono server serving React frontend 367 + - SQLite + Drizzle schema for jobs, companies, bookmarks, searches 368 + - `JobSource` interface + Greenhouse adapter 369 + - Google Custom Search integration for company discovery 370 + - Search UI with filters -> results table 371 + - Job detail view with full description 372 + - Basic salary extraction from description text 373 + 374 + ### Milestone 2 375 + 376 + - Lever + Ashby adapters 377 + - Bookmarking with notes 378 + - Sorting by any column 379 + - Deduplication (cross-source flagging) 380 + - Dark/light theme 381 + - Recent searches for quick re-run 382 + 383 + ### Milestone 3 (Tier 2 Sources) 384 + 385 + - SmartRecruiters adapter 386 + - Recruitee adapter 387 + - BambooHR adapter 388 + - Improved salary extraction (Greenhouse metadata, Lever additional field) 389 + - Experience level inference 390 + 391 + ### Milestone 4 (Tier 3 + Polish) 392 + 393 + - Workday adapter (reverse-engineered JSON endpoints) 394 + - Wellfound adapter (GraphQL) 395 + - Pagination / infinite scroll 396 + - Search result caching with TTL 397 + - Performance tuning for large result sets 398 + 399 + ### Out of Scope (for now) 400 + 401 + - Application tracking (applied/rejected/interviewing) 402 + - Email/notification alerts 403 + - User accounts / multi-user 404 + - Mobile app 405 + - Background scheduled crawling 406 + 407 + --- 408 + 409 + ## ATS Platform Tiers 410 + 411 + ### Tier 1 — Launch (Public JSON APIs) 412 + 413 + | Platform | API Endpoint | Notes | 414 + |---|---|---| 415 + | Greenhouse | `boards-api.greenhouse.io/v1/boards/{company}/jobs` | Add `?content=true` for descriptions | 416 + | Lever | `api.lever.co/v0/postings/{company}` | Returns bare JSON array | 417 + | Ashby | `api.ashbyhq.com/posting-api/job-board/{company}` | Has `isRemote` field | 418 + 419 + ### Tier 2 — Fast Follow (Public JSON APIs) 420 + 421 + | Platform | API Endpoint | Notes | 422 + |---|---|---| 423 + | SmartRecruiters | `api.smartrecruiters.com/v1/companies/{id}/postings` | Paginated | 424 + | Recruitee | `{company}.recruitee.com/api/offers/` | EU startups | 425 + | BambooHR | `{company}.bamboohr.com/careers/list` | Needs `Accept: application/json` header | 426 + 427 + ### Tier 3 — High Value, More Work 428 + 429 + | Platform | Method | Notes | 430 + |---|---|---| 431 + | Workday | Reverse-engineered POST to `{company}.myworkday.com/wday/cxs/.../jobs` | Enterprise tech (Netflix, Salesforce) | 432 + | Wellfound | GraphQL at `wellfound.com/graphql` | Startup coverage |

Configure Feed

Configure Feed