AppView in a box as a Vite plugin thing hatk.dev
2
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: multi-database support with SQLite and DuckDB adapters

Refactor hatk's monolithic DuckDB data layer into a hexagonal architecture
supporting both DuckDB and SQLite via DatabasePort/SearchPort interfaces.

- Add database/ports.ts with DatabasePort, BulkInserter, SearchPort interfaces
- Add database/dialect.ts with SqlDialect configs for DuckDB and SQLite
- Add DuckDB adapter (database/adapters/duckdb.ts) preserving read/write queues
- Add SQLite adapter (database/adapters/sqlite.ts) with $1→? param translation
- Add DuckDB SearchPort using PRAGMA FTS and SQLite SearchPort using FTS5
- Add adapter factory with dynamic imports for tree-shaking
- Refactor db.ts to use DatabasePort instead of direct DuckDB API calls
- Make schema generation dialect-aware (type maps, timestamps, JSON)
- Make FTS index building dialect-aware (string_agg, json_extract)
- Add databaseEngine config option ('duckdb' | 'sqlite')
- Add --sqlite flag to hatk new scaffolding
- Add db/schema.sql auto-generation on startup
- Fix hatk schema command to work with both engines
- Update reset command to clean up SQLite WAL files
- Update all import paths to new database/ directory
- Add database/index.ts barrel export

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+2917 -532
+1217
docs/plans/2026-03-13-multi-database-implementation-plan.md
··· 1 + # Multi-Database Support Implementation Plan 2 + 3 + > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. 4 + 5 + **Goal:** Refactor hatk's data layer into a hexagonal architecture supporting DuckDB and SQLite (and future PostgreSQL) via a `DatabasePort` interface. 6 + 7 + **Architecture:** Extract a low-level `DatabasePort` interface for SQL execution, transactions, and bulk inserts. Keep all business logic (record ops, queries, repo tracking) in a shared `database/db.ts`. Each engine gets a thin adapter. FTS uses an optional `SearchPort` with `LIKE` fallback. 8 + 9 + **Tech Stack:** TypeScript, DuckDB (`@duckdb/node-api`), SQLite (`better-sqlite3`), Node.js 25+ 10 + 11 + --- 12 + 13 + ## Overview of Current State 14 + 15 + **`src/db.ts` (1556 lines)** — monolithic data access layer tightly coupled to DuckDB. Module-level state: `instance`, `con` (write connection), `readCon` (read connection). Uses `writeQueue`/`readQueue` promise chains for serialization. Exports ~45 functions consumed by 13 other files. 16 + 17 + **`src/schema.ts` (468 lines)** — generates `TableSchema` objects and DuckDB-specific DDL from AT Protocol lexicons. Uses `duckdbType` field name throughout. Consumed by `db.ts`, `fts.ts`, `main.ts`, `cli.ts`, `test.ts`, `views.ts`, `xrpc.ts`, `indexer.ts`, `server.ts`, `seed.ts`. 18 + 19 + **`src/fts.ts` (801 lines)** — builds DuckDB FTS shadow tables using `PRAGMA create_fts_index`. Consumed by `db.ts`, `indexer.ts`, `main.ts`. 20 + 21 + **`src/oauth/db.ts` (244 lines)** — OAuth DDL and CRUD. Uses `querySQL`/`runSQL` from `db.ts`. Already uses `$1`-style params and portable SQL. 22 + 23 + **`src/config.ts`** — `HatkConfig.database` is currently the file path string. Need a new `databaseEngine` field. 24 + 25 + ### Files that import from db.ts (all need path updates) 26 + 27 + - `server.ts`, `indexer.ts`, `main.ts`, `cli.ts` (not in list but uses schema.ts) 28 + - `backfill.ts`, `feeds.ts`, `xrpc.ts`, `opengraph.ts` 29 + - `hydrate.ts`, `labels.ts`, `hooks.ts`, `setup.ts`, `test.ts`, `seed.ts` 30 + - `fts.ts`, `oauth/db.ts`, `oauth/server.ts` 31 + 32 + ### Files that import from schema.ts 33 + 34 + - `db.ts`, `fts.ts`, `main.ts`, `cli.ts`, `test.ts` 35 + - `views.ts`, `xrpc.ts`, `indexer.ts`, `server.ts`, `seed.ts` 36 + 37 + ### Files that import from fts.ts 38 + 39 + - `db.ts`, `indexer.ts`, `main.ts` 40 + 41 + --- 42 + 43 + ## Task 1: Create `src/database/ports.ts` — Port Interfaces 44 + 45 + **Files:** 46 + - Create: `packages/hatk/src/database/ports.ts` 47 + 48 + **Step 1: Write the port interfaces file** 49 + 50 + ```typescript 51 + // packages/hatk/src/database/ports.ts 52 + 53 + export type Dialect = 'duckdb' | 'sqlite' | 'postgres' 54 + 55 + export interface DatabasePort { 56 + /** Dialect identifier for SQL generation differences */ 57 + dialect: Dialect 58 + 59 + /** Open a database connection. path is file path or ':memory:' */ 60 + open(path: string): Promise<void> 61 + 62 + /** Close all connections and release resources */ 63 + close(): void 64 + 65 + /** Execute a read query, return rows as plain objects */ 66 + query<T = Record<string, unknown>>(sql: string, params?: unknown[]): Promise<T[]> 67 + 68 + /** Execute a write statement (INSERT, UPDATE, DELETE, DDL) */ 69 + execute(sql: string, params?: unknown[]): Promise<void> 70 + 71 + /** Execute multiple statements in sequence (for DDL batches) */ 72 + executeMultiple(sql: string): Promise<void> 73 + 74 + /** Begin a transaction */ 75 + beginTransaction(): Promise<void> 76 + 77 + /** Commit the current transaction */ 78 + commit(): Promise<void> 79 + 80 + /** Rollback the current transaction */ 81 + rollback(): Promise<void> 82 + 83 + /** Create a bulk inserter for high-throughput writes */ 84 + createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> 85 + } 86 + 87 + export interface BulkInserter { 88 + /** Append a single row of values */ 89 + append(values: unknown[]): void 90 + 91 + /** Flush buffered rows to the database */ 92 + flush(): Promise<void> 93 + 94 + /** Close the inserter and release resources */ 95 + close(): Promise<void> 96 + } 97 + 98 + export interface SearchPort { 99 + /** Build/rebuild an FTS index for a table */ 100 + buildIndex( 101 + shadowTable: string, 102 + sourceQuery: string, 103 + searchColumns: string[], 104 + ): Promise<void> 105 + 106 + /** Search a table, returning URIs with scores */ 107 + search( 108 + shadowTable: string, 109 + query: string, 110 + searchColumns: string[], 111 + limit: number, 112 + offset: number, 113 + ): Promise<Array<{ uri: string; score: number }>> 114 + } 115 + ``` 116 + 117 + **Step 2: Verify the file compiles** 118 + 119 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit packages/hatk/src/database/ports.ts` 120 + Expected: No errors 121 + 122 + **Step 3: Commit** 123 + 124 + ```bash 125 + git add packages/hatk/src/database/ports.ts 126 + git commit -m "feat: add DatabasePort, BulkInserter, and SearchPort interfaces" 127 + ``` 128 + 129 + --- 130 + 131 + ## Task 2: Create `src/database/dialect.ts` — SQL Dialect Configs 132 + 133 + **Files:** 134 + - Create: `packages/hatk/src/database/dialect.ts` 135 + 136 + **Context:** This replaces the hardcoded DuckDB types in `schema.ts:54-74` (`mapType` function). The `duckdbType` field on `ColumnDef` will be renamed to `sqlType` in a later task. 137 + 138 + **Step 1: Write the dialect configuration file** 139 + 140 + ```typescript 141 + // packages/hatk/src/database/dialect.ts 142 + 143 + import type { Dialect } from './ports.ts' 144 + 145 + export interface SqlDialect { 146 + /** Map from lexicon type key to SQL column type */ 147 + typeMap: Record<string, string> 148 + 149 + /** Timestamp type name */ 150 + timestampType: string 151 + 152 + /** JSON type name */ 153 + jsonType: string 154 + 155 + /** Parameter placeholder for index (1-based). DuckDB/Postgres: $1 SQLite: ? */ 156 + param(index: number): string 157 + 158 + /** Whether the engine supports native bulk appenders (DuckDB) vs batched INSERT */ 159 + supportsAppender: boolean 160 + 161 + /** SQL for upsert — 'INSERT OR REPLACE' (DuckDB/SQLite) vs 'ON CONFLICT DO UPDATE' */ 162 + upsertPrefix: string 163 + 164 + /** Extract a string value from a JSON column. Returns SQL expression. */ 165 + jsonExtractString(column: string, path: string): string 166 + 167 + /** Aggregate strings from a JSON array. Returns SQL expression. */ 168 + jsonArrayStringAgg(column: string, jsonPath: string): string 169 + 170 + /** Information schema query to list user tables */ 171 + listTablesQuery: string 172 + 173 + /** CHECKPOINT or equivalent (for WAL compaction). null if not needed. */ 174 + checkpointSQL: string | null 175 + 176 + /** Current timestamp expression */ 177 + currentTimestamp: string 178 + } 179 + 180 + export const DUCKDB_DIALECT: SqlDialect = { 181 + typeMap: { 182 + text: 'TEXT', 183 + integer: 'INTEGER', 184 + bigint: 'BIGINT', 185 + boolean: 'BOOLEAN', 186 + blob: 'BLOB', 187 + timestamp: 'TIMESTAMP', 188 + timestamptz: 'TIMESTAMPTZ', 189 + json: 'JSON', 190 + }, 191 + timestampType: 'TIMESTAMP', 192 + jsonType: 'JSON', 193 + param: (i: number) => `$${i}`, 194 + supportsAppender: true, 195 + upsertPrefix: 'INSERT OR REPLACE INTO', 196 + jsonExtractString: (col, path) => `json_extract_string(${col}, '${path}')`, 197 + jsonArrayStringAgg: (col, path) => `list_string_agg(json_extract_string(${col}, '${path}'))`, 198 + listTablesQuery: `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main' AND table_name NOT LIKE '\\_%' ESCAPE '\\\\'`, 199 + checkpointSQL: 'CHECKPOINT', 200 + currentTimestamp: 'CURRENT_TIMESTAMP', 201 + } 202 + 203 + export const SQLITE_DIALECT: SqlDialect = { 204 + typeMap: { 205 + text: 'TEXT', 206 + integer: 'INTEGER', 207 + bigint: 'INTEGER', 208 + boolean: 'INTEGER', 209 + blob: 'BLOB', 210 + timestamp: 'TEXT', 211 + timestamptz: 'TEXT', 212 + json: 'TEXT', 213 + }, 214 + timestampType: 'TEXT', 215 + jsonType: 'TEXT', 216 + param: (_i: number) => '?', 217 + supportsAppender: false, 218 + upsertPrefix: 'INSERT OR REPLACE INTO', 219 + jsonExtractString: (col, path) => `json_extract(${col}, '${path}')`, 220 + jsonArrayStringAgg: (col, path) => { 221 + // SQLite doesn't have list_string_agg — use json_each + group_concat 222 + return `(SELECT group_concat(je.value, ' ') FROM json_each(${col}, '${path}') je)` 223 + }, 224 + listTablesQuery: `SELECT name AS table_name FROM sqlite_master WHERE type='table' AND name NOT LIKE '\\_%' ESCAPE '\\\\'`, 225 + checkpointSQL: null, 226 + currentTimestamp: "datetime('now')", 227 + } 228 + 229 + export function getDialect(dialect: Dialect): SqlDialect { 230 + switch (dialect) { 231 + case 'duckdb': return DUCKDB_DIALECT 232 + case 'sqlite': return SQLITE_DIALECT 233 + case 'postgres': throw new Error('PostgreSQL adapter not yet implemented') 234 + } 235 + } 236 + ``` 237 + 238 + **Step 2: Verify the file compiles** 239 + 240 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit packages/hatk/src/database/dialect.ts` 241 + Expected: No errors 242 + 243 + **Step 3: Commit** 244 + 245 + ```bash 246 + git add packages/hatk/src/database/dialect.ts 247 + git commit -m "feat: add SQL dialect configs for DuckDB and SQLite" 248 + ``` 249 + 250 + --- 251 + 252 + ## Task 3: Create DuckDB Adapter — `src/database/adapters/duckdb.ts` 253 + 254 + **Files:** 255 + - Create: `packages/hatk/src/database/adapters/duckdb.ts` 256 + 257 + **Context:** Extract from `src/db.ts` lines 1-100 (the DuckDB instance, connections, `bindParams`, `enqueue`). The adapter wraps `@duckdb/node-api` and implements `DatabasePort`. It keeps the read/write connection separation and queuing. 258 + 259 + **Step 1: Write the DuckDB adapter** 260 + 261 + ```typescript 262 + // packages/hatk/src/database/adapters/duckdb.ts 263 + 264 + import { DuckDBInstance } from '@duckdb/node-api' 265 + import type { DatabasePort, BulkInserter } from '../ports.ts' 266 + import type { Dialect } from '../ports.ts' 267 + 268 + export class DuckDBAdapter implements DatabasePort { 269 + dialect: Dialect = 'duckdb' 270 + 271 + private instance!: DuckDBInstance 272 + private writeCon!: Awaited<ReturnType<DuckDBInstance['connect']>> 273 + private readCon!: Awaited<ReturnType<DuckDBInstance['connect']>> 274 + private writeQueue = Promise.resolve() 275 + private readQueue = Promise.resolve() 276 + 277 + async open(path: string): Promise<void> { 278 + this.instance = await DuckDBInstance.create(path) 279 + this.writeCon = await this.instance.connect() 280 + this.readCon = await this.instance.connect() 281 + } 282 + 283 + close(): void { 284 + try { this.readCon?.closeSync() } catch {} 285 + try { this.writeCon?.closeSync() } catch {} 286 + try { this.instance?.closeSync() } catch {} 287 + } 288 + 289 + async query<T = Record<string, unknown>>(sql: string, params: unknown[] = []): Promise<T[]> { 290 + return this.enqueue('read', async () => { 291 + if (params.length === 0) { 292 + const reader = await this.readCon.runAndReadAll(sql) 293 + return this.rowsToObjects(reader) as T[] 294 + } 295 + const prepared = await this.readCon.prepare(sql) 296 + this.bindParams(prepared, params) 297 + const reader = await prepared.runAndReadAll() 298 + return this.rowsToObjects(reader) as T[] 299 + }) 300 + } 301 + 302 + async execute(sql: string, params: unknown[] = []): Promise<void> { 303 + return this.enqueue('write', async () => { 304 + if (params.length === 0) { 305 + await this.writeCon.run(sql) 306 + return 307 + } 308 + const prepared = await this.writeCon.prepare(sql) 309 + this.bindParams(prepared, params) 310 + await prepared.run() 311 + }) 312 + } 313 + 314 + async executeMultiple(sql: string): Promise<void> { 315 + return this.enqueue('write', async () => { 316 + await this.writeCon.run(sql) 317 + }) 318 + } 319 + 320 + async beginTransaction(): Promise<void> { 321 + return this.enqueue('write', async () => { 322 + await this.writeCon.run('BEGIN TRANSACTION') 323 + }) 324 + } 325 + 326 + async commit(): Promise<void> { 327 + return this.enqueue('write', async () => { 328 + await this.writeCon.run('COMMIT') 329 + }) 330 + } 331 + 332 + async rollback(): Promise<void> { 333 + return this.enqueue('write', async () => { 334 + await this.writeCon.run('ROLLBACK') 335 + }) 336 + } 337 + 338 + async createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> { 339 + // DuckDB appender uses the write connection's table appender 340 + // Note: DuckDB appender doesn't take column names — it appends in table column order 341 + const appender = await this.writeCon.createAppender('main', table.replace(/"/g, '')) 342 + return { 343 + append(values: unknown[]) { 344 + appender.appendRow(...values) 345 + }, 346 + async flush() { 347 + appender.flush() 348 + }, 349 + async close() { 350 + appender.close() 351 + }, 352 + } 353 + } 354 + 355 + // --- Internal helpers --- 356 + 357 + private enqueue<T>(queue: 'read' | 'write', fn: () => Promise<T>): Promise<T> { 358 + if (queue === 'write') { 359 + const p = this.writeQueue.then(fn) 360 + this.writeQueue = p.then(() => {}, () => {}) 361 + return p 362 + } else { 363 + const p = this.readQueue.then(fn) 364 + this.readQueue = p.then(() => {}, () => {}) 365 + return p 366 + } 367 + } 368 + 369 + private bindParams(prepared: any, params: unknown[]): void { 370 + for (let i = 0; i < params.length; i++) { 371 + const idx = i + 1 372 + const value = params[i] 373 + if (value === null || value === undefined) { 374 + prepared.bindNull(idx) 375 + } else if (typeof value === 'string') { 376 + prepared.bindVarchar(idx, value) 377 + } else if (typeof value === 'number') { 378 + if (Number.isInteger(value)) { 379 + prepared.bindInteger(idx, value) 380 + } else { 381 + prepared.bindDouble(idx, value) 382 + } 383 + } else if (typeof value === 'boolean') { 384 + prepared.bindBoolean(idx, value) 385 + } else if (value instanceof Uint8Array) { 386 + prepared.bindBlob(idx, value) 387 + } else { 388 + prepared.bindVarchar(idx, JSON.stringify(value)) 389 + } 390 + } 391 + } 392 + 393 + private rowsToObjects(reader: any): Record<string, unknown>[] { 394 + const columns = reader.columnNames() 395 + const rows = reader.getRows() 396 + return rows.map((row: any[]) => { 397 + const obj: Record<string, unknown> = {} 398 + for (let i = 0; i < columns.length; i++) { 399 + obj[columns[i]] = row[i] 400 + } 401 + return obj 402 + }) 403 + } 404 + } 405 + ``` 406 + 407 + **Important notes for implementer:** 408 + - The `bindParams` method is copied from `src/db.ts:46-80`. Check the original for any additional type bindings (BigInt, Date, etc.) that may have been added since this plan was written. 409 + - The `rowsToObjects` method replaces the inline conversion in `src/db.ts`. Check `querySQL` (line ~1241) and `queryRecords` (line ~846) for how results are currently converted — the DuckDB reader API returns column-based data that needs flattening. 410 + - `createBulkInserter` wraps DuckDB's `createAppender`. The current code at `db.ts:572-845` (`bulkInsertRecords`) manually manages appenders — this adapter provides the low-level appender and the shared `db.ts` will manage the staging/batch logic. 411 + 412 + **Step 2: Verify the file compiles** 413 + 414 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit packages/hatk/src/database/adapters/duckdb.ts` 415 + Expected: No errors (may need to check exact DuckDB API method names) 416 + 417 + **Step 3: Commit** 418 + 419 + ```bash 420 + git add packages/hatk/src/database/adapters/duckdb.ts 421 + git commit -m "feat: add DuckDB adapter implementing DatabasePort" 422 + ``` 423 + 424 + --- 425 + 426 + ## Task 4: Create SQLite Adapter — `src/database/adapters/sqlite.ts` 427 + 428 + **Files:** 429 + - Create: `packages/hatk/src/database/adapters/sqlite.ts` 430 + 431 + **Context:** New adapter using `better-sqlite3`. SQLite is synchronous, so we wrap in promises. Uses WAL mode for concurrent reads. `BulkInserter` batches rows into multi-row INSERT within a transaction. 432 + 433 + **Step 1: Add `better-sqlite3` dependency** 434 + 435 + Run: `cd /Users/chadmiller/code/hatk/packages/hatk && npm install better-sqlite3 && npm install -D @types/better-sqlite3` 436 + 437 + **Step 2: Write the SQLite adapter** 438 + 439 + ```typescript 440 + // packages/hatk/src/database/adapters/sqlite.ts 441 + 442 + import Database from 'better-sqlite3' 443 + import type { DatabasePort, BulkInserter, Dialect } from '../ports.ts' 444 + 445 + export class SQLiteAdapter implements DatabasePort { 446 + dialect: Dialect = 'sqlite' 447 + 448 + private db!: Database.Database 449 + 450 + async open(path: string): Promise<void> { 451 + this.db = new Database(path === ':memory:' ? ':memory:' : path) 452 + this.db.pragma('journal_mode = WAL') 453 + this.db.pragma('synchronous = NORMAL') 454 + this.db.pragma('foreign_keys = ON') 455 + } 456 + 457 + close(): void { 458 + try { this.db?.close() } catch {} 459 + } 460 + 461 + async query<T = Record<string, unknown>>(sql: string, params: unknown[] = []): Promise<T[]> { 462 + const stmt = this.db.prepare(sql) 463 + return stmt.all(...params) as T[] 464 + } 465 + 466 + async execute(sql: string, params: unknown[] = []): Promise<void> { 467 + const stmt = this.db.prepare(sql) 468 + stmt.run(...params) 469 + } 470 + 471 + async executeMultiple(sql: string): Promise<void> { 472 + this.db.exec(sql) 473 + } 474 + 475 + async beginTransaction(): Promise<void> { 476 + this.db.exec('BEGIN') 477 + } 478 + 479 + async commit(): Promise<void> { 480 + this.db.exec('COMMIT') 481 + } 482 + 483 + async rollback(): Promise<void> { 484 + this.db.exec('ROLLBACK') 485 + } 486 + 487 + async createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> { 488 + const placeholders = columns.map(() => '?').join(', ') 489 + const sql = `INSERT INTO ${table} (${columns.join(', ')}) VALUES (${placeholders})` 490 + const stmt = this.db.prepare(sql) 491 + const buffer: unknown[][] = [] 492 + const BATCH_SIZE = 500 493 + 494 + const self = this 495 + return { 496 + append(values: unknown[]) { 497 + buffer.push(values) 498 + if (buffer.length >= BATCH_SIZE) { 499 + const tx = self.db.transaction(() => { 500 + for (const row of buffer) { 501 + stmt.run(...row) 502 + } 503 + }) 504 + tx() 505 + buffer.length = 0 506 + } 507 + }, 508 + async flush() { 509 + if (buffer.length > 0) { 510 + const tx = self.db.transaction(() => { 511 + for (const row of buffer) { 512 + stmt.run(...row) 513 + } 514 + }) 515 + tx() 516 + buffer.length = 0 517 + } 518 + }, 519 + async close() { 520 + // flush remaining 521 + if (buffer.length > 0) { 522 + const tx = self.db.transaction(() => { 523 + for (const row of buffer) { 524 + stmt.run(...row) 525 + } 526 + }) 527 + tx() 528 + buffer.length = 0 529 + } 530 + }, 531 + } 532 + } 533 + } 534 + ``` 535 + 536 + **Step 3: Verify the file compiles** 537 + 538 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit packages/hatk/src/database/adapters/sqlite.ts` 539 + Expected: No errors 540 + 541 + **Step 4: Commit** 542 + 543 + ```bash 544 + git add packages/hatk/src/database/adapters/sqlite.ts 545 + git commit -m "feat: add SQLite adapter implementing DatabasePort" 546 + ``` 547 + 548 + --- 549 + 550 + ## Task 5: Rename `duckdbType` to `sqlType` in `schema.ts` 551 + 552 + **Files:** 553 + - Modify: `packages/hatk/src/schema.ts` — rename `duckdbType` field to `sqlType` everywhere 554 + - Modify: `packages/hatk/src/db.ts` — update all references to `duckdbType` 555 + - Modify: `packages/hatk/src/fts.ts` — update all references to `duckdbType` 556 + 557 + **Context:** The `ColumnDef` interface at `schema.ts:4-10` has `duckdbType: string`. This needs to become `sqlType: string` since it will hold dialect-specific types. This is a mechanical rename. 558 + 559 + **Step 1: Rename in schema.ts** 560 + 561 + In `packages/hatk/src/schema.ts`: 562 + - Line 8: `duckdbType: string` → `sqlType: string` 563 + - All occurrences of `duckdbType` in the file (in `mapType` return values, `generateTableSchema`, `generateCreateTableSQL`, etc.) → `sqlType` 564 + - Rename `TypeMapping.duckdbType` (line 50) → `TypeMapping.sqlType` 565 + - Update all `mapType` return values: `{ duckdbType: 'TEXT', ... }` → `{ sqlType: 'TEXT', ... }` 566 + 567 + **Step 2: Rename in db.ts** 568 + 569 + In `packages/hatk/src/db.ts`: 570 + - Search for all `duckdbType` references and replace with `sqlType` 571 + - These appear in `insertRecord`, `bulkInsertRecords`, `queryRecords`, `reshapeRow`, and a few other functions where column type is checked 572 + 573 + **Step 3: Rename in fts.ts** 574 + 575 + In `packages/hatk/src/fts.ts`: 576 + - Search for all `duckdbType` references and replace with `sqlType` 577 + - These appear in `buildFtsIndex` where it checks column types 578 + 579 + **Step 4: Verify everything compiles** 580 + 581 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 582 + Expected: No errors 583 + 584 + **Step 5: Run tests** 585 + 586 + Run: `cd /Users/chadmiller/code/hatk && npm test` 587 + Expected: All tests pass 588 + 589 + **Step 6: Commit** 590 + 591 + ```bash 592 + git add packages/hatk/src/schema.ts packages/hatk/src/db.ts packages/hatk/src/fts.ts 593 + git commit -m "refactor: rename duckdbType to sqlType for dialect neutrality" 594 + ``` 595 + 596 + --- 597 + 598 + ## Task 6: Make `schema.ts` Dialect-Aware 599 + 600 + **Files:** 601 + - Modify: `packages/hatk/src/schema.ts` 602 + 603 + **Context:** The `mapType` function at `schema.ts:54-74` returns hardcoded DuckDB type names. Refactor it to accept a dialect and use the dialect's type map. The function is called from `generateTableSchema` (line 255) and `resolveUnionBranch` (line 239). 604 + 605 + **Step 1: Update `mapType` to accept a type map** 606 + 607 + The current `mapType` returns strings like `'TEXT'`, `'INTEGER'`, `'BOOLEAN'`, `'BLOB'`, `'JSON'`, `'TIMESTAMP'`. Replace these with lookups from a type map parameter: 608 + 609 + ```typescript 610 + import type { SqlDialect } from './database/dialect.ts' 611 + 612 + function mapType(prop: any, dialect: SqlDialect): TypeMapping { 613 + if (prop.type === 'string') { 614 + if (prop.format === 'datetime') return { sqlType: dialect.typeMap.timestamp, isRef: false } 615 + if (prop.format === 'at-uri') return { sqlType: dialect.typeMap.text, isRef: true } 616 + return { sqlType: dialect.typeMap.text, isRef: false } 617 + } 618 + if (prop.type === 'integer') return { sqlType: dialect.typeMap.integer, isRef: false } 619 + if (prop.type === 'boolean') return { sqlType: dialect.typeMap.boolean, isRef: false } 620 + if (prop.type === 'bytes') return { sqlType: dialect.typeMap.blob, isRef: false } 621 + if (prop.type === 'cid-link') return { sqlType: dialect.typeMap.text, isRef: false } 622 + if (prop.type === 'array') return { sqlType: dialect.jsonType, isRef: false } 623 + if (prop.type === 'blob') return { sqlType: dialect.jsonType, isRef: false } 624 + if (prop.type === 'union') return { sqlType: dialect.jsonType, isRef: false } 625 + if (prop.type === 'unknown') return { sqlType: dialect.jsonType, isRef: false } 626 + if (prop.type === 'object') return { sqlType: dialect.jsonType, isRef: false } 627 + if (prop.type === 'ref') { 628 + if (prop.ref === 'com.atproto.repo.strongRef') return { sqlType: 'STRONG_REF', isRef: true } 629 + return { sqlType: dialect.jsonType, isRef: false } 630 + } 631 + return { sqlType: dialect.typeMap.text, isRef: false } 632 + } 633 + ``` 634 + 635 + **Step 2: Thread dialect through `generateTableSchema`, `resolveUnionBranch`, `generateCreateTableSQL`, `buildSchemas`** 636 + 637 + All of these functions need a `dialect: SqlDialect` parameter added. Thread it through the call chain: 638 + 639 + - `buildSchemas(lexicons, collections, dialect)` → passes to `generateTableSchema` 640 + - `generateTableSchema(nsid, lexicon, lexicons, dialect)` → passes to `mapType` and `resolveUnionBranch` 641 + - `resolveUnionBranch(..., dialect)` → passes to `mapType` 642 + - `generateCreateTableSQL(schema, dialect)` → uses `dialect.timestampType` for the `indexed_at` column and system columns 643 + 644 + Update `generateCreateTableSQL` to use `dialect.timestampType` for the `indexed_at TIMESTAMP` column (line 369): 645 + ```typescript 646 + // Before: 647 + ' indexed_at TIMESTAMP NOT NULL', 648 + // After: 649 + ` indexed_at ${dialect.timestampType} NOT NULL`, 650 + ``` 651 + 652 + And `TIMESTAMP DEFAULT CURRENT_TIMESTAMP` in OAuth DDL becomes dialect-aware too. 653 + 654 + **Step 3: Update callers of `buildSchemas`** 655 + 656 + - `main.ts:70` — `buildSchemas(lexicons, collections)` → `buildSchemas(lexicons, collections, dialect)` (dialect comes from adapter) 657 + - `cli.ts` — same pattern 658 + 659 + **Step 4: Verify everything compiles** 660 + 661 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 662 + Expected: No errors 663 + 664 + **Step 5: Run tests** 665 + 666 + Run: `cd /Users/chadmiller/code/hatk && npm test` 667 + Expected: All tests pass 668 + 669 + **Step 6: Commit** 670 + 671 + ```bash 672 + git add packages/hatk/src/schema.ts packages/hatk/src/main.ts packages/hatk/src/cli.ts 673 + git commit -m "refactor: make schema generation dialect-aware via SqlDialect" 674 + ``` 675 + 676 + --- 677 + 678 + ## Task 7: Refactor `db.ts` to Use `DatabasePort` 679 + 680 + **Files:** 681 + - Modify: `packages/hatk/src/db.ts` (will become `packages/hatk/src/database/db.ts`) 682 + 683 + **Context:** This is the largest task. The current `db.ts` has module-level DuckDB state. Replace with a module-level `DatabasePort` reference. All functions that call DuckDB directly (`con.run()`, `readCon.prepare()`, etc.) switch to `port.query()` / `port.execute()`. 684 + 685 + **Step 1: Replace module-level DuckDB state with port reference** 686 + 687 + Remove: 688 + ```typescript 689 + import { DuckDBInstance } from '@duckdb/node-api' 690 + let instance: DuckDBInstance 691 + let con: ... 692 + let readCon: ... 693 + let writeQueue = Promise.resolve() 694 + let readQueue = Promise.resolve() 695 + function enqueue(...) { ... } 696 + function bindParams(...) { ... } 697 + ``` 698 + 699 + Replace with: 700 + ```typescript 701 + import type { DatabasePort, BulkInserter } from './ports.ts' 702 + import { getDialect, type SqlDialect } from './dialect.ts' 703 + 704 + let port: DatabasePort 705 + let dialect: SqlDialect 706 + 707 + export function getDatabasePort(): DatabasePort { return port } 708 + export function getSqlDialect(): SqlDialect { return dialect } 709 + ``` 710 + 711 + **Step 2: Refactor `initDatabase`** 712 + 713 + Current `initDatabase` (line 121) creates a DuckDB instance and runs DDL. Replace with: 714 + 715 + ```typescript 716 + export async function initDatabase( 717 + adapter: DatabasePort, 718 + dbPath: string, 719 + tableSchemas: TableSchema[], 720 + ddlStatements: string[], 721 + ): Promise<void> { 722 + port = adapter 723 + dialect = getDialect(adapter.dialect) 724 + 725 + await port.open(dbPath) 726 + 727 + // Run system table DDL 728 + await port.executeMultiple(SYSTEM_DDL) 729 + 730 + // Run collection DDL 731 + for (const ddl of ddlStatements) { 732 + await port.executeMultiple(ddl) 733 + } 734 + 735 + // Store schemas in memory 736 + for (const s of tableSchemas) { 737 + schemas.set(s.collection, s) 738 + } 739 + } 740 + ``` 741 + 742 + **Step 3: Refactor `querySQL` and `runSQL`** 743 + 744 + Current implementations use the read/write connections directly. Replace with port calls: 745 + 746 + ```typescript 747 + export async function querySQL(sql: string, params: any[] = []): Promise<any[]> { 748 + return port.query(sql, params) 749 + } 750 + 751 + export async function runSQL(sql: string, ...params: any[]): Promise<void> { 752 + return port.execute(sql, params) 753 + } 754 + ``` 755 + 756 + **Step 4: Refactor `closeDatabase`** 757 + 758 + ```typescript 759 + export function closeDatabase(): void { 760 + port?.close() 761 + } 762 + ``` 763 + 764 + **Step 5: Refactor all other functions** 765 + 766 + Every function that currently uses `enqueue('write', ...)` or `enqueue('read', ...)` with raw DuckDB calls needs to switch to `port.query()` or `port.execute()`. Key functions: 767 + 768 + - `getCursor`, `setCursor` — simple query/execute, straightforward 769 + - `setRepoStatus`, `getRepoStatus`, etc. — simple query/execute 770 + - `insertRecord` — uses `con.prepare()` and `bindParams`. Replace with `port.execute(sql, params)` 771 + - `deleteRecord` — similar 772 + - `bulkInsertRecords` — currently manages DuckDB appenders directly. Replace appender creation with `port.createBulkInserter()`. The staging table logic needs adjustment based on `dialect.supportsAppender`. 773 + - `queryRecords` — builds SQL and reads results. Replace query execution with `port.query()`. The result row conversion (DuckDB reader → objects) moves into the adapter. 774 + - `searchRecords` — builds FTS query (DuckDB-specific `match_bm25`). This needs to dispatch to SearchPort or LIKE fallback. 775 + - `runBatch` — iterates operations, executes each. Use `port.execute()`. 776 + 777 + **Step 6: Handle SQL dialect differences in query building** 778 + 779 + Places where SQL differs by dialect (found in current `db.ts`): 780 + 781 + 1. **Parameter placeholders:** Current code uses `$1`, `$2`, etc. For SQLite, these become `?`. Use `dialect.param(i)` when building parameterized queries. 782 + 783 + 2. **`INSERT OR REPLACE`:** Used in `setRepoStatus`, `setCursor`, `storeServerKey`, etc. This syntax works in both DuckDB and SQLite. No change needed. 784 + 785 + 3. **`CURRENT_TIMESTAMP`:** Used in several places. Works in both. No change needed. 786 + 787 + 4. **`string_agg`:** Used in `searchRecords`. SQLite uses `group_concat`. Parameterize. 788 + 789 + 5. **`information_schema.tables`:** Used in `getSchemaDump` and `main.ts` orphan detection. Use `dialect.listTablesQuery`. 790 + 791 + 6. **`json_extract_string` / `list_string_agg`:** DuckDB-specific JSON functions. Use dialect helpers. 792 + 793 + **Step 7: Move file to `packages/hatk/src/database/db.ts`** 794 + 795 + Move the refactored `db.ts` to its new location. Update the import of `schema.ts` and `fts.ts` to be relative within `database/`. 796 + 797 + **Step 8: Verify everything compiles** 798 + 799 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 800 + Expected: Compilation errors from files still importing old path — these are fixed in Task 9. 801 + 802 + **Step 9: Commit** 803 + 804 + ```bash 805 + git add packages/hatk/src/database/db.ts 806 + git commit -m "refactor: rewrite db.ts to use DatabasePort instead of direct DuckDB calls" 807 + ``` 808 + 809 + --- 810 + 811 + ## Task 8: Move `schema.ts` and `fts.ts` into `database/` 812 + 813 + **Files:** 814 + - Move: `packages/hatk/src/schema.ts` → `packages/hatk/src/database/schema.ts` 815 + - Move: `packages/hatk/src/fts.ts` → `packages/hatk/src/database/fts.ts` 816 + 817 + **Step 1: Move schema.ts** 818 + 819 + Move the file. Update its internal imports if any (currently it has no imports from other hatk files). 820 + 821 + **Step 2: Move fts.ts** 822 + 823 + Move the file. Update its imports: 824 + - `from './db.ts'` → `from './db.ts'` (same since both are now in `database/`) 825 + - `from './schema.ts'` → `from './schema.ts'` (same) 826 + 827 + **Step 3: Refactor `fts.ts` for SearchPort** 828 + 829 + The current `fts.ts` uses DuckDB-specific `PRAGMA create_fts_index` and shadow tables. Refactor: 830 + 831 + - Keep `stripStopWords`, `getSearchColumns`, `getLastRebuiltAt`, `ftsTableName` as-is (utility functions) 832 + - `buildFtsIndex` should check if a `SearchPort` is available. If yes, delegate. If no, skip FTS (the `LIKE` fallback is in `searchRecords` in `db.ts`) 833 + - `rebuildAllIndexes` stays as orchestrator 834 + 835 + ```typescript 836 + import type { SearchPort } from './ports.ts' 837 + 838 + let searchPort: SearchPort | null = null 839 + 840 + export function setSearchPort(port: SearchPort | null): void { 841 + searchPort = port 842 + } 843 + 844 + export function hasSearchPort(): boolean { 845 + return searchPort !== null 846 + } 847 + 848 + export async function buildFtsIndex(collection: string): Promise<void> { 849 + if (!searchPort) return // No FTS support for this adapter 850 + 851 + // ... existing shadow table query building logic ... 852 + // Instead of running PRAGMA directly, call: 853 + await searchPort.buildIndex(safeName, sourceQuery, searchColNames) 854 + } 855 + ``` 856 + 857 + **Step 4: Commit** 858 + 859 + ```bash 860 + git add packages/hatk/src/database/schema.ts packages/hatk/src/database/fts.ts 861 + git rm packages/hatk/src/schema.ts packages/hatk/src/fts.ts 862 + git commit -m "refactor: move schema.ts and fts.ts into database/ directory" 863 + ``` 864 + 865 + --- 866 + 867 + ## Task 9: Create DuckDB SearchPort — `src/database/adapters/duckdb-search.ts` 868 + 869 + **Files:** 870 + - Create: `packages/hatk/src/database/adapters/duckdb-search.ts` 871 + 872 + **Context:** Extract the DuckDB FTS PRAGMA calls from `fts.ts` into a `SearchPort` implementation. 873 + 874 + **Step 1: Write the DuckDB search adapter** 875 + 876 + ```typescript 877 + // packages/hatk/src/database/adapters/duckdb-search.ts 878 + 879 + import type { SearchPort } from '../ports.ts' 880 + import type { DatabasePort } from '../ports.ts' 881 + 882 + export class DuckDBSearchPort implements SearchPort { 883 + constructor(private port: DatabasePort) {} 884 + 885 + async buildIndex( 886 + shadowTable: string, 887 + sourceQuery: string, 888 + searchColumns: string[], 889 + ): Promise<void> { 890 + // Create shadow table 891 + await this.port.execute(`CREATE OR REPLACE TABLE ${shadowTable} AS ${sourceQuery}`) 892 + 893 + // Drop existing index 894 + try { 895 + await this.port.execute(`PRAGMA drop_fts_index('${shadowTable}')`) 896 + } catch {} 897 + 898 + // Build FTS index 899 + const colList = searchColumns.map((c) => `'${c}'`).join(', ') 900 + await this.port.execute( 901 + `PRAGMA create_fts_index('${shadowTable}', 'uri', ${colList}, stemmer='porter', stopwords='english', strip_accents=1, lower=1, overwrite=1)` 902 + ) 903 + } 904 + 905 + async search( 906 + shadowTable: string, 907 + query: string, 908 + searchColumns: string[], 909 + limit: number, 910 + offset: number, 911 + ): Promise<Array<{ uri: string; score: number }>> { 912 + const colList = searchColumns.map((c) => `'${c}'`).join(', ') 913 + const sql = `SELECT uri, fts_main_${shadowTable}.match_bm25(uri, $1, fields := ${colList}) AS score 914 + FROM ${shadowTable} 915 + WHERE score IS NOT NULL 916 + ORDER BY score DESC 917 + LIMIT $2 OFFSET $3` 918 + return this.port.query(sql, [query, limit, offset]) 919 + } 920 + } 921 + ``` 922 + 923 + **Step 2: Commit** 924 + 925 + ```bash 926 + git add packages/hatk/src/database/adapters/duckdb-search.ts 927 + git commit -m "feat: add DuckDB SearchPort for FTS via PRAGMA" 928 + ``` 929 + 930 + --- 931 + 932 + ## Task 10: Update All Import Paths 933 + 934 + **Files to modify (every file that imports from `db.ts`, `schema.ts`, or `fts.ts`):** 935 + 936 + - `packages/hatk/src/server.ts` — `'./db.ts'` → `'./database/db.ts'`, `'./schema.ts'` → `'./database/schema.ts'` 937 + - `packages/hatk/src/indexer.ts` — `'./db.ts'` → `'./database/db.ts'`, `'./schema.ts'` → `'./database/schema.ts'`, `'./fts.ts'` → `'./database/fts.ts'` 938 + - `packages/hatk/src/main.ts` — `'./db.ts'` → `'./database/db.ts'`, `'./schema.ts'` → `'./database/schema.ts'`, `'./fts.ts'` → `'./database/fts.ts'` 939 + - `packages/hatk/src/cli.ts` — `'./schema.ts'` → `'./database/schema.ts'` 940 + - `packages/hatk/src/backfill.ts` — `'./db.ts'` → `'./database/db.ts'` 941 + - `packages/hatk/src/feeds.ts` — `'./db.ts'` → `'./database/db.ts'` 942 + - `packages/hatk/src/xrpc.ts` — `'./db.ts'` → `'./database/db.ts'`, `'./schema.ts'` → `'./database/schema.ts'` 943 + - `packages/hatk/src/opengraph.ts` — `'./db.ts'` → `'./database/db.ts'` 944 + - `packages/hatk/src/hydrate.ts` — `'./db.ts'` → `'./database/db.ts'` 945 + - `packages/hatk/src/labels.ts` — `'./db.ts'` → `'./database/db.ts'` 946 + - `packages/hatk/src/hooks.ts` — `'./db.ts'` → `'./database/db.ts'` 947 + - `packages/hatk/src/setup.ts` — `'./db.ts'` → `'./database/db.ts'` 948 + - `packages/hatk/src/test.ts` — `'./db.ts'` → `'./database/db.ts'`, `'./schema.ts'` → `'./database/schema.ts'` 949 + - `packages/hatk/src/seed.ts` — `'./schema.ts'` → `'./database/schema.ts'` 950 + - `packages/hatk/src/views.ts` — `'./schema.ts'` → `'./database/schema.ts'` 951 + - `packages/hatk/src/oauth/db.ts` — `'../db.ts'` → `'../database/db.ts'` 952 + - `packages/hatk/src/oauth/server.ts` — `'../db.ts'` → `'../database/db.ts'` 953 + 954 + **Step 1: Update all imports** 955 + 956 + Mechanical find-and-replace in each file listed above. Also delete the old `src/db.ts`, `src/schema.ts`, `src/fts.ts` files if not already done. 957 + 958 + **Step 2: Verify everything compiles** 959 + 960 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 961 + Expected: No errors 962 + 963 + **Step 3: Run tests** 964 + 965 + Run: `cd /Users/chadmiller/code/hatk && npm test` 966 + Expected: All tests pass 967 + 968 + **Step 4: Commit** 969 + 970 + ```bash 971 + git add -A 972 + git commit -m "refactor: update all import paths to database/ directory" 973 + ``` 974 + 975 + --- 976 + 977 + ## Task 11: Add `databaseEngine` Config Option and Adapter Factory 978 + 979 + **Files:** 980 + - Modify: `packages/hatk/src/config.ts` — add `databaseEngine` field 981 + - Create: `packages/hatk/src/database/adapter-factory.ts` — dynamic import and instantiation 982 + - Modify: `packages/hatk/src/main.ts` — use adapter factory 983 + 984 + **Step 1: Add config field** 985 + 986 + In `packages/hatk/src/config.ts`, add to `HatkConfig` interface (line 40-51): 987 + ```typescript 988 + databaseEngine: 'duckdb' | 'sqlite' // which database adapter to use 989 + ``` 990 + 991 + In `loadConfig` (line 69), add default: 992 + ```typescript 993 + databaseEngine: (env.DATABASE_ENGINE as any) || parsed.databaseEngine || 'duckdb', 994 + ``` 995 + 996 + Update `HatkConfigInput` to make it optional (it defaults to `'duckdb'`). 997 + 998 + **Step 2: Create adapter factory** 999 + 1000 + ```typescript 1001 + // packages/hatk/src/database/adapter-factory.ts 1002 + 1003 + import type { DatabasePort } from './ports.ts' 1004 + import type { SearchPort } from './ports.ts' 1005 + 1006 + export async function createAdapter(engine: 'duckdb' | 'sqlite'): Promise<{ 1007 + adapter: DatabasePort 1008 + searchPort: SearchPort | null 1009 + }> { 1010 + switch (engine) { 1011 + case 'duckdb': { 1012 + const { DuckDBAdapter } = await import('./adapters/duckdb.ts') 1013 + const { DuckDBSearchPort } = await import('./adapters/duckdb-search.ts') 1014 + const adapter = new DuckDBAdapter() 1015 + const searchPort = new DuckDBSearchPort(adapter) 1016 + return { adapter, searchPort } 1017 + } 1018 + case 'sqlite': { 1019 + const { SQLiteAdapter } = await import('./adapters/sqlite.ts') 1020 + return { adapter: new SQLiteAdapter(), searchPort: null } 1021 + } 1022 + default: 1023 + throw new Error(`Unsupported database engine: ${engine}`) 1024 + } 1025 + } 1026 + ``` 1027 + 1028 + **Step 3: Update `main.ts` startup** 1029 + 1030 + Replace the current direct DuckDB initialization with: 1031 + 1032 + ```typescript 1033 + import { createAdapter } from './database/adapter-factory.ts' 1034 + import { setSearchPort } from './database/fts.ts' 1035 + 1036 + // ... after config loaded ... 1037 + const { adapter, searchPort } = await createAdapter(config.databaseEngine) 1038 + setSearchPort(searchPort) 1039 + 1040 + if (config.database !== ':memory:') { 1041 + mkdirSync(dirname(config.database), { recursive: true }) 1042 + } 1043 + await initDatabase(adapter, config.database, schemas, ddlStatements) 1044 + log(`[main] Database initialized (${config.databaseEngine}, ${config.database === ':memory:' ? 'in-memory' : config.database})`) 1045 + ``` 1046 + 1047 + **Step 4: Update `test.ts` startup** 1048 + 1049 + The test utility at `packages/hatk/src/test.ts` also calls `initDatabase`. Update it similarly to create a DuckDB adapter (or make the adapter configurable for test). 1050 + 1051 + **Step 5: Verify everything compiles** 1052 + 1053 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 1054 + Expected: No errors 1055 + 1056 + **Step 6: Run tests** 1057 + 1058 + Run: `cd /Users/chadmiller/code/hatk && npm test` 1059 + Expected: All tests pass 1060 + 1061 + **Step 7: Commit** 1062 + 1063 + ```bash 1064 + git add packages/hatk/src/config.ts packages/hatk/src/database/adapter-factory.ts packages/hatk/src/main.ts packages/hatk/src/test.ts 1065 + git commit -m "feat: add databaseEngine config and adapter factory for DuckDB/SQLite selection" 1066 + ``` 1067 + 1068 + --- 1069 + 1070 + ## Task 12: Create `database/index.ts` Re-Export 1071 + 1072 + **Files:** 1073 + - Create: `packages/hatk/src/database/index.ts` 1074 + 1075 + **Context:** Provide a clean public API for the database module. External consumers (user-written feeds, xrpc handlers) may import from hatk — provide a barrel export. 1076 + 1077 + **Step 1: Write the index file** 1078 + 1079 + ```typescript 1080 + // packages/hatk/src/database/index.ts 1081 + 1082 + export type { DatabasePort, BulkInserter, SearchPort, Dialect } from './ports.ts' 1083 + export type { SqlDialect } from './dialect.ts' 1084 + export { getDialect, DUCKDB_DIALECT, SQLITE_DIALECT } from './dialect.ts' 1085 + export { createAdapter } from './adapter-factory.ts' 1086 + 1087 + // Re-export commonly used functions from db.ts 1088 + export { 1089 + initDatabase, 1090 + closeDatabase, 1091 + querySQL, 1092 + runSQL, 1093 + insertRecord, 1094 + deleteRecord, 1095 + queryRecords, 1096 + searchRecords, 1097 + getRecordByUri, 1098 + getCursor, 1099 + setCursor, 1100 + bulkInsertRecords, 1101 + packCursor, 1102 + unpackCursor, 1103 + } from './db.ts' 1104 + 1105 + // Re-export schema utilities 1106 + export { 1107 + type TableSchema, 1108 + type ColumnDef, 1109 + type ChildTableSchema, 1110 + loadLexicons, 1111 + discoverCollections, 1112 + buildSchemas, 1113 + generateTableSchema, 1114 + generateCreateTableSQL, 1115 + toSnakeCase, 1116 + getLexicon, 1117 + getLexiconArray, 1118 + getAllLexicons, 1119 + storeLexicons, 1120 + } from './schema.ts' 1121 + ``` 1122 + 1123 + **Step 2: Commit** 1124 + 1125 + ```bash 1126 + git add packages/hatk/src/database/index.ts 1127 + git commit -m "feat: add database module barrel export" 1128 + ``` 1129 + 1130 + --- 1131 + 1132 + ## Task 13: End-to-End Verification 1133 + 1134 + **Step 1: Full type check** 1135 + 1136 + Run: `cd /Users/chadmiller/code/hatk && npx tsc --noEmit` 1137 + Expected: No errors 1138 + 1139 + **Step 2: Run all tests** 1140 + 1141 + Run: `cd /Users/chadmiller/code/hatk && npm test` 1142 + Expected: All tests pass 1143 + 1144 + **Step 3: Build** 1145 + 1146 + Run: `cd /Users/chadmiller/code/hatk && npm run build` 1147 + Expected: Clean build 1148 + 1149 + **Step 4: Manual smoke test with DuckDB** 1150 + 1151 + Run: `cd /Users/chadmiller/code/hatk && node packages/hatk/dist/cli.js dev` 1152 + Expected: Server starts, connects to DuckDB, runs normally 1153 + 1154 + **Step 5: Manual smoke test with SQLite** 1155 + 1156 + Create a test config with `databaseEngine: 'sqlite'` and `database: 'test.db'`. Run and verify tables are created in SQLite format. 1157 + 1158 + **Step 6: Commit any fixes** 1159 + 1160 + If any issues were found, fix and commit. 1161 + 1162 + --- 1163 + 1164 + ## Task 14: Update `hatk new` Scaffolding 1165 + 1166 + **Files:** 1167 + - Modify: `packages/hatk/src/cli.ts` — add database engine prompt to `hatk new` 1168 + 1169 + **Context:** The `hatk new` command scaffolds a new project. Add a prompt asking which database engine to use, and set the default in the generated `hatk.config.ts`. 1170 + 1171 + **Step 1: Find the `hatk new` command implementation** 1172 + 1173 + Look in `cli.ts` for the `new` command handler. Add a selection prompt for database engine. 1174 + 1175 + **Step 2: Update generated `hatk.config.ts` template** 1176 + 1177 + The generated config should include: 1178 + ```typescript 1179 + databaseEngine: 'duckdb', // or 'sqlite' based on selection 1180 + ``` 1181 + 1182 + **Step 3: Update generated `package.json` dependencies** 1183 + 1184 + If user selects SQLite, include `better-sqlite3` instead of `@duckdb/node-api` in the generated `package.json`. 1185 + 1186 + **Step 4: Verify scaffolding works** 1187 + 1188 + Run: `cd /tmp && hatk new test-project` (select each database option) 1189 + Expected: Project scaffolded with correct config and dependencies 1190 + 1191 + **Step 5: Commit** 1192 + 1193 + ```bash 1194 + git add packages/hatk/src/cli.ts 1195 + git commit -m "feat: add database engine selection to hatk new scaffolding" 1196 + ``` 1197 + 1198 + --- 1199 + 1200 + ## Summary 1201 + 1202 + | Task | Description | Key files | 1203 + |------|-------------|-----------| 1204 + | 1 | Port interfaces | `database/ports.ts` | 1205 + | 2 | Dialect configs | `database/dialect.ts` | 1206 + | 3 | DuckDB adapter | `database/adapters/duckdb.ts` | 1207 + | 4 | SQLite adapter | `database/adapters/sqlite.ts` | 1208 + | 5 | Rename duckdbType→sqlType | `schema.ts`, `db.ts`, `fts.ts` | 1209 + | 6 | Dialect-aware schema gen | `schema.ts` | 1210 + | 7 | Refactor db.ts to use port | `database/db.ts` | 1211 + | 8 | Move schema+fts to database/ | `database/schema.ts`, `database/fts.ts` | 1212 + | 9 | DuckDB SearchPort | `database/adapters/duckdb-search.ts` | 1213 + | 10 | Update all import paths | 17 files | 1214 + | 11 | Config + adapter factory | `config.ts`, `database/adapter-factory.ts`, `main.ts` | 1215 + | 12 | Barrel export | `database/index.ts` | 1216 + | 13 | End-to-end verification | — | 1217 + | 14 | Update hatk new scaffolding | `cli.ts` |
+156
docs/plans/2026-03-13-multi-database-support-design.md
··· 1 + # Multi-Database Support via Hexagonal Architecture 2 + 3 + ## Motivation 4 + 5 + Support DuckDB, SQLite, and future PostgreSQL to give users a choice at project creation time and remove adoption barriers for users who can't or won't install DuckDB. 6 + 7 + Each hatk project commits to one database engine — no runtime switching. 8 + 9 + ## Configuration 10 + 11 + Users set `database: 'duckdb' | 'sqlite'` in `hatk.config.ts`. At startup, hatk dynamically imports the matching adapter. Users only need the driver for their chosen database installed. 12 + 13 + ## Architecture 14 + 15 + ### Ports 16 + 17 + Two interfaces define the hexagonal boundary: 18 + 19 + **DatabasePort** — low-level SQL execution: 20 + 21 + ```typescript 22 + interface DatabasePort { 23 + open(path: string): Promise<void> 24 + close(): Promise<void> 25 + 26 + query<T>(sql: string, params?: unknown[]): Promise<T[]> 27 + execute(sql: string, params?: unknown[]): Promise<void> 28 + 29 + beginTransaction(): Promise<void> 30 + commit(): Promise<void> 31 + rollback(): Promise<void> 32 + 33 + createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> 34 + 35 + dialect: Dialect 36 + } 37 + 38 + interface BulkInserter { 39 + append(values: unknown[]): void 40 + flush(): Promise<void> 41 + close(): Promise<void> 42 + } 43 + 44 + type Dialect = 'duckdb' | 'sqlite' | 'postgres' 45 + ``` 46 + 47 + **SearchPort** — optional FTS capability: 48 + 49 + ```typescript 50 + interface SearchPort { 51 + createIndex(table: string, columns: string[]): Promise<void> 52 + search(table: string, query: string, opts: SearchOpts): Promise<SearchResult[]> 53 + } 54 + ``` 55 + 56 + Adapters declare whether they implement `SearchPort`. When unavailable, hatk falls back to `LIKE` matching. 57 + 58 + ### Dialect-Aware SQL Generation 59 + 60 + A `SqlDialect` helper provides per-engine variations so the shared layer avoids scattered conditionals: 61 + 62 + ```typescript 63 + interface SqlDialect { 64 + typeMap: Record<string, string> 65 + param(index: number): string // $1 vs ? 66 + supportsAppender: boolean 67 + returningClause: boolean 68 + upsertSyntax: 'on_conflict' | 'insert_or_replace' 69 + jsonExtract(column: string, path: string): string 70 + } 71 + ``` 72 + 73 + Type mappings used by `schema.ts`: 74 + 75 + | Lexicon type | DuckDB | SQLite | Postgres | 76 + |-------------|-------------|-----------|---------------| 77 + | `string` | `TEXT` | `TEXT` | `TEXT` | 78 + | `integer` | `BIGINT` | `INTEGER` | `BIGINT` | 79 + | `boolean` | `BOOLEAN` | `INTEGER` | `BOOLEAN` | 80 + | `bytes` | `BLOB` | `BLOB` | `BYTEA` | 81 + | `datetime` | `TIMESTAMPTZ` | `TEXT` | `TIMESTAMPTZ` | 82 + 83 + SQLite stores booleans as integers and datetimes as text. The shared layer handles conversion at the binding/reading boundary. 84 + 85 + ### Adapters 86 + 87 + **DuckDBAdapter** (~200-300 lines) 88 + - Wraps `@duckdb/node-api` 89 + - `BulkInserter` maps to DuckDB's native appender 90 + - Implements `SearchPort` using DuckDB's FTS extension 91 + - Read/write connection separation 92 + 93 + **SQLiteAdapter** (~200-300 lines) 94 + - Wraps `better-sqlite3` 95 + - `BulkInserter` batches rows into multi-row `INSERT` within a transaction 96 + - No `SearchPort` — falls back to `LIKE` 97 + - WAL mode for concurrent reads 98 + 99 + **PostgresAdapter** (future, ~200-300 lines) 100 + - Wraps `pg` (node-postgres) 101 + - `BulkInserter` uses `COPY FROM` 102 + - Implements `SearchPort` using `tsvector`/`tsquery` 103 + - Connection pooling 104 + 105 + ### Adapter Loading 106 + 107 + ```typescript 108 + async function createAdapter(config: HatkConfig): Promise<DatabasePort> { 109 + switch (config.database) { 110 + case 'duckdb': { 111 + const { DuckDBAdapter } = await import('./adapters/duckdb.js') 112 + return new DuckDBAdapter() 113 + } 114 + case 'sqlite': { 115 + const { SQLiteAdapter } = await import('./adapters/sqlite.js') 116 + return new SQLiteAdapter() 117 + } 118 + } 119 + } 120 + ``` 121 + 122 + ### OAuth 123 + 124 + OAuth operations (sessions, tokens, keys, DPoP) go through the same `DatabasePort`. No separate database or port needed — the queries are simple CRUD. 125 + 126 + ## File Structure 127 + 128 + All database code moves to `src/database/`: 129 + 130 + ``` 131 + src/database/ 132 + ports.ts # DatabasePort, BulkInserter, SearchPort interfaces 133 + dialect.ts # SqlDialect interface + per-engine dialect configs 134 + db.ts # Shared data access layer (refactored from current db.ts) 135 + schema.ts # DDL generation (refactored from current schema.ts) 136 + fts.ts # FTS dispatcher with LIKE fallback 137 + adapters/ 138 + duckdb.ts # DuckDB adapter + SearchPort 139 + sqlite.ts # SQLite adapter 140 + ``` 141 + 142 + The rest of the codebase (`server.ts`, `indexer.ts`, `main.ts`, etc.) imports from `database/db.ts` instead of `db.ts` — same API surface, different path. 143 + 144 + ## Implementation 145 + 146 + Done as a single pass, not phased: 147 + 148 + 1. Create `src/database/` with `ports.ts` and `dialect.ts` 149 + 2. Extract DuckDB-specific code from current `db.ts` into `adapters/duckdb.ts` 150 + 3. Refactor `db.ts` into `database/db.ts`, calling through `DatabasePort` 151 + 4. Refactor `schema.ts` into `database/schema.ts`, using `SqlDialect.typeMap` 152 + 5. Extract DuckDB FTS from `fts.ts` into adapter's `SearchPort`, add `LIKE` fallback 153 + 6. Implement `SQLiteAdapter` in `adapters/sqlite.ts` 154 + 7. Add `database` config option and dynamic adapter loading in startup 155 + 8. Update all imports across the codebase 156 + 9. Update `hatk new` scaffolding to include database choice
+447 -6
package-lock.json
··· 833 833 "resolved": "docs/site", 834 834 "link": true 835 835 }, 836 + "node_modules/@hatk/hatk": { 837 + "resolved": "packages/hatk", 838 + "link": true 839 + }, 836 840 "node_modules/@hatk/oauth-client": { 837 841 "resolved": "packages/oauth-client", 838 842 "link": true ··· 2754 2758 "integrity": "sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==", 2755 2759 "license": "MIT" 2756 2760 }, 2761 + "node_modules/@types/better-sqlite3": { 2762 + "version": "7.6.13", 2763 + "resolved": "https://registry.npmjs.org/@types/better-sqlite3/-/better-sqlite3-7.6.13.tgz", 2764 + "integrity": "sha512-NMv9ASNARoKksWtsq/SHakpYAYnhBrQgGD8zkLYk/jaK8jUGn08CfEdTRgYhMypUQAfzSP8W6gNLe0q19/t4VA==", 2765 + "dev": true, 2766 + "license": "MIT", 2767 + "dependencies": { 2768 + "@types/node": "*" 2769 + } 2770 + }, 2757 2771 "node_modules/@types/chai": { 2758 2772 "version": "5.2.3", 2759 2773 "resolved": "https://registry.npmjs.org/@types/chai/-/chai-5.2.3.tgz", ··· 3327 3341 "url": "https://github.com/sponsors/wooorm" 3328 3342 } 3329 3343 }, 3344 + "node_modules/better-sqlite3": { 3345 + "version": "12.6.2", 3346 + "resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-12.6.2.tgz", 3347 + "integrity": "sha512-8VYKM3MjCa9WcaSAI3hzwhmyHVlH8tiGFwf0RlTsZPWJ1I5MkzjiudCo4KC4DxOaL/53A5B1sI/IbldNFDbsKA==", 3348 + "hasInstallScript": true, 3349 + "license": "MIT", 3350 + "dependencies": { 3351 + "bindings": "^1.5.0", 3352 + "prebuild-install": "^7.1.1" 3353 + }, 3354 + "engines": { 3355 + "node": "20.x || 22.x || 23.x || 24.x || 25.x" 3356 + } 3357 + }, 3358 + "node_modules/bindings": { 3359 + "version": "1.5.0", 3360 + "resolved": "https://registry.npmjs.org/bindings/-/bindings-1.5.0.tgz", 3361 + "integrity": "sha512-p2q/t/mhvuOj/UeLlV6566GD/guowlr0hHxClI0W9m7MWYkL1F0hLo+0Aexs9HSPCtR1SXQ0TD3MMKrXZajbiQ==", 3362 + "license": "MIT", 3363 + "dependencies": { 3364 + "file-uri-to-path": "1.0.0" 3365 + } 3366 + }, 3367 + "node_modules/bl": { 3368 + "version": "4.1.0", 3369 + "resolved": "https://registry.npmjs.org/bl/-/bl-4.1.0.tgz", 3370 + "integrity": "sha512-1W07cM9gS6DcLperZfFSj+bWLtaPGSOHWhPiGzXmvVJbRLdG82sH/Kn8EtW1VqWVA54AKf2h5k5BbnIbwF3h6w==", 3371 + "license": "MIT", 3372 + "dependencies": { 3373 + "buffer": "^5.5.0", 3374 + "inherits": "^2.0.4", 3375 + "readable-stream": "^3.4.0" 3376 + } 3377 + }, 3330 3378 "node_modules/boolbase": { 3331 3379 "version": "1.0.0", 3332 3380 "resolved": "https://registry.npmjs.org/boolbase/-/boolbase-1.0.0.tgz", ··· 3355 3403 "url": "https://github.com/sponsors/sindresorhus" 3356 3404 } 3357 3405 }, 3406 + "node_modules/buffer": { 3407 + "version": "5.7.1", 3408 + "resolved": "https://registry.npmjs.org/buffer/-/buffer-5.7.1.tgz", 3409 + "integrity": "sha512-EHcyIPBQ4BSGlvjB16k5KgAJ27CIsHY/2JBmCRReo48y9rQ3MaUzWX3KVlBa4U7MyX02HdVj0K7C3WaB3ju7FQ==", 3410 + "funding": [ 3411 + { 3412 + "type": "github", 3413 + "url": "https://github.com/sponsors/feross" 3414 + }, 3415 + { 3416 + "type": "patreon", 3417 + "url": "https://www.patreon.com/feross" 3418 + }, 3419 + { 3420 + "type": "consulting", 3421 + "url": "https://feross.org/support" 3422 + } 3423 + ], 3424 + "license": "MIT", 3425 + "dependencies": { 3426 + "base64-js": "^1.3.1", 3427 + "ieee754": "^1.1.13" 3428 + } 3429 + }, 3430 + "node_modules/buffer/node_modules/base64-js": { 3431 + "version": "1.5.1", 3432 + "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", 3433 + "integrity": "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==", 3434 + "funding": [ 3435 + { 3436 + "type": "github", 3437 + "url": "https://github.com/sponsors/feross" 3438 + }, 3439 + { 3440 + "type": "patreon", 3441 + "url": "https://www.patreon.com/feross" 3442 + }, 3443 + { 3444 + "type": "consulting", 3445 + "url": "https://feross.org/support" 3446 + } 3447 + ], 3448 + "license": "MIT" 3449 + }, 3358 3450 "node_modules/camelcase": { 3359 3451 "version": "8.0.0", 3360 3452 "resolved": "https://registry.npmjs.org/camelcase/-/camelcase-8.0.0.tgz", ··· 3461 3553 "funding": { 3462 3554 "url": "https://paulmillr.com/funding/" 3463 3555 } 3556 + }, 3557 + "node_modules/chownr": { 3558 + "version": "1.1.4", 3559 + "resolved": "https://registry.npmjs.org/chownr/-/chownr-1.1.4.tgz", 3560 + "integrity": "sha512-jJ0bqzaylmJtVnNgzTeSOs8DPavpbYgEr/b0YL8/2GO3xJEhInFmhKMUnEJQjZumK7KXGFhUy89PrsJWlakBVg==", 3561 + "license": "ISC" 3464 3562 }, 3465 3563 "node_modules/ci-info": { 3466 3564 "version": "4.4.0", ··· 3747 3845 "url": "https://github.com/sponsors/wooorm" 3748 3846 } 3749 3847 }, 3848 + "node_modules/decompress-response": { 3849 + "version": "6.0.0", 3850 + "resolved": "https://registry.npmjs.org/decompress-response/-/decompress-response-6.0.0.tgz", 3851 + "integrity": "sha512-aW35yZM6Bb/4oJlZncMH2LCoZtJXTRxES17vE3hoRiowU2kWHaJKFkSBDnDR+cm9J+9QhXmREyIfv0pji9ejCQ==", 3852 + "license": "MIT", 3853 + "dependencies": { 3854 + "mimic-response": "^3.1.0" 3855 + }, 3856 + "engines": { 3857 + "node": ">=10" 3858 + }, 3859 + "funding": { 3860 + "url": "https://github.com/sponsors/sindresorhus" 3861 + } 3862 + }, 3863 + "node_modules/deep-extend": { 3864 + "version": "0.6.0", 3865 + "resolved": "https://registry.npmjs.org/deep-extend/-/deep-extend-0.6.0.tgz", 3866 + "integrity": "sha512-LOHxIOaPYdHlJRtCQfDIVZtfw/ufM8+rVj649RIHzcm/vGwQRXFt6OPqIFWsm2XEMrNIEtWR64sY1LEKD2vAOA==", 3867 + "license": "MIT", 3868 + "engines": { 3869 + "node": ">=4.0.0" 3870 + } 3871 + }, 3750 3872 "node_modules/defu": { 3751 3873 "version": "6.1.4", 3752 3874 "resolved": "https://registry.npmjs.org/defu/-/defu-6.1.4.tgz", ··· 3925 4047 "license": "MIT", 3926 4048 "engines": { 3927 4049 "node": ">=10.0.0" 4050 + } 4051 + }, 4052 + "node_modules/end-of-stream": { 4053 + "version": "1.4.5", 4054 + "resolved": "https://registry.npmjs.org/end-of-stream/-/end-of-stream-1.4.5.tgz", 4055 + "integrity": "sha512-ooEGc6HP26xXq/N+GCGOT0JKCLDGrq2bQUZrQ7gyrJiZANJ/8YDTxTpQBXGMn+WbIQXNVpyWymm7KYVICQnyOg==", 4056 + "license": "MIT", 4057 + "dependencies": { 4058 + "once": "^1.4.0" 3928 4059 } 3929 4060 }, 3930 4061 "node_modules/entities": { ··· 4133 4264 "integrity": "sha512-mlsTRyGaPBjPedk6Bvw+aqbsXDtoAyAzm5MO7JgU+yVRyMQ5O8bD4Kcci7BS85f93veegeCPkL8R4GLClnjLFw==", 4134 4265 "license": "MIT" 4135 4266 }, 4267 + "node_modules/expand-template": { 4268 + "version": "2.0.3", 4269 + "resolved": "https://registry.npmjs.org/expand-template/-/expand-template-2.0.3.tgz", 4270 + "integrity": "sha512-XYfuKMvj4O35f/pOXLObndIRvyQ+/+6AhODh+OKWj9S9498pHHn/IMszH+gt0fBCRWMNfk1ZSp5x3AifmnI2vg==", 4271 + "license": "(MIT OR WTFPL)", 4272 + "engines": { 4273 + "node": ">=6" 4274 + } 4275 + }, 4136 4276 "node_modules/expect-type": { 4137 4277 "version": "1.3.0", 4138 4278 "resolved": "https://registry.npmjs.org/expect-type/-/expect-type-1.3.0.tgz", ··· 4183 4323 "integrity": "sha512-5u2V/CDW15QM1XbbgS+0DfPxVB+jUKhWEKuuFuHncbk3tEEqzmoXL+2KyOFuKGqOnmdIy0/davWF1CkuwtibCw==", 4184 4324 "license": "MIT" 4185 4325 }, 4326 + "node_modules/file-uri-to-path": { 4327 + "version": "1.0.0", 4328 + "resolved": "https://registry.npmjs.org/file-uri-to-path/-/file-uri-to-path-1.0.0.tgz", 4329 + "integrity": "sha512-0Zt+s3L7Vf1biwWZ29aARiVYLx7iMGnEUl9x33fbB/j3jR81u/O2LbqK+Bm1CDSNDKVtJ/YjwY7TUd5SkeLQLw==", 4330 + "license": "MIT" 4331 + }, 4186 4332 "node_modules/flattie": { 4187 4333 "version": "1.1.1", 4188 4334 "resolved": "https://registry.npmjs.org/flattie/-/flattie-1.1.1.tgz", ··· 4213 4359 "node": ">=20" 4214 4360 } 4215 4361 }, 4362 + "node_modules/fs-constants": { 4363 + "version": "1.0.0", 4364 + "resolved": "https://registry.npmjs.org/fs-constants/-/fs-constants-1.0.0.tgz", 4365 + "integrity": "sha512-y6OAwoSIf7FyjMIv94u+b5rdheZEjzR63GTyZJm5qh4Bi+2YgwLCcI/fPFZkL5PSixOt6ZNKm+w+Hfp/Bciwow==", 4366 + "license": "MIT" 4367 + }, 4216 4368 "node_modules/fsevents": { 4217 4369 "version": "2.3.3", 4218 4370 "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", ··· 4239 4391 "url": "https://github.com/sponsors/sindresorhus" 4240 4392 } 4241 4393 }, 4394 + "node_modules/github-from-package": { 4395 + "version": "0.0.0", 4396 + "resolved": "https://registry.npmjs.org/github-from-package/-/github-from-package-0.0.0.tgz", 4397 + "integrity": "sha512-SyHy3T1v2NUXn29OsWdxmK6RwHD+vkj3v8en8AOBZ1wBQ/hCAQ5bAQTD02kW4W9tUp/3Qh6J8r9EvntiyCmOOw==", 4398 + "license": "MIT" 4399 + }, 4242 4400 "node_modules/github-slugger": { 4243 4401 "version": "2.0.0", 4244 4402 "resolved": "https://registry.npmjs.org/github-slugger/-/github-slugger-2.0.0.tgz", ··· 4627 4785 "url": "https://opencollective.com/unified" 4628 4786 } 4629 4787 }, 4630 - "node_modules/hatk": { 4631 - "resolved": "packages/hatk", 4632 - "link": true 4633 - }, 4634 4788 "node_modules/hex-rgb": { 4635 4789 "version": "4.3.0", 4636 4790 "resolved": "https://registry.npmjs.org/hex-rgb/-/hex-rgb-4.3.0.tgz", ··· 4698 4852 "@babel/runtime": "^7.23.2" 4699 4853 } 4700 4854 }, 4855 + "node_modules/ieee754": { 4856 + "version": "1.2.1", 4857 + "resolved": "https://registry.npmjs.org/ieee754/-/ieee754-1.2.1.tgz", 4858 + "integrity": "sha512-dcyqhDvX1C46lXZcVqCpK+FtMRQVdIMN6/Df5js2zouUsqG7I6sFxitIC+7KYK29KdXOLHdu9zL4sFnoVQnqaA==", 4859 + "funding": [ 4860 + { 4861 + "type": "github", 4862 + "url": "https://github.com/sponsors/feross" 4863 + }, 4864 + { 4865 + "type": "patreon", 4866 + "url": "https://www.patreon.com/feross" 4867 + }, 4868 + { 4869 + "type": "consulting", 4870 + "url": "https://feross.org/support" 4871 + } 4872 + ], 4873 + "license": "BSD-3-Clause" 4874 + }, 4701 4875 "node_modules/import-meta-resolve": { 4702 4876 "version": "4.2.0", 4703 4877 "resolved": "https://registry.npmjs.org/import-meta-resolve/-/import-meta-resolve-4.2.0.tgz", ··· 4707 4881 "type": "github", 4708 4882 "url": "https://github.com/sponsors/wooorm" 4709 4883 } 4884 + }, 4885 + "node_modules/inherits": { 4886 + "version": "2.0.4", 4887 + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", 4888 + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", 4889 + "license": "ISC" 4890 + }, 4891 + "node_modules/ini": { 4892 + "version": "1.3.8", 4893 + "resolved": "https://registry.npmjs.org/ini/-/ini-1.3.8.tgz", 4894 + "integrity": "sha512-JV/yugV2uzW5iMRSiZAyDtQd+nxtUnjeLt0acNdw98kKLrvuRVyB80tsREOE7yvGVgalhZ6RNXCmEHkUKBKxew==", 4895 + "license": "ISC" 4710 4896 }, 4711 4897 "node_modules/inline-style-parser": { 4712 4898 "version": "0.2.7", ··· 6002 6188 ], 6003 6189 "license": "MIT" 6004 6190 }, 6191 + "node_modules/mimic-response": { 6192 + "version": "3.1.0", 6193 + "resolved": "https://registry.npmjs.org/mimic-response/-/mimic-response-3.1.0.tgz", 6194 + "integrity": "sha512-z0yWI+4FDrrweS8Zmt4Ej5HdJmky15+L2e6Wgn3+iK5fWzb6T3fhNFq2+MeTRb064c6Wr4N/wv0DzQTjNzHNGQ==", 6195 + "license": "MIT", 6196 + "engines": { 6197 + "node": ">=10" 6198 + }, 6199 + "funding": { 6200 + "url": "https://github.com/sponsors/sindresorhus" 6201 + } 6202 + }, 6203 + "node_modules/minimist": { 6204 + "version": "1.2.8", 6205 + "resolved": "https://registry.npmjs.org/minimist/-/minimist-1.2.8.tgz", 6206 + "integrity": "sha512-2yyAR8qBkN3YuheJanUpWC5U3bb5osDywNB8RzDVlDwDHbocAJveqqj1u8+SVD7jkWT4yvsHCpWqqWqAxb0zCA==", 6207 + "license": "MIT", 6208 + "funding": { 6209 + "url": "https://github.com/sponsors/ljharb" 6210 + } 6211 + }, 6212 + "node_modules/mkdirp-classic": { 6213 + "version": "0.5.3", 6214 + "resolved": "https://registry.npmjs.org/mkdirp-classic/-/mkdirp-classic-0.5.3.tgz", 6215 + "integrity": "sha512-gKLcREMhtuZRwRAfqP3RFW+TK4JqApVBtOIftVgjuABpAtpxhPGaDcfvbhNvD0B8iD1oUr/txX35NjcaY6Ns/A==", 6216 + "license": "MIT" 6217 + }, 6005 6218 "node_modules/mrmime": { 6006 6219 "version": "2.0.1", 6007 6220 "resolved": "https://registry.npmjs.org/mrmime/-/mrmime-2.0.1.tgz", ··· 6034 6247 "engines": { 6035 6248 "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" 6036 6249 } 6250 + }, 6251 + "node_modules/napi-build-utils": { 6252 + "version": "2.0.0", 6253 + "resolved": "https://registry.npmjs.org/napi-build-utils/-/napi-build-utils-2.0.0.tgz", 6254 + "integrity": "sha512-GEbrYkbfF7MoNaoh2iGG84Mnf/WZfB0GdGEsM8wz7Expx/LlWf5U8t9nvJKXSp3qr5IsEbK04cBGhol/KwOsWA==", 6255 + "license": "MIT" 6037 6256 }, 6038 6257 "node_modules/neotraverse": { 6039 6258 "version": "0.6.18", ··· 6057 6276 "url": "https://opencollective.com/unified" 6058 6277 } 6059 6278 }, 6279 + "node_modules/node-abi": { 6280 + "version": "3.88.0", 6281 + "resolved": "https://registry.npmjs.org/node-abi/-/node-abi-3.88.0.tgz", 6282 + "integrity": "sha512-At6b4UqIEVudaqPsXjmUO1r/N5BUr4yhDGs5PkBE8/oG5+TfLPhFechiskFsnT6Ql0VfUXbalUUCbfXxtj7K+w==", 6283 + "license": "MIT", 6284 + "dependencies": { 6285 + "semver": "^7.3.5" 6286 + }, 6287 + "engines": { 6288 + "node": ">=10" 6289 + } 6290 + }, 6060 6291 "node_modules/node-fetch-native": { 6061 6292 "version": "1.6.7", 6062 6293 "resolved": "https://registry.npmjs.org/node-fetch-native/-/node-fetch-native-1.6.7.tgz", ··· 6116 6347 "resolved": "https://registry.npmjs.org/ohash/-/ohash-2.0.11.tgz", 6117 6348 "integrity": "sha512-RdR9FQrFwNBNXAr4GixM8YaRZRJ5PUWbKYbE5eOsrwAjJW0q2REGcf79oYPsLyskQCZG1PLN+S/K1V00joZAoQ==", 6118 6349 "license": "MIT" 6350 + }, 6351 + "node_modules/once": { 6352 + "version": "1.4.0", 6353 + "resolved": "https://registry.npmjs.org/once/-/once-1.4.0.tgz", 6354 + "integrity": "sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==", 6355 + "license": "ISC", 6356 + "dependencies": { 6357 + "wrappy": "1" 6358 + } 6119 6359 }, 6120 6360 "node_modules/oniguruma-parser": { 6121 6361 "version": "0.12.1", ··· 6505 6745 "integrity": "sha512-1NNCs6uurfkVbeXG4S8JFT9t19m45ICnif8zWLd5oPSZ50QnwMfK+H3jv408d4jw/7Bttv5axS5IiHoLaVNHeQ==", 6506 6746 "license": "MIT" 6507 6747 }, 6748 + "node_modules/prebuild-install": { 6749 + "version": "7.1.3", 6750 + "resolved": "https://registry.npmjs.org/prebuild-install/-/prebuild-install-7.1.3.tgz", 6751 + "integrity": "sha512-8Mf2cbV7x1cXPUILADGI3wuhfqWvtiLA1iclTDbFRZkgRQS0NqsPZphna9V+HyTEadheuPmjaJMsbzKQFOzLug==", 6752 + "deprecated": "No longer maintained. Please contact the author of the relevant native addon; alternatives are available.", 6753 + "license": "MIT", 6754 + "dependencies": { 6755 + "detect-libc": "^2.0.0", 6756 + "expand-template": "^2.0.3", 6757 + "github-from-package": "0.0.0", 6758 + "minimist": "^1.2.3", 6759 + "mkdirp-classic": "^0.5.3", 6760 + "napi-build-utils": "^2.0.0", 6761 + "node-abi": "^3.3.0", 6762 + "pump": "^3.0.0", 6763 + "rc": "^1.2.7", 6764 + "simple-get": "^4.0.0", 6765 + "tar-fs": "^2.0.0", 6766 + "tunnel-agent": "^0.6.0" 6767 + }, 6768 + "bin": { 6769 + "prebuild-install": "bin.js" 6770 + }, 6771 + "engines": { 6772 + "node": ">=10" 6773 + } 6774 + }, 6508 6775 "node_modules/prismjs": { 6509 6776 "version": "1.30.0", 6510 6777 "resolved": "https://registry.npmjs.org/prismjs/-/prismjs-1.30.0.tgz", ··· 6537 6804 "url": "https://github.com/sponsors/wooorm" 6538 6805 } 6539 6806 }, 6807 + "node_modules/pump": { 6808 + "version": "3.0.4", 6809 + "resolved": "https://registry.npmjs.org/pump/-/pump-3.0.4.tgz", 6810 + "integrity": "sha512-VS7sjc6KR7e1ukRFhQSY5LM2uBWAUPiOPa/A3mkKmiMwSmRFUITt0xuj+/lesgnCv+dPIEYlkzrcyXgquIHMcA==", 6811 + "license": "MIT", 6812 + "dependencies": { 6813 + "end-of-stream": "^1.1.0", 6814 + "once": "^1.3.1" 6815 + } 6816 + }, 6540 6817 "node_modules/radix3": { 6541 6818 "version": "1.1.2", 6542 6819 "resolved": "https://registry.npmjs.org/radix3/-/radix3-1.1.2.tgz", 6543 6820 "integrity": "sha512-b484I/7b8rDEdSDKckSSBA8knMpcdsXudlE/LNL639wFoHKwLbEkQFZHWEYwDC0wa0FKUcCY+GAF73Z7wxNVFA==", 6544 6821 "license": "MIT" 6822 + }, 6823 + "node_modules/rc": { 6824 + "version": "1.2.8", 6825 + "resolved": "https://registry.npmjs.org/rc/-/rc-1.2.8.tgz", 6826 + "integrity": "sha512-y3bGgqKj3QBdxLbLkomlohkvsA8gdAiUQlSBJnBhfn+BPxg4bc62d8TcBW15wavDfgexCgccckhcZvywyQYPOw==", 6827 + "license": "(BSD-2-Clause OR MIT OR Apache-2.0)", 6828 + "dependencies": { 6829 + "deep-extend": "^0.6.0", 6830 + "ini": "~1.3.0", 6831 + "minimist": "^1.2.0", 6832 + "strip-json-comments": "~2.0.1" 6833 + }, 6834 + "bin": { 6835 + "rc": "cli.js" 6836 + } 6837 + }, 6838 + "node_modules/readable-stream": { 6839 + "version": "3.6.2", 6840 + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-3.6.2.tgz", 6841 + "integrity": "sha512-9u/sniCrY3D5WdsERHzHE4G2YCXqoG5FTHUiCC4SIbr6XcLZBY05ya9EKjYek9O5xOAwjGq+1JdGBAS7Q9ScoA==", 6842 + "license": "MIT", 6843 + "dependencies": { 6844 + "inherits": "^2.0.3", 6845 + "string_decoder": "^1.1.1", 6846 + "util-deprecate": "^1.0.1" 6847 + }, 6848 + "engines": { 6849 + "node": ">= 6" 6850 + } 6545 6851 }, 6546 6852 "node_modules/readdirp": { 6547 6853 "version": "5.0.0", ··· 6962 7268 "fsevents": "~2.3.2" 6963 7269 } 6964 7270 }, 7271 + "node_modules/safe-buffer": { 7272 + "version": "5.2.1", 7273 + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.2.1.tgz", 7274 + "integrity": "sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ==", 7275 + "funding": [ 7276 + { 7277 + "type": "github", 7278 + "url": "https://github.com/sponsors/feross" 7279 + }, 7280 + { 7281 + "type": "patreon", 7282 + "url": "https://www.patreon.com/feross" 7283 + }, 7284 + { 7285 + "type": "consulting", 7286 + "url": "https://feross.org/support" 7287 + } 7288 + ], 7289 + "license": "MIT" 7290 + }, 6965 7291 "node_modules/satori": { 6966 7292 "version": "0.19.3", 6967 7293 "resolved": "https://registry.npmjs.org/satori/-/satori-0.19.3.tgz", ··· 7071 7397 "integrity": "sha512-ybx0WO1/8bSBLEWXZvEd7gMW3Sn3JFlW3TvX1nREbDLRNQNaeNN8WK0meBwPdAaOI7TtRRRJn/Es1zhrrCHu7g==", 7072 7398 "license": "ISC" 7073 7399 }, 7400 + "node_modules/simple-concat": { 7401 + "version": "1.0.1", 7402 + "resolved": "https://registry.npmjs.org/simple-concat/-/simple-concat-1.0.1.tgz", 7403 + "integrity": "sha512-cSFtAPtRhljv69IK0hTVZQ+OfE9nePi/rtJmw5UjHeVyVroEqJXP1sFztKUy1qU+xvz3u/sfYJLa947b7nAN2Q==", 7404 + "funding": [ 7405 + { 7406 + "type": "github", 7407 + "url": "https://github.com/sponsors/feross" 7408 + }, 7409 + { 7410 + "type": "patreon", 7411 + "url": "https://www.patreon.com/feross" 7412 + }, 7413 + { 7414 + "type": "consulting", 7415 + "url": "https://feross.org/support" 7416 + } 7417 + ], 7418 + "license": "MIT" 7419 + }, 7420 + "node_modules/simple-get": { 7421 + "version": "4.0.1", 7422 + "resolved": "https://registry.npmjs.org/simple-get/-/simple-get-4.0.1.tgz", 7423 + "integrity": "sha512-brv7p5WgH0jmQJr1ZDDfKDOSeWWg+OVypG99A/5vYGPqJ6pxiaHLy8nxtFjBA7oMa01ebA9gfh1uMCFqOuXxvA==", 7424 + "funding": [ 7425 + { 7426 + "type": "github", 7427 + "url": "https://github.com/sponsors/feross" 7428 + }, 7429 + { 7430 + "type": "patreon", 7431 + "url": "https://www.patreon.com/feross" 7432 + }, 7433 + { 7434 + "type": "consulting", 7435 + "url": "https://feross.org/support" 7436 + } 7437 + ], 7438 + "license": "MIT", 7439 + "dependencies": { 7440 + "decompress-response": "^6.0.0", 7441 + "once": "^1.3.1", 7442 + "simple-concat": "^1.0.0" 7443 + } 7444 + }, 7074 7445 "node_modules/sisteransi": { 7075 7446 "version": "1.0.5", 7076 7447 "resolved": "https://registry.npmjs.org/sisteransi/-/sisteransi-1.0.5.tgz", ··· 7169 7540 "integrity": "sha512-TlnjJ1C0QrmxRNrON00JvaFFlNh5TTG00APw23j74ET7gkQpTASi6/L2fuiav8pzK715HXtUeClpBTw2NPSn6w==", 7170 7541 "license": "MIT" 7171 7542 }, 7543 + "node_modules/string_decoder": { 7544 + "version": "1.3.0", 7545 + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", 7546 + "integrity": "sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA==", 7547 + "license": "MIT", 7548 + "dependencies": { 7549 + "safe-buffer": "~5.2.0" 7550 + } 7551 + }, 7172 7552 "node_modules/string-width": { 7173 7553 "version": "7.2.0", 7174 7554 "resolved": "https://registry.npmjs.org/string-width/-/string-width-7.2.0.tgz", ··· 7221 7601 "url": "https://github.com/chalk/strip-ansi?sponsor=1" 7222 7602 } 7223 7603 }, 7604 + "node_modules/strip-json-comments": { 7605 + "version": "2.0.1", 7606 + "resolved": "https://registry.npmjs.org/strip-json-comments/-/strip-json-comments-2.0.1.tgz", 7607 + "integrity": "sha512-4gB8na07fecVVkOI6Rs4e7T6NOTki5EmL7TUduTs6bu3EdnSycntVJ4re8kgZA+wx9IueI2Y11bfbgwtzuE0KQ==", 7608 + "license": "MIT", 7609 + "engines": { 7610 + "node": ">=0.10.0" 7611 + } 7612 + }, 7224 7613 "node_modules/style-to-js": { 7225 7614 "version": "1.1.21", 7226 7615 "resolved": "https://registry.npmjs.org/style-to-js/-/style-to-js-1.1.21.tgz", ··· 7262 7651 "funding": { 7263 7652 "type": "opencollective", 7264 7653 "url": "https://opencollective.com/svgo" 7654 + } 7655 + }, 7656 + "node_modules/tar-fs": { 7657 + "version": "2.1.4", 7658 + "resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-2.1.4.tgz", 7659 + "integrity": "sha512-mDAjwmZdh7LTT6pNleZ05Yt65HC3E+NiQzl672vQG38jIrehtJk/J3mNwIg+vShQPcLF/LV7CMnDW6vjj6sfYQ==", 7660 + "license": "MIT", 7661 + "dependencies": { 7662 + "chownr": "^1.1.1", 7663 + "mkdirp-classic": "^0.5.2", 7664 + "pump": "^3.0.0", 7665 + "tar-stream": "^2.1.4" 7666 + } 7667 + }, 7668 + "node_modules/tar-stream": { 7669 + "version": "2.2.0", 7670 + "resolved": "https://registry.npmjs.org/tar-stream/-/tar-stream-2.2.0.tgz", 7671 + "integrity": "sha512-ujeqbceABgwMZxEJnk2HDY2DlnUZ+9oEcb1KzTVfYHio0UE6dG71n60d8D2I4qNvleWrrXpmjpt7vZeF1LnMZQ==", 7672 + "license": "MIT", 7673 + "dependencies": { 7674 + "bl": "^4.0.3", 7675 + "end-of-stream": "^1.4.1", 7676 + "fs-constants": "^1.0.0", 7677 + "inherits": "^2.0.3", 7678 + "readable-stream": "^3.1.1" 7679 + }, 7680 + "engines": { 7681 + "node": ">=6" 7265 7682 } 7266 7683 }, 7267 7684 "node_modules/tiny-inflate": { ··· 7366 7783 "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", 7367 7784 "license": "0BSD", 7368 7785 "optional": true 7786 + }, 7787 + "node_modules/tunnel-agent": { 7788 + "version": "0.6.0", 7789 + "resolved": "https://registry.npmjs.org/tunnel-agent/-/tunnel-agent-0.6.0.tgz", 7790 + "integrity": "sha512-McnNiV1l8RYeY8tBgEpuodCC1mLUdbSN+CYBL7kJsJNInOP8UjDDEwdk6Mw60vdLLrr5NHKZhMAOSrR2NZuQ+w==", 7791 + "license": "Apache-2.0", 7792 + "dependencies": { 7793 + "safe-buffer": "^5.0.1" 7794 + }, 7795 + "engines": { 7796 + "node": "*" 7797 + } 7369 7798 }, 7370 7799 "node_modules/type-fest": { 7371 7800 "version": "4.41.0", ··· 8430 8859 "url": "https://github.com/chalk/wrap-ansi?sponsor=1" 8431 8860 } 8432 8861 }, 8862 + "node_modules/wrappy": { 8863 + "version": "1.0.2", 8864 + "resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz", 8865 + "integrity": "sha512-l4Sp/DRseor9wL6EvV2+TuQn63dMkPjZ/sp9XkghTEbV9KlPS1xUsZ3u7/IQO4wxtcFB4bgpQPRcR3QCvezPcQ==", 8866 + "license": "ISC" 8867 + }, 8433 8868 "node_modules/xxhash-wasm": { 8434 8869 "version": "1.1.0", 8435 8870 "resolved": "https://registry.npmjs.org/xxhash-wasm/-/xxhash-wasm-1.1.0.tgz", ··· 8564 8999 } 8565 9000 }, 8566 9001 "packages/hatk": { 9002 + "name": "@hatk/hatk", 9003 + "version": "0.0.1-alpha.22", 9004 + "license": "MIT", 8567 9005 "dependencies": { 8568 9006 "@bigmoves/lexicon": "^0.2.1", 8569 9007 "@duckdb/node-api": "^1.4.4-r.1", 8570 9008 "@hatk/oauth-client": "*", 8571 9009 "@resvg/resvg-js": "^2.6.2", 9010 + "better-sqlite3": "^12.6.2", 8572 9011 "satori": "^0.19.2", 8573 9012 "vitest": "^4", 8574 9013 "yaml": "^2.7.0" 8575 9014 }, 8576 9015 "bin": { 8577 - "hatk": "src/cli.ts" 9016 + "hatk": "dist/cli.js" 8578 9017 }, 8579 9018 "devDependencies": { 8580 9019 "@playwright/test": "^1.58.2", 9020 + "@types/better-sqlite3": "^7.6.13", 8581 9021 "@types/react": "^19.2.14", 8582 9022 "vite": "^6" 8583 9023 } 8584 9024 }, 8585 9025 "packages/oauth-client": { 8586 9026 "name": "@hatk/oauth-client", 8587 - "version": "0.1.0" 9027 + "version": "0.0.1-alpha.0", 9028 + "license": "MIT" 8588 9029 } 8589 9030 } 8590 9031 }
+2
packages/hatk/package.json
··· 35 35 "@duckdb/node-api": "^1.4.4-r.1", 36 36 "@hatk/oauth-client": "*", 37 37 "@resvg/resvg-js": "^2.6.2", 38 + "better-sqlite3": "^12.6.2", 38 39 "satori": "^0.19.2", 39 40 "vitest": "^4", 40 41 "yaml": "^2.7.0" 41 42 }, 42 43 "devDependencies": { 43 44 "@playwright/test": "^1.58.2", 45 + "@types/better-sqlite3": "^7.6.13", 44 46 "@types/react": "^19.2.14", 45 47 "vite": "^6" 46 48 }
+2 -2
packages/hatk/src/backfill.ts
··· 12 12 runSQL, 13 13 getSchema, 14 14 bulkInsertRecords, 15 - } from './db.ts' 16 - import type { BulkRecord } from './db.ts' 15 + } from './database/db.ts' 16 + import type { BulkRecord } from './database/db.ts' 17 17 import { emit, timer } from './logger.ts' 18 18 import type { BackfillConfig } from './config.ts' 19 19
+21 -43
packages/hatk/src/cli.ts
··· 2 2 import { mkdirSync, writeFileSync, existsSync, unlinkSync, readdirSync, readFileSync } from 'node:fs' 3 3 import { resolve, join, dirname } from 'node:path' 4 4 import { execSync, spawn } from 'node:child_process' 5 - import { loadLexicons, discoverCollections, buildSchemas } from './schema.ts' 5 + import { loadLexicons, discoverCollections, buildSchemas } from './database/schema.ts' 6 6 import { loadConfig } from './config.ts' 7 7 8 8 const args = process.argv.slice(2) ··· 333 333 if (command === 'new') { 334 334 const name = args[1] 335 335 if (!name) { 336 - console.error('Usage: hatk new <name> [--svelte] [--template <template-name>]') 336 + console.error('Usage: hatk new <name> [--svelte] [--sqlite] [--template <template-name>]') 337 337 process.exit(1) 338 338 } 339 339 ··· 374 374 } 375 375 376 376 const withSvelte = args.includes('--svelte') 377 + const withSqlite = args.includes('--sqlite') 378 + const dbEngine = withSqlite ? 'sqlite' : 'duckdb' 377 379 mkdirSync(dir) 378 380 const subs = [ 379 381 'lexicons', ··· 405 407 relay: 'ws://localhost:2583', 406 408 plc: 'http://localhost:2582', 407 409 port: 3000, 410 + databaseEngine: '${dbEngine}', 408 411 database: 'data/hatk.db', 409 412 admins: [], 410 413 backfill: { ··· 1033 1036 ) 1034 1037 1035 1038 const pkgDeps: Record<string, string> = { '@hatk/oauth-client': '*', hatk: '*' } 1039 + if (withSqlite) { 1040 + pkgDeps['better-sqlite3'] = '^11' 1041 + } 1036 1042 const pkgDevDeps: Record<string, string> = { 1037 1043 '@playwright/test': '^1', 1038 1044 oxfmt: '^0.35.0', ··· 1838 1844 const config = await loadConfig(resolve('hatk.config.ts')) 1839 1845 1840 1846 if (config.database !== ':memory:') { 1841 - for (const suffix of ['', '.wal']) { 1847 + for (const suffix of ['', '.wal', '-shm', '-wal']) { 1842 1848 const file = config.database + suffix 1843 1849 if (existsSync(file)) { 1844 1850 unlinkSync(file) ··· 1998 2004 execSync('npx hatk generate types', { stdio: 'inherit', cwd: process.cwd() }) 1999 2005 } else if (command === 'schema') { 2000 2006 const config = await loadConfig(resolve('hatk.config.ts')) 2001 - if (config.database === ':memory:') { 2002 - console.error('No database file configured (database is :memory:)') 2003 - process.exit(1) 2004 - } 2005 2007 2006 - // Init DB from lexicons if it doesn't exist yet 2007 - if (!existsSync(config.database)) { 2008 - const configDir = resolve('.') 2009 - const lexicons = loadLexicons(resolve(configDir, 'lexicons')) 2010 - const collections = config.collections.length > 0 ? config.collections : discoverCollections(lexicons) 2011 - if (collections.length === 0) { 2012 - console.error('No record collections found. Add record lexicons to the lexicons/ directory.') 2013 - process.exit(1) 2014 - } 2008 + const { initDatabase, getSchemaDump } = await import('./database/db.ts') 2009 + const { createAdapter } = await import('./database/adapter-factory.ts') 2010 + const { getDialect } = await import('./database/dialect.ts') 2011 + const configDir2 = resolve('.') 2012 + const lexicons2 = loadLexicons(resolve(configDir2, 'lexicons')) 2013 + const collections2 = config.collections.length > 0 ? config.collections : discoverCollections(lexicons2) 2014 + const { schemas: schemas2, ddlStatements: ddl2 } = buildSchemas(lexicons2, collections2, getDialect(config.databaseEngine)) 2015 + 2016 + if (config.database !== ':memory:') { 2015 2017 mkdirSync(dirname(config.database), { recursive: true }) 2016 - const { initDatabase } = await import('./db.ts') 2017 - const { schemas, ddlStatements } = buildSchemas(lexicons, collections) 2018 - await initDatabase(config.database, schemas, ddlStatements) 2019 2018 } 2020 - 2021 - const { DuckDBInstance } = await import('@duckdb/node-api') 2022 - const instance = await DuckDBInstance.create(config.database) 2023 - const con = await instance.connect() 2024 - 2025 - const tables = (await ( 2026 - await con.runAndReadAll( 2027 - `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main' ORDER BY table_name`, 2028 - ) 2029 - ).getRowObjects()) as { table_name: string }[] 2030 - 2031 - for (const { table_name } of tables) { 2032 - console.log(`"${table_name}"`) 2033 - const cols = (await ( 2034 - await con.runAndReadAll( 2035 - `SELECT column_name, data_type, is_nullable FROM information_schema.columns WHERE table_name = '${table_name}' ORDER BY ordinal_position`, 2036 - ) 2037 - ).getRowObjects()) as { column_name: string; data_type: string; is_nullable: string }[] 2019 + const { adapter: adapter2 } = await createAdapter(config.databaseEngine) 2020 + await initDatabase(adapter2, config.database, schemas2, ddl2) 2038 2021 2039 - for (const col of cols) { 2040 - const nullable = col.is_nullable === 'YES' ? '' : ' NOT NULL' 2041 - console.log(` ${col.column_name.padEnd(20)} ${col.data_type}${nullable}`) 2042 - } 2043 - console.log() 2044 - } 2022 + console.log(await getSchemaDump()) 2045 2023 } else if (command === 'start') { 2046 2024 const mainPath = resolve(import.meta.dirname!, 'main.js') 2047 2025 await spawnForward('npx', ['tsx', mainPath, 'hatk.config.ts'])
+3 -1
packages/hatk/src/config.ts
··· 41 41 relay: string 42 42 plc: string // PLC directory URL for DID resolution 43 43 port: number 44 - database: string // DuckDB file path (replaces :memory:) 44 + databaseEngine: 'duckdb' | 'sqlite' // which database adapter to use 45 + database: string // database file path (replaces :memory:) 45 46 publicDir: string | null // static file directory (null to disable) 46 47 collections: string[] // optional — auto-derived from lexicons if empty 47 48 backfill: BackfillConfig ··· 95 96 relay: env.RELAY || parsed.relay || 'ws://localhost:2583', 96 97 plc: env.DID_PLC_URL || parsed.plc || 'https://plc.directory', 97 98 port: parseInt(env.PORT || '') || parsed.port || 3000, 99 + databaseEngine: ((env.DATABASE_ENGINE || parsed.databaseEngine || 'duckdb') as HatkConfig['databaseEngine']), 98 100 database: database ? resolve(configDir, database) : ':memory:', 99 101 publicDir: parsed.publicDir === null ? null : resolve(configDir, parsed.publicDir || './public'), 100 102 collections: parsed.collections || [],
+25
packages/hatk/src/database/adapter-factory.ts
··· 1 + import type { DatabasePort, SearchPort } from './ports.ts' 2 + 3 + export async function createAdapter(engine: 'duckdb' | 'sqlite'): Promise<{ 4 + adapter: DatabasePort 5 + searchPort: SearchPort | null 6 + }> { 7 + switch (engine) { 8 + case 'duckdb': { 9 + const { DuckDBAdapter } = await import('./adapters/duckdb.ts') 10 + const { DuckDBSearchPort } = await import('./adapters/duckdb-search.ts') 11 + const adapter = new DuckDBAdapter() 12 + const searchPort = new DuckDBSearchPort(adapter) 13 + return { adapter, searchPort } 14 + } 15 + case 'sqlite': { 16 + const { SQLiteAdapter } = await import('./adapters/sqlite.ts') 17 + const { SQLiteSearchPort } = await import('./adapters/sqlite-search.ts') 18 + const adapter = new SQLiteAdapter() 19 + const searchPort = new SQLiteSearchPort(adapter) 20 + return { adapter, searchPort } 21 + } 22 + default: 23 + throw new Error(`Unsupported database engine: ${engine}`) 24 + } 25 + }
+43
packages/hatk/src/database/adapters/duckdb-search.ts
··· 1 + import type { SearchPort } from '../ports.ts' 2 + import type { DatabasePort } from '../ports.ts' 3 + 4 + export class DuckDBSearchPort implements SearchPort { 5 + constructor(private port: DatabasePort) {} 6 + 7 + async buildIndex( 8 + shadowTable: string, 9 + sourceQuery: string, 10 + searchColumns: string[], 11 + ): Promise<void> { 12 + // Create shadow table 13 + await this.port.execute(`CREATE OR REPLACE TABLE ${shadowTable} AS ${sourceQuery}`, []) 14 + 15 + // Drop existing index 16 + try { 17 + await this.port.execute(`PRAGMA drop_fts_index('${shadowTable}')`, []) 18 + } catch {} 19 + 20 + // Build FTS index 21 + const colList = searchColumns.map((c) => `'${c}'`).join(', ') 22 + await this.port.execute( 23 + `PRAGMA create_fts_index('${shadowTable}', 'uri', ${colList}, stemmer='porter', stopwords='english', strip_accents=1, lower=1, overwrite=1)`, 24 + [], 25 + ) 26 + } 27 + 28 + async search( 29 + shadowTable: string, 30 + query: string, 31 + searchColumns: string[], 32 + limit: number, 33 + offset: number, 34 + ): Promise<Array<{ uri: string; score: number }>> { 35 + const ftsSchema = `fts_main_${shadowTable}` 36 + const sql = `SELECT uri, ${ftsSchema}.match_bm25(uri, $1) AS score 37 + FROM ${shadowTable} 38 + WHERE score IS NOT NULL 39 + ORDER BY score DESC 40 + LIMIT $2 OFFSET $3` 41 + return this.port.query(sql, [query, limit, offset]) 42 + } 43 + }
+151
packages/hatk/src/database/adapters/duckdb.ts
··· 1 + import { DuckDBInstance } from '@duckdb/node-api' 2 + import type { DatabasePort, BulkInserter, Dialect } from '../ports.ts' 3 + 4 + export class DuckDBAdapter implements DatabasePort { 5 + dialect: Dialect = 'duckdb' 6 + 7 + private instance!: DuckDBInstance 8 + private writeCon!: Awaited<ReturnType<DuckDBInstance['connect']>> 9 + private readCon!: Awaited<ReturnType<DuckDBInstance['connect']>> 10 + private writeQueue = Promise.resolve() 11 + private readQueue = Promise.resolve() 12 + 13 + async open(path: string): Promise<void> { 14 + this.instance = await DuckDBInstance.create(path === ':memory:' ? undefined : path) 15 + this.writeCon = await this.instance.connect() 16 + this.readCon = await this.instance.connect() 17 + } 18 + 19 + close(): void { 20 + try { this.readCon?.closeSync() } catch {} 21 + try { this.writeCon?.closeSync() } catch {} 22 + try { this.instance?.closeSync() } catch {} 23 + } 24 + 25 + async query<T = Record<string, unknown>>(sql: string, params: unknown[] = []): Promise<T[]> { 26 + return this.enqueue('read', async () => { 27 + if (params.length === 0) { 28 + const reader = await this.readCon.runAndReadAll(sql) 29 + return reader.getRowObjects() as T[] 30 + } 31 + const prepared = await this.readCon.prepare(sql) 32 + this.bindParams(prepared, params) 33 + const reader = await prepared.runAndReadAll() 34 + return reader.getRowObjects() as T[] 35 + }) 36 + } 37 + 38 + async execute(sql: string, params: unknown[] = []): Promise<void> { 39 + return this.enqueue('write', async () => { 40 + if (params.length === 0) { 41 + await this.writeCon.run(sql) 42 + return 43 + } 44 + const prepared = await this.writeCon.prepare(sql) 45 + this.bindParams(prepared, params) 46 + await prepared.run() 47 + }) 48 + } 49 + 50 + async executeMultiple(sql: string): Promise<void> { 51 + return this.enqueue('write', async () => { 52 + for (const statement of sql.split(';').filter((s) => s.trim())) { 53 + await this.writeCon.run(statement) 54 + } 55 + }) 56 + } 57 + 58 + async beginTransaction(): Promise<void> { 59 + return this.enqueue('write', async () => { 60 + await this.writeCon.run('BEGIN TRANSACTION') 61 + }) 62 + } 63 + 64 + async commit(): Promise<void> { 65 + return this.enqueue('write', async () => { 66 + await this.writeCon.run('COMMIT') 67 + }) 68 + } 69 + 70 + async rollback(): Promise<void> { 71 + return this.enqueue('write', async () => { 72 + await this.writeCon.run('ROLLBACK') 73 + }) 74 + } 75 + 76 + async createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> { 77 + const appender = await this.writeCon.createAppender(table.replace(/"/g, '')) 78 + return { 79 + append(values: unknown[]) { 80 + for (const value of values) { 81 + if (value === null || value === undefined) { 82 + appender.appendNull() 83 + } else if (typeof value === 'string') { 84 + appender.appendVarchar(value) 85 + } else if (typeof value === 'number') { 86 + if (Number.isInteger(value)) { 87 + appender.appendInteger(value) 88 + } else { 89 + appender.appendDouble(value) 90 + } 91 + } else if (typeof value === 'boolean') { 92 + appender.appendBoolean(value) 93 + } else if (typeof value === 'bigint') { 94 + appender.appendBigInt(value) 95 + } else if (value instanceof Uint8Array) { 96 + appender.appendBlob(value) 97 + } else { 98 + appender.appendVarchar(String(value)) 99 + } 100 + } 101 + appender.endRow() 102 + }, 103 + async flush() { 104 + appender.flushSync() 105 + }, 106 + async close() { 107 + appender.flushSync() 108 + appender.closeSync() 109 + }, 110 + } 111 + } 112 + 113 + /** Enqueue a read or write operation for serialization */ 114 + private enqueue<T>(queue: 'read' | 'write', fn: () => Promise<T>): Promise<T> { 115 + if (queue === 'write') { 116 + const p = this.writeQueue.then(fn) 117 + this.writeQueue = p.then(() => {}, () => {}) 118 + return p 119 + } else { 120 + const p = this.readQueue.then(fn) 121 + this.readQueue = p.then(() => {}, () => {}) 122 + return p 123 + } 124 + } 125 + 126 + private bindParams(prepared: any, params: unknown[]): void { 127 + for (let i = 0; i < params.length; i++) { 128 + const idx = i + 1 129 + const value = params[i] 130 + if (value === null || value === undefined) { 131 + prepared.bindNull(idx) 132 + } else if (typeof value === 'string') { 133 + prepared.bindVarchar(idx, value) 134 + } else if (typeof value === 'number') { 135 + if (Number.isInteger(value)) { 136 + prepared.bindInteger(idx, value) 137 + } else { 138 + prepared.bindDouble(idx, value) 139 + } 140 + } else if (typeof value === 'boolean') { 141 + prepared.bindBoolean(idx, value) 142 + } else if (typeof value === 'bigint') { 143 + prepared.bindBigInt(idx, value) 144 + } else if (value instanceof Uint8Array) { 145 + prepared.bindBlob(idx, value) 146 + } else { 147 + prepared.bindVarchar(idx, String(value)) 148 + } 149 + } 150 + } 151 + }
+59
packages/hatk/src/database/adapters/sqlite-search.ts
··· 1 + import type { SearchPort } from '../ports.ts' 2 + import type { DatabasePort } from '../ports.ts' 3 + 4 + /** 5 + * SQLite FTS5-based search port. 6 + * 7 + * Uses SQLite's built-in FTS5 virtual tables for full-text search with BM25 ranking. 8 + * The shadow table name is reused as the FTS5 virtual table name. 9 + */ 10 + export class SQLiteSearchPort implements SearchPort { 11 + constructor(private port: DatabasePort) {} 12 + 13 + async buildIndex( 14 + shadowTable: string, 15 + sourceQuery: string, 16 + searchColumns: string[], 17 + ): Promise<void> { 18 + // Drop existing FTS table and data table 19 + await this.port.execute(`DROP TABLE IF EXISTS ${shadowTable}_fts`, []) 20 + await this.port.execute(`DROP TABLE IF EXISTS ${shadowTable}`, []) 21 + 22 + // Create the data table from the source query 23 + await this.port.execute(`CREATE TABLE ${shadowTable} AS ${sourceQuery}`, []) 24 + 25 + // Create the FTS5 virtual table over the search columns 26 + const colList = searchColumns.join(', ') 27 + await this.port.execute( 28 + `CREATE VIRTUAL TABLE ${shadowTable}_fts USING fts5(uri UNINDEXED, ${colList}, tokenize='porter unicode61 remove_diacritics 2')`, 29 + [], 30 + ) 31 + 32 + // Populate FTS table from the data table 33 + const selectCols = ['uri', ...searchColumns].map((c) => `COALESCE(CAST(${c} AS TEXT), '')`) 34 + await this.port.execute( 35 + `INSERT INTO ${shadowTable}_fts (uri, ${colList}) SELECT ${selectCols.join(', ')} FROM ${shadowTable}`, 36 + [], 37 + ) 38 + } 39 + 40 + async search( 41 + shadowTable: string, 42 + query: string, 43 + _searchColumns: string[], 44 + limit: number, 45 + offset: number, 46 + ): Promise<Array<{ uri: string; score: number }>> { 47 + // Escape FTS5 special characters and build query 48 + const escaped = query.replace(/['"*(){}[\]^~\\:]/g, ' ').trim() 49 + if (!escaped) return [] 50 + 51 + // Use FTS5 MATCH with bm25() ranking (lower = better match, negate for DESC) 52 + const sql = `SELECT uri, -bm25(${shadowTable}_fts) AS score 53 + FROM ${shadowTable}_fts 54 + WHERE ${shadowTable}_fts MATCH $1 55 + ORDER BY score DESC 56 + LIMIT $2 OFFSET $3` 57 + return this.port.query(sql, [escaped, limit, offset]) 58 + } 59 + }
+111
packages/hatk/src/database/adapters/sqlite.ts
··· 1 + import Database from 'better-sqlite3' 2 + import type { DatabasePort, BulkInserter, Dialect } from '../ports.ts' 3 + 4 + /** 5 + * Translate DuckDB-style `$1, $2` placeholders to SQLite `?` placeholders. 6 + * Handles repeated references to the same `$N` by duplicating the param value. 7 + * Returns the translated SQL and expanded params array. 8 + */ 9 + function translateParams(sql: string, params: unknown[]): { sql: string; params: unknown[] } { 10 + if (params.length === 0) return { sql, params } 11 + 12 + const expandedParams: unknown[] = [] 13 + const translated = sql.replace(/\$(\d+)/g, (_match, numStr) => { 14 + const idx = parseInt(numStr) - 1 // $1 → index 0 15 + expandedParams.push(params[idx]) 16 + return '?' 17 + }) 18 + 19 + return { sql: translated, params: expandedParams } 20 + } 21 + 22 + export class SQLiteAdapter implements DatabasePort { 23 + dialect: Dialect = 'sqlite' 24 + 25 + private db!: Database.Database 26 + 27 + async open(path: string): Promise<void> { 28 + this.db = new Database(path === ':memory:' ? ':memory:' : path) 29 + this.db.pragma('journal_mode = WAL') 30 + this.db.pragma('synchronous = NORMAL') 31 + this.db.pragma('foreign_keys = ON') 32 + } 33 + 34 + close(): void { 35 + try { this.db?.close() } catch {} 36 + } 37 + 38 + async query<T = Record<string, unknown>>(sql: string, params: unknown[] = []): Promise<T[]> { 39 + const t = translateParams(sql, params) 40 + const stmt = this.db.prepare(t.sql) 41 + return stmt.all(...t.params) as T[] 42 + } 43 + 44 + async execute(sql: string, params: unknown[] = []): Promise<void> { 45 + const t = translateParams(sql, params) 46 + const stmt = this.db.prepare(t.sql) 47 + stmt.run(...t.params) 48 + } 49 + 50 + async executeMultiple(sql: string): Promise<void> { 51 + this.db.exec(sql) 52 + } 53 + 54 + async beginTransaction(): Promise<void> { 55 + this.db.exec('BEGIN') 56 + } 57 + 58 + async commit(): Promise<void> { 59 + this.db.exec('COMMIT') 60 + } 61 + 62 + async rollback(): Promise<void> { 63 + this.db.exec('ROLLBACK') 64 + } 65 + 66 + async createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> { 67 + const placeholders = columns.map(() => '?').join(', ') 68 + const sql = `INSERT INTO ${table} (${columns.join(', ')}) VALUES (${placeholders})` 69 + const stmt = this.db.prepare(sql) 70 + const buffer: unknown[][] = [] 71 + const BATCH_SIZE = 500 72 + 73 + const self = this 74 + return { 75 + append(values: unknown[]) { 76 + buffer.push(values) 77 + if (buffer.length >= BATCH_SIZE) { 78 + const tx = self.db.transaction(() => { 79 + for (const row of buffer) { 80 + stmt.run(...row) 81 + } 82 + }) 83 + tx() 84 + buffer.length = 0 85 + } 86 + }, 87 + async flush() { 88 + if (buffer.length > 0) { 89 + const tx = self.db.transaction(() => { 90 + for (const row of buffer) { 91 + stmt.run(...row) 92 + } 93 + }) 94 + tx() 95 + buffer.length = 0 96 + } 97 + }, 98 + async close() { 99 + if (buffer.length > 0) { 100 + const tx = self.db.transaction(() => { 101 + for (const row of buffer) { 102 + stmt.run(...row) 103 + } 104 + }) 105 + tx() 106 + buffer.length = 0 107 + } 108 + }, 109 + } 110 + } 111 + }
+127
packages/hatk/src/database/dialect.ts
··· 1 + import type { Dialect } from './ports.ts' 2 + 3 + export interface SqlDialect { 4 + /** Map from lexicon type key to SQL column type */ 5 + typeMap: Record<string, string> 6 + 7 + /** Timestamp type name */ 8 + timestampType: string 9 + 10 + /** JSON type name */ 11 + jsonType: string 12 + 13 + /** Parameter placeholder for index (1-based). DuckDB/Postgres: $1 SQLite: ? */ 14 + param(index: number): string 15 + 16 + /** Whether the engine supports native bulk appenders (DuckDB) vs batched INSERT */ 17 + supportsAppender: boolean 18 + 19 + /** SQL for upsert — 'INSERT OR REPLACE' (DuckDB/SQLite) vs 'ON CONFLICT DO UPDATE' */ 20 + upsertPrefix: string 21 + 22 + /** Extract a string value from a JSON column. Returns SQL expression. */ 23 + jsonExtractString(column: string, path: string): string 24 + 25 + /** Aggregate strings from a JSON array. Returns SQL expression. */ 26 + jsonArrayStringAgg(column: string, jsonPath: string): string 27 + 28 + /** Information schema query to list user tables */ 29 + listTablesQuery: string 30 + 31 + /** CHECKPOINT or equivalent (for WAL compaction). null if not needed. */ 32 + checkpointSQL: string | null 33 + 34 + /** Current timestamp expression */ 35 + currentTimestamp: string 36 + 37 + /** ILIKE or equivalent for case-insensitive matching */ 38 + ilike: string 39 + 40 + /** Cast expression for safe timestamp parsing. DuckDB: TRY_CAST(x AS TIMESTAMP), SQLite: x */ 41 + tryCastTimestamp(expr: string): string 42 + 43 + /** COUNT(*)::INTEGER or equivalent */ 44 + countAsInteger: string 45 + 46 + /** GREATEST(...) or MAX(...) for multi-arg max */ 47 + greatest(exprs: string[]): string 48 + 49 + /** jaro_winkler_similarity or null if unsupported */ 50 + jaroWinklerSimilarity: string | null 51 + 52 + /** string_agg or group_concat */ 53 + stringAgg(column: string, separator: string): string 54 + 55 + /** CREATE SEQUENCE support */ 56 + supportsSequences: boolean 57 + } 58 + 59 + export const DUCKDB_DIALECT: SqlDialect = { 60 + typeMap: { 61 + text: 'TEXT', 62 + integer: 'INTEGER', 63 + bigint: 'BIGINT', 64 + boolean: 'BOOLEAN', 65 + blob: 'BLOB', 66 + timestamp: 'TIMESTAMP', 67 + timestamptz: 'TIMESTAMPTZ', 68 + json: 'JSON', 69 + }, 70 + timestampType: 'TIMESTAMP', 71 + jsonType: 'JSON', 72 + param: (i: number) => `$${i}`, 73 + supportsAppender: true, 74 + upsertPrefix: 'INSERT OR REPLACE INTO', 75 + jsonExtractString: (col, path) => `json_extract_string(${col}, '${path}')`, 76 + jsonArrayStringAgg: (col, path) => `list_string_agg(json_extract_string(${col}, '${path}'))`, 77 + listTablesQuery: `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main' AND table_name NOT LIKE '\\_%' ESCAPE '\\\\'`, 78 + checkpointSQL: 'CHECKPOINT', 79 + currentTimestamp: 'CURRENT_TIMESTAMP', 80 + ilike: 'ILIKE', 81 + tryCastTimestamp: (expr) => `TRY_CAST(${expr} AS TIMESTAMP)`, 82 + countAsInteger: 'COUNT(*)::INTEGER', 83 + greatest: (exprs) => `GREATEST(${exprs.join(', ')})`, 84 + jaroWinklerSimilarity: 'jaro_winkler_similarity', 85 + stringAgg: (col, sep) => `string_agg(${col}, ${sep})`, 86 + supportsSequences: true, 87 + } 88 + 89 + export const SQLITE_DIALECT: SqlDialect = { 90 + typeMap: { 91 + text: 'TEXT', 92 + integer: 'INTEGER', 93 + bigint: 'INTEGER', 94 + boolean: 'INTEGER', 95 + blob: 'BLOB', 96 + timestamp: 'TEXT', 97 + timestamptz: 'TEXT', 98 + json: 'TEXT', 99 + }, 100 + timestampType: 'TEXT', 101 + jsonType: 'TEXT', 102 + param: (_i: number) => '?', 103 + supportsAppender: false, 104 + upsertPrefix: 'INSERT OR REPLACE INTO', 105 + jsonExtractString: (col, path) => `json_extract(${col}, '${path}')`, 106 + jsonArrayStringAgg: (col, path) => { 107 + return `(SELECT group_concat(je.value, ' ') FROM json_each(${col}, '${path}') je)` 108 + }, 109 + listTablesQuery: `SELECT name AS table_name FROM sqlite_master WHERE type='table' AND name NOT LIKE '\\_%' ESCAPE '\\\\'`, 110 + checkpointSQL: null, 111 + currentTimestamp: 'CURRENT_TIMESTAMP', 112 + ilike: 'LIKE', 113 + tryCastTimestamp: (expr) => expr, 114 + countAsInteger: 'CAST(COUNT(*) AS INTEGER)', 115 + greatest: (exprs) => `MAX(${exprs.join(', ')})`, 116 + jaroWinklerSimilarity: null, 117 + stringAgg: (col, sep) => `group_concat(${col}, ${sep})`, 118 + supportsSequences: false, 119 + } 120 + 121 + export function getDialect(dialect: Dialect): SqlDialect { 122 + switch (dialect) { 123 + case 'duckdb': return DUCKDB_DIALECT 124 + case 'sqlite': return SQLITE_DIALECT 125 + case 'postgres': throw new Error('PostgreSQL adapter not yet implemented') 126 + } 127 + }
+39
packages/hatk/src/database/index.ts
··· 1 + export type { DatabasePort, BulkInserter, SearchPort, Dialect } from './ports.ts' 2 + export type { SqlDialect } from './dialect.ts' 3 + export { getDialect, DUCKDB_DIALECT, SQLITE_DIALECT } from './dialect.ts' 4 + export { createAdapter } from './adapter-factory.ts' 5 + 6 + // Re-export commonly used functions from db.ts 7 + export { 8 + initDatabase, 9 + closeDatabase, 10 + querySQL, 11 + runSQL, 12 + insertRecord, 13 + deleteRecord, 14 + queryRecords, 15 + searchRecords, 16 + getRecordByUri, 17 + getCursor, 18 + setCursor, 19 + bulkInsertRecords, 20 + packCursor, 21 + unpackCursor, 22 + } from './db.ts' 23 + 24 + // Re-export schema utilities 25 + export { 26 + type TableSchema, 27 + type ColumnDef, 28 + type ChildTableSchema, 29 + loadLexicons, 30 + discoverCollections, 31 + buildSchemas, 32 + generateTableSchema, 33 + generateCreateTableSQL, 34 + toSnakeCase, 35 + getLexicon, 36 + getLexiconArray, 37 + getAllLexicons, 38 + storeLexicons, 39 + } from './schema.ts'
+62
packages/hatk/src/database/ports.ts
··· 1 + export type Dialect = 'duckdb' | 'sqlite' | 'postgres' 2 + 3 + export interface DatabasePort { 4 + /** Dialect identifier for SQL generation differences */ 5 + dialect: Dialect 6 + 7 + /** Open a database connection. path is file path or ':memory:' */ 8 + open(path: string): Promise<void> 9 + 10 + /** Close all connections and release resources */ 11 + close(): void 12 + 13 + /** Execute a read query, return rows as plain objects */ 14 + query<T = Record<string, unknown>>(sql: string, params?: unknown[]): Promise<T[]> 15 + 16 + /** Execute a write statement (INSERT, UPDATE, DELETE, DDL) */ 17 + execute(sql: string, params?: unknown[]): Promise<void> 18 + 19 + /** Execute multiple statements in sequence (for DDL batches) */ 20 + executeMultiple(sql: string): Promise<void> 21 + 22 + /** Begin a transaction */ 23 + beginTransaction(): Promise<void> 24 + 25 + /** Commit the current transaction */ 26 + commit(): Promise<void> 27 + 28 + /** Rollback the current transaction */ 29 + rollback(): Promise<void> 30 + 31 + /** Create a bulk inserter for high-throughput writes */ 32 + createBulkInserter(table: string, columns: string[]): Promise<BulkInserter> 33 + } 34 + 35 + export interface BulkInserter { 36 + /** Append a single row of values */ 37 + append(values: unknown[]): void 38 + 39 + /** Flush buffered rows to the database */ 40 + flush(): Promise<void> 41 + 42 + /** Close the inserter and release resources */ 43 + close(): Promise<void> 44 + } 45 + 46 + export interface SearchPort { 47 + /** Build/rebuild an FTS index for a table */ 48 + buildIndex( 49 + shadowTable: string, 50 + sourceQuery: string, 51 + searchColumns: string[], 52 + ): Promise<void> 53 + 54 + /** Search a table, returning URIs with scores */ 55 + search( 56 + shadowTable: string, 57 + query: string, 58 + searchColumns: string[], 59 + limit: number, 60 + offset: number, 61 + ): Promise<Array<{ uri: string; score: number }>> 62 + }
+307 -368
packages/hatk/src/db.ts packages/hatk/src/database/db.ts
··· 1 - import { DuckDBInstance } from '@duckdb/node-api' 2 1 import { type TableSchema, toSnakeCase } from './schema.ts' 3 - import type { Row } from './lex-types.ts' 4 - import { getSearchColumns, stripStopWords } from './fts.ts' 5 - import { emit, timer } from './logger.ts' 6 - import { OAUTH_DDL } from './oauth/db.ts' 2 + import type { Row } from '../lex-types.ts' 3 + import { getSearchColumns, stripStopWords, hasSearchPort, getSearchPort } from './fts.ts' 4 + import { emit, timer } from '../logger.ts' 5 + import { OAUTH_DDL } from '../oauth/db.ts' 6 + import type { DatabasePort } from './ports.ts' 7 + import { getDialect, type SqlDialect } from './dialect.ts' 7 8 8 - let instance: DuckDBInstance 9 - let con: Awaited<ReturnType<DuckDBInstance['connect']>> 10 - let readCon: Awaited<ReturnType<DuckDBInstance['connect']>> 9 + let port: DatabasePort 10 + let dialect: SqlDialect 11 11 const schemas = new Map<string, TableSchema>() 12 12 13 + export function getDatabasePort(): DatabasePort { return port } 14 + export function getSqlDialect(): SqlDialect { return dialect } 15 + 13 16 export function closeDatabase(): void { 14 - try { 15 - readCon?.closeSync() 16 - } catch {} 17 - try { 18 - con?.closeSync() 19 - } catch {} 20 - try { 21 - instance?.closeSync() 22 - } catch {} 23 - } 24 - 25 - let writeQueue = Promise.resolve() 26 - let readQueue = Promise.resolve() 27 - 28 - function enqueue<T>(queue: 'read' | 'write', fn: () => Promise<T>): Promise<T> { 29 - if (queue === 'write') { 30 - const p = writeQueue.then(fn) 31 - writeQueue = p.then( 32 - () => {}, 33 - () => {}, 34 - ) 35 - return p 36 - } else { 37 - const p = readQueue.then(fn) 38 - readQueue = p.then( 39 - () => {}, 40 - () => {}, 41 - ) 42 - return p 43 - } 44 - } 45 - 46 - function bindParams(prepared: any, params: any[]): void { 47 - for (let i = 0; i < params.length; i++) { 48 - const idx = i + 1 49 - const value = params[i] 50 - if (value === null || value === undefined) { 51 - prepared.bindNull(idx) 52 - } else if (typeof value === 'string') { 53 - prepared.bindVarchar(idx, value) 54 - } else if (typeof value === 'number') { 55 - if (Number.isInteger(value)) { 56 - prepared.bindInteger(idx, value) 57 - } else { 58 - prepared.bindDouble(idx, value) 59 - } 60 - } else if (typeof value === 'boolean') { 61 - prepared.bindBoolean(idx, value) 62 - } else if (typeof value === 'bigint') { 63 - prepared.bindBigInt(idx, value) 64 - } else if (value instanceof Uint8Array) { 65 - prepared.bindBlob(idx, value) 66 - } else { 67 - prepared.bindVarchar(idx, String(value)) 68 - } 69 - } 70 - } 71 - 72 - async function runDirect(sql: string, ...params: any[]): Promise<void> { 73 - if (params.length === 0) { 74 - await con.run(sql) 75 - return 76 - } 77 - const prepared = await con.prepare(sql) 78 - bindParams(prepared, params) 79 - await prepared.run() 17 + port?.close() 80 18 } 81 19 82 20 async function run(sql: string, ...params: any[]): Promise<void> { 83 - return enqueue('write', () => runDirect(sql, ...params)) 21 + return port.execute(sql, params) 84 22 } 85 23 86 24 export async function runBatch(operations: Array<{ sql: string; params: any[] }>): Promise<void> { 87 - return enqueue('write', async () => { 88 - await con.run('BEGIN TRANSACTION') 25 + await port.beginTransaction() 26 + try { 89 27 for (const op of operations) { 90 28 try { 91 - if (op.params.length === 0) { 92 - await con.run(op.sql) 93 - } else { 94 - const prepared = await con.prepare(op.sql) 95 - bindParams(prepared, op.params) 96 - await prepared.run() 97 - } 29 + await port.execute(op.sql, op.params) 98 30 } catch { 99 31 // Skip bad records, continue with rest of batch 100 32 } 101 33 } 102 - await con.run('COMMIT') 103 - }) 104 - } 105 - 106 - async function allDirect(sql: string, ...params: any[]): Promise<any[]> { 107 - if (params.length === 0) { 108 - const reader = await readCon.runAndReadAll(sql) 109 - return reader.getRowObjects() 34 + await port.commit() 35 + } catch { 36 + await port.rollback() 110 37 } 111 - const prepared = await readCon.prepare(sql) 112 - bindParams(prepared, params) 113 - const reader = await prepared.runAndReadAll() 114 - return reader.getRowObjects() 115 38 } 116 39 117 40 async function all(sql: string, ...params: any[]): Promise<any[]> { 118 - return enqueue('read', () => allDirect(sql, ...params)) 41 + return port.query(sql, params) 119 42 } 120 43 121 44 export async function initDatabase( 45 + adapter: DatabasePort, 122 46 dbPath: string, 123 47 tableSchemas: TableSchema[], 124 48 ddlStatements: string[], 125 49 ): Promise<void> { 126 - instance = await DuckDBInstance.create(dbPath === ':memory:' ? undefined : dbPath) 127 - con = await instance.connect() 128 - readCon = await instance.connect() 50 + port = adapter 51 + dialect = getDialect(adapter.dialect) 52 + 53 + await port.open(dbPath) 129 54 130 55 for (const schema of tableSchemas) { 131 56 schemas.set(schema.collection, schema) 132 57 } 133 58 134 59 for (const ddl of ddlStatements) { 135 - for (const statement of ddl.split(';').filter((s) => s.trim())) { 136 - await run(statement) 137 - } 60 + await port.executeMultiple(ddl) 138 61 } 139 62 140 63 // Internal tables for backfill state ··· 142 65 did TEXT PRIMARY KEY, 143 66 status TEXT NOT NULL DEFAULT 'pending', 144 67 handle TEXT, 145 - backfilled_at TIMESTAMP, 68 + backfilled_at ${dialect.timestampType}, 146 69 rev TEXT, 147 70 retry_count INTEGER NOT NULL DEFAULT 0, 148 71 retry_after INTEGER NOT NULL DEFAULT 0 ··· 161 84 )`) 162 85 163 86 // Labels table (atproto-compatible) 164 - await run(`CREATE SEQUENCE IF NOT EXISTS _labels_seq START 1`) 165 - await run(`CREATE TABLE IF NOT EXISTS _labels ( 166 - id INTEGER PRIMARY KEY DEFAULT nextval('_labels_seq'), 167 - src TEXT NOT NULL, 168 - uri TEXT NOT NULL, 169 - val TEXT NOT NULL, 170 - neg BOOLEAN DEFAULT FALSE, 171 - cts TIMESTAMP NOT NULL, 172 - exp TIMESTAMP 173 - )`) 87 + if (dialect.supportsSequences) { 88 + await run(`CREATE SEQUENCE IF NOT EXISTS _labels_seq START 1`) 89 + await run(`CREATE TABLE IF NOT EXISTS _labels ( 90 + id INTEGER PRIMARY KEY DEFAULT nextval('_labels_seq'), 91 + src TEXT NOT NULL, 92 + uri TEXT NOT NULL, 93 + val TEXT NOT NULL, 94 + neg ${dialect.typeMap.boolean} DEFAULT FALSE, 95 + cts ${dialect.timestampType} NOT NULL, 96 + exp ${dialect.timestampType} 97 + )`) 98 + } else { 99 + await run(`CREATE TABLE IF NOT EXISTS _labels ( 100 + id INTEGER PRIMARY KEY AUTOINCREMENT, 101 + src TEXT NOT NULL, 102 + uri TEXT NOT NULL, 103 + val TEXT NOT NULL, 104 + neg INTEGER DEFAULT 0, 105 + cts TEXT NOT NULL, 106 + exp TEXT 107 + )`) 108 + } 174 109 await run(`CREATE INDEX IF NOT EXISTS idx_labels_uri ON _labels(uri)`) 175 110 await run(`CREATE INDEX IF NOT EXISTS idx_labels_src ON _labels(src)`) 176 111 ··· 178 113 await run(`CREATE TABLE IF NOT EXISTS _preferences ( 179 114 did TEXT NOT NULL, 180 115 key TEXT NOT NULL, 181 - value JSON NOT NULL, 182 - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 116 + value ${dialect.jsonType} NOT NULL, 117 + updated_at ${dialect.timestampType} DEFAULT ${dialect.currentTimestamp}, 183 118 PRIMARY KEY (did, key) 184 119 )`) 185 120 186 121 // OAuth tables 187 - for (const statement of OAUTH_DDL.split(';').filter((s) => s.trim())) { 188 - await run(statement) 189 - } 122 + await port.executeMultiple(OAUTH_DDL) 190 123 } 191 124 192 125 export async function getCursor(key: string): Promise<string | null> { ··· 300 233 params.push(status) 301 234 } 302 235 if (q) { 303 - conditions.push(`(did ILIKE $${paramIdx} OR handle ILIKE $${paramIdx})`) 236 + conditions.push(`(did ${dialect.ilike} $${paramIdx} OR handle ${dialect.ilike} $${paramIdx})`) 304 237 params.push(`%${q}%`) 305 238 paramIdx++ 306 239 } 307 240 308 241 const where = conditions.length ? ' WHERE ' + conditions.join(' AND ') : '' 309 242 310 - const countRows = await all(`SELECT COUNT(*)::INTEGER as total FROM _repos${where}`, ...params) 243 + const countRows = await all(`SELECT ${dialect.countAsInteger} as total FROM _repos${where}`, ...params) 311 244 const total = Number(countRows[0]?.total || 0) 312 245 313 246 const rows = await all( 314 - `SELECT did, handle, status, backfilled_at, rev FROM _repos${where} ORDER BY backfilled_at DESC NULLS LAST, did LIMIT $${paramIdx++} OFFSET $${paramIdx++}`, 247 + `SELECT did, handle, status, backfilled_at, rev FROM _repos${where} ORDER BY CASE WHEN backfilled_at IS NULL THEN 1 ELSE 0 END, backfilled_at DESC, did LIMIT $${paramIdx++} OFFSET $${paramIdx++}`, 315 248 ...params, 316 249 limit, 317 250 offset, ··· 323 256 export async function getCollectionCounts(): Promise<Record<string, number>> { 324 257 const counts: Record<string, number> = {} 325 258 for (const [collection, schema] of schemas) { 326 - const rows = await all(`SELECT COUNT(*)::INTEGER as count FROM ${schema.tableName}`) 259 + const rows = await all(`SELECT ${dialect.countAsInteger} as count FROM ${schema.tableName}`) 327 260 counts[collection] = Number(rows[0]?.count || 0) 328 261 } 329 262 return counts 330 263 } 331 264 332 265 export async function getSchemaDump(): Promise<string> { 333 - const rows = await all(`SELECT sql FROM duckdb_tables() ORDER BY table_name`) 334 - return rows.map((r: any) => r.sql + ';').join('\n\n') 266 + let rows: any[] 267 + if (dialect.supportsSequences) { 268 + // DuckDB: use duckdb_tables() for full DDL 269 + rows = await all(`SELECT sql FROM duckdb_tables() ORDER BY table_name`) 270 + } else { 271 + // SQLite: use sqlite_master, skip FTS shadow/internal tables 272 + rows = await all( 273 + `SELECT sql FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' AND name NOT LIKE '_fts_%' AND sql IS NOT NULL ORDER BY name`, 274 + ) 275 + } 276 + // Normalize indentation 277 + return rows 278 + .map((r: any) => { 279 + let sql = (r.sql as string).trim() 280 + // Split into lines and re-indent consistently 281 + const lines = sql.split('\n').map((l) => l.trim()) 282 + sql = lines 283 + .map((line, i) => { 284 + if (i === 0) return line // CREATE TABLE line 285 + if (line.startsWith(')')) return ')' // closing paren at top level 286 + return ' ' + line // indent columns 287 + }) 288 + .join('\n') 289 + return sql + ';' 290 + }) 291 + .join('\n\n') 335 292 } 336 293 337 294 export function buildInsertOp( ··· 362 319 363 320 if (rawValue === undefined || rawValue === null) { 364 321 values.push(null) 365 - } else if (col.duckdbType === 'JSON') { 322 + } else if (col.sqlType === 'JSON') { 366 323 values.push(JSON.stringify(rawValue)) 367 324 } else { 368 325 values.push(rawValue) ··· 405 362 const raw = item[col.originalName] 406 363 if (raw === undefined || raw === null) { 407 364 values.push(null) 408 - } else if (col.duckdbType === 'JSON') { 365 + } else if (col.sqlType === 'JSON') { 409 366 values.push(JSON.stringify(raw)) 410 367 } else { 411 368 values.push(raw) ··· 447 404 const raw = item[col.originalName] 448 405 if (raw === undefined || raw === null) { 449 406 values.push(null) 450 - } else if (col.duckdbType === 'JSON') { 407 + } else if (col.sqlType === 'JSON') { 451 408 values.push(JSON.stringify(raw)) 452 409 } else { 453 410 values.push(raw) ··· 471 428 const raw = branchData[col.originalName] 472 429 if (raw === undefined || raw === null) { 473 430 values.push(null) 474 - } else if (col.duckdbType === 'JSON') { 431 + } else if (col.sqlType === 'JSON') { 475 432 values.push(JSON.stringify(raw)) 476 433 } else { 477 434 values.push(raw) ··· 592 549 'cid TEXT', 593 550 'did TEXT', 594 551 'indexed_at TEXT', 595 - ...schema.columns.map((c) => `${c.name} ${c.duckdbType === 'TIMESTAMP' ? 'TEXT' : c.duckdbType}`), 552 + ...schema.columns.map((c) => { 553 + const t = c.sqlType 554 + // Use TEXT for timestamp columns in staging (will cast on merge) 555 + return `${c.name} ${t === 'TIMESTAMP' || t === 'TIMESTAMPTZ' ? 'TEXT' : t}` 556 + }), 596 557 ] 597 558 598 - // Create staging table + appender + merge all in one write queue slot 599 - await enqueue('write', async () => { 600 - await con.run(`DROP TABLE IF EXISTS ${stagingTable}`) 601 - await con.run(`CREATE TABLE ${stagingTable} (${colDefs.join(', ')})`) 559 + await port.execute(`DROP TABLE IF EXISTS ${stagingTable}`, []) 560 + await port.execute(`CREATE TABLE ${stagingTable} (${colDefs.join(', ')})`, []) 561 + 562 + const inserter = await port.createBulkInserter(stagingTable, allCols) 563 + const now = new Date().toISOString() 564 + 565 + for (const rec of recs) { 566 + try { 567 + const values: unknown[] = [rec.uri, rec.cid, rec.did, now] 568 + 569 + for (const col of schema.columns) { 570 + values.push(resolveColumnValue(col, rec.record)) 571 + } 572 + inserter.append(values) 573 + inserted++ 574 + } catch { 575 + // Skip bad records 576 + } 577 + } 578 + 579 + await inserter.close() 580 + 581 + // Merge into target, filtering rows that would violate NOT NULL 582 + const selectCols = allCols.map((name) => { 583 + const col = schema.columns.find((c) => c.name === name) 584 + if (name === 'indexed_at' || (col && (col.sqlType === 'TIMESTAMP' || col.sqlType === 'TIMESTAMPTZ'))) { 585 + return `${dialect.tryCastTimestamp(name)} AS ${name}` 586 + } 587 + return name 588 + }) 589 + const notNullChecks: string[] = ['uri IS NOT NULL', 'did IS NOT NULL'] 590 + for (const col of schema.columns) { 591 + if (col.notNull) { 592 + if (col.sqlType === 'TIMESTAMP' || col.sqlType === 'TIMESTAMPTZ') { 593 + notNullChecks.push(`${dialect.tryCastTimestamp(col.name)} IS NOT NULL`) 594 + } else { 595 + notNullChecks.push(`${col.name} IS NOT NULL`) 596 + } 597 + } 598 + } 599 + const whereClause = notNullChecks.length ? ` WHERE ${notNullChecks.join(' AND ')}` : '' 600 + await port.execute( 601 + `INSERT OR REPLACE INTO ${schema.tableName} (${allCols.join(', ')}) SELECT ${selectCols.join(', ')} FROM ${stagingTable}${whereClause}`, 602 + [], 603 + ) 604 + await port.execute(`DROP TABLE ${stagingTable}`, []) 602 605 603 - const appender = await con.createAppender(stagingTable) 604 - const now = new Date().toISOString() 606 + // Populate child tables 607 + for (const child of schema.children) { 608 + const childStagingTable = `_staging_${collection.replace(/\./g, '_')}__${child.fieldName}` 609 + const childColDefs = [ 610 + 'parent_uri TEXT', 611 + 'parent_did TEXT', 612 + ...child.columns.map((c) => { 613 + const t = c.sqlType 614 + return `${c.name} ${t === 'TIMESTAMP' || t === 'TIMESTAMPTZ' ? 'TEXT' : t}` 615 + }), 616 + ] 617 + const childAllCols = ['parent_uri', 'parent_did', ...child.columns.map((c) => c.name)] 605 618 606 - for (const rec of recs) { 607 - try { 608 - appender.appendVarchar(rec.uri) 609 - appender.appendVarchar(rec.cid) 610 - appender.appendVarchar(rec.did) 611 - appender.appendVarchar(now) 619 + await port.execute(`DROP TABLE IF EXISTS ${childStagingTable}`, []) 620 + await port.execute(`CREATE TABLE ${childStagingTable} (${childColDefs.join(', ')})`, []) 612 621 613 - for (const col of schema.columns) { 614 - let rawValue = rec.record[col.originalName] 615 - if (rawValue && typeof rawValue === 'object' && col.name.endsWith('_uri') && col.isRef) { 616 - rawValue = rawValue.uri 617 - } else if (col.originalName.endsWith('__cid') && rec.record[col.originalName.replace('__cid', '')]) { 618 - rawValue = rec.record[col.originalName.replace('__cid', '')].cid 619 - } 622 + const childInserter = await port.createBulkInserter(childStagingTable, childAllCols) 623 + 624 + for (const rec of recs) { 625 + const items = rec.record[child.fieldName] 626 + if (!Array.isArray(items)) continue 620 627 621 - if (rawValue === undefined || rawValue === null) { 622 - appender.appendNull() 623 - } else if (col.duckdbType === 'JSON') { 624 - appender.appendVarchar(JSON.stringify(rawValue)) 625 - } else if (col.duckdbType === 'INTEGER') { 626 - appender.appendInteger(typeof rawValue === 'number' ? rawValue : parseInt(rawValue)) 627 - } else if (col.duckdbType === 'BOOLEAN') { 628 - appender.appendBoolean(!!rawValue) 629 - } else { 630 - appender.appendVarchar(String(rawValue)) 628 + for (const item of items) { 629 + try { 630 + const values: unknown[] = [rec.uri, rec.did] 631 + for (const col of child.columns) { 632 + values.push(resolveRawColumnValue(col, item)) 631 633 } 634 + childInserter.append(values) 635 + } catch { 636 + // Skip bad items 632 637 } 633 - appender.endRow() 634 - inserted++ 635 - } catch { 636 - // Skip bad records 637 638 } 638 639 } 639 640 640 - appender.flushSync() 641 - appender.closeSync() 641 + await childInserter.close() 642 642 643 - // Merge into target with TRY_CAST for TIMESTAMP columns, filtering rows that would violate NOT NULL 644 - const selectCols = allCols.map((name) => { 645 - const col = schema.columns.find((c) => c.name === name) 646 - if (name === 'indexed_at' || (col && col.duckdbType === 'TIMESTAMP')) { 647 - return `TRY_CAST(${name} AS TIMESTAMP) AS ${name}` 643 + // Delete existing child rows for these URIs, then merge staging 644 + const uriPlaceholders = recs.map((_, i) => `$${i + 1}`).join(',') 645 + await port.execute( 646 + `DELETE FROM ${child.tableName} WHERE parent_uri IN (${uriPlaceholders})`, 647 + recs.map((r) => r.uri), 648 + ) 649 + 650 + const childSelectCols = childAllCols.map((name) => { 651 + const col = child.columns.find((c) => c.name === name) 652 + if (col && (col.sqlType === 'TIMESTAMP' || col.sqlType === 'TIMESTAMPTZ')) { 653 + return `${dialect.tryCastTimestamp(name)} AS ${name}` 648 654 } 649 655 return name 650 656 }) 651 - // Build WHERE clause to exclude rows missing NOT NULL fields 652 - const notNullChecks: string[] = ['uri IS NOT NULL', 'did IS NOT NULL'] 653 - for (const col of schema.columns) { 654 - if (col.notNull) { 655 - if (col.duckdbType === 'TIMESTAMP') { 656 - notNullChecks.push(`TRY_CAST(${col.name} AS TIMESTAMP) IS NOT NULL`) 657 - } else { 658 - notNullChecks.push(`${col.name} IS NOT NULL`) 659 - } 660 - } 661 - } 662 - const whereClause = notNullChecks.length ? ` WHERE ${notNullChecks.join(' AND ')}` : '' 663 - await con.run( 664 - `INSERT OR REPLACE INTO ${schema.tableName} (${allCols.join(', ')}) SELECT ${selectCols.join(', ')} FROM ${stagingTable}${whereClause}`, 657 + await port.execute( 658 + `INSERT INTO ${child.tableName} (${childAllCols.join(', ')}) SELECT ${childSelectCols.join(', ')} FROM ${childStagingTable} WHERE parent_uri IS NOT NULL`, 659 + [], 665 660 ) 666 - await con.run(`DROP TABLE ${stagingTable}`) 661 + await port.execute(`DROP TABLE ${childStagingTable}`, []) 662 + } 667 663 668 - // Populate child tables 669 - for (const child of schema.children) { 670 - const childStagingTable = `_staging_${collection.replace(/\./g, '_')}__${child.fieldName}` 671 - const childColDefs = [ 664 + // Populate union branch tables 665 + for (const union of schema.unions) { 666 + for (const branch of union.branches) { 667 + const branchStagingTable = `_staging_${collection.replace(/\./g, '_')}__${toSnakeCase(union.fieldName)}_${branch.branchName}` 668 + const branchColDefs = [ 672 669 'parent_uri TEXT', 673 670 'parent_did TEXT', 674 - ...child.columns.map((c) => `${c.name} ${c.duckdbType === 'TIMESTAMP' ? 'TEXT' : c.duckdbType}`), 671 + ...branch.columns.map((c) => { 672 + const t = c.sqlType 673 + return `${c.name} ${t === 'TIMESTAMP' || t === 'TIMESTAMPTZ' ? 'TEXT' : t}` 674 + }), 675 675 ] 676 - const childAllCols = ['parent_uri', 'parent_did', ...child.columns.map((c) => c.name)] 676 + const branchAllCols = ['parent_uri', 'parent_did', ...branch.columns.map((c) => c.name)] 677 677 678 - await con.run(`DROP TABLE IF EXISTS ${childStagingTable}`) 679 - await con.run(`CREATE TABLE ${childStagingTable} (${childColDefs.join(', ')})`) 678 + await port.execute(`DROP TABLE IF EXISTS ${branchStagingTable}`, []) 679 + await port.execute(`CREATE TABLE ${branchStagingTable} (${branchColDefs.join(', ')})`, []) 680 680 681 - const childAppender = await con.createAppender(childStagingTable) 681 + const branchInserter = await port.createBulkInserter(branchStagingTable, branchAllCols) 682 682 683 683 for (const rec of recs) { 684 - const items = rec.record[child.fieldName] 685 - if (!Array.isArray(items)) continue 684 + const unionValue = rec.record[union.fieldName] 685 + if (!unionValue || typeof unionValue !== 'object') continue 686 + if (unionValue.$type !== branch.type) continue 686 687 687 - for (const item of items) { 688 + if (branch.isArray && branch.arrayField) { 689 + const items = resolveBranchData(unionValue, branch)[branch.arrayField] 690 + if (!Array.isArray(items)) continue 691 + for (const item of items) { 692 + try { 693 + const values: unknown[] = [rec.uri, rec.did] 694 + for (const col of branch.columns) { 695 + values.push(resolveRawColumnValue(col, item)) 696 + } 697 + branchInserter.append(values) 698 + } catch { 699 + // Skip bad items 700 + } 701 + } 702 + } else { 688 703 try { 689 - childAppender.appendVarchar(rec.uri) 690 - childAppender.appendVarchar(rec.did) 691 - 692 - for (const col of child.columns) { 693 - const rawValue = item[col.originalName] 694 - if (rawValue === undefined || rawValue === null) { 695 - childAppender.appendNull() 696 - } else if (col.duckdbType === 'JSON') { 697 - childAppender.appendVarchar(JSON.stringify(rawValue)) 698 - } else if (col.duckdbType === 'INTEGER') { 699 - childAppender.appendInteger(typeof rawValue === 'number' ? rawValue : parseInt(rawValue)) 700 - } else if (col.duckdbType === 'BOOLEAN') { 701 - childAppender.appendBoolean(!!rawValue) 702 - } else { 703 - childAppender.appendVarchar(String(rawValue)) 704 - } 704 + const branchData = resolveBranchData(unionValue, branch) 705 + const values: unknown[] = [rec.uri, rec.did] 706 + for (const col of branch.columns) { 707 + values.push(resolveRawColumnValue(col, branchData)) 705 708 } 706 - childAppender.endRow() 709 + branchInserter.append(values) 707 710 } catch { 708 - // Skip bad items 711 + // Skip bad records 709 712 } 710 713 } 711 714 } 712 715 713 - childAppender.flushSync() 714 - childAppender.closeSync() 716 + await branchInserter.close() 715 717 716 - // Delete existing child rows for these URIs, then merge staging 718 + // Delete existing branch rows for these URIs, then merge staging 717 719 const uriPlaceholders = recs.map((_, i) => `$${i + 1}`).join(',') 718 - const delStmt = await con.prepare(`DELETE FROM ${child.tableName} WHERE parent_uri IN (${uriPlaceholders})`) 719 - bindParams( 720 - delStmt, 720 + await port.execute( 721 + `DELETE FROM ${branch.tableName} WHERE parent_uri IN (${uriPlaceholders})`, 721 722 recs.map((r) => r.uri), 722 723 ) 723 - await delStmt.run() 724 724 725 - const childSelectCols = childAllCols.map((name) => { 726 - const col = child.columns.find((c) => c.name === name) 727 - if (col && col.duckdbType === 'TIMESTAMP') return `TRY_CAST(${name} AS TIMESTAMP) AS ${name}` 725 + const branchSelectCols = branchAllCols.map((name) => { 726 + const col = branch.columns.find((c) => c.name === name) 727 + if (col && (col.sqlType === 'TIMESTAMP' || col.sqlType === 'TIMESTAMPTZ')) { 728 + return `${dialect.tryCastTimestamp(name)} AS ${name}` 729 + } 728 730 return name 729 731 }) 730 - await con.run( 731 - `INSERT INTO ${child.tableName} (${childAllCols.join(', ')}) SELECT ${childSelectCols.join(', ')} FROM ${childStagingTable} WHERE parent_uri IS NOT NULL`, 732 + await port.execute( 733 + `INSERT INTO ${branch.tableName} (${branchAllCols.join(', ')}) SELECT ${branchSelectCols.join(', ')} FROM ${branchStagingTable} WHERE parent_uri IS NOT NULL`, 734 + [], 732 735 ) 733 - await con.run(`DROP TABLE ${childStagingTable}`) 736 + await port.execute(`DROP TABLE ${branchStagingTable}`, []) 734 737 } 738 + } 739 + } 735 740 736 - // Populate union branch tables 737 - for (const union of schema.unions) { 738 - for (const branch of union.branches) { 739 - const branchStagingTable = `_staging_${collection.replace(/\./g, '_')}__${toSnakeCase(union.fieldName)}_${branch.branchName}` 740 - const branchColDefs = [ 741 - 'parent_uri TEXT', 742 - 'parent_did TEXT', 743 - ...branch.columns.map((c) => `${c.name} ${c.duckdbType === 'TIMESTAMP' ? 'TEXT' : c.duckdbType}`), 744 - ] 745 - const branchAllCols = ['parent_uri', 'parent_did', ...branch.columns.map((c) => c.name)] 741 + return inserted 742 + } 746 743 747 - await con.run(`DROP TABLE IF EXISTS ${branchStagingTable}`) 748 - await con.run(`CREATE TABLE ${branchStagingTable} (${branchColDefs.join(', ')})`) 744 + /** Extract a column value from a record, handling strongRef expansion and type coercion for bulk insert */ 745 + function resolveColumnValue(col: { name: string; originalName: string; sqlType: string; isRef: boolean }, record: Record<string, any>): unknown { 746 + let rawValue = record[col.originalName] 747 + if (rawValue && typeof rawValue === 'object' && col.name.endsWith('_uri') && col.isRef) { 748 + rawValue = rawValue.uri 749 + } else if (col.originalName.endsWith('__cid') && record[col.originalName.replace('__cid', '')]) { 750 + rawValue = record[col.originalName.replace('__cid', '')].cid 751 + } 752 + return coerceValue(col.sqlType, rawValue) 753 + } 749 754 750 - const branchAppender = await con.createAppender(branchStagingTable) 755 + /** Extract a raw column value from a data object and coerce for bulk insert */ 756 + function resolveRawColumnValue(col: { originalName: string; sqlType: string }, data: Record<string, any>): unknown { 757 + return coerceValue(col.sqlType, data[col.originalName]) 758 + } 751 759 752 - for (const rec of recs) { 753 - const unionValue = rec.record[union.fieldName] 754 - if (!unionValue || typeof unionValue !== 'object') continue 755 - if (unionValue.$type !== branch.type) continue 756 - 757 - if (branch.isArray && branch.arrayField) { 758 - const items = resolveBranchData(unionValue, branch)[branch.arrayField] 759 - if (!Array.isArray(items)) continue 760 - for (const item of items) { 761 - try { 762 - branchAppender.appendVarchar(rec.uri) 763 - branchAppender.appendVarchar(rec.did) 764 - for (const col of branch.columns) { 765 - const rawValue = item[col.originalName] 766 - if (rawValue === undefined || rawValue === null) { 767 - branchAppender.appendNull() 768 - } else if (col.duckdbType === 'JSON') { 769 - branchAppender.appendVarchar(JSON.stringify(rawValue)) 770 - } else if (col.duckdbType === 'INTEGER') { 771 - branchAppender.appendInteger(typeof rawValue === 'number' ? rawValue : parseInt(rawValue)) 772 - } else if (col.duckdbType === 'BOOLEAN') { 773 - branchAppender.appendBoolean(!!rawValue) 774 - } else { 775 - branchAppender.appendVarchar(String(rawValue)) 776 - } 777 - } 778 - branchAppender.endRow() 779 - } catch { 780 - // Skip bad items 781 - } 782 - } 783 - } else { 784 - try { 785 - const branchData = resolveBranchData(unionValue, branch) 786 - branchAppender.appendVarchar(rec.uri) 787 - branchAppender.appendVarchar(rec.did) 788 - for (const col of branch.columns) { 789 - const rawValue = branchData[col.originalName] 790 - if (rawValue === undefined || rawValue === null) { 791 - branchAppender.appendNull() 792 - } else if (col.duckdbType === 'JSON') { 793 - branchAppender.appendVarchar(JSON.stringify(rawValue)) 794 - } else if (col.duckdbType === 'INTEGER') { 795 - branchAppender.appendInteger(typeof rawValue === 'number' ? rawValue : parseInt(rawValue)) 796 - } else if (col.duckdbType === 'BOOLEAN') { 797 - branchAppender.appendBoolean(!!rawValue) 798 - } else { 799 - branchAppender.appendVarchar(String(rawValue)) 800 - } 801 - } 802 - branchAppender.endRow() 803 - } catch { 804 - // Skip bad records 805 - } 806 - } 807 - } 808 - 809 - branchAppender.flushSync() 810 - branchAppender.closeSync() 811 - 812 - // Delete existing branch rows for these URIs, then merge staging 813 - const uriPlaceholders = recs.map((_, i) => `$${i + 1}`).join(',') 814 - const delStmt = await con.prepare(`DELETE FROM ${branch.tableName} WHERE parent_uri IN (${uriPlaceholders})`) 815 - bindParams( 816 - delStmt, 817 - recs.map((r) => r.uri), 818 - ) 819 - await delStmt.run() 820 - 821 - const branchSelectCols = branchAllCols.map((name) => { 822 - const col = branch.columns.find((c) => c.name === name) 823 - if (col && col.duckdbType === 'TIMESTAMP') return `TRY_CAST(${name} AS TIMESTAMP) AS ${name}` 824 - return name 825 - }) 826 - await con.run( 827 - `INSERT INTO ${branch.tableName} (${branchAllCols.join(', ')}) SELECT ${branchSelectCols.join(', ')} FROM ${branchStagingTable} WHERE parent_uri IS NOT NULL`, 828 - ) 829 - await con.run(`DROP TABLE ${branchStagingTable}`) 830 - } 831 - } 832 - }) 760 + /** Coerce a value to the appropriate type for insertion */ 761 + function coerceValue(sqlType: string, rawValue: any): unknown { 762 + if (rawValue === undefined || rawValue === null) return null 763 + // Objects and arrays always need JSON stringification regardless of sqlType 764 + // (on SQLite, JSON columns map to TEXT but still need stringification) 765 + if (typeof rawValue === 'object' && !(rawValue instanceof Uint8Array)) { 766 + return JSON.stringify(rawValue) 767 + } 768 + if (sqlType === 'JSON' || sqlType === 'TEXT') { 769 + return String(rawValue) 770 + } 771 + if (sqlType === 'INTEGER' || sqlType === 'BIGINT') { 772 + return typeof rawValue === 'number' ? rawValue : parseInt(rawValue) 833 773 } 834 - 835 - return inserted 774 + if (sqlType === 'BOOLEAN') return !!rawValue 775 + return String(rawValue) 836 776 } 837 777 838 778 interface QueryOpts { ··· 1030 970 1031 971 const elapsed = timer() 1032 972 const { limit = 20, cursor, fuzzy = true } = opts 1033 - const textCols = schema.columns.filter((c) => c.duckdbType === 'TEXT') 973 + const textCols = schema.columns.filter((c) => c.sqlType === 'TEXT') 1034 974 1035 975 // Also check if FTS has indexed any columns (including derived JSON columns) 1036 976 const ftsSearchCols = getSearchColumns(collection) ··· 1045 985 const phaseErrors: string[] = [] 1046 986 const phasesUsed: string[] = [] 1047 987 1048 - // Phase 1: BM25 ranked search on FTS shadow table 988 + // Phase 1: BM25 ranked search via SearchPort 1049 989 let bm25Results: any[] = [] 1050 - try { 1051 - let paramIdx = 1 1052 - 990 + const sp = getSearchPort() 991 + if (sp) try { 1053 992 const ftsQuery = stripStopWords(query) 1054 - const isMultiWord = ftsQuery.split(/\s+/).length > 1 1055 - const conjunctiveFlag = isMultiWord ? ', conjunctive := 1' : '' 1056 - let sql = `SELECT m.*, ${ftsSchema}.match_bm25(s.uri, $${paramIdx++}${conjunctiveFlag}) AS score 1057 - FROM ${safeName} s 1058 - JOIN ${schema.tableName} m ON m.uri = s.uri 1059 - LEFT JOIN _repos r ON m.did = r.did 1060 - WHERE score IS NOT NULL 1061 - AND (r.status IS NULL OR r.status != 'takendown')` 993 + const ftsSearchColNames = getSearchColumns(collection) 1062 994 1063 - const params: any[] = [ftsQuery] 1064 - 1065 - if (cursor) { 1066 - const parsed = unpackCursor(cursor) 1067 - if (parsed) { 1068 - const pScore1 = `$${paramIdx++}` 1069 - const pScore2 = `$${paramIdx++}` 1070 - const pCid = `$${paramIdx++}` 1071 - sql += ` AND (score > ${pScore1} OR (score = ${pScore2} AND m.cid < ${pCid}))` 1072 - params.push(parsed.primary, parsed.primary, parsed.cid) 1073 - } 1074 - } 995 + // Get ranked URIs from the search port 996 + const hits = await sp.search(safeName, ftsQuery, ftsSearchColNames, limit + 1, 0) 997 + if (hits.length > 0) { 998 + const uriList = hits.map((h) => h.uri) 999 + const scoreMap = new Map(hits.map((h) => [h.uri, h.score])) 1075 1000 1076 - sql += ` ORDER BY score, m.cid DESC LIMIT $${paramIdx++}` 1077 - params.push(limit + 1) 1001 + // Fetch full records for matched URIs 1002 + const placeholders = uriList.map((_, i) => `$${i + 1}`).join(', ') 1003 + const rows = await all( 1004 + `SELECT m.* FROM ${schema.tableName} m 1005 + LEFT JOIN _repos r ON m.did = r.did 1006 + WHERE m.uri IN (${placeholders}) 1007 + AND (r.status IS NULL OR r.status != 'takendown')`, 1008 + ...uriList, 1009 + ) 1078 1010 1079 - bm25Results = await all(sql, ...params) 1011 + // Re-attach scores and sort 1012 + bm25Results = rows 1013 + .map((r: any) => ({ ...r, score: scoreMap.get(r.uri) ?? 0 })) 1014 + .sort((a: any, b: any) => b.score - a.score) 1015 + } 1080 1016 phasesUsed.push('bm25') 1081 1017 } catch (err: any) { 1082 1018 phaseErrors.push(`bm25: ${err.message}`) ··· 1095 1031 const ilikeConds: string[] = [] 1096 1032 const params: any[] = [] 1097 1033 1098 - // TEXT columns — direct ILIKE 1034 + // TEXT columns — direct ILIKE/LIKE 1099 1035 for (const c of textCols) { 1100 - ilikeConds.push(`t.${c.name} ILIKE $${paramIdx++}`) 1036 + ilikeConds.push(`t.${c.name} ${dialect.ilike} $${paramIdx++}`) 1101 1037 params.push(searchParam) 1102 1038 } 1103 1039 1104 - // JSON columns — cast to text then ILIKE 1105 - const jsonCols = schema.columns.filter((c) => c.duckdbType === 'JSON') 1040 + // JSON columns — cast to text then ILIKE/LIKE 1041 + const jsonCols = schema.columns.filter((c) => c.sqlType === 'JSON' || c.sqlType === 'TEXT') 1106 1042 for (const c of jsonCols) { 1107 - ilikeConds.push(`CAST(t.${c.name} AS TEXT) ILIKE $${paramIdx++}`) 1043 + if (textCols.some((tc) => tc.name === c.name)) continue // skip already-added TEXT cols 1044 + ilikeConds.push(`CAST(t.${c.name} AS TEXT) ${dialect.ilike} $${paramIdx++}`) 1108 1045 params.push(searchParam) 1109 1046 } 1110 1047 1111 1048 // Handle from _repos table 1112 - ilikeConds.push(`r.handle ILIKE $${paramIdx++}`) 1049 + ilikeConds.push(`r.handle ${dialect.ilike} $${paramIdx++}`) 1113 1050 params.push(searchParam) 1114 1051 1115 1052 if (ilikeConds.length > 0) { ··· 1148 1085 const remaining = limit - bm25Results.length 1149 1086 const searchParam = `%${query}%` 1150 1087 let paramIdx = 1 1151 - const ilikeParts = textCols.map((c) => `t.${c.name} ILIKE $${paramIdx++}`) 1152 - ilikeParts.push(`r.handle ILIKE $${paramIdx++}`) 1088 + const ilikeParts = textCols.map((c) => `t.${c.name} ${dialect.ilike} $${paramIdx++}`) 1089 + ilikeParts.push(`r.handle ${dialect.ilike} $${paramIdx++}`) 1153 1090 const ilikeConds = ilikeParts.join(' OR ') 1154 1091 const params: any[] = [...textCols.map(() => searchParam), searchParam] 1155 1092 ··· 1176 1113 } 1177 1114 1178 1115 // Phase 4: Fuzzy fallback for typo tolerance (if still under limit) 1116 + // Only available on dialects with jaro_winkler_similarity (DuckDB) 1179 1117 let fuzzyCount = 0 1180 - if (fuzzy && bm25Results.length < limit) { 1118 + if (fuzzy && dialect.jaroWinklerSimilarity && bm25Results.length < limit) { 1181 1119 const remaining = limit - bm25Results.length 1120 + const jwFn = dialect.jaroWinklerSimilarity 1182 1121 const simExprs = [ 1183 - ...textCols.map((c) => `jaro_winkler_similarity(lower(t.${c.name}), lower($1))`), 1184 - `jaro_winkler_similarity(lower(r.handle), lower($1))`, 1122 + ...textCols.map((c) => `${jwFn}(lower(t.${c.name}), lower($1))`), 1123 + `${jwFn}(lower(r.handle), lower($1))`, 1185 1124 ] 1186 1125 // Include child table TEXT columns via correlated subquery 1187 1126 for (const child of schema.children) { 1188 1127 for (const col of child.columns) { 1189 - if (col.duckdbType === 'TEXT') { 1128 + if (col.sqlType === 'TEXT') { 1190 1129 simExprs.push( 1191 - `COALESCE((SELECT MAX(jaro_winkler_similarity(lower(c.${col.name}), lower($1))) FROM ${child.tableName} c WHERE c.parent_uri = t.uri), 0)`, 1130 + `COALESCE((SELECT MAX(${jwFn}(lower(c.${col.name}), lower($1))) FROM ${child.tableName} c WHERE c.parent_uri = t.uri), 0)`, 1192 1131 ) 1193 1132 } 1194 1133 } 1195 1134 } 1196 - const greatestExpr = `GREATEST(${simExprs.join(', ')})` 1135 + const greatestExpr = dialect.greatest(simExprs) 1197 1136 const fuzzySQL = `SELECT t.*, ${greatestExpr} AS fuzzy_score 1198 1137 FROM ${schema.tableName} t LEFT JOIN _repos r ON t.did = r.did 1199 1138 WHERE ${greatestExpr} >= 0.8 ··· 1383 1322 if (schema) { 1384 1323 for (const col of schema.columns) { 1385 1324 nameMap.set(col.name, col.originalName) 1386 - if (col.duckdbType === 'JSON') jsonCols.add(col.name) 1325 + if (col.sqlType === 'JSON') jsonCols.add(col.name) 1387 1326 } 1388 1327 } 1389 1328 ··· 1496 1435 1497 1436 export async function searchAccounts(query: string, limit: number = 20): Promise<any[]> { 1498 1437 return all( 1499 - `SELECT did, handle, status FROM _repos WHERE did ILIKE $1 OR handle ILIKE $1 ORDER BY handle LIMIT $2`, 1438 + `SELECT did, handle, status FROM _repos WHERE did ${dialect.ilike} $1 OR handle ${dialect.ilike} $1 ORDER BY handle LIMIT $2`, 1500 1439 `%${query}%`, 1501 1440 limit, 1502 1441 )
+1 -1
packages/hatk/src/feeds.ts
··· 1 1 import { resolve } from 'node:path' 2 2 import { readdirSync } from 'node:fs' 3 3 import { log } from './logger.ts' 4 - import { querySQL, packCursor, unpackCursor, isTakendownDid, filterTakendownDids } from './db.ts' 4 + import { querySQL, packCursor, unpackCursor, isTakendownDid, filterTakendownDids } from './database/db.ts' 5 5 import { resolveRecords, buildHydrateContext } from './hydrate.ts' 6 6 import type { HydrateContext, Row } from './hydrate.ts' 7 7 import type { Checked } from './lex-types.ts'
+39 -30
packages/hatk/src/fts.ts packages/hatk/src/database/fts.ts
··· 1 - import { getSchema, runSQL } from './db.ts' 1 + import { getSchema, runSQL, getSqlDialect } from './db.ts' 2 2 import { getLexicon } from './schema.ts' 3 - import { emit, timer } from './logger.ts' 3 + import { emit, timer } from '../logger.ts' 4 + import type { SearchPort } from './ports.ts' 4 5 5 6 interface SearchColumn { 6 7 expr: string // SQL expression for the shadow table SELECT ··· 21 22 * Given a JSON column and its lexicon property definition, produce 22 23 * search column expressions that extract searchable text. 23 24 */ 24 - function jsonSearchColumns(colName: string, prop: any, lexicon: any): SearchColumn[] { 25 + function jsonSearchColumns(colName: string, prop: any, lexicon: any, dialect: import('./dialect.ts').SqlDialect): SearchColumn[] { 25 26 const columns: SearchColumn[] = [] 26 27 // Strip table qualifier (e.g. "t.artists" → "artists") for use in aliases 27 28 const aliasBase = colName.includes('.') ? colName.split('.').pop()! : colName ··· 34 35 if (itemDef.type === 'string') { 35 36 // array of strings — join into one text column 36 37 columns.push({ 37 - expr: `list_string_agg(json_extract_string(${colName}, '$[*]'))`, 38 + expr: dialect.jsonArrayStringAgg(colName, '$[*]'), 38 39 alias: `${aliasBase}_text`, 39 40 }) 40 41 } else if (itemDef.type === 'object' && itemDef.properties) { ··· 42 43 for (const [field, fieldProp] of Object.entries(itemDef.properties as Record<string, any>)) { 43 44 if (fieldProp.type === 'string') { 44 45 columns.push({ 45 - expr: `list_string_agg(json_extract_string(${colName}, '$[*].${field}'))`, 46 + expr: dialect.jsonArrayStringAgg(colName, `$[*].${field}`), 46 47 alias: `${aliasBase}_${field}`, 47 48 }) 48 49 } ··· 53 54 for (const [field, fieldProp] of Object.entries(prop.properties as Record<string, any>)) { 54 55 if ((fieldProp as any).type === 'string') { 55 56 columns.push({ 56 - expr: `json_extract_string(${colName}, '$.${field}')`, 57 + expr: dialect.jsonExtractString(colName, `$.${field}`), 57 58 alias: `${aliasBase}_${field}`, 58 59 }) 59 60 } ··· 64 65 return columns 65 66 } 66 67 68 + let searchPort: SearchPort | null = null 69 + 70 + export function setSearchPort(port: SearchPort | null): void { 71 + searchPort = port 72 + } 73 + 74 + export function hasSearchPort(): boolean { 75 + return searchPort !== null 76 + } 77 + 78 + export function getSearchPort(): SearchPort | null { 79 + return searchPort 80 + } 81 + 67 82 // Tracks when each collection's FTS index was last rebuilt 68 83 const lastRebuiltAt = new Map<string, string>() 69 84 ··· 92 107 * using Porter stemmer with English stopwords. 93 108 */ 94 109 export async function buildFtsIndex(collection: string): Promise<void> { 110 + if (!searchPort) return // No FTS support for this adapter 111 + 95 112 const schema = getSchema(collection) 96 113 if (!schema) throw new Error(`Unknown collection: ${collection}`) 97 114 ··· 99 116 const record = lexicon?.defs?.main?.record 100 117 101 118 // Build column list for shadow table 119 + const dialect = getSqlDialect() 102 120 const selectExprs: string[] = ['t.uri', 't.cid', 't.did', 't.indexed_at'] 103 121 const searchColNames: string[] = [] 104 122 105 123 for (const col of schema.columns) { 106 - if (col.duckdbType === 'TEXT') { 124 + if (col.sqlType === 'TEXT') { 107 125 selectExprs.push(`t.${col.name}`) 108 126 searchColNames.push(col.name) 109 - } else if (col.duckdbType === 'JSON' && record?.properties) { 127 + } else if ((col.sqlType === 'JSON' || col.sqlType === 'TEXT') && record?.properties) { 110 128 const prop = record.properties[col.originalName] 111 129 if (prop?.type === 'blob') continue // skip blobs 112 130 if (prop && lexicon) { 113 - const derived = jsonSearchColumns(`t.${col.name}`, prop, lexicon) 131 + const derived = jsonSearchColumns(`t.${col.name}`, prop, lexicon, dialect) 114 132 if (derived.length > 0) { 115 133 for (const d of derived) { 116 134 selectExprs.push(`${d.expr} AS ${d.alias}`) ··· 128 146 // Include searchable text from child tables (decomposed array fields) 129 147 for (const child of schema.children) { 130 148 for (const col of child.columns) { 131 - if (col.duckdbType === 'TEXT') { 149 + if (col.sqlType === 'TEXT') { 132 150 const alias = `${child.fieldName}_${col.name}` 151 + const agg = dialect.stringAgg(`c.${col.name}`, "' '") 133 152 selectExprs.push( 134 - `(SELECT string_agg(c.${col.name}, ' ') FROM ${child.tableName} c WHERE c.parent_uri = t.uri) AS ${alias}`, 153 + `(SELECT ${agg} FROM ${child.tableName} c WHERE c.parent_uri = t.uri) AS ${alias}`, 135 154 ) 136 155 searchColNames.push(alias) 137 156 } ··· 142 161 for (const union of schema.unions) { 143 162 for (const branch of union.branches) { 144 163 for (const col of branch.columns) { 145 - if (col.duckdbType === 'TEXT') { 164 + if (col.sqlType === 'TEXT') { 146 165 const alias = `${union.fieldName}_${branch.branchName}_${col.name}` 166 + const agg = dialect.stringAgg(`c.${col.name}`, "' '") 147 167 selectExprs.push( 148 - `(SELECT string_agg(c.${col.name}, ' ') FROM ${branch.tableName} c WHERE c.parent_uri = t.uri) AS ${alias}`, 168 + `(SELECT ${agg} FROM ${branch.tableName} c WHERE c.parent_uri = t.uri) AS ${alias}`, 149 169 ) 150 170 searchColNames.push(alias) 151 171 } ··· 162 182 } 163 183 164 184 const safeName = ftsTableName(collection) 185 + const sourceQuery = `SELECT ${selectExprs.join(', ')} FROM ${schema.tableName} t LEFT JOIN _repos r ON t.did = r.did` 165 186 166 - // Build shadow table with derived text columns, joining _repos for handle 167 - await runSQL( 168 - `CREATE OR REPLACE TABLE ${safeName} AS SELECT ${selectExprs.join(', ')} FROM ${schema.tableName} t LEFT JOIN _repos r ON t.did = r.did`, 169 - ) 170 - 171 - // Drop existing index (ignore error if none exists) 172 - try { 173 - await runSQL(`PRAGMA drop_fts_index('${safeName}')`) 174 - } catch {} 175 - 176 - // Build FTS index over all search columns 177 - const colList = searchColNames.map((c) => `'${c}'`).join(', ') 178 - await runSQL( 179 - `PRAGMA create_fts_index('${safeName}', 'uri', ${colList}, stemmer='porter', stopwords='english', strip_accents=1, lower=1, overwrite=1)`, 180 - ) 187 + await searchPort.buildIndex(safeName, sourceQuery, searchColNames) 181 188 182 189 searchColumnCache.set(collection, searchColNames) 183 190 lastRebuiltAt.set(collection, new Date().toISOString()) ··· 785 792 } 786 793 } 787 794 788 - // Compact WAL to free DuckDB memory after heavy FTS operations 795 + // Compact WAL to free memory after heavy FTS operations (DuckDB only) 789 796 try { 790 - await runSQL('CHECKPOINT') 797 + const { getSqlDialect } = await import('./db.ts') 798 + const d = getSqlDialect() 799 + if (d.checkpointSQL) await runSQL(d.checkpointSQL) 791 800 } catch {} 792 801 793 802 emit('fts', 'rebuild', {
+1 -1
packages/hatk/src/hooks.ts
··· 22 22 import { existsSync } from 'node:fs' 23 23 import { resolve } from 'node:path' 24 24 import { log } from './logger.ts' 25 - import { setRepoStatus } from './db.ts' 25 + import { setRepoStatus } from './database/db.ts' 26 26 import { triggerAutoBackfill } from './indexer.ts' 27 27 28 28 /** Context passed to the on-login hook after a successful OAuth login. */
+1 -1
packages/hatk/src/hydrate.ts
··· 6 6 reshapeRow, 7 7 queryLabelsForUris, 8 8 filterTakendownDids, 9 - } from './db.ts' 9 + } from './database/db.ts' 10 10 import { blobUrl } from './xrpc.ts' 11 11 import type { Row } from './lex-types.ts' 12 12
+3 -3
packages/hatk/src/indexer.ts
··· 1 1 import { cborDecode } from './cbor.ts' 2 2 import { parseCarFrame } from './car.ts' 3 - import { insertRecord, deleteRecord, setCursor, setRepoStatus, getRepoRetryInfo, listAllRepoStatuses } from './db.ts' 3 + import { insertRecord, deleteRecord, setCursor, setRepoStatus, getRepoRetryInfo, listAllRepoStatuses } from './database/db.ts' 4 4 import { backfillRepo } from './backfill.ts' 5 - import { rebuildAllIndexes } from './fts.ts' 5 + import { rebuildAllIndexes } from './database/fts.ts' 6 6 import { log, emit, timer } from './logger.ts' 7 7 import { runLabelRules } from './labels.ts' 8 - import { getLexiconArray } from './schema.ts' 8 + import { getLexiconArray } from './database/schema.ts' 9 9 import { validateRecord } from '@bigmoves/lexicon' 10 10 11 11 /** A record pending insertion, buffered to enable batched writes. */
+2 -2
packages/hatk/src/labels.ts
··· 29 29 import { resolve } from 'node:path' 30 30 import { readdirSync } from 'node:fs' 31 31 import type { LabelDefinition } from './config.ts' 32 - import { querySQL, runSQL, insertLabels, getSchema } from './db.ts' 32 + import { querySQL, runSQL, insertLabels, getSchema } from './database/db.ts' 33 33 import { log, emit } from './logger.ts' 34 34 35 35 /** Context passed to label rule evaluate() functions */ ··· 163 163 for (const col of schema.columns) { 164 164 let v = row[col.name] 165 165 if (v === null || v === undefined) continue 166 - if (col.duckdbType === 'JSON' && typeof v === 'string') { 166 + if (col.sqlType === 'JSON' && typeof v === 'string') { 167 167 try { 168 168 v = JSON.parse(v) 169 169 } catch {}
+27 -11
packages/hatk/src/main.ts
··· 1 1 #!/usr/bin/env node 2 - import { mkdirSync } from 'node:fs' 2 + import { mkdirSync, writeFileSync } from 'node:fs' 3 3 import { dirname, resolve } from 'node:path' 4 4 import { log } from './logger.ts' 5 5 import { loadConfig } from './config.ts' ··· 8 8 storeLexicons, 9 9 discoverCollections, 10 10 buildSchemas, 11 - } from './schema.ts' 11 + } from './database/schema.ts' 12 12 import { discoverViews } from './views.ts' 13 - import { initDatabase, getCursor, querySQL } from './db.ts' 13 + import { initDatabase, getCursor, querySQL, getSqlDialect, getSchemaDump } from './database/db.ts' 14 + import { createAdapter } from './database/adapter-factory.ts' 15 + import { getDialect } from './database/dialect.ts' 16 + import { setSearchPort } from './database/fts.ts' 14 17 import { initFeeds, listFeeds } from './feeds.ts' 15 18 import { initXrpc, listXrpc, configureRelay } from './xrpc.ts' 16 19 import { initOpengraph } from './opengraph.ts' 17 20 import { initLabels, getLabelDefinitions } from './labels.ts' 18 21 import { startIndexer } from './indexer.ts' 19 - import { rebuildAllIndexes } from './fts.ts' 22 + import { rebuildAllIndexes } from './database/fts.ts' 20 23 import { startServer } from './server.ts' 21 24 import { validateLexicons } from '@bigmoves/lexicon' 22 25 import { relayHttpUrl } from './config.ts' ··· 67 70 discoverViews() 68 71 await loadOnLoginHook(resolve(configDir, 'hooks')) 69 72 70 - const { schemas, ddlStatements } = buildSchemas(lexicons, collections) 73 + const engineDialect = getDialect(config.databaseEngine) 74 + const { schemas, ddlStatements } = buildSchemas(lexicons, collections, engineDialect) 71 75 for (const s of schemas) { 72 76 if (s.columns.length === 0) { 73 77 log(`[main] No lexicon found for ${s.collection}, using generic JSON storage`) ··· 76 80 } 77 81 } 78 82 79 - // 3. Ensure data directory exists and initialize DuckDB 83 + // 3. Ensure data directory exists and initialize database 80 84 if (config.database !== ':memory:') { 81 85 mkdirSync(dirname(config.database), { recursive: true }) 82 86 } 83 - await initDatabase(config.database, schemas, ddlStatements) 87 + const { adapter, searchPort } = await createAdapter(config.databaseEngine) 88 + setSearchPort(searchPort) 89 + await initDatabase(adapter, config.database, schemas, ddlStatements) 84 90 logMemory('after-db-init') 85 - log(`[main] DuckDB initialized (${config.database === ':memory:' ? 'in-memory' : config.database})`) 91 + log(`[main] Database initialized (${config.databaseEngine}, ${config.database === ':memory:' ? 'in-memory' : config.database})`) 92 + 93 + // Write db/schema.sql 94 + try { 95 + const schemaDir = resolve(configDir, 'db') 96 + mkdirSync(schemaDir, { recursive: true }) 97 + const schemaDump = await getSchemaDump() 98 + writeFileSync( 99 + resolve(schemaDir, 'schema.sql'), 100 + `-- This file is auto-generated by hatk on startup. Do not edit.\n-- Database engine: ${config.databaseEngine}\n\n${schemaDump}\n`, 101 + ) 102 + log(`[main] Schema written to db/schema.sql`) 103 + } catch {} 86 104 87 105 // 3b. Run setup hooks (after DB init, before server) 88 106 await initSetup(resolve(configDir, 'setup')) 89 107 90 108 // Detect orphaned tables 91 109 try { 92 - const existingTables = await querySQL( 93 - `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main' AND table_name NOT LIKE '\\_%' ESCAPE '\\'`, 94 - ) 110 + const existingTables = await querySQL(getSqlDialect().listTablesQuery) 95 111 for (const row of existingTables) { 96 112 const tableName = row.table_name 97 113 const isChildTable = collections.some((c) => tableName.startsWith(c + '__'))
+1 -1
packages/hatk/src/oauth/db.ts
··· 1 1 // packages/hatk/src/oauth/db.ts 2 2 3 - import { querySQL, runSQL } from '../db.ts' 3 + import { querySQL, runSQL } from '../database/db.ts' 4 4 5 5 // --- DDL --- 6 6
+3 -3
packages/hatk/src/oauth/server.ts
··· 32 32 revokeRefreshToken, 33 33 } from './db.ts' 34 34 import { emit } from '../logger.ts' 35 - import { querySQL } from '../db.ts' 35 + import { querySQL } from '../database/db.ts' 36 36 import { fireOnLoginHook } from '../hooks.ts' 37 37 38 38 const SERVER_KEY_KID = 'appview-oauth-key' ··· 328 328 iss: string | null, 329 329 ): Promise<{ requestUri: string; clientRedirectUri: string; clientState: string | null }> { 330 330 // Find the matching OAuth request by pds_state (unique per PAR) 331 - const { querySQL } = await import('../db.ts') 331 + const { querySQL } = await import('../database/db.ts') 332 332 let request: any = null 333 333 334 334 if (state) { ··· 437 437 438 438 // Update the request with the DID (in case it wasn't set during PAR) 439 439 if (!request.did && did) { 440 - const { runSQL } = await import('../db.ts') 440 + const { runSQL } = await import('../database/db.ts') 441 441 await runSQL('UPDATE _oauth_requests SET did = $1 WHERE request_uri = $2', did, request.request_uri) 442 442 } 443 443
+1 -1
packages/hatk/src/opengraph.ts
··· 15 15 lookupByFieldBatch, 16 16 countByFieldBatch, 17 17 queryLabelsForUris, 18 - } from './db.ts' 18 + } from './database/db.ts' 19 19 import { resolveRecords } from './hydrate.ts' 20 20 import { blobUrl } from './xrpc.ts' 21 21 import type { XrpcContext } from './xrpc.ts'
+48 -44
packages/hatk/src/schema.ts packages/hatk/src/database/schema.ts
··· 1 1 import { readFileSync, readdirSync, statSync } from 'node:fs' 2 2 import { join } from 'node:path' 3 + import type { SqlDialect } from './dialect.ts' 4 + import { DUCKDB_DIALECT } from './dialect.ts' 3 5 4 6 export interface ColumnDef { 5 7 name: string // snake_case column name 6 8 originalName: string // camelCase lexicon field name 7 - duckdbType: string // DuckDB type 9 + sqlType: string // DuckDB type 8 10 notNull: boolean 9 11 isRef: boolean // true if this column holds an AT URI referencing another record 10 12 } ··· 45 47 return str.replace(/([A-Z])/g, '_$1').toLowerCase() 46 48 } 47 49 48 - // Map lexicon property type to DuckDB type 50 + // Map lexicon property type to SQL type using dialect config 49 51 interface TypeMapping { 50 - duckdbType: string 52 + sqlType: string 51 53 isRef: boolean 52 54 } 53 55 54 - function mapType(prop: any): TypeMapping { 56 + function mapType(prop: any, dialect: SqlDialect): TypeMapping { 55 57 if (prop.type === 'string') { 56 - if (prop.format === 'datetime') return { duckdbType: 'TIMESTAMP', isRef: false } 57 - if (prop.format === 'at-uri') return { duckdbType: 'TEXT', isRef: true } 58 - return { duckdbType: 'TEXT', isRef: false } 58 + if (prop.format === 'datetime') return { sqlType: dialect.typeMap.timestamp, isRef: false } 59 + if (prop.format === 'at-uri') return { sqlType: dialect.typeMap.text, isRef: true } 60 + return { sqlType: dialect.typeMap.text, isRef: false } 59 61 } 60 - if (prop.type === 'integer') return { duckdbType: 'INTEGER', isRef: false } 61 - if (prop.type === 'boolean') return { duckdbType: 'BOOLEAN', isRef: false } 62 - if (prop.type === 'bytes') return { duckdbType: 'BLOB', isRef: false } 63 - if (prop.type === 'cid-link') return { duckdbType: 'TEXT', isRef: false } 64 - if (prop.type === 'array') return { duckdbType: 'JSON', isRef: false } 65 - if (prop.type === 'blob') return { duckdbType: 'JSON', isRef: false } 66 - if (prop.type === 'union') return { duckdbType: 'JSON', isRef: false } 67 - if (prop.type === 'unknown') return { duckdbType: 'JSON', isRef: false } 68 - if (prop.type === 'object') return { duckdbType: 'JSON', isRef: false } 62 + if (prop.type === 'integer') return { sqlType: dialect.typeMap.integer, isRef: false } 63 + if (prop.type === 'boolean') return { sqlType: dialect.typeMap.boolean, isRef: false } 64 + if (prop.type === 'bytes') return { sqlType: dialect.typeMap.blob, isRef: false } 65 + if (prop.type === 'cid-link') return { sqlType: dialect.typeMap.text, isRef: false } 66 + if (prop.type === 'array') return { sqlType: dialect.jsonType, isRef: false } 67 + if (prop.type === 'blob') return { sqlType: dialect.jsonType, isRef: false } 68 + if (prop.type === 'union') return { sqlType: dialect.jsonType, isRef: false } 69 + if (prop.type === 'unknown') return { sqlType: dialect.jsonType, isRef: false } 70 + if (prop.type === 'object') return { sqlType: dialect.jsonType, isRef: false } 69 71 if (prop.type === 'ref') { 70 72 // strongRef contains { uri, cid } — handled specially in generateTableSchema 71 - if (prop.ref === 'com.atproto.repo.strongRef') return { duckdbType: 'STRONG_REF', isRef: true } 72 - return { duckdbType: 'JSON', isRef: false } 73 + if (prop.ref === 'com.atproto.repo.strongRef') return { sqlType: 'STRONG_REF', isRef: true } 74 + return { sqlType: dialect.jsonType, isRef: false } 73 75 } 74 - return { duckdbType: 'TEXT', isRef: false } 76 + return { sqlType: dialect.typeMap.text, isRef: false } 75 77 } 76 78 77 79 // Recursively find all .json files in a directory ··· 173 175 collection: string, 174 176 fieldName: string, 175 177 defs: Record<string, any>, 176 - lexicons?: Map<string, any>, 178 + lexicons: Map<string, any> | undefined, 179 + dialect: SqlDialect, 177 180 ): UnionBranchSchema | null { 178 181 let branchDef: any = null 179 182 let branchName: string ··· 236 239 237 240 const columns: ColumnDef[] = [] 238 241 for (const [propName, prop] of Object.entries(propSource)) { 239 - const { duckdbType, isRef } = mapType(prop as any) 242 + const { sqlType, isRef } = mapType(prop as any, dialect) 240 243 // Skip STRONG_REF expansion in branch tables — treat as JSON 241 - const finalType = duckdbType === 'STRONG_REF' ? 'JSON' : duckdbType 244 + const finalType = sqlType === 'STRONG_REF' ? dialect.jsonType : sqlType 242 245 columns.push({ 243 246 name: toSnakeCase(propName), 244 247 originalName: propName, 245 - duckdbType: finalType, 248 + sqlType: finalType, 246 249 notNull: branchRequired.has(propName), 247 250 isRef: finalType !== 'JSON' && isRef, 248 251 }) ··· 252 255 } 253 256 254 257 // Generate a TableSchema from a lexicon record definition 255 - export function generateTableSchema(nsid: string, lexicon: any, lexicons?: Map<string, any>): TableSchema { 258 + export function generateTableSchema(nsid: string, lexicon: any, lexicons?: Map<string, any>, dialect: SqlDialect = DUCKDB_DIALECT): TableSchema { 256 259 const mainDef = lexicon.defs?.main 257 260 if (!mainDef || mainDef.type !== 'record') { 258 261 throw new Error(`Lexicon ${nsid} does not define a record type`) ··· 275 278 if (p.type === 'union' && p.refs) { 276 279 const branches: UnionBranchSchema[] = [] 277 280 for (const ref of p.refs) { 278 - const branch = resolveUnionBranch(ref, nsid, fieldName, lexicon.defs, lexicons) 281 + const branch = resolveUnionBranch(ref, nsid, fieldName, lexicon.defs, lexicons, dialect) 279 282 if (branch) branches.push(branch) 280 283 } 281 284 if (branches.length > 0) { ··· 285 288 columns.push({ 286 289 name: toSnakeCase(fieldName), 287 290 originalName: fieldName, 288 - duckdbType: 'JSON', 291 + sqlType: dialect.jsonType, 289 292 notNull: required.has(fieldName), 290 293 isRef: false, 291 294 }) ··· 299 302 const childColumns: ColumnDef[] = [] 300 303 const itemRequired = new Set(p.items?.required || lexicon.defs?.[p.items?.ref?.slice(1)]?.required || []) 301 304 for (const [itemField, itemProp] of Object.entries(itemProps)) { 302 - const { duckdbType, isRef } = mapType(itemProp as any) 305 + const { sqlType, isRef } = mapType(itemProp as any, dialect) 303 306 childColumns.push({ 304 307 name: toSnakeCase(itemField), 305 308 originalName: itemField, 306 - duckdbType, 309 + sqlType, 307 310 notNull: itemRequired.has(itemField), 308 311 isRef, 309 312 }) ··· 319 322 } 320 323 } 321 324 322 - const { duckdbType, isRef } = mapType(p) 325 + const { sqlType, isRef } = mapType(p, dialect) 323 326 324 - if (duckdbType === 'STRONG_REF') { 327 + if (sqlType === 'STRONG_REF') { 325 328 // Expand strongRef into two columns: {name}_uri and {name}_cid 326 329 columns.push({ 327 330 name: toSnakeCase(fieldName) + '_uri', 328 331 originalName: fieldName, 329 - duckdbType: 'TEXT', 332 + sqlType: dialect.typeMap.text, 330 333 notNull: required.has(fieldName), 331 334 isRef: true, 332 335 }) 333 336 columns.push({ 334 337 name: toSnakeCase(fieldName) + '_cid', 335 338 originalName: fieldName + '__cid', 336 - duckdbType: 'TEXT', 339 + sqlType: dialect.typeMap.text, 337 340 notNull: required.has(fieldName), 338 341 isRef: false, 339 342 }) ··· 341 344 columns.push({ 342 345 name: toSnakeCase(fieldName), 343 346 originalName: fieldName, 344 - duckdbType, 347 + sqlType, 345 348 notNull: required.has(fieldName), 346 349 isRef, 347 350 }) ··· 361 364 } 362 365 363 366 // Generate CREATE TABLE SQL from a TableSchema 364 - export function generateCreateTableSQL(schema: TableSchema): string { 367 + export function generateCreateTableSQL(schema: TableSchema, dialect: SqlDialect = DUCKDB_DIALECT): string { 365 368 const lines: string[] = [ 366 369 ' uri TEXT PRIMARY KEY', 367 370 ' cid TEXT', 368 371 ' did TEXT NOT NULL', 369 - ' indexed_at TIMESTAMP NOT NULL', 372 + ` indexed_at ${dialect.timestampType} NOT NULL`, 370 373 ] 371 374 372 375 for (const col of schema.columns) { 373 376 const nullable = col.notNull ? ' NOT NULL' : '' 374 - lines.push(` ${col.name} ${col.duckdbType}${nullable}`) 377 + lines.push(` ${col.name} ${col.sqlType}${nullable}`) 375 378 } 376 379 377 380 const createTable = `CREATE TABLE IF NOT EXISTS ${schema.tableName} (\n${lines.join(',\n')}\n);` ··· 393 396 const childLines: string[] = [' parent_uri TEXT NOT NULL', ' parent_did TEXT NOT NULL'] 394 397 for (const col of child.columns) { 395 398 const nullable = col.notNull ? ' NOT NULL' : '' 396 - childLines.push(` ${col.name} ${col.duckdbType}${nullable}`) 399 + childLines.push(` ${col.name} ${col.sqlType}${nullable}`) 397 400 } 398 401 childDDL.push(`CREATE TABLE IF NOT EXISTS ${child.tableName} (\n${childLines.join(',\n')}\n);`) 399 402 ··· 402 405 childDDL.push(`CREATE INDEX IF NOT EXISTS idx_${childPrefix}_did ON ${child.tableName}(parent_did);`) 403 406 404 407 for (const col of child.columns) { 405 - if (col.duckdbType === 'JSON' || col.duckdbType === 'BLOB') continue 408 + if (col.sqlType === 'JSON' || col.sqlType === 'BLOB') continue 406 409 childDDL.push(`CREATE INDEX IF NOT EXISTS idx_${childPrefix}_${col.name} ON ${child.tableName}(${col.name});`) 407 410 } 408 411 } ··· 413 416 const branchLines: string[] = [' parent_uri TEXT NOT NULL', ' parent_did TEXT NOT NULL'] 414 417 for (const col of branch.columns) { 415 418 const nullable = col.notNull ? ' NOT NULL' : '' 416 - branchLines.push(` ${col.name} ${col.duckdbType}${nullable}`) 419 + branchLines.push(` ${col.name} ${col.sqlType}${nullable}`) 417 420 } 418 421 childDDL.push(`CREATE TABLE IF NOT EXISTS ${branch.tableName} (\n${branchLines.join(',\n')}\n);`) 419 422 ··· 422 425 childDDL.push(`CREATE INDEX IF NOT EXISTS idx_${branchPrefix}_did ON ${branch.tableName}(parent_did);`) 423 426 424 427 for (const col of branch.columns) { 425 - if (col.duckdbType === 'JSON' || col.duckdbType === 'BLOB') continue 428 + if (col.sqlType === 'JSON' || col.sqlType === 'BLOB') continue 426 429 childDDL.push(`CREATE INDEX IF NOT EXISTS idx_${branchPrefix}_${col.name} ON ${branch.tableName}(${col.name});`) 427 430 } 428 431 } ··· 438 441 export function buildSchemas( 439 442 lexicons: Map<string, any>, 440 443 collections: string[], 444 + dialect: SqlDialect = DUCKDB_DIALECT, 441 445 ): { schemas: TableSchema[]; ddlStatements: string[] } { 442 446 const schemas: TableSchema[] = [] 443 447 const ddlStatements: string[] = [] ··· 449 453 uri TEXT PRIMARY KEY, 450 454 cid TEXT, 451 455 did TEXT NOT NULL, 452 - indexed_at TIMESTAMP NOT NULL, 453 - data JSON 456 + indexed_at ${dialect.timestampType} NOT NULL, 457 + data ${dialect.jsonType} 454 458 ); 455 459 CREATE INDEX IF NOT EXISTS idx_${nsid.replace(/\./g, '_')}_indexed ON "${nsid}"(indexed_at DESC); 456 460 CREATE INDEX IF NOT EXISTS idx_${nsid.replace(/\./g, '_')}_author ON "${nsid}"(did);` ··· 459 463 continue 460 464 } 461 465 462 - const schema = generateTableSchema(nsid, lexicon, lexicons) 466 + const schema = generateTableSchema(nsid, lexicon, lexicons, dialect) 463 467 schemas.push(schema) 464 - ddlStatements.push(generateCreateTableSQL(schema)) 468 + ddlStatements.push(generateCreateTableSQL(schema, dialect)) 465 469 } 466 470 467 471 return { schemas, ddlStatements }
+1 -1
packages/hatk/src/seed.ts
··· 23 23 * ) 24 24 * ``` 25 25 */ 26 - import { loadLexicons } from './schema.ts' 26 + import { loadLexicons } from './database/schema.ts' 27 27 import { validateRecord } from '@bigmoves/lexicon' 28 28 import { resolve } from 'node:path' 29 29 import { readFileSync } from 'node:fs'
+4 -4
packages/hatk/src/server.ts
··· 24 24 getSchemaDump, 25 25 getPreferences, 26 26 putPreference, 27 - } from './db.ts' 27 + } from './database/db.ts' 28 28 import { executeFeed, listFeeds } from './feeds.ts' 29 29 import { executeXrpc, InvalidRequestError } from './xrpc.ts' 30 - import { getLexiconArray } from './schema.ts' 30 + import { getLexiconArray } from './database/schema.ts' 31 31 import { validateRecord } from '@bigmoves/lexicon' 32 32 import { resolveRecords } from './hydrate.ts' 33 33 import { handleOpengraphRequest, buildOgMeta } from './opengraph.ts' ··· 264 264 columns: schema?.columns.map((col) => ({ 265 265 name: col.name, 266 266 originalName: col.originalName, 267 - type: col.duckdbType, 267 + type: col.sqlType, 268 268 required: col.notNull, 269 269 })), 270 270 } ··· 604 604 // GET /admin/schema — full DuckDB DDL dump + lexicons 605 605 if (url.pathname === '/admin/schema') { 606 606 if (!requireAdmin(viewer, res)) return 607 - const { getAllLexicons } = await import('./schema.ts') 607 + const { getAllLexicons } = await import('./database/schema.ts') 608 608 const ddl = await getSchemaDump() 609 609 jsonResponse(res, { ddl, lexicons: getAllLexicons() }) 610 610 return
+1 -1
packages/hatk/src/setup.ts
··· 25 25 import { resolve, relative } from 'node:path' 26 26 import { readdirSync, statSync } from 'node:fs' 27 27 import { log } from './logger.ts' 28 - import { querySQL, runSQL } from './db.ts' 28 + import { querySQL, runSQL } from './database/db.ts' 29 29 30 30 /** Context passed to each setup script's handler function. */ 31 31 export interface SetupContext {
+9 -5
packages/hatk/src/test.ts
··· 9 9 discoverCollections, 10 10 generateTableSchema, 11 11 generateCreateTableSQL, 12 - } from './schema.ts' 13 - import { initDatabase, querySQL, runSQL, insertRecord, closeDatabase } from './db.ts' 12 + } from './database/schema.ts' 13 + import { initDatabase, querySQL, runSQL, insertRecord, closeDatabase } from './database/db.ts' 14 + import { createAdapter } from './database/adapter-factory.ts' 15 + import { setSearchPort } from './database/fts.ts' 14 16 import { initFeeds, executeFeed, listFeeds, createPaginate } from './feeds.ts' 15 17 import { initXrpc, executeXrpc, listXrpc, configureRelay } from './xrpc.ts' 16 18 import { initOpengraph } from './opengraph.ts' ··· 18 20 import { discoverViews } from './views.ts' 19 21 import { loadOnLoginHook } from './hooks.ts' 20 22 import { validateLexicons } from '@bigmoves/lexicon' 21 - import { packCursor, unpackCursor, isTakendownDid, filterTakendownDids } from './db.ts' 23 + import { packCursor, unpackCursor, isTakendownDid, filterTakendownDids } from './database/db.ts' 22 24 import { seed as createSeedHelpers, type SeedOpts } from './seed.ts' 23 25 import type { FeedContext } from './feeds.ts' 24 26 ··· 99 101 ddlStatements.push(generateCreateTableSQL(schema)) 100 102 } 101 103 102 - // In-memory DuckDB 103 - await initDatabase(':memory:', schemas, ddlStatements) 104 + // In-memory database 105 + const { adapter, searchPort } = await createAdapter('duckdb') 106 + setSearchPort(searchPort) 107 + await initDatabase(adapter, ':memory:', schemas, ddlStatements) 104 108 105 109 // Discover views + hooks 106 110 discoverViews()
+1 -1
packages/hatk/src/views.ts
··· 4 4 // 2. Defs views: defined in a defs lexicon, associated by naming convention (e.g., profileView) 5 5 6 6 import { log } from './logger.ts' 7 - import { getAllLexicons, getLexicon } from './schema.ts' 7 + import { getAllLexicons, getLexicon } from './database/schema.ts' 8 8 9 9 // --- Types --- 10 10
+2 -2
packages/hatk/src/xrpc.ts
··· 33 33 lookupByFieldBatch, 34 34 countByFieldBatch, 35 35 queryLabelsForUris, 36 - } from './db.ts' 36 + } from './database/db.ts' 37 37 import { resolveRecords } from './hydrate.ts' 38 - import { getLexicon } from './schema.ts' 38 + import { getLexicon } from './database/schema.ts' 39 39 import type { Row, FlatRow } from './lex-types.ts' 40 40 41 41 export type { Row, FlatRow }