···11+# Database Scripts
22+33+This directory contains database schema and utility scripts for the ATlast migration.
44+55+## Files
66+77+- **`init-db.sql`** - PostgreSQL schema definition (tables, indexes, functions)
88+- **`seed-test-data.sql`** - Test data for local development
99+- **`generate-encryption-key.ts`** - OAuth key generation utility
1010+- **`keygen.js`** - Legacy key generation script
1111+1212+## Database Setup
1313+1414+### Local Development (Docker)
1515+1616+1. **Start PostgreSQL container:**
1717+ ```bash
1818+ cd docker
1919+ docker compose up -d database
2020+ ```
2121+2222+2. **Initialize schema:**
2323+ ```bash
2424+ docker compose exec database psql -U atlast -d atlast -f /docker-entrypoint-initdb.d/init.sql
2525+ ```
2626+2727+ Or connect and run manually:
2828+ ```bash
2929+ docker compose exec -i database psql -U atlast -d atlast < scripts/init-db.sql
3030+ ```
3131+3232+3. **Seed test data (optional):**
3333+ ```bash
3434+ docker compose exec -i database psql -U atlast -d atlast < scripts/seed-test-data.sql
3535+ ```
3636+3737+4. **Verify setup:**
3838+ ```bash
3939+ cd packages/api
4040+ DATABASE_URL=postgresql://atlast:password@localhost:5432/atlast pnpm run test:db
4141+ ```
4242+4343+### Production Setup
4444+4545+The database will be automatically initialized when the Docker container starts, as `init-db.sql` is mounted to `/docker-entrypoint-initdb.d/init.sql` in the compose file.
4646+4747+## Schema Overview
4848+4949+### Transient Tables (Session Data)
5050+- `oauth_states` - OAuth flow state storage
5151+- `oauth_sessions` - OAuth session data
5252+- `user_sessions` - User authentication sessions
5353+- `notification_queue` - Pending notifications (Phase 2)
5454+5555+**Note:** Transient data is cleaned up daily via the `cleanup_transient_data()` function.
5656+5757+### Persistent Tables (User Data)
5858+- `user_uploads` - Upload history and metadata
5959+- `source_accounts` - Usernames from source platforms (Instagram, TikTok, etc.)
6060+- `user_source_follows` - Links users to their source account follows
6161+- `atproto_matches` - Matched AT Protocol accounts
6262+- `user_match_status` - User interaction with matches (viewed, followed, etc.)
6363+- `partner_api_keys` - API keys for partner integrations (Phase 2)
6464+6565+## Key Features
6666+6767+### Fuzzy Matching
6868+The schema includes the `pg_trgm` extension for fuzzy username matching. This enables:
6969+- Similarity-based searches (`%` operator)
7070+- Trigram GIN indexes for fast fuzzy lookups
7171+- Essential for Phase 2 Tap server matching
7272+7373+### Indexes
7474+All tables are indexed for common query patterns:
7575+- Foreign key indexes for joins
7676+- Partial indexes for filtered queries (e.g., unnotified matches)
7777+- GIN indexes for fuzzy text matching
7878+7979+### Cleanup Function
8080+The `cleanup_transient_data()` function automatically removes:
8181+- Expired OAuth states (>1 hour old)
8282+- Expired user sessions
8383+- Old notification records (>7 days sent, >30 days failed)
8484+8585+This runs daily via BullMQ worker in production.
8686+8787+## Testing
8888+8989+### Test Connection
9090+```bash
9191+cd packages/api
9292+DATABASE_URL=postgresql://atlast:password@localhost:5432/atlast pnpm run test:db
9393+```
9494+9595+This script verifies:
9696+- Database connectivity
9797+- Required extensions are installed
9898+- All tables exist
9999+- Indexes are created
100100+- Fuzzy matching works
101101+- Displays record counts
102102+103103+### Manual Testing
104104+```bash
105105+# Connect to database
106106+docker compose exec database psql -U atlast
107107+108108+# List tables
109109+\dt
110110+111111+# List indexes
112112+\di
113113+114114+# Check extensions
115115+SELECT * FROM pg_extension WHERE extname IN ('uuid-ossp', 'pg_trgm');
116116+117117+# Test fuzzy matching
118118+SELECT similarity('johndoe', 'john_doe');
119119+120120+# Run cleanup function
121121+SELECT cleanup_transient_data();
122122+```
123123+124124+## Migration Notes
125125+126126+### Phase 1 (Current)
127127+- No periodic checking features
128128+- Notification queue exists but is not used until Phase 2
129129+- Partner API keys table exists but is not used until Phase 2
130130+131131+### Phase 2 (Future)
132132+- Tap server will use fuzzy matching to detect new accounts
133133+- Notification system will use the notification_queue table
134134+- Partner integrations will use the partner_api_keys table
135135+136136+## Troubleshooting
137137+138138+### Extensions Not Found
139139+```sql
140140+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
141141+CREATE EXTENSION IF NOT EXISTS "pg_trgm";
142142+```
143143+144144+### Permission Issues
145145+Ensure the database user has necessary permissions:
146146+```sql
147147+GRANT ALL PRIVILEGES ON DATABASE atlast TO atlast;
148148+GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO atlast;
149149+```
150150+151151+### Connection Refused
152152+Check that:
153153+- PostgreSQL is running: `docker compose ps database`
154154+- Port is exposed: `docker compose port database 5432`
155155+- DATABASE_URL is correct: `postgresql://atlast:password@localhost:5432/atlast`
+151
scripts/init-db.sql
···11+-- ATlast Database Schema
22+-- Migration Plan v2.0 - Phase 1
33+-- Self-hosted PostgreSQL schema with fuzzy matching support
44+55+-- Enable extensions
66+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
77+CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- For fuzzy matching
88+99+-- OAuth state storage (transient)
1010+CREATE TABLE oauth_states (
1111+ state TEXT PRIMARY KEY,
1212+ data JSONB NOT NULL,
1313+ created_at TIMESTAMP DEFAULT NOW()
1414+);
1515+CREATE INDEX idx_oauth_states_created ON oauth_states(created_at);
1616+1717+-- OAuth sessions (transient)
1818+CREATE TABLE oauth_sessions (
1919+ did TEXT PRIMARY KEY,
2020+ session_data JSONB NOT NULL,
2121+ updated_at TIMESTAMP DEFAULT NOW()
2222+);
2323+2424+-- User sessions (transient)
2525+CREATE TABLE user_sessions (
2626+ session_id TEXT PRIMARY KEY,
2727+ did TEXT NOT NULL,
2828+ fingerprint TEXT NOT NULL,
2929+ created_at TIMESTAMP DEFAULT NOW(),
3030+ expires_at TIMESTAMP NOT NULL
3131+);
3232+CREATE INDEX idx_user_sessions_did ON user_sessions(did);
3333+CREATE INDEX idx_user_sessions_expires ON user_sessions(expires_at);
3434+3535+-- User uploads (persistent)
3636+CREATE TABLE user_uploads (
3737+ upload_id TEXT PRIMARY KEY,
3838+ user_did TEXT NOT NULL,
3939+ source_platform TEXT NOT NULL,
4040+ created_at TIMESTAMP DEFAULT NOW(),
4141+ total_users INTEGER DEFAULT 0,
4242+ matched_users INTEGER DEFAULT 0,
4343+ unmatched_users INTEGER DEFAULT 0
4444+);
4545+CREATE INDEX idx_user_uploads_user_did ON user_uploads(user_did);
4646+-- Note: check_frequency and last_checked removed - no periodic checking in Phase 1
4747+4848+-- Source accounts (persistent)
4949+CREATE TABLE source_accounts (
5050+ id SERIAL PRIMARY KEY,
5151+ source_platform TEXT NOT NULL,
5252+ original_username TEXT NOT NULL,
5353+ normalized_username TEXT NOT NULL,
5454+ date_on_source TIMESTAMP,
5555+ created_at TIMESTAMP DEFAULT NOW(),
5656+ UNIQUE(source_platform, normalized_username)
5757+);
5858+CREATE INDEX idx_source_accounts_normalized ON source_accounts
5959+ USING gin(normalized_username gin_trgm_ops); -- Fuzzy matching!
6060+CREATE INDEX idx_source_accounts_platform ON source_accounts(source_platform);
6161+6262+-- User-source follows (join table)
6363+CREATE TABLE user_source_follows (
6464+ user_did TEXT NOT NULL,
6565+ upload_id TEXT NOT NULL REFERENCES user_uploads(upload_id) ON DELETE CASCADE,
6666+ source_account_id INTEGER NOT NULL REFERENCES source_accounts(id),
6767+ found_at TIMESTAMP DEFAULT NOW(),
6868+ PRIMARY KEY (upload_id, source_account_id)
6969+);
7070+CREATE INDEX idx_user_source_follows_user ON user_source_follows(user_did);
7171+CREATE INDEX idx_user_source_follows_source ON user_source_follows(source_account_id);
7272+7373+-- AT Protocol matches (persistent)
7474+CREATE TABLE atproto_matches (
7575+ id SERIAL PRIMARY KEY,
7676+ source_account_id INTEGER NOT NULL REFERENCES source_accounts(id),
7777+ atproto_did TEXT NOT NULL,
7878+ atproto_handle TEXT NOT NULL,
7979+ display_name TEXT,
8080+ match_score INTEGER NOT NULL,
8181+ post_count INTEGER,
8282+ follower_count INTEGER,
8383+ follow_status JSONB DEFAULT '{}',
8484+ found_at TIMESTAMP DEFAULT NOW(),
8585+ UNIQUE(source_account_id, atproto_did)
8686+);
8787+CREATE INDEX idx_atproto_matches_source ON atproto_matches(source_account_id);
8888+CREATE INDEX idx_atproto_matches_did ON atproto_matches(atproto_did);
8989+CREATE INDEX idx_atproto_matches_score ON atproto_matches(match_score DESC);
9090+9191+-- User match status (persistent)
9292+CREATE TABLE user_match_status (
9393+ user_did TEXT NOT NULL,
9494+ match_id INTEGER NOT NULL REFERENCES atproto_matches(id),
9595+ viewed BOOLEAN DEFAULT FALSE,
9696+ dismissed BOOLEAN DEFAULT FALSE,
9797+ followed BOOLEAN DEFAULT FALSE,
9898+ notified BOOLEAN DEFAULT FALSE,
9999+ updated_at TIMESTAMP DEFAULT NOW(),
100100+ PRIMARY KEY (user_did, match_id)
101101+);
102102+CREATE INDEX idx_user_match_status_user ON user_match_status(user_did);
103103+CREATE INDEX idx_user_match_status_notified ON user_match_status(user_did, notified)
104104+ WHERE notified = FALSE;
105105+106106+-- Notification queue (transient - for Phase 2)
107107+CREATE TABLE notification_queue (
108108+ id SERIAL PRIMARY KEY,
109109+ user_did TEXT NOT NULL,
110110+ match_id INTEGER NOT NULL REFERENCES atproto_matches(id),
111111+ notification_type TEXT NOT NULL, -- 'in_app', 'bluesky_dm', 'partner_api'
112112+ status TEXT DEFAULT 'pending', -- 'pending', 'sent', 'failed'
113113+ attempts INTEGER DEFAULT 0,
114114+ last_attempt TIMESTAMP,
115115+ error_message TEXT, -- Store error details for debugging
116116+ created_at TIMESTAMP DEFAULT NOW()
117117+);
118118+CREATE INDEX idx_notification_queue_status ON notification_queue(status)
119119+ WHERE status = 'pending';
120120+CREATE INDEX idx_notification_queue_user ON notification_queue(user_did);
121121+122122+-- Partner API keys (for Phase 2)
123123+CREATE TABLE partner_api_keys (
124124+ id SERIAL PRIMARY KEY,
125125+ partner_name TEXT NOT NULL, -- 'skylight', 'spark', etc.
126126+ api_key_hash TEXT NOT NULL UNIQUE, -- SHA-256 hashed API key
127127+ created_at TIMESTAMP DEFAULT NOW(),
128128+ last_used TIMESTAMP,
129129+ is_active BOOLEAN DEFAULT TRUE
130130+);
131131+CREATE INDEX idx_partner_api_keys_hash ON partner_api_keys(api_key_hash)
132132+ WHERE is_active = TRUE;
133133+134134+-- Cleanup function for old transient data
135135+CREATE OR REPLACE FUNCTION cleanup_transient_data() RETURNS void AS $$
136136+BEGIN
137137+ -- Clean expired OAuth states (1 hour)
138138+ DELETE FROM oauth_states WHERE created_at < NOW() - INTERVAL '1 hour';
139139+140140+ -- Clean expired sessions
141141+ DELETE FROM user_sessions WHERE expires_at < NOW();
142142+143143+ -- Clean old sent notifications (7 days)
144144+ DELETE FROM notification_queue
145145+ WHERE status = 'sent' AND created_at < NOW() - INTERVAL '7 days';
146146+147147+ -- Clean old failed notifications (30 days)
148148+ DELETE FROM notification_queue
149149+ WHERE status = 'failed' AND created_at < NOW() - INTERVAL '30 days';
150150+END;
151151+$$ LANGUAGE plpgsql;
+96
scripts/seed-test-data.sql
···11+-- Test Data Seeding Script
22+-- Use this for local development and testing
33+44+-- Clean existing test data (optional)
55+DELETE FROM user_match_status WHERE user_did LIKE 'did:plc:test%';
66+DELETE FROM atproto_matches WHERE source_account_id IN (SELECT id FROM source_accounts WHERE source_platform = 'test');
77+DELETE FROM user_source_follows WHERE user_did LIKE 'did:plc:test%';
88+DELETE FROM source_accounts WHERE source_platform = 'test';
99+DELETE FROM user_uploads WHERE upload_id LIKE 'test-%';
1010+DELETE FROM user_sessions WHERE session_id LIKE 'test-%';
1111+1212+-- Test user session
1313+INSERT INTO user_sessions (session_id, did, fingerprint, expires_at)
1414+VALUES (
1515+ 'test-session-123',
1616+ 'did:plc:test',
1717+ 'test-fingerprint',
1818+ NOW() + INTERVAL '7 days'
1919+);
2020+2121+-- Test upload
2222+INSERT INTO user_uploads (upload_id, user_did, source_platform, total_users, matched_users, unmatched_users)
2323+VALUES (
2424+ 'test-upload-1',
2525+ 'did:plc:test',
2626+ 'instagram',
2727+ 10,
2828+ 5,
2929+ 5
3030+);
3131+3232+-- Test source accounts
3333+INSERT INTO source_accounts (source_platform, original_username, normalized_username)
3434+VALUES
3535+ ('instagram', 'test_user', 'testuser'),
3636+ ('instagram', 'john.doe', 'johndoe'),
3737+ ('instagram', 'jane_smith', 'janesmith'),
3838+ ('tiktok', '@cool_person', 'coolperson'),
3939+ ('twitter', 'example_account', 'exampleaccount')
4040+ON CONFLICT (source_platform, normalized_username) DO NOTHING;
4141+4242+-- Link source accounts to upload
4343+INSERT INTO user_source_follows (user_did, upload_id, source_account_id)
4444+SELECT
4545+ 'did:plc:test',
4646+ 'test-upload-1',
4747+ id
4848+FROM source_accounts
4949+WHERE source_platform IN ('instagram', 'tiktok', 'twitter')
5050+ AND normalized_username IN ('testuser', 'johndoe', 'janesmith', 'coolperson', 'exampleaccount')
5151+ON CONFLICT DO NOTHING;
5252+5353+-- Test AT Protocol matches
5454+INSERT INTO atproto_matches (
5555+ source_account_id,
5656+ atproto_did,
5757+ atproto_handle,
5858+ display_name,
5959+ match_score,
6060+ post_count,
6161+ follower_count,
6262+ follow_status
6363+)
6464+SELECT
6565+ sa.id,
6666+ 'did:plc:matched-' || sa.id,
6767+ sa.normalized_username || '.bsky.social',
6868+ INITCAP(REPLACE(sa.normalized_username, '_', ' ')),
6969+ 100,
7070+ 42,
7171+ 128,
7272+ '{}'::jsonb
7373+FROM source_accounts sa
7474+WHERE sa.source_platform IN ('instagram', 'tiktok')
7575+ AND sa.normalized_username IN ('testuser', 'johndoe')
7676+ON CONFLICT (source_account_id, atproto_did) DO NOTHING;
7777+7878+-- Test user match status
7979+INSERT INTO user_match_status (user_did, match_id, viewed, dismissed, followed, notified)
8080+SELECT
8181+ 'did:plc:test',
8282+ am.id,
8383+ false,
8484+ false,
8585+ false,
8686+ false
8787+FROM atproto_matches am
8888+WHERE am.atproto_did LIKE 'did:plc:matched-%'
8989+ON CONFLICT DO NOTHING;
9090+9191+-- Display summary
9292+SELECT 'Test data seeded successfully!' as message;
9393+SELECT COUNT(*) as session_count FROM user_sessions WHERE session_id LIKE 'test-%';
9494+SELECT COUNT(*) as upload_count FROM user_uploads WHERE upload_id LIKE 'test-%';
9595+SELECT COUNT(*) as source_account_count FROM source_accounts WHERE source_platform IN ('test', 'instagram', 'tiktok', 'twitter');
9696+SELECT COUNT(*) as match_count FROM atproto_matches WHERE atproto_did LIKE 'did:plc:matched-%';