this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Initial commit.

TKTK cc90c24a 8120db89

+885
+1
.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc
··· 1 + ../../CLAUDE.md
+140
README.md
··· 1 + # circle-filter 2 + 3 + Detect and remove Twitter Circle tweets from your Twitter/X archive export. 4 + 5 + ## The Problem 6 + 7 + Twitter Circle was a feature (May 2022 - October 2023) that let users share tweets with a limited audience. When you download your Twitter archive, **Circle tweets are included but not marked as such** - there's no field indicating a tweet was Circle-only. 8 + 9 + This is a privacy issue: if you share or upload your archive, your Circle tweets become public. 10 + 11 + ## The Solution 12 + 13 + This tool uses Twitter's public syndication API to detect Circle tweets: 14 + 15 + 1. **Public tweets** return `__typename: "Tweet"` with full data 16 + 2. **Circle tweets** return `__typename: "TweetTombstone"` with "This Post is unavailable" 17 + 18 + The syndication API (`cdn.syndication.twimg.com/tweet-result`) is the same endpoint used for embedded tweets on websites - no API key or authentication required. 19 + 20 + ## Usage 21 + 22 + ### Step 1: Detect Circle Tweets 23 + 24 + ```bash 25 + cd circle-filter 26 + bun run src/detect.ts --archive ../data 27 + ``` 28 + 29 + This will: 30 + - Load all tweets from the Circle era (May 2022 - Nov 2023) 31 + - Skip retweets (can't be Circle tweets) 32 + - Check each tweet against the syndication API 33 + - Save progress every 50 tweets (resumable if interrupted) 34 + 35 + **Runtime:** ~2-3 hours for a typical archive (32k tweets to check at ~4 req/s) 36 + 37 + **Output:** 38 + - `output/circle-tweets.json` - Array of Circle tweet IDs 39 + - `output/detection-log.json` - Full log of every API check 40 + - `output/progress.json` - Resume checkpoint 41 + 42 + ### Step 2: Generate Cleaned Archive 43 + 44 + ```bash 45 + bun run src/clean.ts --archive ../data --circles ./output/circle-tweets.json 46 + ``` 47 + 48 + This generates cleaned `tweets*.js` files in `output/` with Circle tweets removed. 49 + 50 + **Output:** 51 + - `output/tweets.js` - Cleaned tweets (same format as original) 52 + - `output/tweets-part1.js` 53 + - `output/tweets-part2.js` 54 + - `output/clean-summary.json` - Summary of removal 55 + 56 + ### For Future Archives 57 + 58 + Once you have `circle-tweets.json`, reuse it for new archive exports: 59 + 60 + ```bash 61 + bun run src/clean.ts --archive /path/to/new/archive/data --circles ./output/circle-tweets.json 62 + ``` 63 + 64 + No need to re-run detection - your Circle tweet IDs won't change. 65 + 66 + ## How Detection Works 67 + 68 + ``` 69 + For each tweet in Circle era (May 2022 - Nov 2023): 70 + 1. Skip if retweet (can't be Circle) 71 + 2. Skip if in deleted-tweets.js (deleted, not Circle) 72 + 3. Query: cdn.syndication.twimg.com/tweet-result?id={id}&token={token} 73 + 4. If response.__typename === "TweetTombstone" → Circle tweet 74 + 5. Save to circle-tweets.json 75 + ``` 76 + 77 + The token is calculated as `(id / 1e15) * π` in base-36. Tweet IDs exceed `Number.MAX_SAFE_INTEGER`, so we use BigInt with hi/lo split to preserve precision: 78 + ```typescript 79 + const bigId = BigInt(id) 80 + const hi = Number(bigId / 1_000_000_000_000_000n) 81 + const lo = Number(bigId % 1_000_000_000_000_000n) / 1e15 82 + token = ((hi + lo) * Math.PI).toString(36).replace(/(0+|\.)/g, "") 83 + ``` 84 + 85 + ## Rate Limiting 86 + 87 + - Base delay: 250ms between requests (~4 req/s) 88 + - On HTTP 429: Exponential backoff (1s, 2s, 4s, 8s...) 89 + - Max retries per tweet: 5 90 + - Request timeout: 30s (prevents hung connections) 91 + - Progress saved every 50 tweets for resume after interruption 92 + - Transient errors (429, 5xx, network) are retried on resume 93 + 94 + ## Verification 95 + 96 + The `detection-log.json` file contains every API call made: 97 + 98 + ```json 99 + { 100 + "startedAt": "2025-12-07T...", 101 + "completedAt": "2025-12-07T...", 102 + "totalCandidates": 32150, 103 + "circleCount": 847, 104 + "results": { 105 + "1718307594148651356": { 106 + "id": "1718307594148651356", 107 + "typename": "TweetTombstone", 108 + "tombstoneText": "This Post is unavailable. Learn more", 109 + "retries": 0, 110 + "timestamp": "2025-12-07T..." 111 + } 112 + } 113 + } 114 + ``` 115 + 116 + ## File Structure 117 + 118 + ``` 119 + circle-filter/ 120 + ├── src/ 121 + │ ├── types.ts # TypeScript interfaces 122 + │ ├── syndication.ts # API client with retry logic 123 + │ ├── utils.ts # Archive loading helpers 124 + │ ├── detect.ts # Main detection script 125 + │ └── clean.ts # Archive cleaner script 126 + ├── output/ # Generated files go here 127 + ├── package.json 128 + ├── tsconfig.json 129 + └── README.md 130 + ``` 131 + 132 + ## Credits 133 + 134 + Detection method based on research into the Twitter syndication API. See also: 135 + - [community-archive circle-mitigation scripts](https://github.com/TheExGenesis/community-archive/tree/main/scripts/circle-mitigation) 136 + - [twittxr - Twitter Syndication API wrapper](https://github.com/Owen3H/twittxr) 137 + 138 + ## License 139 + 140 + MIT
+26
bun.lock
··· 1 + { 2 + "lockfileVersion": 1, 3 + "configVersion": 1, 4 + "workspaces": { 5 + "": { 6 + "name": "circle-filter", 7 + "devDependencies": { 8 + "@types/bun": "latest", 9 + }, 10 + "peerDependencies": { 11 + "typescript": "^5", 12 + }, 13 + }, 14 + }, 15 + "packages": { 16 + "@types/bun": ["@types/bun@1.3.4", "", { "dependencies": { "bun-types": "1.3.4" } }, "sha512-EEPTKXHP+zKGPkhRLv+HI0UEX8/o+65hqARxLy8Ov5rIxMBPNTjeZww00CIihrIQGEQBYg+0roO5qOnS/7boGA=="], 17 + 18 + "@types/node": ["@types/node@24.10.1", "", { "dependencies": { "undici-types": "~7.16.0" } }, "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ=="], 19 + 20 + "bun-types": ["bun-types@1.3.4", "", { "dependencies": { "@types/node": "*" } }, "sha512-5ua817+BZPZOlNaRgGBpZJOSAQ9RQ17pkwPD0yR7CfJg+r8DgIILByFifDTa+IPDDxzf5VNhtNlcKqFzDgJvlQ=="], 21 + 22 + "typescript": ["typescript@5.9.3", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw=="], 23 + 24 + "undici-types": ["undici-types@7.16.0", "", {}, "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="], 25 + } 26 + }
+1
index.ts
··· 1 + console.log("Hello via Bun!");
+12
package.json
··· 1 + { 2 + "name": "circle-filter", 3 + "module": "index.ts", 4 + "type": "module", 5 + "private": true, 6 + "devDependencies": { 7 + "@types/bun": "latest" 8 + }, 9 + "peerDependencies": { 10 + "typescript": "^5" 11 + } 12 + }
+148
src/clean.ts
··· 1 + import { existsSync, readFileSync, writeFileSync, mkdirSync, readdirSync } from "fs" 2 + import { parseArgs } from "util" 3 + import type { Tweet } from "./types" 4 + 5 + // Dynamically discover tweet files (tweets.js, tweets-part1.js, tweets-part2.js, etc.) 6 + function discoverTweetFiles(archivePath: string): string[] { 7 + const allFiles = readdirSync(archivePath) 8 + return allFiles 9 + .filter((f) => f === "tweets.js" || f.match(/^tweets-part\d+\.js$/)) 10 + .sort() 11 + } 12 + 13 + function parseCliArgs() { 14 + const { values } = parseArgs({ 15 + args: Bun.argv.slice(2), 16 + options: { 17 + archive: { type: "string", short: "a" }, 18 + circles: { type: "string", short: "c" }, 19 + output: { type: "string", short: "o", default: "./output" }, 20 + }, 21 + strict: true, 22 + }) 23 + 24 + if (!values.archive || !values.circles) { 25 + console.error("Usage: bun run src/clean.ts --archive <path> --circles <circle-tweets.json>") 26 + console.error("Example: bun run src/clean.ts --archive ../data --circles ./output/circle-tweets.json") 27 + process.exit(1) 28 + } 29 + 30 + return { 31 + archivePath: values.archive, 32 + circlesPath: values.circles, 33 + outputPath: values.output!, 34 + } 35 + } 36 + 37 + function loadCircleIds(path: string): Set<string> { 38 + if (!existsSync(path)) { 39 + console.error(`Circle tweets file not found: ${path}`) 40 + console.error("Run detect.ts first to generate this file.") 41 + process.exit(1) 42 + } 43 + const ids = JSON.parse(readFileSync(path, "utf8")) as string[] 44 + return new Set(ids) 45 + } 46 + 47 + function loadTweetsFile(archivePath: string, fileName: string): { tweets: Tweet[]; varName: string } | null { 48 + const path = `${archivePath}/${fileName}` 49 + if (!existsSync(path)) return null 50 + 51 + const content = readFileSync(path, "utf8") 52 + const match = content.match(/^(window\.YTD\.\w+\.part\d+) = /) 53 + if (!match) throw new Error(`Invalid JS file format: ${path}`) 54 + 55 + const varName = match[1] 56 + const json = content.slice(match[0].length) 57 + const tweets = JSON.parse(json) as Tweet[] 58 + 59 + return { tweets, varName } 60 + } 61 + 62 + function writeTweetsFile(outputPath: string, fileName: string, varName: string, tweets: Tweet[]) { 63 + const content = `${varName} = ${JSON.stringify(tweets, null, 2)}` 64 + writeFileSync(`${outputPath}/${fileName}`, content) 65 + } 66 + 67 + function main() { 68 + const { archivePath, circlesPath, outputPath } = parseCliArgs() 69 + 70 + console.log("Circle Tweet Cleaner") 71 + console.log("====================") 72 + console.log(`Archive: ${archivePath}`) 73 + console.log(`Circle tweets: ${circlesPath}`) 74 + console.log(`Output: ${outputPath}`) 75 + console.log() 76 + 77 + // Ensure output directory exists 78 + if (!existsSync(outputPath)) { 79 + mkdirSync(outputPath, { recursive: true }) 80 + } 81 + 82 + // Load circle tweet IDs 83 + const circleIds = loadCircleIds(circlesPath) 84 + console.log(`Loaded ${circleIds.size} circle tweet IDs to filter`) 85 + console.log() 86 + 87 + let totalOriginal = 0 88 + let totalRemoved = 0 89 + let totalCleaned = 0 90 + 91 + // Discover and process each tweets file dynamically 92 + const tweetFiles = discoverTweetFiles(archivePath) 93 + console.log(`Found ${tweetFiles.length} tweet file(s): ${tweetFiles.join(", ")}`) 94 + console.log() 95 + 96 + for (const file of tweetFiles) { 97 + const data = loadTweetsFile(archivePath, file) 98 + if (!data) { 99 + console.log(`[SKIP] ${file} (not found)`) 100 + continue 101 + } 102 + 103 + const { tweets, varName } = data 104 + const originalCount = tweets.length 105 + 106 + // Filter out circle tweets 107 + const cleaned = tweets.filter((item) => !circleIds.has(item.tweet.id_str)) 108 + const removedCount = originalCount - cleaned.length 109 + 110 + // Write cleaned file 111 + writeTweetsFile(outputPath, file, varName, cleaned) 112 + 113 + console.log(`[OK] ${file}: ${originalCount} -> ${cleaned.length} (removed ${removedCount})`) 114 + 115 + totalOriginal += originalCount 116 + totalRemoved += removedCount 117 + totalCleaned += cleaned.length 118 + } 119 + 120 + console.log() 121 + console.log("=== SUMMARY ===") 122 + console.log(`Original tweets: ${totalOriginal}`) 123 + console.log(`Circle tweets removed: ${totalRemoved}`) 124 + console.log(`Cleaned tweets: ${totalCleaned}`) 125 + console.log() 126 + console.log(`Cleaned files saved to: ${outputPath}/`) 127 + console.log() 128 + 129 + // Verification 130 + if (totalRemoved !== circleIds.size) { 131 + console.log(`[WARN] Removed ${totalRemoved} tweets but had ${circleIds.size} circle IDs`) 132 + console.log(" Some circle tweets may be in other files or already deleted") 133 + } 134 + 135 + // Write summary 136 + const summary = { 137 + timestamp: new Date().toISOString(), 138 + archivePath, 139 + circlesPath, 140 + totalOriginal, 141 + totalRemoved, 142 + totalCleaned, 143 + circleIdsCount: circleIds.size, 144 + } 145 + writeFileSync(`${outputPath}/clean-summary.json`, JSON.stringify(summary, null, 2)) 146 + } 147 + 148 + main()
+275
src/detect.ts
··· 1 + import { existsSync, mkdirSync, writeFileSync } from "fs"; 2 + import { parseArgs } from "util"; 3 + import { checkTweet } from "./syndication"; 4 + import { 5 + loadTweets, 6 + loadDeletedTweetIds, 7 + loadProgress, 8 + saveProgress, 9 + isInDateRange, 10 + isRetweet, 11 + formatDuration, 12 + formatProgress, 13 + } from "./utils"; 14 + import type { Progress, SyndicationResult, DetectionStats } from "./types"; 15 + 16 + // Circle tweet date range (conservative) 17 + const CIRCLE_START = new Date("2022-05-02"); 18 + const CIRCLE_END = new Date("2023-11-15"); 19 + 20 + const CONCURRENCY = 10; // Concurrent API requests 21 + 22 + function parseCliArgs() { 23 + const { values } = parseArgs({ 24 + args: Bun.argv.slice(2), 25 + options: { 26 + archive: { type: "string", short: "a" }, 27 + output: { type: "string", short: "o", default: "./output" }, 28 + }, 29 + strict: true, 30 + }); 31 + 32 + if (!values.archive) { 33 + console.error( 34 + "Usage: bun run src/detect.ts --archive <path-to-data-folder>", 35 + ); 36 + console.error("Example: bun run src/detect.ts --archive ../data"); 37 + process.exit(1); 38 + } 39 + 40 + return { 41 + archivePath: values.archive, 42 + outputPath: values.output!, 43 + }; 44 + } 45 + 46 + async function main() { 47 + const { archivePath, outputPath } = parseCliArgs(); 48 + liveOutputPath = outputPath; // For SIGINT handler 49 + 50 + console.log("Circle Tweet Detector"); 51 + console.log("====================="); 52 + console.log(`Archive: ${archivePath}`); 53 + console.log(`Output: ${outputPath}`); 54 + console.log( 55 + `Date range: ${CIRCLE_START.toISOString().split("T")[0]} to ${CIRCLE_END.toISOString().split("T")[0]}`, 56 + ); 57 + console.log(); 58 + 59 + // Ensure output directory exists 60 + if (!existsSync(outputPath)) { 61 + mkdirSync(outputPath, { recursive: true }); 62 + } 63 + 64 + // Load deleted tweet IDs (to exclude from Circle detection) 65 + console.log("Loading deleted tweets..."); 66 + const deletedIds = loadDeletedTweetIds(archivePath); 67 + console.log(`Found ${deletedIds.size} deleted tweets`); 68 + 69 + // Load or resume progress 70 + let progress = loadProgress(outputPath); 71 + let isResuming = false; 72 + 73 + if (progress) { 74 + isResuming = true; 75 + console.log(`\nResuming from previous run (started ${progress.startedAt})`); 76 + console.log(`Already checked: ${Object.keys(progress.checked).length}`); 77 + console.log(`Circle tweets found so far: ${progress.circleIds.length}`); 78 + } else { 79 + // Load tweets and build candidate list 80 + console.log("\nLoading tweets..."); 81 + const tweets = loadTweets(archivePath); 82 + console.log(`Loaded ${tweets.length} tweets`); 83 + 84 + // Filter to Circle era candidates 85 + console.log("\nFiltering to Circle era candidates..."); 86 + const candidates: string[] = []; 87 + 88 + for (const item of tweets) { 89 + const t = item.tweet; 90 + const text = t.full_text || ""; 91 + 92 + // Skip retweets (can't be Circle tweets) 93 + if (isRetweet(text)) continue; 94 + 95 + // Check date range 96 + if (!isInDateRange(t.created_at, CIRCLE_START, CIRCLE_END)) continue; 97 + 98 + candidates.push(t.id_str); 99 + } 100 + 101 + console.log(`Found ${candidates.length} candidates to check`); 102 + 103 + progress = { 104 + checked: {}, 105 + circleIds: [], 106 + candidates, 107 + startedAt: new Date().toISOString(), 108 + lastUpdated: new Date().toISOString(), 109 + }; 110 + } 111 + 112 + liveProgress = progress; // For SIGINT handler 113 + 114 + // Filter out already checked candidates (but retry errors from transient failures) 115 + const remaining = progress.candidates.filter((id) => { 116 + const result = progress!.checked[id]; 117 + return !result || result.typename === "error"; 118 + }); 119 + console.log(`\nRemaining to check: ${remaining.length}`); 120 + 121 + if (remaining.length === 0) { 122 + console.log("All candidates already checked!"); 123 + writeFinalResults(outputPath, progress); 124 + return; 125 + } 126 + 127 + // Estimate time (rough: ~50ms per request with concurrency) 128 + const estimatedMs = (remaining.length / CONCURRENCY) * 50; 129 + console.log(`Estimated time: ${formatDuration(estimatedMs)} (${CONCURRENCY} concurrent)`); 130 + console.log(); 131 + 132 + // Stats tracking (exclude errors from checked count since they'll be retried) 133 + const errorCount = Object.values(progress.checked).filter( 134 + (r) => r.typename === "error", 135 + ).length; 136 + const stats: DetectionStats = { 137 + total: progress.candidates.length, 138 + checked: Object.keys(progress.checked).length - errorCount, 139 + public: 0, 140 + circle: progress.circleIds.length, 141 + errors: 0, 142 + skipped: 0, 143 + }; 144 + 145 + // Recount from existing progress (errors will be retried, don't count them) 146 + for (const result of Object.values(progress.checked)) { 147 + if (result.typename === "Tweet") stats.public++; 148 + // errors not counted here since they'll be retried 149 + } 150 + 151 + const startTime = Date.now(); 152 + 153 + // First pass: handle deleted tweets synchronously (no API needed) 154 + const toCheck: string[] = []; 155 + for (const id of remaining) { 156 + if (deletedIds.has(id)) { 157 + stats.skipped++; 158 + stats.checked++; 159 + progress.checked[id] = { 160 + id, 161 + typename: "Tweet", // Treat as public (just deleted) 162 + retries: 0, 163 + timestamp: new Date().toISOString(), 164 + }; 165 + } else { 166 + toCheck.push(id); 167 + } 168 + } 169 + 170 + // Process in concurrent batches 171 + for (let i = 0; i < toCheck.length; i += CONCURRENCY) { 172 + const batch = toCheck.slice(i, i + CONCURRENCY); 173 + 174 + const results = await Promise.all(batch.map((id) => checkTweet(id))); 175 + 176 + // Process results 177 + for (let j = 0; j < batch.length; j++) { 178 + const id = batch[j]; 179 + const result = results[j]; 180 + progress.checked[id] = result; 181 + stats.checked++; 182 + 183 + if (result.typename === "TweetTombstone") { 184 + progress.circleIds.push(id); 185 + stats.circle++; 186 + console.log(`[CIRCLE] ${id}`); 187 + } else if (result.typename === "Tweet") { 188 + stats.public++; 189 + } else { 190 + stats.errors++; 191 + console.log(`[ERROR] ${id}: ${result.error}`); 192 + } 193 + } 194 + 195 + // Save after each batch 196 + saveProgress(outputPath, progress); 197 + 198 + // Progress update every 10 batches (~100 tweets) 199 + const processed = Math.min(i + CONCURRENCY, toCheck.length); 200 + if (processed % 100 < CONCURRENCY) { 201 + const elapsed = Date.now() - startTime; 202 + const rate = processed / (elapsed / 1000); 203 + const eta = (toCheck.length - processed) / rate; 204 + console.log( 205 + `Progress: ${formatProgress(stats.checked, stats.total)} | ` + 206 + `Circle: ${stats.circle} | ` + 207 + `Rate: ${rate.toFixed(1)}/s | ` + 208 + `ETA: ${formatDuration(eta * 1000)}`, 209 + ); 210 + } 211 + } 212 + 213 + // Final save 214 + saveProgress(outputPath, progress); 215 + writeFinalResults(outputPath, progress); 216 + 217 + // Summary 218 + const elapsed = Date.now() - startTime; 219 + console.log(); 220 + console.log("=== DETECTION COMPLETE ==="); 221 + console.log(`Total candidates: ${stats.total}`); 222 + console.log(`Checked: ${stats.checked}`); 223 + console.log(`Public tweets: ${stats.public}`); 224 + console.log(`Circle tweets: ${stats.circle}`); 225 + console.log(`Errors: ${stats.errors}`); 226 + console.log(`Skipped (deleted): ${stats.skipped}`); 227 + console.log(`Time: ${formatDuration(elapsed)}`); 228 + console.log(); 229 + console.log(`Results saved to:`); 230 + console.log(` ${outputPath}/circle-tweets.json`); 231 + console.log(` ${outputPath}/detection-log.json`); 232 + } 233 + 234 + function writeFinalResults(outputPath: string, progress: Progress) { 235 + // Write circle tweet IDs 236 + writeFileSync( 237 + `${outputPath}/circle-tweets.json`, 238 + JSON.stringify(progress.circleIds, null, 2), 239 + ); 240 + 241 + // Write full detection log 242 + const log = { 243 + startedAt: progress.startedAt, 244 + completedAt: new Date().toISOString(), 245 + totalCandidates: progress.candidates.length, 246 + circleCount: progress.circleIds.length, 247 + results: progress.checked, 248 + }; 249 + writeFileSync( 250 + `${outputPath}/detection-log.json`, 251 + JSON.stringify(log, null, 2), 252 + ); 253 + } 254 + 255 + // Live state for SIGINT handler 256 + let liveProgress: Progress | null = null; 257 + let liveOutputPath: string | null = null; 258 + 259 + // Handle Ctrl+C gracefully - save live in-memory progress, not reloaded from disk 260 + process.on("SIGINT", () => { 261 + console.log("\n\nInterrupted! Saving progress..."); 262 + if (liveProgress && liveOutputPath) { 263 + liveProgress.lastUpdated = new Date().toISOString(); 264 + saveProgress(liveOutputPath, liveProgress); 265 + console.log( 266 + `Progress saved (${Object.keys(liveProgress.checked).length} checked). Run again to resume.`, 267 + ); 268 + } 269 + process.exit(0); 270 + }); 271 + 272 + main().catch((e) => { 273 + console.error("Fatal error:", e); 274 + process.exit(1); 275 + });
+111
src/syndication.ts
··· 1 + import type { SyndicationResult, ApiResponseType } from "./types" 2 + 3 + const TWEET_URL = "https://cdn.syndication.twimg.com/tweet-result" 4 + const BASE_DELAY = 250 // ms between requests 5 + const MAX_RETRIES = 5 6 + const FETCH_TIMEOUT_MS = 30000 // 30s timeout per request 7 + 8 + function getToken(id: string): string { 9 + // Use BigInt to preserve precision for tweet IDs > 2^53 (Number.MAX_SAFE_INTEGER) 10 + // Original formula: (id / 1e15) * PI 11 + // Split hi/lo to avoid BigInt truncation losing lower digits 12 + const bigId = BigInt(id) 13 + const hi = Number(bigId / 1_000_000_000_000_000n) 14 + const lo = Number(bigId % 1_000_000_000_000_000n) / 1e15 15 + return ((hi + lo) * Math.PI).toString(36).replace(/(0+|\.)/g, "") 16 + } 17 + 18 + function sleep(ms: number): Promise<void> { 19 + return new Promise((resolve) => setTimeout(resolve, ms)) 20 + } 21 + 22 + export async function checkTweet(id: string): Promise<SyndicationResult> { 23 + const token = getToken(id) 24 + const url = `${TWEET_URL}?id=${id}&token=${token}` 25 + 26 + let lastError: string | undefined 27 + let retries = 0 28 + 29 + for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) { 30 + const controller = new AbortController() 31 + const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS) 32 + try { 33 + const resp = await fetch(url, { signal: controller.signal }) 34 + 35 + // Rate limited - exponential backoff 36 + if (resp.status === 429) { 37 + const backoff = Math.pow(2, attempt) * 1000 38 + console.log(` Rate limited, waiting ${backoff / 1000}s...`) 39 + await sleep(backoff) 40 + retries++ 41 + continue 42 + } 43 + 44 + // Other HTTP errors 45 + if (!resp.ok) { 46 + lastError = `HTTP ${resp.status}` 47 + retries++ 48 + await sleep(1000 * attempt) 49 + continue 50 + } 51 + 52 + const data = await resp.json() as { __typename?: string; tombstone?: { text?: { text?: string } } } 53 + 54 + const typename: ApiResponseType = 55 + data.__typename === "Tweet" ? "Tweet" : 56 + data.__typename === "TweetTombstone" ? "TweetTombstone" : 57 + "error" 58 + 59 + return { 60 + id, 61 + typename, 62 + tombstoneText: data.tombstone?.text?.text, 63 + retries, 64 + timestamp: new Date().toISOString() 65 + } 66 + } catch (e) { 67 + lastError = e instanceof Error ? e.message : String(e) 68 + retries++ 69 + await sleep(1000 * attempt) 70 + } finally { 71 + clearTimeout(timeout) 72 + } 73 + } 74 + 75 + return { 76 + id, 77 + typename: "error", 78 + error: lastError || "Max retries exceeded", 79 + retries, 80 + timestamp: new Date().toISOString() 81 + } 82 + } 83 + 84 + export interface CheckOptions { 85 + onProgress?: (checked: number, total: number, result: SyndicationResult) => void 86 + delayMs?: number 87 + } 88 + 89 + export async function checkTweets( 90 + ids: string[], 91 + options: CheckOptions = {} 92 + ): Promise<SyndicationResult[]> { 93 + const { onProgress, delayMs = BASE_DELAY } = options 94 + const results: SyndicationResult[] = [] 95 + 96 + for (let i = 0; i < ids.length; i++) { 97 + const result = await checkTweet(ids[i]) 98 + results.push(result) 99 + 100 + if (onProgress) { 101 + onProgress(i + 1, ids.length, result) 102 + } 103 + 104 + // Delay between requests (skip on last) 105 + if (i < ids.length - 1) { 106 + await sleep(delayMs) 107 + } 108 + } 109 + 110 + return results 111 + }
+53
src/types.ts
··· 1 + export interface TweetHeader { 2 + tweet: { 3 + tweet_id: string 4 + user_id: string 5 + created_at: string 6 + } 7 + } 8 + 9 + export interface Tweet { 10 + tweet: { 11 + id_str: string 12 + created_at: string 13 + full_text: string 14 + in_reply_to_status_id_str?: string 15 + in_reply_to_user_id_str?: string 16 + [key: string]: unknown 17 + } 18 + } 19 + 20 + export interface DeletedTweet { 21 + tweet: { 22 + tweet_id?: string 23 + id_str?: string 24 + } 25 + } 26 + 27 + export type ApiResponseType = "Tweet" | "TweetTombstone" | "error" 28 + 29 + export interface SyndicationResult { 30 + id: string 31 + typename: ApiResponseType 32 + tombstoneText?: string 33 + error?: string 34 + retries: number 35 + timestamp: string 36 + } 37 + 38 + export interface Progress { 39 + checked: Record<string, SyndicationResult> 40 + circleIds: string[] 41 + candidates: string[] 42 + startedAt: string 43 + lastUpdated: string 44 + } 45 + 46 + export interface DetectionStats { 47 + total: number 48 + checked: number 49 + public: number 50 + circle: number 51 + errors: number 52 + skipped: number 53 + }
+89
src/utils.ts
··· 1 + import { readFileSync, writeFileSync, existsSync, readdirSync } from "fs" 2 + import type { TweetHeader, Tweet, DeletedTweet, Progress } from "./types" 3 + 4 + export function loadJsData<T>(filePath: string): T[] { 5 + const content = readFileSync(filePath, "utf8") 6 + const match = content.match(/^window\.YTD\.\w+\.part\d+ = /) 7 + if (!match) throw new Error(`Invalid JS file format: ${filePath}`) 8 + const json = content.slice(match[0].length) 9 + return JSON.parse(json) 10 + } 11 + 12 + export function loadTweetHeaders(archivePath: string): TweetHeader[] { 13 + return loadJsData<TweetHeader>(`${archivePath}/tweet-headers.js`) 14 + } 15 + 16 + export function loadTweets(archivePath: string): Tweet[] { 17 + // Dynamically find all tweet files (tweets.js, tweets-part1.js, tweets-part2.js, etc.) 18 + const allFiles = readdirSync(archivePath) 19 + const tweetFiles = allFiles 20 + .filter((f) => f === "tweets.js" || f.match(/^tweets-part\d+\.js$/)) 21 + .sort() // Ensure consistent ordering 22 + 23 + const tweets: Tweet[] = [] 24 + for (const file of tweetFiles) { 25 + tweets.push(...loadJsData<Tweet>(`${archivePath}/${file}`)) 26 + } 27 + 28 + return tweets 29 + } 30 + 31 + export function loadDeletedTweetIds(archivePath: string): Set<string> { 32 + // Dynamically find all deleted-tweets files (deleted-tweets.js, deleted-tweets-part1.js, etc.) 33 + const allFiles = readdirSync(archivePath) 34 + const deletedFiles = allFiles 35 + .filter((f) => f === "deleted-tweets.js" || f.match(/^deleted-tweets-part\d+\.js$/)) 36 + .sort() 37 + 38 + if (deletedFiles.length === 0) return new Set() 39 + 40 + const allDeleted: DeletedTweet[] = [] 41 + for (const file of deletedFiles) { 42 + allDeleted.push(...loadJsData<DeletedTweet>(`${archivePath}/${file}`)) 43 + } 44 + 45 + return new Set(allDeleted.map((d) => d.tweet.tweet_id || d.tweet.id_str || "")) 46 + } 47 + 48 + export function loadProgress(outputPath: string): Progress | null { 49 + const path = `${outputPath}/progress.json` 50 + if (!existsSync(path)) return null 51 + return JSON.parse(readFileSync(path, "utf8")) 52 + } 53 + 54 + export function saveProgress(outputPath: string, progress: Progress): void { 55 + progress.lastUpdated = new Date().toISOString() 56 + writeFileSync(`${outputPath}/progress.json`, JSON.stringify(progress, null, 2)) 57 + } 58 + 59 + export function parseDate(dateStr: string): Date { 60 + // Twitter date format: "Sun Oct 29 23:42:00 +0000 2023" 61 + return new Date(dateStr) 62 + } 63 + 64 + export function isInDateRange(dateStr: string, start: Date, end: Date): boolean { 65 + const date = parseDate(dateStr) 66 + return date >= start && date <= end 67 + } 68 + 69 + export function isRetweet(text: string): boolean { 70 + return text.startsWith("RT @") 71 + } 72 + 73 + export function formatDuration(ms: number): string { 74 + const seconds = Math.floor(ms / 1000) 75 + const minutes = Math.floor(seconds / 60) 76 + const hours = Math.floor(minutes / 60) 77 + 78 + if (hours > 0) { 79 + return `${hours}h ${minutes % 60}m` 80 + } else if (minutes > 0) { 81 + return `${minutes}m ${seconds % 60}s` 82 + } 83 + return `${seconds}s` 84 + } 85 + 86 + export function formatProgress(current: number, total: number): string { 87 + const pct = ((current / total) * 100).toFixed(1) 88 + return `${current}/${total} (${pct}%)` 89 + }
+29
tsconfig.json
··· 1 + { 2 + "compilerOptions": { 3 + // Environment setup & latest features 4 + "lib": ["ESNext"], 5 + "target": "ESNext", 6 + "module": "Preserve", 7 + "moduleDetection": "force", 8 + "jsx": "react-jsx", 9 + "allowJs": true, 10 + 11 + // Bundler mode 12 + "moduleResolution": "bundler", 13 + "allowImportingTsExtensions": true, 14 + "verbatimModuleSyntax": true, 15 + "noEmit": true, 16 + 17 + // Best practices 18 + "strict": true, 19 + "skipLibCheck": true, 20 + "noFallthroughCasesInSwitch": true, 21 + "noUncheckedIndexedAccess": true, 22 + "noImplicitOverride": true, 23 + 24 + // Some stricter flags (disabled by default) 25 + "noUnusedLocals": false, 26 + "noUnusedParameters": false, 27 + "noPropertyAccessFromIndexSignature": false 28 + } 29 + }