this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Initial commit: EPUB reader skill for Claude Code

Add a skill that enables reading EPUB ebook files with:
- Metadata extraction (title, author, publisher, etc.)
- Table of contents listing
- Chapter-by-chapter reading
- Full book extraction as markdown
- Text search with context

Built with TypeScript using jszip, xml2js, turndown, and commander.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

TKTK 07e812cf

+1346
+16
.gitignore
··· 1 + # Dependencies 2 + node_modules/ 3 + 4 + # Build output - keep dist/ tracked since it's needed for the skill to work 5 + # dist/ 6 + 7 + # OS files 8 + .DS_Store 9 + Thumbs.db 10 + 11 + # Editor files 12 + *.swp 13 + *.swo 14 + *~ 15 + .idea/ 16 + .vscode/
+70
AGENTS.md
··· 1 + # EPUB Reader Skill for Claude Code 2 + 3 + A Claude Code skill that enables efficient reading of EPUB ebook files. 4 + 5 + ## Capabilities 6 + 7 + - **Metadata extraction** - title, author, publisher, date, language 8 + - **Table of contents** - view chapter structure 9 + - **Chapter reading** - read specific chapters by number 10 + - **Full extraction** - extract entire book as markdown 11 + - **Search** - find text with surrounding context 12 + 13 + ## Directory Structure 14 + 15 + ``` 16 + ~/.claude/skills/epub/ 17 + ├── SKILL.md # Skill definition (triggers on EPUB-related requests) 18 + ├── AGENTS.md # This documentation 19 + ├── CLAUDE.md -> AGENTS.md # Symlink 20 + └── scripts/epub-reader/ 21 + ├── package.json 22 + ├── tsconfig.json 23 + ├── src/index.ts # TypeScript source 24 + └── dist/ # Compiled JavaScript 25 + ``` 26 + 27 + ## Technology Stack 28 + 29 + - **TypeScript** - main implementation language 30 + - **jszip** - extract EPUB contents (EPUBs are ZIP archives) 31 + - **xml2js** - parse OPF/NCX metadata files 32 + - **turndown** - convert HTML content to Markdown 33 + - **commander** - CLI argument parsing 34 + 35 + ## CLI Commands 36 + 37 + ```bash 38 + # View metadata 39 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js metadata "<file.epub>" 40 + 41 + # List table of contents 42 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js toc "<file.epub>" 43 + 44 + # Read specific chapter (1-indexed) 45 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js chapter "<file.epub>" <number> 46 + 47 + # Extract entire book 48 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js full "<file.epub>" 49 + 50 + # Search for text 51 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js search "<file.epub>" "<query>" 52 + ``` 53 + 54 + ## How the Skill Works 55 + 56 + 1. **SKILL.md** defines when Claude should use this skill (any EPUB-related request) 57 + 2. Claude automatically invokes the appropriate CLI command based on user intent 58 + 3. Output is clean Markdown suitable for reading and analysis 59 + 60 + ## Rebuilding 61 + 62 + If you need to modify and rebuild: 63 + 64 + ```bash 65 + cd ~/.claude/skills/epub/scripts/epub-reader 66 + npm install 67 + npm run build 68 + ``` 69 + 70 + Restart Claude Code after any changes to SKILL.md for them to take effect.
+1
CLAUDE.md
··· 1 + AGENTS.md
+88
SKILL.md
··· 1 + --- 2 + name: epub 3 + description: Read and extract content from EPUB ebook files. Use this skill when the user wants to read an EPUB file, extract text from an ebook, view EPUB metadata (title, author), list chapters or table of contents, search within EPUB content, or analyze ebook content. 4 + --- 5 + 6 + # EPUB Reader Skill 7 + 8 + Read EPUB ebook files and extract content as clean Markdown. 9 + 10 + ## Instructions 11 + 12 + Use the epub-reader CLI tool to interact with EPUB files. The tool is located at: 13 + `~/.claude/skills/epub/scripts/epub-reader/dist/index.js` 14 + 15 + ### Available Commands 16 + 17 + #### 1. View Metadata 18 + Get book information (title, author, publisher, date, description). 19 + 20 + ```bash 21 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js metadata "<path-to-epub>" 22 + ``` 23 + 24 + #### 2. List Table of Contents 25 + View all chapters and their structure. 26 + 27 + ```bash 28 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js toc "<path-to-epub>" 29 + ``` 30 + 31 + #### 3. Read Specific Chapter 32 + Read a single chapter by number (1-indexed). 33 + 34 + ```bash 35 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js chapter "<path-to-epub>" <chapter-number> 36 + ``` 37 + 38 + #### 4. Read Entire Book 39 + Extract the complete book as Markdown. 40 + 41 + ```bash 42 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js full "<path-to-epub>" 43 + ``` 44 + 45 + #### 5. Search Text 46 + Find text occurrences with surrounding context. 47 + 48 + ```bash 49 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js search "<path-to-epub>" "<search-query>" 50 + ``` 51 + 52 + ## Recommended Workflow 53 + 54 + 1. **Start with metadata** to understand what book you're working with 55 + 2. **View the TOC** to see available chapters and structure 56 + 3. **Read specific chapters** for targeted analysis, or use **full** for complete extraction 57 + 4. **Use search** to find specific topics, quotes, or references 58 + 59 + ## Output Format 60 + 61 + All output is clean Markdown: 62 + - Headings preserved as `#`, `##`, etc. 63 + - Lists, links, and emphasis converted properly 64 + - Excessive whitespace cleaned up 65 + - Chapter separators included for full extraction 66 + 67 + ## Examples 68 + 69 + ```bash 70 + # What book is this? 71 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js metadata "/path/to/book.epub" 72 + 73 + # Show me the chapters 74 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js toc "/path/to/book.epub" 75 + 76 + # Read chapter 3 77 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js chapter "/path/to/book.epub" 3 78 + 79 + # Find all mentions of "democracy" 80 + node ~/.claude/skills/epub/scripts/epub-reader/dist/index.js search "/path/to/book.epub" "democracy" 81 + ``` 82 + 83 + ## Notes 84 + 85 + - Chapter numbers are 1-indexed (first chapter is 1, not 0) 86 + - Paths with spaces must be quoted 87 + - Large books may produce substantial output with the `full` command 88 + - Search results show up to 5 matches per chapter with context
+2
scripts/epub-reader/dist/index.d.ts
··· 1 + #!/usr/bin/env node 2 + export {};
+383
scripts/epub-reader/dist/index.js
··· 1 + #!/usr/bin/env node 2 + import { program } from "commander"; 3 + import * as fs from "fs"; 4 + import * as path from "path"; 5 + import JSZip from "jszip"; 6 + import { parseStringPromise } from "xml2js"; 7 + import TurndownService from "turndown"; 8 + const turndown = new TurndownService({ 9 + headingStyle: "atx", 10 + codeBlockStyle: "fenced", 11 + emDelimiter: "*", 12 + }); 13 + // Improve turndown to handle more elements 14 + turndown.addRule("preserveLineBreaks", { 15 + filter: "br", 16 + replacement: () => "\n", 17 + }); 18 + async function loadEpub(filePath) { 19 + const absolutePath = path.resolve(filePath); 20 + if (!fs.existsSync(absolutePath)) { 21 + throw new Error(`File not found: ${absolutePath}`); 22 + } 23 + const data = fs.readFileSync(absolutePath); 24 + const zip = await JSZip.loadAsync(data); 25 + // Find container.xml 26 + const containerXml = await zip.file("META-INF/container.xml")?.async("text"); 27 + if (!containerXml) { 28 + throw new Error("Invalid EPUB: Missing META-INF/container.xml"); 29 + } 30 + const container = await parseStringPromise(containerXml); 31 + const rootfilePath = container.container.rootfiles[0].rootfile[0].$["full-path"]; 32 + // Get content base path 33 + const contentBasePath = path.dirname(rootfilePath); 34 + // Parse OPF file 35 + const opfContent = await zip.file(rootfilePath)?.async("text"); 36 + if (!opfContent) { 37 + throw new Error(`Invalid EPUB: Missing OPF file at ${rootfilePath}`); 38 + } 39 + const opf = await parseStringPromise(opfContent); 40 + const pkg = opf.package; 41 + // Extract metadata 42 + const metadata = extractMetadata(pkg.metadata[0]); 43 + // Build manifest map 44 + const manifest = new Map(); 45 + for (const item of pkg.manifest[0].item) { 46 + manifest.set(item.$.id, { 47 + id: item.$.id, 48 + href: item.$.href, 49 + mediaType: item.$["media-type"], 50 + }); 51 + } 52 + // Extract spine 53 + const spine = pkg.spine[0].itemref.map((item) => ({ 54 + idref: item.$.idref, 55 + linear: item.$.linear, 56 + })); 57 + // Try to extract TOC 58 + const toc = await extractToc(zip, manifest, contentBasePath, pkg); 59 + return { 60 + metadata, 61 + manifest, 62 + spine, 63 + toc, 64 + contentBasePath, 65 + zip, 66 + }; 67 + } 68 + function extractMetadata(metadataNode) { 69 + const metadata = {}; 70 + // Helper to get text content from various formats 71 + const getText = (node) => { 72 + if (!node) 73 + return undefined; 74 + if (Array.isArray(node)) { 75 + const first = node[0]; 76 + if (typeof first === "string") 77 + return first; 78 + if (typeof first === "object" && first !== null && "_" in first) 79 + return first._; 80 + if (typeof first === "object" && first !== null) 81 + return JSON.stringify(first); 82 + } 83 + if (typeof node === "string") 84 + return node; 85 + return undefined; 86 + }; 87 + // DC metadata (Dublin Core) 88 + const dc = (key) => getText(metadataNode[`dc:${key}`]) || getText(metadataNode[key]); 89 + metadata.title = dc("title"); 90 + metadata.creator = dc("creator"); 91 + metadata.author = metadata.creator; 92 + metadata.language = dc("language"); 93 + metadata.publisher = dc("publisher"); 94 + metadata.date = dc("date"); 95 + metadata.description = dc("description"); 96 + metadata.identifier = dc("identifier"); 97 + // Handle multiple subjects 98 + const subjects = metadataNode["dc:subject"] || metadataNode["subject"]; 99 + if (Array.isArray(subjects)) { 100 + metadata.subject = subjects.map((s) => typeof s === "string" ? s : s._ || String(s)); 101 + } 102 + return metadata; 103 + } 104 + async function extractToc(zip, manifest, basePath, pkg) { 105 + const toc = []; 106 + // Try EPUB 3 nav document first 107 + for (const [, item] of manifest) { 108 + if (item.mediaType === "application/xhtml+xml") { 109 + const fullPath = basePath === "." ? item.href : `${basePath}/${item.href}`; 110 + const content = await zip.file(fullPath)?.async("text"); 111 + if (content && content.includes('epub:type="toc"')) { 112 + const navToc = await parseNavToc(content); 113 + if (navToc.length > 0) 114 + return navToc; 115 + } 116 + } 117 + } 118 + // Try NCX file (EPUB 2) 119 + const spine = pkg.spine; 120 + const tocId = spine?.[0]?.$?.toc; 121 + if (tocId && manifest.has(tocId)) { 122 + const ncxItem = manifest.get(tocId); 123 + const ncxPath = basePath === "." ? ncxItem.href : `${basePath}/${ncxItem.href}`; 124 + const ncxContent = await zip.file(ncxPath)?.async("text"); 125 + if (ncxContent) { 126 + return await parseNcxToc(ncxContent); 127 + } 128 + } 129 + // Fallback: look for any .ncx file 130 + for (const [, item] of manifest) { 131 + if (item.href.endsWith(".ncx")) { 132 + const ncxPath = basePath === "." ? item.href : `${basePath}/${item.href}`; 133 + const ncxContent = await zip.file(ncxPath)?.async("text"); 134 + if (ncxContent) { 135 + return await parseNcxToc(ncxContent); 136 + } 137 + } 138 + } 139 + return toc; 140 + } 141 + async function parseNavToc(navContent) { 142 + const toc = []; 143 + // Simple regex-based parsing for nav document 144 + const tocMatch = navContent.match(/<nav[^>]*epub:type="toc"[^>]*>([\s\S]*?)<\/nav>/i); 145 + if (!tocMatch) 146 + return toc; 147 + const linkRegex = /<a[^>]*href="([^"]*)"[^>]*>([^<]*)<\/a>/gi; 148 + let match; 149 + while ((match = linkRegex.exec(tocMatch[1])) !== null) { 150 + toc.push({ 151 + label: match[2].trim(), 152 + href: match[1], 153 + }); 154 + } 155 + return toc; 156 + } 157 + async function parseNcxToc(ncxContent) { 158 + const ncx = await parseStringPromise(ncxContent); 159 + const navMap = ncx.ncx?.navMap?.[0]?.navPoint; 160 + if (!navMap) 161 + return []; 162 + return parseNavPoints(navMap); 163 + } 164 + function parseNavPoints(navPoints) { 165 + return navPoints.map((point) => { 166 + const item = { 167 + label: point.navLabel?.[0]?.text?.[0] || "Untitled", 168 + href: point.content?.[0]?.$?.src || "", 169 + }; 170 + if (point.navPoint && Array.isArray(point.navPoint)) { 171 + item.children = parseNavPoints(point.navPoint); 172 + } 173 + return item; 174 + }); 175 + } 176 + async function getChapterContent(epub, index) { 177 + if (index < 0 || index >= epub.spine.length) { 178 + throw new Error(`Chapter index ${index + 1} out of range. Book has ${epub.spine.length} chapters.`); 179 + } 180 + const spineItem = epub.spine[index]; 181 + const manifestItem = epub.manifest.get(spineItem.idref); 182 + if (!manifestItem) { 183 + throw new Error(`Could not find manifest item for spine entry: ${spineItem.idref}`); 184 + } 185 + const fullPath = epub.contentBasePath === "." 186 + ? manifestItem.href 187 + : `${epub.contentBasePath}/${manifestItem.href}`; 188 + const content = await epub.zip.file(fullPath)?.async("text"); 189 + if (!content) { 190 + throw new Error(`Could not read content file: ${fullPath}`); 191 + } 192 + // Extract title from content if possible 193 + const titleMatch = content.match(/<title>([^<]*)<\/title>/i); 194 + const h1Match = content.match(/<h1[^>]*>([^<]*)<\/h1>/i); 195 + const title = epub.toc[index]?.label || 196 + h1Match?.[1] || 197 + titleMatch?.[1] || 198 + `Chapter ${index + 1}`; 199 + // Convert HTML to Markdown 200 + const markdown = htmlToMarkdown(content); 201 + return { title, content: markdown }; 202 + } 203 + function htmlToMarkdown(html) { 204 + // Extract body content if present 205 + const bodyMatch = html.match(/<body[^>]*>([\s\S]*)<\/body>/i); 206 + const content = bodyMatch ? bodyMatch[1] : html; 207 + // Convert to markdown 208 + let markdown = turndown.turndown(content); 209 + // Clean up excessive whitespace 210 + markdown = markdown.replace(/\n{3,}/g, "\n\n"); 211 + markdown = markdown.trim(); 212 + return markdown; 213 + } 214 + async function searchContent(epub, query) { 215 + const results = []; 216 + const searchRegex = new RegExp(`.{0,50}${escapeRegex(query)}.{0,50}`, "gi"); 217 + for (let i = 0; i < epub.spine.length; i++) { 218 + const { title, content } = await getChapterContent(epub, i); 219 + const matches = content.match(searchRegex); 220 + if (matches && matches.length > 0) { 221 + results.push({ 222 + chapter: i + 1, 223 + title, 224 + matches: matches.slice(0, 5).map((m) => `...${m.trim()}...`), 225 + }); 226 + } 227 + } 228 + return results; 229 + } 230 + function escapeRegex(string) { 231 + return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); 232 + } 233 + function formatToc(toc, indent = 0) { 234 + let output = ""; 235 + toc.forEach((item, index) => { 236 + const prefix = " ".repeat(indent); 237 + output += `${prefix}${indent === 0 ? index + 1 + "." : "-"} ${item.label}\n`; 238 + if (item.children) { 239 + output += formatToc(item.children, indent + 1); 240 + } 241 + }); 242 + return output; 243 + } 244 + // CLI Commands 245 + program 246 + .name("epub-reader") 247 + .description("CLI tool for reading EPUB files and extracting content as Markdown") 248 + .version("1.0.0"); 249 + program 250 + .command("metadata") 251 + .description("Display EPUB metadata (title, author, etc.)") 252 + .argument("<file>", "Path to EPUB file") 253 + .action(async (file) => { 254 + try { 255 + const epub = await loadEpub(file); 256 + const m = epub.metadata; 257 + console.log("# EPUB Metadata\n"); 258 + if (m.title) 259 + console.log(`**Title:** ${m.title}`); 260 + if (m.author) 261 + console.log(`**Author:** ${m.author}`); 262 + if (m.publisher) 263 + console.log(`**Publisher:** ${m.publisher}`); 264 + if (m.date) 265 + console.log(`**Date:** ${m.date}`); 266 + if (m.language) 267 + console.log(`**Language:** ${m.language}`); 268 + if (m.identifier) 269 + console.log(`**Identifier:** ${m.identifier}`); 270 + if (m.subject && m.subject.length > 0) { 271 + console.log(`**Subjects:** ${m.subject.join(", ")}`); 272 + } 273 + if (m.description) { 274 + console.log(`\n## Description\n\n${m.description}`); 275 + } 276 + console.log(`\n**Total Chapters:** ${epub.spine.length}`); 277 + } 278 + catch (error) { 279 + console.error(`Error: ${error instanceof Error ? error.message : String(error)}`); 280 + process.exit(1); 281 + } 282 + }); 283 + program 284 + .command("toc") 285 + .description("Display table of contents") 286 + .argument("<file>", "Path to EPUB file") 287 + .action(async (file) => { 288 + try { 289 + const epub = await loadEpub(file); 290 + console.log("# Table of Contents\n"); 291 + if (epub.toc.length > 0) { 292 + console.log(formatToc(epub.toc)); 293 + } 294 + else { 295 + // Fallback to spine-based listing 296 + console.log("(No structured TOC found, listing spine items)\n"); 297 + for (let i = 0; i < epub.spine.length; i++) { 298 + const item = epub.manifest.get(epub.spine[i].idref); 299 + console.log(`${i + 1}. ${item?.href || `Chapter ${i + 1}`}`); 300 + } 301 + } 302 + } 303 + catch (error) { 304 + console.error(`Error: ${error instanceof Error ? error.message : String(error)}`); 305 + process.exit(1); 306 + } 307 + }); 308 + program 309 + .command("chapter") 310 + .description("Read a specific chapter (1-indexed)") 311 + .argument("<file>", "Path to EPUB file") 312 + .argument("<number>", "Chapter number (starting from 1)") 313 + .action(async (file, number) => { 314 + try { 315 + const chapterNum = parseInt(number, 10); 316 + if (isNaN(chapterNum) || chapterNum < 1) { 317 + throw new Error("Chapter number must be a positive integer"); 318 + } 319 + const epub = await loadEpub(file); 320 + const { title, content } = await getChapterContent(epub, chapterNum - 1); 321 + console.log(`# ${title}\n`); 322 + console.log(content); 323 + } 324 + catch (error) { 325 + console.error(`Error: ${error instanceof Error ? error.message : String(error)}`); 326 + process.exit(1); 327 + } 328 + }); 329 + program 330 + .command("full") 331 + .description("Extract entire book as Markdown") 332 + .argument("<file>", "Path to EPUB file") 333 + .action(async (file) => { 334 + try { 335 + const epub = await loadEpub(file); 336 + const m = epub.metadata; 337 + // Print metadata header 338 + console.log(`# ${m.title || "Untitled"}\n`); 339 + if (m.author) 340 + console.log(`*By ${m.author}*\n`); 341 + console.log("---\n"); 342 + // Print each chapter 343 + for (let i = 0; i < epub.spine.length; i++) { 344 + const { title, content } = await getChapterContent(epub, i); 345 + console.log(`## ${title}\n`); 346 + console.log(content); 347 + console.log("\n---\n"); 348 + } 349 + } 350 + catch (error) { 351 + console.error(`Error: ${error instanceof Error ? error.message : String(error)}`); 352 + process.exit(1); 353 + } 354 + }); 355 + program 356 + .command("search") 357 + .description("Search for text in the book") 358 + .argument("<file>", "Path to EPUB file") 359 + .argument("<query>", "Text to search for") 360 + .action(async (file, query) => { 361 + try { 362 + const epub = await loadEpub(file); 363 + const results = await searchContent(epub, query); 364 + if (results.length === 0) { 365 + console.log(`No matches found for "${query}"`); 366 + return; 367 + } 368 + console.log(`# Search Results for "${query}"\n`); 369 + console.log(`Found matches in ${results.length} chapter(s):\n`); 370 + for (const result of results) { 371 + console.log(`## Chapter ${result.chapter}: ${result.title}\n`); 372 + for (const match of result.matches) { 373 + console.log(`- ${match}`); 374 + } 375 + console.log(); 376 + } 377 + } 378 + catch (error) { 379 + console.error(`Error: ${error instanceof Error ? error.message : String(error)}`); 380 + process.exit(1); 381 + } 382 + }); 383 + program.parse();
+223
scripts/epub-reader/package-lock.json
··· 1 + { 2 + "name": "epub-reader", 3 + "version": "1.0.0", 4 + "lockfileVersion": 3, 5 + "requires": true, 6 + "packages": { 7 + "": { 8 + "name": "epub-reader", 9 + "version": "1.0.0", 10 + "dependencies": { 11 + "commander": "^12.1.0", 12 + "jszip": "^3.10.1", 13 + "turndown": "^7.2.0", 14 + "xml2js": "^0.6.2" 15 + }, 16 + "devDependencies": { 17 + "@types/node": "^22.9.0", 18 + "@types/turndown": "^5.0.5", 19 + "@types/xml2js": "^0.4.14", 20 + "typescript": "^5.6.3" 21 + } 22 + }, 23 + "node_modules/@mixmark-io/domino": { 24 + "version": "2.2.0", 25 + "resolved": "https://registry.npmjs.org/@mixmark-io/domino/-/domino-2.2.0.tgz", 26 + "integrity": "sha512-Y28PR25bHXUg88kCV7nivXrP2Nj2RueZ3/l/jdx6J9f8J4nsEGcgX0Qe6lt7Pa+J79+kPiJU3LguR6O/6zrLOw==", 27 + "license": "BSD-2-Clause" 28 + }, 29 + "node_modules/@types/node": { 30 + "version": "22.19.1", 31 + "resolved": "https://registry.npmjs.org/@types/node/-/node-22.19.1.tgz", 32 + "integrity": "sha512-LCCV0HdSZZZb34qifBsyWlUmok6W7ouER+oQIGBScS8EsZsQbrtFTUrDX4hOl+CS6p7cnNC4td+qrSVGSCTUfQ==", 33 + "dev": true, 34 + "license": "MIT", 35 + "dependencies": { 36 + "undici-types": "~6.21.0" 37 + } 38 + }, 39 + "node_modules/@types/turndown": { 40 + "version": "5.0.6", 41 + "resolved": "https://registry.npmjs.org/@types/turndown/-/turndown-5.0.6.tgz", 42 + "integrity": "sha512-ru00MoyeeouE5BX4gRL+6m/BsDfbRayOskWqUvh7CLGW+UXxHQItqALa38kKnOiZPqJrtzJUgAC2+F0rL1S4Pg==", 43 + "dev": true, 44 + "license": "MIT" 45 + }, 46 + "node_modules/@types/xml2js": { 47 + "version": "0.4.14", 48 + "resolved": "https://registry.npmjs.org/@types/xml2js/-/xml2js-0.4.14.tgz", 49 + "integrity": "sha512-4YnrRemBShWRO2QjvUin8ESA41rH+9nQGLUGZV/1IDhi3SL9OhdpNC/MrulTWuptXKwhx/aDxE7toV0f/ypIXQ==", 50 + "dev": true, 51 + "license": "MIT", 52 + "dependencies": { 53 + "@types/node": "*" 54 + } 55 + }, 56 + "node_modules/commander": { 57 + "version": "12.1.0", 58 + "resolved": "https://registry.npmjs.org/commander/-/commander-12.1.0.tgz", 59 + "integrity": "sha512-Vw8qHK3bZM9y/P10u3Vib8o/DdkvA2OtPtZvD871QKjy74Wj1WSKFILMPRPSdUSx5RFK1arlJzEtA4PkFgnbuA==", 60 + "license": "MIT", 61 + "engines": { 62 + "node": ">=18" 63 + } 64 + }, 65 + "node_modules/core-util-is": { 66 + "version": "1.0.3", 67 + "resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz", 68 + "integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==", 69 + "license": "MIT" 70 + }, 71 + "node_modules/immediate": { 72 + "version": "3.0.6", 73 + "resolved": "https://registry.npmjs.org/immediate/-/immediate-3.0.6.tgz", 74 + "integrity": "sha512-XXOFtyqDjNDAQxVfYxuF7g9Il/IbWmmlQg2MYKOH8ExIT1qg6xc4zyS3HaEEATgs1btfzxq15ciUiY7gjSXRGQ==", 75 + "license": "MIT" 76 + }, 77 + "node_modules/inherits": { 78 + "version": "2.0.4", 79 + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", 80 + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", 81 + "license": "ISC" 82 + }, 83 + "node_modules/isarray": { 84 + "version": "1.0.0", 85 + "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", 86 + "integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==", 87 + "license": "MIT" 88 + }, 89 + "node_modules/jszip": { 90 + "version": "3.10.1", 91 + "resolved": "https://registry.npmjs.org/jszip/-/jszip-3.10.1.tgz", 92 + "integrity": "sha512-xXDvecyTpGLrqFrvkrUSoxxfJI5AH7U8zxxtVclpsUtMCq4JQ290LY8AW5c7Ggnr/Y/oK+bQMbqK2qmtk3pN4g==", 93 + "license": "(MIT OR GPL-3.0-or-later)", 94 + "dependencies": { 95 + "lie": "~3.3.0", 96 + "pako": "~1.0.2", 97 + "readable-stream": "~2.3.6", 98 + "setimmediate": "^1.0.5" 99 + } 100 + }, 101 + "node_modules/lie": { 102 + "version": "3.3.0", 103 + "resolved": "https://registry.npmjs.org/lie/-/lie-3.3.0.tgz", 104 + "integrity": "sha512-UaiMJzeWRlEujzAuw5LokY1L5ecNQYZKfmyZ9L7wDHb/p5etKaxXhohBcrw0EYby+G/NA52vRSN4N39dxHAIwQ==", 105 + "license": "MIT", 106 + "dependencies": { 107 + "immediate": "~3.0.5" 108 + } 109 + }, 110 + "node_modules/pako": { 111 + "version": "1.0.11", 112 + "resolved": "https://registry.npmjs.org/pako/-/pako-1.0.11.tgz", 113 + "integrity": "sha512-4hLB8Py4zZce5s4yd9XzopqwVv/yGNhV1Bl8NTmCq1763HeK2+EwVTv+leGeL13Dnh2wfbqowVPXCIO0z4taYw==", 114 + "license": "(MIT AND Zlib)" 115 + }, 116 + "node_modules/process-nextick-args": { 117 + "version": "2.0.1", 118 + "resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz", 119 + "integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==", 120 + "license": "MIT" 121 + }, 122 + "node_modules/readable-stream": { 123 + "version": "2.3.8", 124 + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz", 125 + "integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==", 126 + "license": "MIT", 127 + "dependencies": { 128 + "core-util-is": "~1.0.0", 129 + "inherits": "~2.0.3", 130 + "isarray": "~1.0.0", 131 + "process-nextick-args": "~2.0.0", 132 + "safe-buffer": "~5.1.1", 133 + "string_decoder": "~1.1.1", 134 + "util-deprecate": "~1.0.1" 135 + } 136 + }, 137 + "node_modules/safe-buffer": { 138 + "version": "5.1.2", 139 + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz", 140 + "integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==", 141 + "license": "MIT" 142 + }, 143 + "node_modules/sax": { 144 + "version": "1.4.3", 145 + "resolved": "https://registry.npmjs.org/sax/-/sax-1.4.3.tgz", 146 + "integrity": "sha512-yqYn1JhPczigF94DMS+shiDMjDowYO6y9+wB/4WgO0Y19jWYk0lQ4tuG5KI7kj4FTp1wxPj5IFfcrz/s1c3jjQ==", 147 + "license": "BlueOak-1.0.0" 148 + }, 149 + "node_modules/setimmediate": { 150 + "version": "1.0.5", 151 + "resolved": "https://registry.npmjs.org/setimmediate/-/setimmediate-1.0.5.tgz", 152 + "integrity": "sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==", 153 + "license": "MIT" 154 + }, 155 + "node_modules/string_decoder": { 156 + "version": "1.1.1", 157 + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", 158 + "integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==", 159 + "license": "MIT", 160 + "dependencies": { 161 + "safe-buffer": "~5.1.0" 162 + } 163 + }, 164 + "node_modules/turndown": { 165 + "version": "7.2.2", 166 + "resolved": "https://registry.npmjs.org/turndown/-/turndown-7.2.2.tgz", 167 + "integrity": "sha512-1F7db8BiExOKxjSMU2b7if62D/XOyQyZbPKq/nUwopfgnHlqXHqQ0lvfUTeUIr1lZJzOPFn43dODyMSIfvWRKQ==", 168 + "license": "MIT", 169 + "dependencies": { 170 + "@mixmark-io/domino": "^2.2.0" 171 + } 172 + }, 173 + "node_modules/typescript": { 174 + "version": "5.9.3", 175 + "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz", 176 + "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==", 177 + "dev": true, 178 + "license": "Apache-2.0", 179 + "bin": { 180 + "tsc": "bin/tsc", 181 + "tsserver": "bin/tsserver" 182 + }, 183 + "engines": { 184 + "node": ">=14.17" 185 + } 186 + }, 187 + "node_modules/undici-types": { 188 + "version": "6.21.0", 189 + "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz", 190 + "integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==", 191 + "dev": true, 192 + "license": "MIT" 193 + }, 194 + "node_modules/util-deprecate": { 195 + "version": "1.0.2", 196 + "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", 197 + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==", 198 + "license": "MIT" 199 + }, 200 + "node_modules/xml2js": { 201 + "version": "0.6.2", 202 + "resolved": "https://registry.npmjs.org/xml2js/-/xml2js-0.6.2.tgz", 203 + "integrity": "sha512-T4rieHaC1EXcES0Kxxj4JWgaUQHDk+qwHcYOCFHfiwKz7tOVPLq7Hjq9dM1WCMhylqMEfP7hMcOIChvotiZegA==", 204 + "license": "MIT", 205 + "dependencies": { 206 + "sax": ">=0.6.0", 207 + "xmlbuilder": "~11.0.0" 208 + }, 209 + "engines": { 210 + "node": ">=4.0.0" 211 + } 212 + }, 213 + "node_modules/xmlbuilder": { 214 + "version": "11.0.1", 215 + "resolved": "https://registry.npmjs.org/xmlbuilder/-/xmlbuilder-11.0.1.tgz", 216 + "integrity": "sha512-fDlsI/kFEx7gLvbecc0/ohLG50fugQp8ryHzMTuW9vSa1GJ0XYWKnhsUx7oie3G98+r56aTQIUB4kht42R3JvA==", 217 + "license": "MIT", 218 + "engines": { 219 + "node": ">=4.0" 220 + } 221 + } 222 + } 223 + }
+23
scripts/epub-reader/package.json
··· 1 + { 2 + "name": "epub-reader", 3 + "version": "1.0.0", 4 + "description": "CLI tool for reading EPUB files and extracting content as Markdown", 5 + "main": "dist/index.js", 6 + "type": "module", 7 + "scripts": { 8 + "build": "tsc", 9 + "start": "node dist/index.js" 10 + }, 11 + "dependencies": { 12 + "commander": "^12.1.0", 13 + "turndown": "^7.2.0", 14 + "jszip": "^3.10.1", 15 + "xml2js": "^0.6.2" 16 + }, 17 + "devDependencies": { 18 + "@types/node": "^22.9.0", 19 + "@types/turndown": "^5.0.5", 20 + "@types/xml2js": "^0.4.14", 21 + "typescript": "^5.6.3" 22 + } 23 + }
+524
scripts/epub-reader/src/index.ts
··· 1 + #!/usr/bin/env node 2 + 3 + import { program } from "commander"; 4 + import * as fs from "fs"; 5 + import * as path from "path"; 6 + import JSZip from "jszip"; 7 + import { parseStringPromise } from "xml2js"; 8 + import TurndownService from "turndown"; 9 + 10 + interface EpubMetadata { 11 + title?: string; 12 + creator?: string; 13 + author?: string; 14 + language?: string; 15 + publisher?: string; 16 + date?: string; 17 + description?: string; 18 + subject?: string[]; 19 + identifier?: string; 20 + } 21 + 22 + interface ManifestItem { 23 + id: string; 24 + href: string; 25 + mediaType: string; 26 + } 27 + 28 + interface SpineItem { 29 + idref: string; 30 + linear?: string; 31 + } 32 + 33 + interface TocItem { 34 + label: string; 35 + href: string; 36 + children?: TocItem[]; 37 + } 38 + 39 + interface ParsedEpub { 40 + metadata: EpubMetadata; 41 + manifest: Map<string, ManifestItem>; 42 + spine: SpineItem[]; 43 + toc: TocItem[]; 44 + contentBasePath: string; 45 + zip: JSZip; 46 + } 47 + 48 + const turndown = new TurndownService({ 49 + headingStyle: "atx", 50 + codeBlockStyle: "fenced", 51 + emDelimiter: "*", 52 + }); 53 + 54 + // Improve turndown to handle more elements 55 + turndown.addRule("preserveLineBreaks", { 56 + filter: "br", 57 + replacement: () => "\n", 58 + }); 59 + 60 + async function loadEpub(filePath: string): Promise<ParsedEpub> { 61 + const absolutePath = path.resolve(filePath); 62 + 63 + if (!fs.existsSync(absolutePath)) { 64 + throw new Error(`File not found: ${absolutePath}`); 65 + } 66 + 67 + const data = fs.readFileSync(absolutePath); 68 + const zip = await JSZip.loadAsync(data); 69 + 70 + // Find container.xml 71 + const containerXml = await zip.file("META-INF/container.xml")?.async("text"); 72 + if (!containerXml) { 73 + throw new Error("Invalid EPUB: Missing META-INF/container.xml"); 74 + } 75 + 76 + const container = await parseStringPromise(containerXml); 77 + const rootfilePath = 78 + container.container.rootfiles[0].rootfile[0].$["full-path"]; 79 + 80 + // Get content base path 81 + const contentBasePath = path.dirname(rootfilePath); 82 + 83 + // Parse OPF file 84 + const opfContent = await zip.file(rootfilePath)?.async("text"); 85 + if (!opfContent) { 86 + throw new Error(`Invalid EPUB: Missing OPF file at ${rootfilePath}`); 87 + } 88 + 89 + const opf = await parseStringPromise(opfContent); 90 + const pkg = opf.package; 91 + 92 + // Extract metadata 93 + const metadata = extractMetadata(pkg.metadata[0]); 94 + 95 + // Build manifest map 96 + const manifest = new Map<string, ManifestItem>(); 97 + for (const item of pkg.manifest[0].item) { 98 + manifest.set(item.$.id, { 99 + id: item.$.id, 100 + href: item.$.href, 101 + mediaType: item.$["media-type"], 102 + }); 103 + } 104 + 105 + // Extract spine 106 + const spine: SpineItem[] = pkg.spine[0].itemref.map( 107 + (item: { $: { idref: string; linear?: string } }) => ({ 108 + idref: item.$.idref, 109 + linear: item.$.linear, 110 + }) 111 + ); 112 + 113 + // Try to extract TOC 114 + const toc = await extractToc(zip, manifest, contentBasePath, pkg); 115 + 116 + return { 117 + metadata, 118 + manifest, 119 + spine, 120 + toc, 121 + contentBasePath, 122 + zip, 123 + }; 124 + } 125 + 126 + function extractMetadata(metadataNode: Record<string, unknown>): EpubMetadata { 127 + const metadata: EpubMetadata = {}; 128 + 129 + // Helper to get text content from various formats 130 + const getText = (node: unknown): string | undefined => { 131 + if (!node) return undefined; 132 + if (Array.isArray(node)) { 133 + const first = node[0]; 134 + if (typeof first === "string") return first; 135 + if (typeof first === "object" && first !== null && "_" in first) 136 + return (first as { _: string })._; 137 + if (typeof first === "object" && first !== null) 138 + return JSON.stringify(first); 139 + } 140 + if (typeof node === "string") return node; 141 + return undefined; 142 + }; 143 + 144 + // DC metadata (Dublin Core) 145 + const dc = (key: string) => 146 + getText(metadataNode[`dc:${key}`]) || getText(metadataNode[key]); 147 + 148 + metadata.title = dc("title"); 149 + metadata.creator = dc("creator"); 150 + metadata.author = metadata.creator; 151 + metadata.language = dc("language"); 152 + metadata.publisher = dc("publisher"); 153 + metadata.date = dc("date"); 154 + metadata.description = dc("description"); 155 + metadata.identifier = dc("identifier"); 156 + 157 + // Handle multiple subjects 158 + const subjects = metadataNode["dc:subject"] || metadataNode["subject"]; 159 + if (Array.isArray(subjects)) { 160 + metadata.subject = subjects.map((s) => 161 + typeof s === "string" ? s : (s as { _?: string })._ || String(s) 162 + ); 163 + } 164 + 165 + return metadata; 166 + } 167 + 168 + async function extractToc( 169 + zip: JSZip, 170 + manifest: Map<string, ManifestItem>, 171 + basePath: string, 172 + pkg: Record<string, unknown> 173 + ): Promise<TocItem[]> { 174 + const toc: TocItem[] = []; 175 + 176 + // Try EPUB 3 nav document first 177 + for (const [, item] of manifest) { 178 + if (item.mediaType === "application/xhtml+xml") { 179 + const fullPath = 180 + basePath === "." ? item.href : `${basePath}/${item.href}`; 181 + const content = await zip.file(fullPath)?.async("text"); 182 + if (content && content.includes('epub:type="toc"')) { 183 + const navToc = await parseNavToc(content); 184 + if (navToc.length > 0) return navToc; 185 + } 186 + } 187 + } 188 + 189 + // Try NCX file (EPUB 2) 190 + const spine = pkg.spine as { $?: { toc?: string } }[] | undefined; 191 + const tocId = spine?.[0]?.$?.toc; 192 + if (tocId && manifest.has(tocId)) { 193 + const ncxItem = manifest.get(tocId)!; 194 + const ncxPath = 195 + basePath === "." ? ncxItem.href : `${basePath}/${ncxItem.href}`; 196 + const ncxContent = await zip.file(ncxPath)?.async("text"); 197 + if (ncxContent) { 198 + return await parseNcxToc(ncxContent); 199 + } 200 + } 201 + 202 + // Fallback: look for any .ncx file 203 + for (const [, item] of manifest) { 204 + if (item.href.endsWith(".ncx")) { 205 + const ncxPath = 206 + basePath === "." ? item.href : `${basePath}/${item.href}`; 207 + const ncxContent = await zip.file(ncxPath)?.async("text"); 208 + if (ncxContent) { 209 + return await parseNcxToc(ncxContent); 210 + } 211 + } 212 + } 213 + 214 + return toc; 215 + } 216 + 217 + async function parseNavToc(navContent: string): Promise<TocItem[]> { 218 + const toc: TocItem[] = []; 219 + 220 + // Simple regex-based parsing for nav document 221 + const tocMatch = navContent.match( 222 + /<nav[^>]*epub:type="toc"[^>]*>([\s\S]*?)<\/nav>/i 223 + ); 224 + if (!tocMatch) return toc; 225 + 226 + const linkRegex = /<a[^>]*href="([^"]*)"[^>]*>([^<]*)<\/a>/gi; 227 + let match; 228 + 229 + while ((match = linkRegex.exec(tocMatch[1])) !== null) { 230 + toc.push({ 231 + label: match[2].trim(), 232 + href: match[1], 233 + }); 234 + } 235 + 236 + return toc; 237 + } 238 + 239 + async function parseNcxToc(ncxContent: string): Promise<TocItem[]> { 240 + const ncx = await parseStringPromise(ncxContent); 241 + const navMap = ncx.ncx?.navMap?.[0]?.navPoint; 242 + 243 + if (!navMap) return []; 244 + 245 + return parseNavPoints(navMap); 246 + } 247 + 248 + function parseNavPoints( 249 + navPoints: Array<{ 250 + navLabel?: Array<{ text?: string[] }>; 251 + content?: Array<{ $?: { src?: string } }>; 252 + navPoint?: unknown[]; 253 + }> 254 + ): TocItem[] { 255 + return navPoints.map((point) => { 256 + const item: TocItem = { 257 + label: point.navLabel?.[0]?.text?.[0] || "Untitled", 258 + href: point.content?.[0]?.$?.src || "", 259 + }; 260 + 261 + if (point.navPoint && Array.isArray(point.navPoint)) { 262 + item.children = parseNavPoints( 263 + point.navPoint as Array<{ 264 + navLabel?: Array<{ text?: string[] }>; 265 + content?: Array<{ $?: { src?: string } }>; 266 + navPoint?: unknown[]; 267 + }> 268 + ); 269 + } 270 + 271 + return item; 272 + }); 273 + } 274 + 275 + async function getChapterContent( 276 + epub: ParsedEpub, 277 + index: number 278 + ): Promise<{ title: string; content: string }> { 279 + if (index < 0 || index >= epub.spine.length) { 280 + throw new Error( 281 + `Chapter index ${index + 1} out of range. Book has ${epub.spine.length} chapters.` 282 + ); 283 + } 284 + 285 + const spineItem = epub.spine[index]; 286 + const manifestItem = epub.manifest.get(spineItem.idref); 287 + 288 + if (!manifestItem) { 289 + throw new Error(`Could not find manifest item for spine entry: ${spineItem.idref}`); 290 + } 291 + 292 + const fullPath = 293 + epub.contentBasePath === "." 294 + ? manifestItem.href 295 + : `${epub.contentBasePath}/${manifestItem.href}`; 296 + 297 + const content = await epub.zip.file(fullPath)?.async("text"); 298 + if (!content) { 299 + throw new Error(`Could not read content file: ${fullPath}`); 300 + } 301 + 302 + // Extract title from content if possible 303 + const titleMatch = content.match(/<title>([^<]*)<\/title>/i); 304 + const h1Match = content.match(/<h1[^>]*>([^<]*)<\/h1>/i); 305 + const title = 306 + epub.toc[index]?.label || 307 + h1Match?.[1] || 308 + titleMatch?.[1] || 309 + `Chapter ${index + 1}`; 310 + 311 + // Convert HTML to Markdown 312 + const markdown = htmlToMarkdown(content); 313 + 314 + return { title, content: markdown }; 315 + } 316 + 317 + function htmlToMarkdown(html: string): string { 318 + // Extract body content if present 319 + const bodyMatch = html.match(/<body[^>]*>([\s\S]*)<\/body>/i); 320 + const content = bodyMatch ? bodyMatch[1] : html; 321 + 322 + // Convert to markdown 323 + let markdown = turndown.turndown(content); 324 + 325 + // Clean up excessive whitespace 326 + markdown = markdown.replace(/\n{3,}/g, "\n\n"); 327 + markdown = markdown.trim(); 328 + 329 + return markdown; 330 + } 331 + 332 + async function searchContent( 333 + epub: ParsedEpub, 334 + query: string 335 + ): Promise<Array<{ chapter: number; title: string; matches: string[] }>> { 336 + const results: Array<{ chapter: number; title: string; matches: string[] }> = 337 + []; 338 + const searchRegex = new RegExp(`.{0,50}${escapeRegex(query)}.{0,50}`, "gi"); 339 + 340 + for (let i = 0; i < epub.spine.length; i++) { 341 + const { title, content } = await getChapterContent(epub, i); 342 + const matches = content.match(searchRegex); 343 + 344 + if (matches && matches.length > 0) { 345 + results.push({ 346 + chapter: i + 1, 347 + title, 348 + matches: matches.slice(0, 5).map((m) => `...${m.trim()}...`), 349 + }); 350 + } 351 + } 352 + 353 + return results; 354 + } 355 + 356 + function escapeRegex(string: string): string { 357 + return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); 358 + } 359 + 360 + function formatToc(toc: TocItem[], indent = 0): string { 361 + let output = ""; 362 + toc.forEach((item, index) => { 363 + const prefix = " ".repeat(indent); 364 + output += `${prefix}${indent === 0 ? index + 1 + "." : "-"} ${item.label}\n`; 365 + if (item.children) { 366 + output += formatToc(item.children, indent + 1); 367 + } 368 + }); 369 + return output; 370 + } 371 + 372 + // CLI Commands 373 + program 374 + .name("epub-reader") 375 + .description("CLI tool for reading EPUB files and extracting content as Markdown") 376 + .version("1.0.0"); 377 + 378 + program 379 + .command("metadata") 380 + .description("Display EPUB metadata (title, author, etc.)") 381 + .argument("<file>", "Path to EPUB file") 382 + .action(async (file: string) => { 383 + try { 384 + const epub = await loadEpub(file); 385 + const m = epub.metadata; 386 + 387 + console.log("# EPUB Metadata\n"); 388 + if (m.title) console.log(`**Title:** ${m.title}`); 389 + if (m.author) console.log(`**Author:** ${m.author}`); 390 + if (m.publisher) console.log(`**Publisher:** ${m.publisher}`); 391 + if (m.date) console.log(`**Date:** ${m.date}`); 392 + if (m.language) console.log(`**Language:** ${m.language}`); 393 + if (m.identifier) console.log(`**Identifier:** ${m.identifier}`); 394 + if (m.subject && m.subject.length > 0) { 395 + console.log(`**Subjects:** ${m.subject.join(", ")}`); 396 + } 397 + if (m.description) { 398 + console.log(`\n## Description\n\n${m.description}`); 399 + } 400 + console.log(`\n**Total Chapters:** ${epub.spine.length}`); 401 + } catch (error) { 402 + console.error( 403 + `Error: ${error instanceof Error ? error.message : String(error)}` 404 + ); 405 + process.exit(1); 406 + } 407 + }); 408 + 409 + program 410 + .command("toc") 411 + .description("Display table of contents") 412 + .argument("<file>", "Path to EPUB file") 413 + .action(async (file: string) => { 414 + try { 415 + const epub = await loadEpub(file); 416 + 417 + console.log("# Table of Contents\n"); 418 + 419 + if (epub.toc.length > 0) { 420 + console.log(formatToc(epub.toc)); 421 + } else { 422 + // Fallback to spine-based listing 423 + console.log("(No structured TOC found, listing spine items)\n"); 424 + for (let i = 0; i < epub.spine.length; i++) { 425 + const item = epub.manifest.get(epub.spine[i].idref); 426 + console.log(`${i + 1}. ${item?.href || `Chapter ${i + 1}`}`); 427 + } 428 + } 429 + } catch (error) { 430 + console.error( 431 + `Error: ${error instanceof Error ? error.message : String(error)}` 432 + ); 433 + process.exit(1); 434 + } 435 + }); 436 + 437 + program 438 + .command("chapter") 439 + .description("Read a specific chapter (1-indexed)") 440 + .argument("<file>", "Path to EPUB file") 441 + .argument("<number>", "Chapter number (starting from 1)") 442 + .action(async (file: string, number: string) => { 443 + try { 444 + const chapterNum = parseInt(number, 10); 445 + if (isNaN(chapterNum) || chapterNum < 1) { 446 + throw new Error("Chapter number must be a positive integer"); 447 + } 448 + 449 + const epub = await loadEpub(file); 450 + const { title, content } = await getChapterContent(epub, chapterNum - 1); 451 + 452 + console.log(`# ${title}\n`); 453 + console.log(content); 454 + } catch (error) { 455 + console.error( 456 + `Error: ${error instanceof Error ? error.message : String(error)}` 457 + ); 458 + process.exit(1); 459 + } 460 + }); 461 + 462 + program 463 + .command("full") 464 + .description("Extract entire book as Markdown") 465 + .argument("<file>", "Path to EPUB file") 466 + .action(async (file: string) => { 467 + try { 468 + const epub = await loadEpub(file); 469 + const m = epub.metadata; 470 + 471 + // Print metadata header 472 + console.log(`# ${m.title || "Untitled"}\n`); 473 + if (m.author) console.log(`*By ${m.author}*\n`); 474 + console.log("---\n"); 475 + 476 + // Print each chapter 477 + for (let i = 0; i < epub.spine.length; i++) { 478 + const { title, content } = await getChapterContent(epub, i); 479 + console.log(`## ${title}\n`); 480 + console.log(content); 481 + console.log("\n---\n"); 482 + } 483 + } catch (error) { 484 + console.error( 485 + `Error: ${error instanceof Error ? error.message : String(error)}` 486 + ); 487 + process.exit(1); 488 + } 489 + }); 490 + 491 + program 492 + .command("search") 493 + .description("Search for text in the book") 494 + .argument("<file>", "Path to EPUB file") 495 + .argument("<query>", "Text to search for") 496 + .action(async (file: string, query: string) => { 497 + try { 498 + const epub = await loadEpub(file); 499 + const results = await searchContent(epub, query); 500 + 501 + if (results.length === 0) { 502 + console.log(`No matches found for "${query}"`); 503 + return; 504 + } 505 + 506 + console.log(`# Search Results for "${query}"\n`); 507 + console.log(`Found matches in ${results.length} chapter(s):\n`); 508 + 509 + for (const result of results) { 510 + console.log(`## Chapter ${result.chapter}: ${result.title}\n`); 511 + for (const match of result.matches) { 512 + console.log(`- ${match}`); 513 + } 514 + console.log(); 515 + } 516 + } catch (error) { 517 + console.error( 518 + `Error: ${error instanceof Error ? error.message : String(error)}` 519 + ); 520 + process.exit(1); 521 + } 522 + }); 523 + 524 + program.parse();
+16
scripts/epub-reader/tsconfig.json
··· 1 + { 2 + "compilerOptions": { 3 + "target": "ES2022", 4 + "module": "ESNext", 5 + "moduleResolution": "node", 6 + "esModuleInterop": true, 7 + "strict": true, 8 + "outDir": "./dist", 9 + "rootDir": "./src", 10 + "declaration": true, 11 + "skipLibCheck": true, 12 + "resolveJsonModule": true 13 + }, 14 + "include": ["src/**/*"], 15 + "exclude": ["node_modules", "dist"] 16 + }