···11+# smol-epub
22+33+Minimal `no_std` EPUB parser with streaming decompression, HTML stripping,
44+CSS resolution, and optional 1-bit image decoders.
55+66+Designed for memory-constrained embedded targets (≥ 140 KB heap), but works
77+anywhere `alloc` is available.
88+99+## Features
1010+1111+| Module | Purpose |
1212+|--------|---------|
1313+| `zip` | ZIP central-directory parser, streaming DEFLATE extraction |
1414+| `xml` | Minimal XML tag / attribute scanner (EPUB metadata) |
1515+| `css` | CSS property parser for EPUB stylesheets |
1616+| `epub` | EPUB structure: `container.xml` → OPF → spine / metadata / TOC |
1717+| `html_strip` | Single-pass, streaming HTML-to-styled-text converter |
1818+| `cache` | Chapter decompress-and-strip pipeline with cache metadata |
1919+| `png` | PNG decoder → 1-bit Floyd–Steinberg dithered bitmap *(feature `images`)* |
2020+| `jpeg` | JPEG decoder → 1-bit Floyd–Steinberg dithered bitmap *(feature `images`)* |
2121+2222+## Feature flags
2323+2424+| Flag | Default | Description |
2525+|------|---------|-------------|
2626+| `images` | ✓ | Enable `png` and `jpeg` image decoders |
2727+2828+## Quick start
2929+3030+```rust
3131+use smol_epub::zip::{self, ZipIndex};
3232+use smol_epub::epub::{self, EpubMeta, EpubSpine, EpubToc};
3333+3434+// 1. Build ZIP index from the EPUB file's central directory
3535+let mut zip = ZipIndex::new();
3636+let (cd_offset, cd_size) = ZipIndex::parse_eocd(&tail_buf, file_size)?;
3737+// ... read the central directory bytes into `cd_buf` ...
3838+zip.parse_central_directory(&cd_buf)?;
3939+4040+// 2. Parse EPUB structure
4141+let container = zip::extract_entry(
4242+ zip.entry(zip.find("META-INF/container.xml").unwrap()),
4343+ zip.entry(zip.find("META-INF/container.xml").unwrap()).local_offset,
4444+ |off, buf| read_fn(off, buf),
4545+)?;
4646+let mut opf_path = [0u8; epub::OPF_PATH_CAP];
4747+let opf_len = epub::parse_container(&container, &mut opf_path)?;
4848+4949+// 3. Extract metadata and reading-order spine
5050+let mut meta = EpubMeta::new();
5151+let mut spine = EpubSpine::new();
5252+epub::parse_opf(&opf_data, opf_dir, &zip, &mut meta, &mut spine)?;
5353+println!("{} by {}", meta.title_str(), meta.author_str());
5454+5555+// 4. Optionally parse the table of contents
5656+let mut toc = EpubToc::new();
5757+if let Some(src) = epub::find_toc_source(&opf_data, opf_dir, &zip) {
5858+ epub::parse_toc(src, &toc_data, toc_dir, &spine, &zip, &mut toc);
5959+}
6060+6161+// 5. Stream-decompress + HTML-strip a chapter
6262+let bytes_written = smol_epub::cache::stream_strip_entry(
6363+ &entry, local_offset,
6464+ |off, buf| read_fn(off, buf), // read closure
6565+ |chunk| { output.extend(chunk); Ok(()) }, // output closure
6666+)?;
6767+```
6868+6969+## Streaming I/O model
7070+7171+All functions that read from an external byte source accept a generic
7272+closure:
7373+7474+```rust
7575+FnMut(offset: u32, buf: &mut [u8]) -> Result<usize, E>
7676+```
7777+7878+This works with SD cards, flash memory, `std::fs::File`, in-memory buffers,
7979+or any other random-access byte store — the crate never assumes a specific
8080+storage backend.
8181+8282+## Image decoders
8383+8484+The `png` and `jpeg` modules decode images to 1-bit monochrome bitmaps
8585+using Floyd–Steinberg dithering, ideal for e-ink displays. Three decoder
8686+variants are provided for each format:
8787+8888+| Function | Input |
8989+|----------|-------|
9090+| `decode_{png,jpeg}_fit` | In-memory `&[u8]` buffer |
9191+| `decode_{png,jpeg}_streaming` | Stored (uncompressed) ZIP entry via read closure |
9292+| `decode_{png,jpeg}_deflate_streaming` | DEFLATE-compressed ZIP entry via read closure |
9393+9494+All variants accept `max_w` / `max_h` parameters and integer-downscale
9595+the image to fit.
9696+9797+## Memory budget
9898+9999+Typical peak heap usage on an embedded target:
100100+101101+| Operation | Peak heap |
102102+|-----------|-----------|
103103+| ZIP index parse | ~5 KB |
104104+| Chapter stream-strip (DEFLATE) | ~51 KB |
105105+| PNG streaming decode | ~90 KB |
106106+| JPEG streaming decode | ~30 KB |
107107+| JPEG DEFLATE streaming decode | ~79 KB |
108108+109109+Stack usage is kept low throughout; large structs like `DecompressorOxide`
110110+(~11 KB) are always heap-allocated via `Box`.
111111+112112+## License
113113+114114+Licensed under either of
115115+116116+- [MIT license](http://opensource.org/licenses/MIT)
117117+- [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)
118118+119119+at your option.
+33-9
smol-epub/src/cache.rs
···11-// EPUB chapter cache: streaming decompress + HTML strip to SD.
22-// No persistent heap; ~51KB temp per chapter. Cache dir: _XXXXXXX/
33-// with META.BIN + CHnnn.TXT files.
11+//! EPUB chapter cache: streaming decompress + HTML strip pipeline.
22+//!
33+//! No persistent heap; ≈ 51 KB temporary per chapter.
44+//! Cache directory layout uses 8.3-safe names: `_XXXXXXX/` with
55+//! `META.BIN` + `CHnnn.TXT` files.
4657use alloc::boxed::Box;
68use alloc::vec::Vec;
···1214const CACHE_VERSION: u8 = 1;
1315const META_HEADER: usize = 16;
14161717+/// Maximum number of chapters that can be tracked in a single cache.
1518pub const MAX_CACHE_CHAPTERS: usize = 256;
1919+/// Maximum byte size of a `META.BIN` file (header + one `u32` per chapter).
1620pub const META_MAX_SIZE: usize = META_HEADER + 4 * MAX_CACHE_CHAPTERS;
17211822const WINDOW_SIZE: usize = 32768; // DEFLATE sliding window
···2024const STRIP_BUF_SIZE: usize = 4096; // strip output accumulator
2125const FLUSH_THRESHOLD: usize = STRIP_BUF_SIZE - 128;
22262727+/// Compute the FNV-1a hash of `data`.
2328#[inline]
2429pub fn fnv1a(data: &[u8]) -> u32 {
2530 let mut h: u32 = 0x811c_9dc5;
···3035 h
3136}
32373333-// 8.3 cache dir name: '_' + 7 hex digits of lower 28 bits of hash
3838+/// Generate an 8.3-safe cache directory name from a hash.
3939+///
4040+/// Format: `_` followed by 7 uppercase hex digits of the lower 28 bits.
3441pub fn dir_name_for_hash(name_hash: u32) -> [u8; 8] {
3542 let h = name_hash & 0x0FFF_FFFF;
3643 let mut buf = [0u8; 8];
···4653 buf
4754}
48555656+/// Interpret an 8-byte directory name buffer as a UTF-8 `&str`.
4957#[inline]
5058pub fn dir_name_str(buf: &[u8; 8]) -> &str {
5159 core::str::from_utf8(buf).unwrap_or("_0000000")
5260}
53615454-// 8.3 chapter filename: CH000.TXT to CH255.TXT
6262+/// Generate an 8.3-safe chapter filename: `CH000.TXT` through `CH255.TXT`.
5563pub fn chapter_file_name(idx: u16) -> [u8; 9] {
5664 debug_assert!(idx < 1000, "chapter index out of 3-digit range");
5765 let mut n = *b"CH000.TXT";
···6169 n
6270}
63717272+/// Interpret a 9-byte chapter filename buffer as a UTF-8 `&str`.
6473#[inline]
6574pub fn chapter_file_str(buf: &[u8; 9]) -> &str {
6675 core::str::from_utf8(buf).unwrap_or("CH000.TXT")
6776}
68777878+/// Filename used for the cache metadata file.
6979pub const META_FILE: &str = "META.BIN";
70807171-// encode cache metadata into buf; return bytes written
8181+/// Encode cache metadata into `buf`; returns the number of bytes written.
8282+///
8383+/// The metadata header stores a magic value, version, the EPUB file size,
8484+/// a name hash, and a `u32` size for each cached chapter.
7285pub fn encode_cache_meta(
7386 epub_size: u32,
7487 name_hash: u32,
···100113 total
101114}
102115103103-// parse and validate META.BIN; write chapter sizes into out slice
116116+/// Parse and validate a `META.BIN` blob.
117117+///
118118+/// On success, writes individual chapter sizes into `chapter_sizes_out`
119119+/// and returns the number of chapters. Returns an error if the magic,
120120+/// version, EPUB size, name hash, or chapter count do not match.
104121pub fn parse_cache_meta(
105122 data: &[u8],
106123 epub_size: u32,
···154171 Ok(count)
155172}
156173157157-// stream-decompress ZIP entry, strip HTML, emit plain-text chunks; ~47KB temp
174174+/// Stream-decompress a ZIP entry, strip HTML, and emit plain-text chunks.
175175+///
176176+/// `read_fn(offset, buf)` reads raw bytes from the underlying store.
177177+/// `output_fn(chunk)` receives stripped plain-text output incrementally.
178178+///
179179+/// Returns the total number of bytes written through `output_fn`.
180180+/// Peak temporary memory ≈ 47 KB (decompressor + sliding window + strip
181181+/// buffers).
158182pub fn stream_strip_entry<E>(
159183 entry: &ZipEntry,
160184 local_offset: u32,
···289313 Ok(_) => {
290314 comp_left = 0;
291315 }
292292- Err(_) => return Err("cache: SD read failed during deflate"),
316316+ Err(_) => return Err("cache: read failed during deflate"),
293317 }
294318 }
295319
+65-14
smol-epub/src/css.rs
···11-// Minimal CSS parser for EPUB stylesheets.
22-// Selectors: tag, .class, tag.class, grouped. Combinators reduced to
33-// rightmost simple selector. @-rules and pseudo-classes skipped.
44-// Rule table stack-allocated: MAX_CSS_RULES x ~16B = 2KB.
11+//! Minimal CSS parser for EPUB stylesheets.
22+//!
33+//! Selectors: tag, `.class`, `tag.class`, grouped. Combinators are
44+//! reduced to the rightmost simple selector. `@`-rules and
55+//! pseudo-classes are skipped.
66+//!
77+//! Rule table is stack-allocated: `MAX_CSS_RULES` × ~16 B = 2 KB.
5899+/// Maximum number of CSS rules the parser will store.
610pub const MAX_CSS_RULES: usize = 128;
71188-// property flag bits (which fields in StyleProps are explicitly set)
1212+// ── property flag bits (which fields in StyleProps are explicitly set) ──
9131414+/// Flag: `font-weight` is explicitly set.
1015pub const PROP_FONT_WEIGHT: u16 = 1 << 0;
1616+/// Flag: `font-style` is explicitly set.
1117pub const PROP_FONT_STYLE: u16 = 1 << 1;
1818+/// Flag: `text-align` is explicitly set.
1219pub const PROP_TEXT_ALIGN: u16 = 1 << 2;
2020+/// Flag: `text-indent` is explicitly set.
1321pub const PROP_TEXT_INDENT: u16 = 1 << 3;
2222+/// Flag: `margin-left` is explicitly set.
1423pub const PROP_MARGIN_LEFT: u16 = 1 << 4;
2424+/// Flag: `margin-right` is explicitly set.
1525pub const PROP_MARGIN_RIGHT: u16 = 1 << 5;
2626+/// Flag: `margin-top` is explicitly set.
1627pub const PROP_MARGIN_TOP: u16 = 1 << 6;
2828+/// Flag: `margin-bottom` is explicitly set.
1729pub const PROP_MARGIN_BOTTOM: u16 = 1 << 7;
3030+/// Flag: `display` is explicitly set.
1831pub const PROP_DISPLAY: u16 = 1 << 8;
3232+/// Flag: `text-decoration` is explicitly set.
1933pub const PROP_TEXT_DECORATION: u16 = 1 << 9;
20342121-// property value constants
3535+// ── property value constants ────────────────────────────────────────
22362323-// font-weight
3737+/// `font-weight: normal`.
2438pub const FW_NORMAL: u8 = 0;
3939+/// `font-weight: bold`.
2540pub const FW_BOLD: u8 = 1;
26412727-// font-style
4242+/// `font-style: normal`.
2843pub const FS_NORMAL: u8 = 0;
4444+/// `font-style: italic`.
2945pub const FS_ITALIC: u8 = 1;
30463131-// text-align
4747+/// `text-align: left`.
3248pub const TA_LEFT: u8 = 0;
4949+/// `text-align: center`.
3350pub const TA_CENTER: u8 = 1;
5151+/// `text-align: right`.
3452pub const TA_RIGHT: u8 = 2;
5353+/// `text-align: justify`.
3554pub const TA_JUSTIFY: u8 = 3;
36553737-// display
5656+/// `display` not explicitly set (inherit / default).
3857pub const DISP_DEFAULT: u8 = 0;
5858+/// `display: none`.
3959pub const DISP_NONE: u8 = 1;
6060+/// `display: block`.
4061pub const DISP_BLOCK: u8 = 2;
6262+/// `display: inline`.
4163pub const DISP_INLINE: u8 = 3;
42644343-// text-decoration (bitmask)
6565+/// `text-decoration: none`.
4466pub const TD_NONE: u8 = 0;
6767+/// `text-decoration: underline`.
4568pub const TD_UNDERLINE: u8 = 1;
6969+/// `text-decoration: line-through`.
4670pub const TD_LINE_THROUGH: u8 = 2;
47714848-// resolved CSS properties; `set` tracks which are explicitly specified.
4949-// Lengths in quarter-em units (i8): 1em = 4, 0.5em = 2, 2em = 8.
5050-7272+/// Resolved CSS properties for a single element.
7373+///
7474+/// The `set` bitmask tracks which fields have been explicitly specified
7575+/// by a stylesheet rule. Lengths are stored in **quarter-em** units
7676+/// (`i8`): 1 em = 4, 0.5 em = 2, 2 em = 8.
5177#[derive(Clone, Copy)]
5278pub struct StyleProps {
7979+ /// Bitmask of `PROP_*` flags indicating which fields are set.
5380 pub set: u16,
8181+ /// `font-weight` — see [`FW_NORMAL`], [`FW_BOLD`].
5482 pub font_weight: u8,
8383+ /// `font-style` — see [`FS_NORMAL`], [`FS_ITALIC`].
5584 pub font_style: u8,
8585+ /// `text-align` — see [`TA_LEFT`], [`TA_CENTER`], etc.
5686 pub text_align: u8,
8787+ /// `text-indent` in quarter-em units.
5788 pub text_indent: i8,
8989+ /// `margin-left` in quarter-em units.
5890 pub margin_left: i8,
9191+ /// `margin-right` in quarter-em units.
5992 pub margin_right: i8,
9393+ /// `margin-top` in quarter-em units.
6094 pub margin_top: i8,
9595+ /// `margin-bottom` in quarter-em units.
6196 pub margin_bottom: i8,
9797+ /// `display` — see [`DISP_DEFAULT`], [`DISP_NONE`], etc.
6298 pub display: u8,
9999+ /// `text-decoration` bitmask — see [`TD_NONE`], [`TD_UNDERLINE`], etc.
63100 pub text_decoration: u8,
64101}
6510266103impl StyleProps {
104104+ /// A `StyleProps` with no fields set and all values at their defaults.
67105 pub const EMPTY: Self = Self {
68106 set: 0,
69107 font_weight: FW_NORMAL,
···102140 }
103141104142 #[inline]
143143+ /// Returns `true` if `font-weight` is set to bold.
105144 pub fn is_bold(&self) -> bool {
106145 self.set & PROP_FONT_WEIGHT != 0 && self.font_weight == FW_BOLD
107146 }
108147109148 #[inline]
149149+ /// Returns `true` if `font-style` is set to italic.
110150 pub fn is_italic(&self) -> bool {
111151 self.set & PROP_FONT_STYLE != 0 && self.font_style == FS_ITALIC
112152 }
113153114154 #[inline]
155155+ /// Returns `true` if `display` is set to `none`.
115156 pub fn is_hidden(&self) -> bool {
116157 self.set & PROP_DISPLAY != 0 && self.display == DISP_NONE
117158 }
···156197}
157198158199// parsed CSS rule table, stack-allocated (~2KB)
200200+/// Parsed CSS rule table (stack-allocated, up to [`MAX_CSS_RULES`] entries).
159201pub struct CssRules {
160202 rules: [CssRule; MAX_CSS_RULES],
161203 count: usize,
···168210}
169211170212impl CssRules {
213213+ /// Create an empty rule table.
171214 pub const fn new() -> Self {
172215 Self {
173216 rules: [CssRule::EMPTY; MAX_CSS_RULES],
···175218 }
176219 }
177220221221+ /// Remove all parsed rules.
178222 pub fn clear(&mut self) {
179223 self.count = 0;
180224 }
181225182226 #[inline]
227227+ /// Number of rules currently stored.
183228 pub fn len(&self) -> usize {
184229 self.count
185230 }
186231187232 #[inline]
233233+ /// Returns `true` if no rules have been parsed.
188234 pub fn is_empty(&self) -> bool {
189235 self.count == 0
190236 }
191237192238 // parse stylesheet; may be called multiple times to accumulate rules
239239+ /// Parse a CSS stylesheet and append rules to the table.
193240 pub fn parse(&mut self, css: &[u8]) {
194241 let mut pos: usize = 0;
195242···241288 }
242289243290 // resolve effective style for tag + class; merged by specificity
291291+ /// Resolve the effective style for an element given its tag and class names.
244292 pub fn resolve(&self, tag_name: &[u8], class_name: &[u8]) -> StyleProps {
245293 let tid = tag_id(tag_name);
246294 let chash = if class_name.is_empty() {
···262310 }
263311264312 // resolve by pre-computed tag ID and class hash
313313+ /// Resolve the effective style using precomputed tag-id and class-hash.
265314 pub fn resolve_by_id(&self, tid: u8, chash: u16) -> StyleProps {
266315 let mut result = StyleProps::EMPTY;
267316 let mut best = [0u8; 16];
···540589// tag ID mapping: lowercase tag name -> compact u8 for selector matching.
541590// 0 = unknown/any; known tags get stable IDs.
542591592592+/// Map an HTML tag name to a compact numeric id used by [`CssRules::resolve_by_id`].
543593pub fn tag_id(name: &[u8]) -> u8 {
544594 match name {
545595 b"p" => 1,
···590640// class hash: FNV-1a folded to 16 bits.
591641// 0 reserved for "no class constraint"; hash of 0 is mapped to 1.
592642643643+/// Compute a 16-bit hash of a CSS class name for [`CssRules::resolve_by_id`].
593644pub fn class_hash(name: &[u8]) -> u16 {
594645 let mut h: u32 = 0x811c_9dc5;
595646 for &b in name {
+74-17
smol-epub/src/epub.rs
···11-// EPUB structure parser: container.xml -> OPF -> spine + metadata.
22-// container.xml gives the OPF path; the OPF gives metadata, a
33-// manifest (id->href), and a spine (ordered idrefs). Spine idrefs
44-// are resolved through the manifest to ZIP entry indices.
11+//! EPUB structure parser: `container.xml` → OPF → spine + metadata.
22+//!
33+//! `container.xml` gives the OPF path; the OPF gives metadata, a
44+//! manifest (`id` → `href`), and a spine (ordered `idref`s). Spine
55+//! references are resolved through the manifest to ZIP entry indices.
5667use alloc::vec::Vec;
7889use crate::xml;
910use crate::zip::ZipIndex;
10111212+/// Maximum byte length of an EPUB title.
1113pub const TITLE_CAP: usize = 96;
1414+/// Maximum byte length of an EPUB author name.
1215pub const AUTHOR_CAP: usize = 64;
1616+/// Maximum number of spine entries (reading-order items).
1317pub const MAX_SPINE: usize = 256;
1818+/// Maximum byte length of the OPF file path inside the ZIP.
1419pub const OPF_PATH_CAP: usize = 256;
15202121+/// EPUB book metadata (title and author), stored inline with fixed-size buffers.
1622pub struct EpubMeta {
2323+ /// Raw UTF-8 bytes of the title (up to [`TITLE_CAP`] bytes).
1724 pub title: [u8; TITLE_CAP],
2525+ /// Number of valid bytes in [`title`](Self::title).
1826 pub title_len: u8,
2727+ /// Raw UTF-8 bytes of the author name (up to [`AUTHOR_CAP`] bytes).
1928 pub author: [u8; AUTHOR_CAP],
2929+ /// Number of valid bytes in [`author`](Self::author).
2030 pub author_len: u8,
2131}
2232···2737}
28382939impl EpubMeta {
4040+ /// Create a new, empty `EpubMeta`.
3041 pub const fn new() -> Self {
3142 Self {
3243 title: [0u8; TITLE_CAP],
···3647 }
3748 }
38495050+ /// Return the title as a `&str`, or `""` if it is not valid UTF-8.
3951 pub fn title_str(&self) -> &str {
4052 core::str::from_utf8(&self.title[..self.title_len as usize]).unwrap_or("")
4153 }
42545555+ /// Return the author as a `&str`, or `""` if it is not valid UTF-8.
4356 pub fn author_str(&self) -> &str {
4457 core::str::from_utf8(&self.author[..self.author_len as usize]).unwrap_or("")
4558 }
···5770 }
5871}
59727373+/// The EPUB reading-order spine: an ordered list of ZIP entry indices.
6074pub struct EpubSpine {
7575+ /// ZIP entry indices in reading order.
6176 pub items: [u16; MAX_SPINE],
7777+ /// Number of valid entries in [`items`](Self::items).
6278 pub count: u16,
6379}
6480···6985}
70867187impl EpubSpine {
8888+ /// Create a new, empty spine.
7289 pub const fn new() -> Self {
7390 Self {
7491 items: [0u16; MAX_SPINE],
···7794 }
78957996 #[inline]
9797+ /// Number of items in the spine.
8098 pub fn len(&self) -> usize {
8199 self.count as usize
82100 }
8310184102 #[inline]
103103+ /// Returns `true` if the spine contains no items.
85104 pub fn is_empty(&self) -> bool {
86105 self.count == 0
87106 }
88107}
891089090-// table of contents
109109+// ── table of contents ───────────────────────────────────────────────
91110111111+/// Maximum number of entries in the table of contents.
92112pub const MAX_TOC: usize = 128;
113113+/// Maximum byte length of a single TOC entry title.
93114pub const TOC_TITLE_CAP: usize = 48;
94115116116+/// A single entry in the EPUB table of contents.
95117#[derive(Clone, Copy)]
96118pub struct TocEntry {
119119+ /// Raw UTF-8 bytes of the entry title.
97120 pub title: [u8; TOC_TITLE_CAP],
121121+ /// Number of valid bytes in [`title`](Self::title).
98122 pub title_len: u8,
9999- // index into EpubSpine::items; 0xFFFF = unresolved
123123+ /// Index into [`EpubSpine::items`]; `0xFFFF` means unresolved.
100124 pub spine_idx: u16,
101125}
102126103127impl TocEntry {
128128+ /// An empty, unresolved TOC entry.
104129 pub const EMPTY: Self = Self {
105130 title: [0u8; TOC_TITLE_CAP],
106131 title_len: 0,
107132 spine_idx: 0xFFFF,
108133 };
109134135135+ /// Return the entry title as a `&str`, or `""` if not valid UTF-8.
110136 pub fn title_str(&self) -> &str {
111137 core::str::from_utf8(&self.title[..self.title_len as usize]).unwrap_or("")
112138 }
113139}
114140141141+/// EPUB table of contents (flat list of [`TocEntry`] items).
115142pub struct EpubToc {
143143+ /// TOC entries in document order.
116144 pub entries: [TocEntry; MAX_TOC],
145145+ /// Number of valid entries.
117146 pub count: u16,
118147}
119148···124153}
125154126155impl EpubToc {
156156+ /// Create a new, empty table of contents.
127157 pub const fn new() -> Self {
128158 Self {
129159 entries: [TocEntry::EMPTY; MAX_TOC],
···131161 }
132162 }
133163164164+ /// Remove all entries.
134165 pub fn clear(&mut self) {
135166 self.count = 0;
136167 }
137168138169 #[inline]
170170+ /// Number of entries in the TOC.
139171 pub fn len(&self) -> usize {
140172 self.count as usize
141173 }
142174143175 #[inline]
176176+ /// Returns `true` if the TOC contains no entries.
144177 pub fn is_empty(&self) -> bool {
145178 self.count == 0
146179 }
···159192 }
160193}
161194162162-// where the TOC data lives inside the EPUB ZIP
195195+/// Identifies where the table-of-contents data lives inside the EPUB ZIP.
163196#[derive(Clone, Copy, Debug)]
164197pub enum TocSource {
165165- Ncx(usize), // EPUB 2
166166- Nav(usize), // EPUB 3
198198+ /// EPUB 2 NCX document (ZIP entry index).
199199+ Ncx(usize),
200200+ /// EPUB 3 Navigation Document (ZIP entry index).
201201+ Nav(usize),
167202}
168203169204impl TocSource {
205205+ /// Return the ZIP entry index regardless of variant.
170206 pub fn zip_index(&self) -> usize {
171207 match *self {
172208 TocSource::Ncx(i) | TocSource::Nav(i) => i,
···175211}
176212177213// parse container.xml to find the OPF path; write into out
214214+/// Parse `META-INF/container.xml` and extract the OPF file path.
215215+///
216216+/// Writes the path into `out` and returns its byte length.
178217pub fn parse_container(data: &[u8], out: &mut [u8; OPF_PATH_CAP]) -> Result<usize, &'static str> {
179218 let mut found_len: Option<usize> = None;
180219···192231 found_len.ok_or("epub: no rootfile full-path in container.xml")
193232}
194233195195-// parse OPF: extract metadata and build the reading-order spine as ZIP entry indices.
196196-// Two-pass, zero heap: phase 1 collects idref byte offsets (MAX_SPINE*4 = 1KB stack);
197197-// phase 2 resolves each idref to a manifest href and then a ZIP index.
234234+/// Parse an OPF document: extract metadata and build the reading-order spine.
235235+///
236236+/// Two-pass, zero heap: phase 1 collects `idref` byte offsets
237237+/// (`MAX_SPINE` × 4 = 1 KB stack); phase 2 resolves each `idref`
238238+/// through the manifest to a ZIP entry index.
198239pub fn parse_opf(
199240 opf: &[u8],
200241 opf_dir: &str,
···284325}
285326286327// locate TOC in ZIP: EPUB 3 nav first, EPUB 2 NCX fallback
328328+/// Search the OPF manifest for a table-of-contents source.
329329+///
330330+/// Tries, in order: EPUB 3 `<item properties="nav">`, EPUB 2
331331+/// `<spine toc="id">`, and a media-type fallback for NCX files.
287332pub fn find_toc_source(opf: &[u8], opf_dir: &str, zip: &ZipIndex) -> Option<TocSource> {
288333 let mut path_buf = [0u8; 512];
289334···400445 None
401446}
402447403403-// dispatch TOC parse by format (NCX vs nav)
448448+/// Parse a TOC document (NCX or Navigation Document) into `toc`.
449449+///
450450+/// Dispatches to [`parse_ncx_toc`] or [`parse_nav_toc`] based on
451451+/// the [`TocSource`] variant.
404452pub fn parse_toc(
405453 source: TocSource,
406454 data: &[u8],
···415463 }
416464}
417465418418-// parse EPUB 2 NCX into flat TOC entries (nested navPoints flattened)
466466+/// Parse an EPUB 2 NCX document into flat TOC entries.
467467+///
468468+/// Nested `<navPoint>` elements are flattened into a linear list.
419469pub fn parse_ncx_toc(
420470 ncx: &[u8],
421471 ncx_dir: &str,
···497547 }
498548}
499549500500-// parse EPUB 3 nav document; extract <a> entries, flatten nested <ol>
550550+/// Parse an EPUB 3 Navigation Document into flat TOC entries.
551551+///
552552+/// Extracts `<a>` elements from the `<nav epub:type="toc">` region
553553+/// and flattens nested `<ol>` lists.
501554pub fn parse_nav_toc(
502555 nav: &[u8],
503556 nav_dir: &str,
···787840 if start >= end { &[] } else { &data[start..end] }
788841}
789842790790-// -- path helpers --
843843+// ── path helpers ────────────────────────────────────────────────────
791844845845+/// Resolve a relative `href` against `base_dir`, writing the result
846846+/// into `out`. Returns the number of bytes written.
847847+///
848848+/// Handles `../` segments, leading `./`, and absolute paths.
792849pub fn resolve_path(base_dir: &str, href: &str, out: &mut [u8; 512]) -> usize {
793850 let href = href.split('#').next().unwrap_or(href);
794851···893950 }
894951}
895952896896-// check if filename looks like an EPUB (.epub or .epu for FAT 8.3 truncation)
953953+/// Check if a filename looks like an EPUB (`.epub` or `.epu` for FAT 8.3 truncation).
897954pub fn is_epub_filename(name: &str) -> bool {
898955 let b = name.as_bytes();
899956
+40-12
smol-epub/src/html_strip.rs
···11-// Single-pass HTML to styled-text converter for EPUB XHTML.
22-// HtmlStripStream: streaming feed/finish; emits 2-byte [MARKER, tag] style codes.
33-// strip_html_inplace(): in-place variant for container.xml/OPF/TOC.
44-// Marker: [0x01, tag]. Inline: B/b I/i. Block: H/h Q/q S(hr).
11+//! Single-pass HTML to styled-text converter for EPUB XHTML.
22+//!
33+//! [`HtmlStripStream`]: streaming `feed`/`finish` interface; emits 2-byte
44+//! `[MARKER, tag]` style codes inline with plain text.
55+//!
66+//! [`strip_html_inplace`]: in-place variant for `container.xml` / OPF / TOC.
77+//!
88+//! Marker encoding: `[0x01, tag]`. Inline: `B`/`b` `I`/`i`.
99+//! Block: `H`/`h` `Q`/`q` `S` (hr). Image: `P` (path follows).
510611use alloc::vec::Vec;
71288-pub const MARKER: u8 = 0x01; // escape byte for style markers
1313+/// Escape byte that introduces a 2-byte style marker in the output stream.
1414+pub const MARKER: u8 = 0x01;
9151616+/// Style tag: bold **on** (`[MARKER, BOLD_ON]`).
1017pub const BOLD_ON: u8 = b'B';
1818+/// Style tag: bold **off** (`[MARKER, BOLD_OFF]`).
1119pub const BOLD_OFF: u8 = b'b';
2020+/// Style tag: italic **on** (`[MARKER, ITALIC_ON]`).
1221pub const ITALIC_ON: u8 = b'I';
2222+/// Style tag: italic **off** (`[MARKER, ITALIC_OFF]`).
1323pub const ITALIC_OFF: u8 = b'i';
2424+/// Style tag: heading **on** (`[MARKER, HEADING_ON]`).
1425pub const HEADING_ON: u8 = b'H';
2626+/// Style tag: heading **off** (`[MARKER, HEADING_OFF]`).
1527pub const HEADING_OFF: u8 = b'h';
2828+/// Style tag: block-quote **on** (`[MARKER, QUOTE_ON]`).
1629pub const QUOTE_ON: u8 = b'Q';
3030+/// Style tag: block-quote **off** (`[MARKER, QUOTE_OFF]`).
1731pub const QUOTE_OFF: u8 = b'q';
18321919-// Standalone
3333+/// Style tag: thematic break / horizontal rule (`[MARKER, BREAK]`).
2034pub const BREAK: u8 = b'S';
2121-pub const IMG_REF: u8 = b'P'; // image ref: [MARKER, IMG_REF, len, path...]
3535+/// Style tag: inline image reference (`[MARKER, IMG_REF, len, path…]`).
3636+pub const IMG_REF: u8 = b'P';
22373838+/// Returns `true` if `b` is the [`MARKER`] escape byte.
2339#[inline]
2440pub const fn is_marker(b: u8) -> bool {
2541 b == MARKER
···6480 }
6581}
66826767-// stateful streaming HTML-to-styled-text converter; ~80 bytes of state
8383+/// Stateful, streaming HTML-to-styled-text converter (~80 bytes of state).
8484+///
8585+/// Feed chunks of EPUB XHTML via [`feed`](Self::feed), then call
8686+/// [`finish`](Self::finish) to flush any trailing state. The output is
8787+/// plain text interspersed with 2-byte `[MARKER, tag]` style codes.
6888pub struct HtmlStripStream {
6989 phase: Phase,
7090···116136}
117137118138impl HtmlStripStream {
139139+ /// Create a new stream in its initial state.
119140 pub const fn new() -> Self {
120141 Self {
121142 phase: Phase::Text,
···147168 }
148169 }
149170150150- // process a chunk of HTML; returns (consumed, written); call again if input not fully consumed
171171+ /// Process a chunk of HTML input.
172172+ ///
173173+ /// Returns `(consumed, written)`. If `consumed < input.len()`, call
174174+ /// again with the remaining input (the output buffer was full).
151175 pub fn feed(&mut self, input: &[u8], output: &mut [u8]) -> (usize, usize) {
152176 let ilen = input.len();
153177 let olen = output.len();
···572596 }
573597 }
574598575575- // flush pending state; append terminal newline if content was produced; return bytes written
599599+ /// Flush any pending state and append a terminal newline if content
600600+ /// was produced. Returns the number of bytes written to `output`.
576601 pub fn finish(&mut self, output: &mut [u8]) -> usize {
577602 let mut op: usize = 0;
578603···756781 }
757782}
758783759759-// in-place HTML stripper: operates on a complete buffer, produces plain text
760760-// without style markers. write cursor never passes read cursor (w <= r always).
784784+/// Strip HTML tags from a complete buffer **in place**, producing plain text
785785+/// without style markers.
786786+///
787787+/// The write cursor never passes the read cursor, so no extra allocation
788788+/// is needed.
761789pub fn strip_html_inplace(buf: &mut Vec<u8>) {
762790 let len = buf.len();
763791 if len == 0 {
+60-16
smol-epub/src/jpeg.rs
···11-// Baseline JPEG decoder for e-ink display.
22-// Streams MCU-row-by-row via ChunkReader (4KB chunks from SD); peak RAM ~30KB.
33-// Luminance (Y) only; chrominance Huffman-decoded to advance bitstream, discarded.
44-// Progressive JPEG (SOF2) partially supported: first scan only (DC + low-freq AC).
55-// Full progressive not feasible: ~1.5MB coefficient buffer exceeds ESP32-C3 heap.
66-// Output: png::DecodedImage, packed 1-bit MSB-first, Floyd-Steinberg dithered.
11+//! Minimal baseline JPEG decoder producing 1-bit Floyd–Steinberg dithered bitmaps.
22+//!
33+//! Streams MCU-row-by-row via 4 KB chunked reads; peak RAM ≈ 30 KB.
44+//! Luminance (Y) channel only — chrominance is Huffman-decoded to
55+//! advance the bitstream, then discarded.
66+//!
77+//! Progressive JPEG (SOF2) is partially supported: first scan only
88+//! (DC + low-frequency AC).
99+//!
1010+//! Output is packed 1-bit MSB-first, row-major — see [`DecodedImage`](crate::DecodedImage).
711812extern crate alloc;
913···1115use alloc::vec;
1216use alloc::vec::Vec;
13171414-use crate::png::DecodedImage;
1818+use crate::DecodedImage;
15191620// JPEG marker bytes
1721···3438// header bytes to read for marker parsing; large APP/EXIF segments skipped by length
3539const HEADER_READ: usize = 32768;
36403737-// chunk size for streaming SD reads during MCU decode
4141+// chunk size for streaming reads during MCU decode
3842const CHUNK_SIZE: usize = 4096;
39434044// DEFLATE sliding-window size for streaming ZIP decompression
···175179// reads from SD via closure, buffering 4KB chunks
176180struct ChunkReader<F> {
177181 read_fn: F,
178178- offset: u32, // absolute SD offset of next byte to fetch
182182+ offset: u32, // absolute offset of next byte to fetch
179183 end: u32, // end-of-data offset (exclusive)
180184 buf: [u8; CHUNK_SIZE],
181185 pos: usize,
···234238// peak heap: ~47KB (11KB decompressor + 32KB window + 4KB read buf).
235239struct DeflateReader<F> {
236240 read_fn: F,
237237- file_pos: u32, // absolute SD offset of next compressed byte
241241+ file_pos: u32, // absolute offset of next compressed byte
238242 comp_left: usize, // compressed bytes remaining in ZIP entry
239243 rbuf: Vec<u8>, // compressed-data read buffer
240244 in_avail: usize, // valid bytes in rbuf
···515519// public API
516520517521// decode a baseline JPEG from an in-memory buffer
522522+/// Decode a JPEG from an in-memory buffer to a 1-bit dithered bitmap.
523523+///
524524+/// The image is integer-downscaled so the result fits within
525525+/// `max_w` × `max_h` pixels.
518526pub fn decode_jpeg_fit(data: &[u8], max_w: u16, max_h: u16) -> Result<DecodedImage, &'static str> {
519527 let st = parse_markers(data)?;
520528···524532 decode_baseline(&st, BitReader::new(reader), max_w, max_h)
525533}
526534527527-// decode a JPEG by streaming 4KB chunks from SD.
528528-// read_fn(abs_offset, buf) -> Ok(bytes_read). progressive = first scan only.
529529-pub fn decode_jpeg_sd<F>(
535535+/// Decode a JPEG from a **stored** (uncompressed) ZIP entry by streaming
536536+/// 4 KB chunks through `read_fn`.
537537+///
538538+/// `read_fn(offset, buf)` reads bytes at the given absolute offset and
539539+/// returns the number of bytes actually read. Progressive JPEGs are
540540+/// decoded using the first scan only.
541541+pub fn decode_jpeg_streaming<F>(
530542 mut read_fn: F,
531543 data_offset: u32,
532544 data_size: u32,
···559571 decode_baseline(&st, BitReader::new(reader), max_w, max_h)
560572}
561573562562-// decode a DEFLATE-compressed JPEG from SD, streaming both decompression and MCU decode.
563563-// peak heap: ~79KB (47KB deflate reader + 32KB header buf + ~30KB decode bufs).
564564-pub fn decode_jpeg_deflate_sd<F>(
574574+/// Backward-compatible alias for [`decode_jpeg_streaming`].
575575+pub fn decode_jpeg_sd<F>(
576576+ read_fn: F,
577577+ data_offset: u32,
578578+ data_size: u32,
579579+ max_w: u16,
580580+ max_h: u16,
581581+) -> Result<DecodedImage, &'static str>
582582+where
583583+ F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>,
584584+{
585585+ decode_jpeg_streaming(read_fn, data_offset, data_size, max_w, max_h)
586586+}
587587+588588+/// Decode a JPEG from a **DEFLATE-compressed** ZIP entry by streaming
589589+/// reads through `read_fn`.
590590+///
591591+/// Both ZIP decompression and MCU decode are streamed concurrently,
592592+/// so the full entry is never held in memory. Peak heap ≈ 79 KB.
593593+pub fn decode_jpeg_deflate_streaming<F>(
565594 read_fn: F,
566595 data_offset: u32,
567596 comp_size: u32,
···614643 drop(hdr);
615644616645 decode_baseline(&st, BitReader::new(deflate), max_w, max_h)
646646+}
647647+648648+/// Backward-compatible alias for [`decode_jpeg_deflate_streaming`].
649649+pub fn decode_jpeg_deflate_sd<F>(
650650+ read_fn: F,
651651+ data_offset: u32,
652652+ comp_size: u32,
653653+ uncomp_size: u32,
654654+ max_w: u16,
655655+ max_h: u16,
656656+) -> Result<DecodedImage, &'static str>
657657+where
658658+ F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>,
659659+{
660660+ decode_jpeg_deflate_streaming(read_fn, data_offset, comp_size, uncomp_size, max_w, max_h)
617661}
618662619663// baseline decode core (generic over byte source)
+116-9
smol-epub/src/lib.rs
···11-// smol-epub: minimal no_std EPUB parser with streaming image decoders.
22-// zip: ZIP central directory parser, streaming DEFLATE extraction
33-// xml: minimal XML tag/attribute scanner for EPUB metadata
44-// css: minimal CSS parser for EPUB stylesheet resolution
55-// epub: EPUB structure parser (container.xml, OPF spine, TOC)
66-// html_strip: single-pass HTML to styled-text converter (streaming)
77-// cache: EPUB chapter cache: streaming decompress + strip
88-// png: PNG decoder, 1-bit Floyd-Steinberg dithered bitmap
99-// jpeg: JPEG decoder, 1-bit Floyd-Steinberg dithered bitmap
11+//! # smol-epub
22+//!
33+//! Minimal `no_std` EPUB parser with streaming decompression, HTML
44+//! stripping, CSS resolution, and optional 1-bit image decoders.
55+//!
66+//! Designed for memory-constrained embedded targets (≥ 140 KB heap),
77+//! but works anywhere `alloc` is available.
88+//!
99+//! ## Modules
1010+//!
1111+//! | Module | Purpose |
1212+//! |--------|---------|
1313+//! | [`zip`] | ZIP central-directory parser, streaming DEFLATE extraction |
1414+//! | [`xml`] | Minimal XML tag / attribute scanner (EPUB metadata) |
1515+//! | [`css`] | CSS property parser for EPUB stylesheets |
1616+//! | [`epub`] | EPUB structure: `container.xml` → OPF → spine / metadata / TOC |
1717+//! | [`html_strip`] | Single-pass, streaming HTML-to-styled-text converter |
1818+//! | [`cache`] | Chapter decompress-and-strip pipeline with cache metadata |
1919+//! | [`png`] | PNG decoder → 1-bit Floyd–Steinberg dithered bitmap *(feature `images`)* |
2020+//! | [`jpeg`] | JPEG decoder → 1-bit Floyd–Steinberg dithered bitmap *(feature `images`)* |
2121+//!
2222+//! ## Feature flags
2323+//!
2424+//! | Flag | Default | Description |
2525+//! |------|---------|-------------|
2626+//! | `images` | ✓ | Enable [`png`] and [`jpeg`] image decoders |
2727+//!
2828+//! ## Streaming I/O model
2929+//!
3030+//! Functions that read from an external byte source accept a generic
3131+//! closure with signature:
3232+//!
3333+//! ```text
3434+//! FnMut(offset: u32, buf: &mut [u8]) -> Result<usize, E>
3535+//! ```
3636+//!
3737+//! This works with SD cards, flash, `std::fs::File`, in-memory
3838+//! buffers, or any other random-access byte store.
3939+//!
4040+//! ## Quick start
4141+//!
4242+//! ```rust,ignore
4343+//! use smol_epub::zip::ZipIndex;
4444+//! use smol_epub::epub::{self, EpubMeta, EpubSpine, EpubToc};
4545+//!
4646+//! // 1. Build ZIP index from the file's central directory
4747+//! let mut zip = ZipIndex::new();
4848+//! // ... parse_eocd, read CD, parse_central_directory ...
4949+//!
5050+//! // 2. Parse EPUB structure
5151+//! let container = smol_epub::zip::extract_entry(/* ... */)?;
5252+//! let mut opf_path = [0u8; epub::OPF_PATH_CAP];
5353+//! let opf_len = epub::parse_container(&container, &mut opf_path)?;
5454+//!
5555+//! // 3. Extract metadata and reading-order spine
5656+//! let mut meta = EpubMeta::new();
5757+//! let mut spine = EpubSpine::new();
5858+//! epub::parse_opf(&opf_data, opf_dir, &zip, &mut meta, &mut spine)?;
5959+//!
6060+//! // 4. Optionally parse the table of contents
6161+//! let mut toc = EpubToc::new();
6262+//! if let Some(src) = epub::find_toc_source(&opf_data, opf_dir, &zip) {
6363+//! epub::parse_toc(src, &toc_data, toc_dir, &spine, &zip, &mut toc);
6464+//! }
6565+//!
6666+//! // 5. Stream-decompress + HTML-strip chapters via cache module
6767+//! let bytes_written = smol_epub::cache::stream_strip_entry(
6868+//! &entry, local_offset, read_fn, output_fn,
6969+//! )?;
7070+//! ```
10711172#![no_std]
7373+#![warn(missing_docs)]
12741375extern crate alloc;
7676+7777+use alloc::vec::Vec;
7878+7979+// ── public modules ──────────────────────────────────────────────────
14801581pub mod cache;
1682pub mod css;
···2389pub mod jpeg;
2490#[cfg(feature = "images")]
2591pub mod png;
9292+9393+// ── shared types ────────────────────────────────────────────────────
9494+9595+/// A decoded 1-bit monochrome image, packed MSB-first, row-major.
9696+///
9797+/// A **set** bit (1) represents black (ink); a **clear** bit (0) represents
9898+/// white (paper). This convention matches most e-ink controllers directly.
9999+///
100100+/// Produced by the [`png`] and [`jpeg`] decoders when the `images`
101101+/// feature is enabled.
102102+///
103103+/// # Layout
104104+///
105105+/// ```text
106106+/// stride = ceil(width / 8) bytes per row
107107+/// data.len() == stride * height
108108+/// ```
109109+///
110110+/// Pixel (x, y) is bit `(7 - x % 8)` of byte `data[y * stride + x / 8]`.
111111+#[derive(Clone)]
112112+pub struct DecodedImage {
113113+ /// Image width in pixels.
114114+ pub width: u16,
115115+ /// Image height in pixels.
116116+ pub height: u16,
117117+ /// Packed 1-bit pixel data, `stride * height` bytes.
118118+ pub data: Vec<u8>,
119119+ /// Bytes per row (`ceil(width / 8)`).
120120+ pub stride: usize,
121121+}
122122+123123+impl core::fmt::Debug for DecodedImage {
124124+ fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
125125+ f.debug_struct("DecodedImage")
126126+ .field("width", &self.width)
127127+ .field("height", &self.height)
128128+ .field("stride", &self.stride)
129129+ .field("data_len", &self.data.len())
130130+ .finish()
131131+ }
132132+}
+80-49
smol-epub/src/png.rs
···11-// Minimal PNG decoder for monochrome e-ink display.
22-// Decodes to 1-bit Floyd-Steinberg dithered bitmap; streams row-by-row
33-// through miniz_oxide; peak RAM ~90KB (32KB dict + 11KB decomp + bitmap).
44-// Colour types: 0=greyscale, 2=RGB, 3=palette, 4=grey+alpha, 6=RGBA.
55-// Interlaced (Adam7) rejected; rare in EPUB and doubles code complexity.
66-// Output packed 1-bit MSB-first, row-major; pass to StripBuffer::blit_1bpp.
11+//! Minimal PNG decoder producing 1-bit Floyd–Steinberg dithered bitmaps.
22+//!
33+//! Streams row-by-row through `miniz_oxide`; peak RAM ≈ 90 KB
44+//! (32 KB dictionary + 11 KB decompressor + output bitmap).
55+//!
66+//! Supported colour types: greyscale, RGB, palette, grey+alpha, RGBA.
77+//! Interlaced (Adam7) images are rejected (rare in EPUB content and
88+//! would double code complexity).
99+//!
1010+//! Output is packed 1-bit MSB-first, row-major — see [`DecodedImage`](crate::DecodedImage).
711812extern crate alloc;
9131014use alloc::boxed::Box;
1115use alloc::vec;
1216use alloc::vec::Vec;
1717+1818+use crate::DecodedImage;
13191420// PNG constants
1521···3743// miniz_oxide LZ dictionary size; must be a power of two >= 32768
3844const DICT_SIZE: usize = 32_768;
39454040-// public types
4141-4242-// decoded 1-bit image, packed MSB-first, row-major.
4343-// set bit = black (ink); clear bit = white (paper).
4444-pub struct DecodedImage {
4545- pub width: u16,
4646- pub height: u16,
4747- pub data: Vec<u8>, // stride * height bytes
4848- pub stride: usize, // bytes per row: ceil(width / 8)
4949-}
5050-5151-// backward-compatible alias
4646+/// Backward-compatible alias for [`DecodedImage`](crate::DecodedImage).
5247pub type PngImage = DecodedImage;
53485454-// decode a PNG buffer to a 1-bit dithered bitmap;
5555-// images wider or taller than max_w/max_h are nearest-neighbour down-scaled
5656-pub fn decode_png(data: &[u8]) -> Result<DecodedImage, &'static str> {
5757- decode_png_fit(data, 800, 480)
5858-}
5959-6060-// decode, scaling down by integer factor so result fits inside max_w x max_h
4949+/// Decode a PNG from an in-memory buffer to a 1-bit dithered bitmap.
5050+///
5151+/// The image is integer-downscaled so the result fits within
5252+/// `max_w` × `max_h` pixels.
6153pub fn decode_png_fit(data: &[u8], max_w: u16, max_h: u16) -> Result<DecodedImage, &'static str> {
6254 let header = parse_ihdr(data)?;
6355 let idat = collect_idat(data)?;
···223215 })
224216}
225217226226-// streaming PNG decoders: decode PNG images from ZIP entries without
227227-// extracting to a contiguous buffer; IDAT fed directly into zlib row-by-row
218218+// ── streaming PNG decoders ──────────────────────────────────────────
219219+// Decode PNG images from ZIP entries without extracting to a contiguous
220220+// buffer; IDAT data is fed directly into zlib row-by-row.
228221229229-// chunk size for streaming SD reads
230230-const SD_READ_BUF: usize = 4096;
222222+/// Read-chunk size used by the streaming decoders (bytes).
223223+const STREAMING_READ_BUF: usize = 4096;
231224232232-// DEFLATE sliding-window for outer ZIP decompression
225225+/// DEFLATE sliding-window for outer ZIP decompression (bytes).
233226const ZIP_DEFLATE_WINDOW: usize = 32_768;
234227235228// sequential byte source for streaming PNG decoder
···247240 }
248241}
249242250250-// reads sequentially from a STORED ZIP entry on SD
251251-struct SdSource<F> {
243243+// reads sequentially from a STORED ZIP entry via a user-supplied closure
244244+struct StoredSource<F> {
252245 read_fn: F,
253246 offset: u32,
254247 end: u32,
255248}
256249257257-impl<F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>> ReadExact for SdSource<F> {
250250+impl<F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>> ReadExact for StoredSource<F> {
258251 fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), &'static str> {
259252 let mut done = 0usize;
260253 while done < buf.len() {
···283276 }
284277}
285278286286-// reads sequentially from a DEFLATE-compressed ZIP entry on SD
279279+// reads sequentially from a DEFLATE-compressed ZIP entry via a user-supplied closure
287280struct DeflateSource<F> {
288281 read_fn: F,
289282 file_pos: u32,
···316309 window.resize(ZIP_DEFLATE_WINDOW, 0);
317310318311 let mut rbuf = Vec::new();
319319- rbuf.try_reserve_exact(SD_READ_BUF)
312312+ rbuf.try_reserve_exact(STREAMING_READ_BUF)
320313 .map_err(|_| "png: OOM for DEFLATE read buffer")?;
321321- rbuf.resize(SD_READ_BUF, 0);
314314+ rbuf.resize(STREAMING_READ_BUF, 0);
322315323316 Ok(Self {
324317 read_fn,
···343336 return Ok(());
344337 }
345338346346- if self.in_avail < SD_READ_BUF && self.comp_left > 0 {
347347- let space = SD_READ_BUF - self.in_avail;
339339+ if self.in_avail < STREAMING_READ_BUF && self.comp_left > 0 {
340340+ let space = STREAMING_READ_BUF - self.in_avail;
348341 let want = space.min(self.comp_left);
349342 match (self.read_fn)(
350343 self.file_pos,
···358351 Ok(_) => {
359352 self.comp_left = 0;
360353 }
361361- Err(_) => return Err("png: SD read failed during DEFLATE"),
354354+ Err(_) => return Err("png: read failed during DEFLATE"),
362355 }
363356 }
364357···422415}
423416424417// decode a PNG from a STORED ZIP entry by streaming from SD
425425-pub fn decode_png_sd<F>(
418418+/// Decode a PNG from a **stored** (uncompressed) ZIP entry by streaming
419419+/// reads through `read_fn`.
420420+///
421421+/// `read_fn(offset, buf)` reads bytes at the given absolute offset and
422422+/// returns the number of bytes actually read.
423423+pub fn decode_png_streaming<F>(
426424 read_fn: F,
427425 data_offset: u32,
428426 data_size: u32,
···432430where
433431 F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>,
434432{
435435- let mut src = SdSource {
433433+ let mut src = StoredSource {
436434 read_fn,
437435 offset: data_offset,
438436 end: data_offset + data_size,
···440438 decode_png_from(&mut src, max_w, max_h)
441439}
442440443443-// decode a PNG from a DEFLATE-compressed ZIP entry by streaming
444444-pub fn decode_png_deflate_sd<F>(
441441+/// Backward-compatible alias for [`decode_png_streaming`].
442442+pub fn decode_png_sd<F>(
443443+ read_fn: F,
444444+ data_offset: u32,
445445+ data_size: u32,
446446+ max_w: u16,
447447+ max_h: u16,
448448+) -> Result<DecodedImage, &'static str>
449449+where
450450+ F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>,
451451+{
452452+ decode_png_streaming(read_fn, data_offset, data_size, max_w, max_h)
453453+}
454454+455455+/// Decode a PNG from a **DEFLATE-compressed** ZIP entry by streaming
456456+/// reads through `read_fn`.
457457+///
458458+/// Both ZIP decompression and PNG IDAT inflation are streamed
459459+/// concurrently, so the full entry is never held in memory.
460460+pub fn decode_png_deflate_streaming<F>(
445461 read_fn: F,
446462 data_offset: u32,
447463 comp_size: u32,
···455471 decode_png_from(&mut src, max_w, max_h)
456472}
457473458458-// core streaming PNG decoder; generic over byte source.
459459-// reads chunks sequentially, feeds IDAT into zlib row-by-row; never holds full PNG in RAM.
474474+/// Backward-compatible alias for [`decode_png_deflate_streaming`].
475475+pub fn decode_png_deflate_sd<F>(
476476+ read_fn: F,
477477+ data_offset: u32,
478478+ comp_size: u32,
479479+ max_w: u16,
480480+ max_h: u16,
481481+) -> Result<DecodedImage, &'static str>
482482+where
483483+ F: FnMut(u32, &mut [u8]) -> Result<usize, &'static str>,
484484+{
485485+ decode_png_deflate_streaming(read_fn, data_offset, comp_size, max_w, max_h)
486486+}
487487+488488+/// Core streaming PNG decoder; generic over byte source.
489489+/// Reads chunks sequentially, feeds IDAT into zlib row-by-row;
490490+/// never holds the full PNG in RAM.
460491fn decode_png_from<R: ReadExact>(
461492 src: &mut R,
462493 max_w: u16,
···591622 let mut out_y: usize = 0;
592623593624 // feed IDAT chunks into zlib row-by-row
594594- let mut idat_buf = [0u8; SD_READ_BUF];
625625+ let mut idat_buf = [0u8; STREAMING_READ_BUF];
595626 let mut in_avail: usize = 0;
596627 let mut idat_chunk_left = first_idat_len;
597628 let mut more_idat = true;
598629599630 loop {
600631 // top up input buffer from the IDAT stream
601601- while in_avail < SD_READ_BUF {
632632+ while in_avail < STREAMING_READ_BUF {
602633 if idat_chunk_left > 0 {
603603- let space = SD_READ_BUF - in_avail;
634634+ let space = STREAMING_READ_BUF - in_avail;
604635 let want = idat_chunk_left.min(space);
605636 src.read_exact(&mut idat_buf[in_avail..in_avail + want])?;
606637 in_avail += want;
···685716 if !has_more && in_avail == 0 {
686717 return Err("png: truncated IDAT stream");
687718 }
688688- if consumed == 0 && produced == 0 && in_avail >= SD_READ_BUF {
719719+ if consumed == 0 && produced == 0 && in_avail >= STREAMING_READ_BUF {
689720 return Err("png: IDAT decompression stuck");
690721 }
691722 }
+13-5
smol-epub/src/xml.rs
···11-// Minimal XML tag/attribute scanner for EPUB metadata.
22-// Not a general parser; handles container.xml and OPF only.
33-// Single-pass, forward-only, namespace-aware, lenient.
11+//! Minimal XML tag/attribute scanner for EPUB metadata.
22+//!
33+//! Not a general-purpose XML parser — handles `container.xml` and OPF
44+//! documents only. Single-pass, forward-only, namespace-aware, lenient.
4566+/// Extract the value of an attribute from a raw XML opening-tag byte slice.
77+///
88+/// `tag_bytes` should start at the tag name (after `<`) and end before `>`.
99+/// Returns `None` if the attribute is not found.
510pub fn get_attr<'a>(tag_bytes: &'a [u8], attr_name: &[u8]) -> Option<&'a [u8]> {
611 let mut pos = 0;
712 let len = tag_bytes.len();
···6974 None
7075}
71767272-// text of first element matching tag_name (namespace-aware)
7777+/// Return the text content of the first element whose local name matches
7878+/// `tag_name` (namespace-aware: `dc:title` matches `title`).
7379pub fn tag_text<'a>(data: &'a [u8], tag_name: &[u8]) -> Option<&'a [u8]> {
7480 let mut pos = 0;
7581···121127 None
122128}
123129124124-// invoke cb for every opening tag matching tag_name (namespace-aware)
130130+/// Invoke `cb` for every opening tag whose local name matches `tag_name`
131131+/// (namespace-aware). The callback receives the tag body bytes (from the
132132+/// tag name up to but not including `>`).
125133pub fn for_each_tag<'a>(data: &'a [u8], tag_name: &[u8], mut cb: impl FnMut(&'a [u8])) {
126134 let mut pos = 0;
127135
+40-7
smol-epub/src/zip.rs
···11-// ZIP central directory parser and streaming entry extraction.
22-// ZipIndex: 256 entries inline (~5KB); names heap-allocated during parse.
33-// DEFLATE in 4KB chunks; try_reserve throughout for graceful OOM.
11+//! ZIP central-directory parser and streaming entry extraction.
22+//!
33+//! [`ZipIndex`] holds up to 256 entries inline (~5 KB); entry names are
44+//! heap-allocated during parse. DEFLATE decompression streams in 4 KB
55+//! chunks; `try_reserve` is used throughout for graceful OOM handling.
4657use alloc::boxed::Box;
68use alloc::vec;
···1214const CD_SIG: u32 = 0x0201_4b50;
1315const LOCAL_SIG: u32 = 0x0403_4b50;
14161717+/// ZIP compression method: stored (no compression).
1518pub const METHOD_STORED: u16 = 0;
1919+/// ZIP compression method: DEFLATE.
1620pub const METHOD_DEFLATE: u16 = 8;
17211822#[inline]
···2529 u32::from_le_bytes([d[o], d[o + 1], d[o + 2], d[o + 3]])
2630}
27313232+/// A single entry in the ZIP central directory.
2833#[derive(Clone, Copy)]
2934pub struct ZipEntry {
3535+ /// Byte offset into the name pool where this entry's name starts.
3036 pub name_start: u16,
3737+ /// Length of the entry name in bytes.
3138 pub name_len: u16,
3939+ /// Byte offset of the local file header in the ZIP file.
3240 pub local_offset: u32,
4141+ /// Compressed size in bytes.
3342 pub comp_size: u32,
4343+ /// Uncompressed size in bytes.
3444 pub uncomp_size: u32,
4545+ /// Compression method ([`METHOD_STORED`] or [`METHOD_DEFLATE`]).
3546 pub method: u16,
3647}
3748···4657 };
4758}
48596060+/// Maximum number of entries the [`ZipIndex`] can hold.
4961pub const MAX_ENTRIES: usize = 256;
50626363+/// In-memory index of a ZIP archive's central directory.
6464+///
6565+/// Holds up to [`MAX_ENTRIES`] entries inline (~5 KB); entry names are
6666+/// stored in a single heap-allocated byte pool.
5167pub struct ZipIndex {
5268 entries: [ZipEntry; MAX_ENTRIES],
5369 count: u16,
···6177}
62786379impl ZipIndex {
8080+ /// Create a new, empty index.
6481 pub const fn new() -> Self {
6582 Self {
6683 entries: [ZipEntry::EMPTY; MAX_ENTRIES],
···6986 }
7087 }
71888989+ /// Remove all entries and free the name pool.
7290 pub fn clear(&mut self) {
7391 self.count = 0;
7492 self.names = Vec::new();
7593 }
76947777- // parse EOCD from file tail; return (cd_offset, cd_size)
9595+ /// Parse the End-of-Central-Directory record from the last bytes of a
9696+ /// ZIP file. Returns `(cd_offset, cd_size)`.
9797+ ///
9898+ /// `tail` should be the final ≤ 65557 bytes of the file (22 bytes is
9999+ /// the minimum for a ZIP with no comment).
78100 pub fn parse_eocd(tail: &[u8], file_size: u32) -> Result<(u32, u32), &'static str> {
79101 if tail.len() < 22 {
80102 return Err("zip: tail too short for EOCD");
···101123 Ok((cd_offset, cd_size))
102124 }
103125104104- // parse central directory into entry index
126126+ /// Parse a central-directory blob into this index, replacing any
127127+ /// previously stored entries.
105128 pub fn parse_central_directory(&mut self, cd: &[u8]) -> Result<(), &'static str> {
106129 self.count = 0;
107130 self.names.clear();
···158181 Ok(())
159182 }
160183184184+ /// Number of entries in the index.
161185 #[inline]
162186 pub fn count(&self) -> usize {
163187 self.count as usize
164188 }
165189190190+ /// Return a reference to the entry at `idx`. Panics if out of range.
166191 #[inline]
167192 pub fn entry(&self, idx: usize) -> &ZipEntry {
168193 assert!(idx < self.count as usize);
169194 &self.entries[idx]
170195 }
171196197197+ /// Return the filename of the entry at `idx` as a `&str`.
172198 pub fn entry_name(&self, idx: usize) -> &str {
173199 let e = self.entry(idx);
174200 let start = e.name_start as usize;
···176202 core::str::from_utf8(&self.names[start..end]).unwrap_or("")
177203 }
178204205205+ /// Find an entry by exact (case-sensitive) name. Returns its index.
179206 pub fn find(&self, name: &str) -> Option<usize> {
180207 let name_bytes = name.as_bytes();
181208 for i in 0..self.count as usize {
···189216 None
190217 }
191218219219+ /// Find an entry by case-insensitive ASCII name. Returns its index.
192220 pub fn find_icase(&self, name: &str) -> Option<usize> {
193221 let target = name.as_bytes();
194222 for i in 0..self.count as usize {
···203231 None
204232 }
205233206206- // bytes past local file header to entry data
234234+ /// Given the first 30+ bytes of a local file header, return the number
235235+ /// of bytes to skip past the header to reach the entry's data.
207236 pub fn local_header_data_skip(header: &[u8]) -> Result<u32, &'static str> {
208237 if header.len() < 30 {
209238 return Err("zip: local header too short");
···217246 }
218247}
219248220220-// entry extraction
249249+// ── entry extraction ────────────────────────────────────────────────
221250251251+/// Extract a complete ZIP entry into a heap-allocated `Vec<u8>`.
252252+///
253253+/// Supports both stored and DEFLATE-compressed entries. The `read_fn`
254254+/// closure reads bytes at a given absolute offset.
222255pub fn extract_entry<E, F>(
223256 entry: &ZipEntry,
224257 local_offset: u32,