A fork of https://github.com/crosspoint-reader/crosspoint-reader
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: Basic table support (#980)

I've been reading "Children of Time" over the last days and that book,
annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way
to at least be able to read those tables.


This commit simply renders the contents of table cells as separate
paragraphs with a small header describing its position in the table. For
me, it's better than nothing.

## Summary

* **What is the goal of this PR?**

Implements really basic table support

* **What changes are included?**

* Minimal changes to ChapterHtmlSlimParser
* A demo book in test/epubs

## Additional Context

Here's some screenshots of the demo-book I provide with this PR.


![PXL_20260218_211446510](https://github.com/user-attachments/assets/49ef81b8-2fa0-4f0d-bb6f-4ef885be6772)


![PXL_20260218_211456379](https://github.com/user-attachments/assets/e7c82b35-b4a9-4a7d-9ec5-2b4bc2ff3514)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**PARTIALLY**_
_Little bit of guidance on what to touch, parts of the impl, rest
manually._

authored by

Maik Allgöwer and committed by
GitHub
103fac2e 6527f43c

+95 -11
+92 -11
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
··· 53 53 return matches(name, HEADER_TAGS, NUM_HEADER_TAGS) || matches(name, BLOCK_TAGS, NUM_BLOCK_TAGS); 54 54 } 55 55 56 + bool isTableStructuralTag(const char* name) { 57 + return strcmp(name, "table") == 0 || strcmp(name, "tr") == 0 || strcmp(name, "td") == 0 || strcmp(name, "th") == 0; 58 + } 59 + 56 60 // Update effective bold/italic/underline based on block style and inline style stack 57 61 void ChapterHtmlSlimParser::updateEffectiveInlineStyle() { 58 62 // Start with block-level styles ··· 145 149 centeredBlockStyle.textAlignDefined = true; 146 150 centeredBlockStyle.alignment = CssTextAlign::Center; 147 151 148 - // Special handling for tables - show placeholder text instead of dropping silently 152 + // Special handling for tables/cells: flatten into per-cell paragraphs with a prefixed header. 149 153 if (strcmp(name, "table") == 0) { 150 - // Add placeholder text 151 - self->startNewTextBlock(centeredBlockStyle); 154 + // skip nested tables 155 + if (self->tableDepth > 0) { 156 + self->tableDepth += 1; 157 + return; 158 + } 152 159 153 - self->italicUntilDepth = min(self->italicUntilDepth, self->depth); 154 - // Advance depth before processing character data (like you would for an element with text) 160 + if (self->partWordBufferIndex > 0) { 161 + self->flushPartWordBuffer(); 162 + } 163 + self->tableDepth += 1; 164 + self->tableRowIndex = 0; 165 + self->tableColIndex = 0; 166 + self->depth += 1; 167 + return; 168 + } 169 + 170 + if (self->tableDepth == 1 && strcmp(name, "tr") == 0) { 171 + self->tableRowIndex += 1; 172 + self->tableColIndex = 0; 155 173 self->depth += 1; 156 - self->characterData(userData, "[Table omitted]", strlen("[Table omitted]")); 174 + return; 175 + } 176 + 177 + if (self->tableDepth == 1 && (strcmp(name, "td") == 0 || strcmp(name, "th") == 0)) { 178 + if (self->partWordBufferIndex > 0) { 179 + self->flushPartWordBuffer(); 180 + } 181 + self->tableColIndex += 1; 182 + 183 + auto tableCellBlockStyle = BlockStyle(); 184 + tableCellBlockStyle.textAlignDefined = true; 185 + const auto align = (self->paragraphAlignment == static_cast<uint8_t>(CssTextAlign::None)) 186 + ? CssTextAlign::Justify 187 + : static_cast<CssTextAlign>(self->paragraphAlignment); 188 + tableCellBlockStyle.alignment = align; 189 + self->startNewTextBlock(tableCellBlockStyle); 190 + 191 + const std::string headerText = 192 + "Tab Row " + std::to_string(self->tableRowIndex) + ", Cell " + std::to_string(self->tableColIndex) + ":"; 193 + StyleStackEntry headerStyle; 194 + headerStyle.depth = self->depth; 195 + headerStyle.hasBold = true; 196 + headerStyle.bold = false; 197 + headerStyle.hasItalic = true; 198 + headerStyle.italic = true; 199 + headerStyle.hasUnderline = true; 200 + headerStyle.underline = false; 201 + self->inlineStyleStack.push_back(headerStyle); 202 + self->updateEffectiveInlineStyle(); 203 + self->characterData(userData, headerText.c_str(), static_cast<int>(headerText.length())); 204 + if (self->partWordBufferIndex > 0) { 205 + self->flushPartWordBuffer(); 206 + } 207 + self->nextWordContinues = false; 208 + self->inlineStyleStack.pop_back(); 209 + self->updateEffectiveInlineStyle(); 157 210 158 - // Skip table contents (skip until parent as we pre-advanced depth above) 159 - self->skipUntilDepth = self->depth - 1; 211 + self->depth += 1; 160 212 return; 161 213 } 162 214 ··· 445 497 void XMLCALL ChapterHtmlSlimParser::characterData(void* userData, const XML_Char* s, const int len) { 446 498 auto* self = static_cast<ChapterHtmlSlimParser*>(userData); 447 499 500 + // Skip content of nested table 501 + if (self->tableDepth > 1) { 502 + return; 503 + } 504 + 448 505 // Middle of skip 449 506 if (self->skipUntilDepth < self->depth) { 450 507 return; ··· 548 605 549 606 const bool styleWillChange = willPopStyleStack || willClearBold || willClearItalic || willClearUnderline; 550 607 const bool headerOrBlockTag = isHeaderOrBlock(name); 608 + const bool tableStructuralTag = isTableStructuralTag(name); 609 + 610 + if (self->tableDepth > 1 && strcmp(name, "table") == 0) { 611 + // get rid of all text inside the nested table 612 + self->partWordBufferIndex = 0; 613 + self->tableDepth -= 1; 614 + LOG_DBG("EHP", "nested table detected, get rid of its content"); 615 + return; 616 + } 551 617 552 618 // Flush buffer with current style BEFORE any style changes 553 619 if (self->partWordBufferIndex > 0) { 554 620 // Flush if style will change OR if we're closing a block/structural element 555 - const bool isInlineTag = !headerOrBlockTag && strcmp(name, "table") != 0 && 556 - !matches(name, IMAGE_TAGS, NUM_IMAGE_TAGS) && self->depth != 1; 621 + const bool isInlineTag = 622 + !headerOrBlockTag && !tableStructuralTag && !matches(name, IMAGE_TAGS, NUM_IMAGE_TAGS) && self->depth != 1; 557 623 const bool shouldFlush = styleWillChange || headerOrBlockTag || matches(name, BOLD_TAGS, NUM_BOLD_TAGS) || 558 624 matches(name, ITALIC_TAGS, NUM_ITALIC_TAGS) || 559 - matches(name, UNDERLINE_TAGS, NUM_UNDERLINE_TAGS) || strcmp(name, "table") == 0 || 625 + matches(name, UNDERLINE_TAGS, NUM_UNDERLINE_TAGS) || tableStructuralTag || 560 626 matches(name, IMAGE_TAGS, NUM_IMAGE_TAGS) || self->depth == 1; 561 627 562 628 if (shouldFlush) { ··· 573 639 // Leaving skip 574 640 if (self->skipUntilDepth == self->depth) { 575 641 self->skipUntilDepth = INT_MAX; 642 + } 643 + 644 + if (self->tableDepth == 1 && (strcmp(name, "td") == 0 || strcmp(name, "th") == 0)) { 645 + self->nextWordContinues = false; 646 + } 647 + 648 + if (self->tableDepth == 1 && (strcmp(name, "tr") == 0)) { 649 + self->nextWordContinues = false; 650 + } 651 + 652 + if (self->tableDepth == 1 && strcmp(name, "table") == 0) { 653 + self->tableDepth -= 1; 654 + self->tableRowIndex = 0; 655 + self->tableColIndex = 0; 656 + self->nextWordContinues = false; 576 657 } 577 658 578 659 // Leaving bold tag
+3
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
··· 62 62 bool effectiveBold = false; 63 63 bool effectiveItalic = false; 64 64 bool effectiveUnderline = false; 65 + int tableDepth = 0; 66 + int tableRowIndex = 0; 67 + int tableColIndex = 0; 65 68 66 69 void updateEffectiveInlineStyle(); 67 70 void startNewTextBlock(const BlockStyle& blockStyle);
test/epubs/test_tables.epub

This is a binary file and will not be displayed.