Skip to content

Conversation

@graycreate
Copy link
Member

Summary

  • Add comprehensive HTML tag support for RichView component
  • Implement table rendering (converts HTML tables to Markdown table format)
  • Add strikethrough support (del, s, strike~~text~~)
  • Add underline support (u, ins_text_)
  • Add superscript/subscript support (sup, sub)
  • Add mark/highlight support (mark==text==)
  • Add definition list support (dl, dt, dd)
  • Add semantic element support (abbr, cite, kbd, samp, var, small)
  • Add figure elements (figure, figcaption)
  • Add document structure elements (address, time, details, summary)
  • Add container elements (article, section, nav, aside, header, footer, main, caption)

This fixes the issue where V2EX posts containing tables (like https://www.v2ex.com/t/1176026) were not rendering correctly.

Test plan

  • Added unit tests for all new HTML tag conversions
  • Added tests for table conversion with thead/tbody
  • Added tests for strikethrough, underline, mark rendering
  • Added performance test for complex tables
  • Build succeeded
  • All existing tests pass

🤖 Generated with Claude Code

Add support for previously unsupported HTML tags:
- Table support (table, thead, tbody, tfoot, tr, th, td)
- Strikethrough (del, s, strike)
- Underline (u, ins)
- Superscript/subscript (sup, sub)
- Mark/highlight (mark)
- Definition lists (dl, dt, dd)
- Semantic elements (abbr, cite, kbd, samp, var, small)
- Figure elements (figure, figcaption)
- Document structure (address, time, details, summary)
- Container elements (article, section, nav, aside, header, footer, main, caption)

Also added corresponding tests for all new HTML tags and
strikethrough/highlight rendering in MarkdownRenderer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 1, 2025 02:23
Copilot finished reviewing on behalf of graycreate December 1, 2025 02:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive HTML tag support to the RichView component, enabling proper rendering of V2EX posts that contain tables and various semantic HTML elements. The implementation converts HTML to markdown format which is then rendered with appropriate styling.

Key changes:

  • Implements table conversion from HTML to markdown format with support for thead/tbody
  • Adds support for text formatting tags (strikethrough, underline, superscript/subscript, highlight)
  • Adds semantic HTML element support (abbreviations, citations, keyboard input, etc.)
  • Includes extensive unit tests for HTML to markdown conversion
  • Adds performance tests for complex table rendering

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
V2er/Sources/RichView/Converters/HTMLToMarkdownConverter.swift Adds conversion logic for tables, strikethrough, underline, superscript/subscript, mark/highlight, definition lists, and semantic elements to markdown format
V2er/Sources/RichView/Renderers/MarkdownRenderer.swift Implements markdown table rendering with visual styling, adds strikethrough and highlight rendering support
V2erTests/RichView/HTMLToMarkdownConverterTests.swift Comprehensive test coverage for all new HTML tag conversions including tables, formatting tags, semantic elements, and performance tests

Comment on lines 193 to 196
// Underline - no standard markdown, render as emphasized text
case "u", "ins":
let content = try convertElement(childElement)
result += "_\(content)_"
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underline HTML tags (<u> and <ins>) are converted to markdown format _text_, but the MarkdownRenderer does not have logic to render this format with actual underline styling. The underscore format will be interpreted as italic (similar to *text*), not underline.

Consider either:

  1. Adding underline rendering support in renderInlineMarkdown() method to handle _text_ with underlineStyle = .single
  2. Using a different markdown format for underline that doesn't conflict with italic (e.g., <u>text</u> or custom markers)
Suggested change
// Underline - no standard markdown, render as emphasized text
case "u", "ins":
let content = try convertElement(childElement)
result += "_\(content)_"
// Underline - no standard markdown, preserve as HTML <u> for custom renderer
case "u", "ins":
let content = try convertElement(childElement)
result += "<u>\(content)</u>"

Copilot uses AI. Check for mistakes.
Comment on lines 198 to 205
// Superscript/subscript - render with markers
case "sup":
let content = try convertElement(childElement)
result += "^\(content)"

case "sub":
let content = try convertElement(childElement)
result += "~\(content)"
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superscript and subscript HTML tags are converted to markdown format (^text and ~text), but the MarkdownRenderer does not have logic to render these formats. These will be rendered as plain text without any special styling.

Additionally, the subscript marker ~ could conflict with regular tildes in text (e.g., "~5 minutes ago"). Consider:

  1. Adding rendering support in renderInlineMarkdown() for ^ and ~ markers
  2. Using a more distinctive marker that's less likely to appear in regular text (e.g., ^{text} or HTML-style <sup>text</sup>)
Suggested change
// Superscript/subscript - render with markers
case "sup":
let content = try convertElement(childElement)
result += "^\(content)"
case "sub":
let content = try convertElement(childElement)
result += "~\(content)"
// Superscript/subscript - render as HTML tags to preserve formatting
case "sup":
let content = try convertElement(childElement)
result += "<sup>\(content)</sup>"
case "sub":
let content = try convertElement(childElement)
result += "<sub>\(content)</sub>"

Copilot uses AI. Check for mistakes.
Comment on lines 387 to 388
// Skip separator row (| --- | --- |)
if line.contains("---") {
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table separator detection using line.contains("---") could incorrectly skip rows that legitimately contain three consecutive hyphens in their cell content (e.g., a cell with "2020---2025" or "pre---post").

Consider a more precise check that verifies the line matches the separator pattern more strictly:

if line.range(of: #"^\|\s*---+\s*(\|\s*---+\s*)*\|$"#, options: .regularExpression) != nil {

This ensures the line consists only of pipes and hyphens in the expected separator format.

Suggested change
// Skip separator row (| --- | --- |)
if line.contains("---") {
// Skip separator row (| --- | --- | or with colons for alignment)
if line.range(of: #"^\|\s*(:?-+:?)\s*(\|\s*(:?-+:?)\s*)*\|$"#, options: .regularExpression) != nil {

Copilot uses AI. Check for mistakes.
let tagName = cell.tagName().lowercased()
if tagName == "th" || tagName == "td" {
let content = try convertElement(cell)
.replacingOccurrences(of: "\n", with: " ")
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table cells containing pipe characters (|) are not escaped, which will break the markdown table structure. For example, a cell containing "option A | option B" would be split into multiple columns.

Consider escaping pipe characters in cell content before building the table:

let content = try convertElement(cell)
    .replacingOccurrences(of: "\n", with: " ")
    .replacingOccurrences(of: "|", with: "\\|")  // Escape pipes
    .trimmingCharacters(in: .whitespaces)
Suggested change
.replacingOccurrences(of: "\n", with: " ")
.replacingOccurrences(of: "\n", with: " ")
.replacingOccurrences(of: "|", with: "\\|") // Escape pipes for Markdown tables

Copilot uses AI. Check for mistakes.
Comment on lines 328 to 354
var headerRowCount = 0

// Get all rows from thead and tbody
let allRows = try element.select("tr")

for row in allRows {
var cells: [String] = []
let thCells = try row.select("th")
let isHeaderRow = row.parent()?.tagName().lowercased() == "thead"
|| !thCells.isEmpty()

// Get th and td cells
for cell in row.children() {
let tagName = cell.tagName().lowercased()
if tagName == "th" || tagName == "td" {
let content = try convertElement(cell)
.replacingOccurrences(of: "\n", with: " ")
.trimmingCharacters(in: .whitespaces)
cells.append(content)
}
}

if !cells.isEmpty {
rows.append(cells)
if isHeaderRow && headerRowCount == 0 {
headerRowCount = 1
}
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The headerRowCount variable is incremented when a header row is detected but is never used to influence the table conversion logic. The separator is unconditionally added after the first row (line 378) regardless of whether a header was detected.

Consider either:

  1. Removing this unused variable and the isHeaderRow logic if it's not needed
  2. Using it to control separator placement (though markdown tables always require a separator after the first row, so the current behavior may be intentional)

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Code Coverage Report ❌

Current coverage: 17.7%

- Preserve <u> HTML tags for underline instead of converting to _text_
  to avoid conflict with italic styling
- Preserve <sup>/<sub> HTML tags for superscript/subscript instead of
  ^/~ markers to avoid conflicts with regular text
- Use regex pattern for table separator detection to avoid false
  positives when cell content contains "---"
- Escape pipe characters in table cells to prevent markdown table
  structure breakage
- Remove unused headerRowCount variable and related code
- Update tests to reflect new behavior
- Add test for pipe escaping in table cells

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions bot added size/XL and removed size/XL labels Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Code Coverage Report ❌

Current coverage: 25.62%

@graycreate graycreate merged commit 75f5d90 into main Dec 1, 2025
6 checks passed
@graycreate graycreate deleted the feature/richview-html-tags-support branch December 1, 2025 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants