Skip to content

fix(pptx): round-trip preserve UnknownElement on save#39

Merged
karthikmudunuri merged 1 commit into
mainfrom
karthikmudunuri/pptx-roundtrip-preserve
May 12, 2026
Merged

fix(pptx): round-trip preserve UnknownElement on save#39
karthikmudunuri merged 1 commit into
mainfrom
karthikmudunuri/pptx-roundtrip-preserve

Conversation

@karthikmudunuri
Copy link
Copy Markdown
Member

Summary

Stops silent data loss on save when a deck contains charts, SmartArt, group shapes, OLE, math, or any other OOXML Slidewise couldn't model on import.

The bug

`deckToPptx.addElement`'s `case "unknown": return;` made the serializer ignore every `UnknownElement` — even though the importer was already preserving the raw OOXML on those elements (`UnknownElement.ooxmlXml`). Every parse → save cycle silently destroyed those parts of the deck.

The fix

  • Parse side: `parsePptx` stashes the original archive bytes on the returned `Deck` (non-enumerable) and the source slide path on each `Slide`. No public schema change — internal contract used only by the parse/serialize boundary.
  • Serialize side: after pptxgenjs writes the zip, a new post-process step (`preserveUnknowns`) finds every slide carrying `UnknownElement` payloads, injects each `ooxmlXml` fragment into the generated `<p:spTree>`, and copies the rels + media those fragments referenced from the source archive. rIds get renumbered to avoid clashes with what pptxgenjs already wrote, and preserved media targets get a unique `slidewise_preserved_N_` prefix so they don't collide with media pptxgenjs allocated.
  • API: `parsePptx` now also accepts `Uint8Array` (in addition to `Blob` and `ArrayBuffer`) so the server-side `serialize → arrayBuffer → parsePptx` loop works without an extra allocation.
  • Test: new `roundtrip.test.ts` case builds a synthetic deck with a SmartArt-style `<dgm:relIds r:dm="rId7">` fragment, runs the full parse → serialize → re-parse loop, and asserts the fragment lands in the new `spTree` plus the referenced PNG is copied to the output with a fresh rId.

Verified end-to-end

Round-tripped the Dickinson Sample deck: its bar chart on slide 4 (a `<p:graphicFrame>` wrapping `<c:chart r:id="rId2">`) survives — the graphicFrame XML is injected, `chart1.xml` is copied to `ppt/charts/slidewise_preserved_0_chart1.xml`, and the slide's rels now expose `rId3 → slidewise_preserved_0_chart1.xml`. Re-parsing the output yields the same 1 `UnknownElement` as the original.

What this doesn't try to do

  • No EMF/WMF rendering — those are still dropped on parse with a diagnostic. The follow-up PR planned in docs: plan — PPTX round 2 (table styles, charts, EMF fallbacks) #38 covers them.
  • No native types for chart/SmartArt/group — the data is preserved verbatim and re-emitted on save, but Slidewise's editor still treats them as opaque `UnknownElement` blocks. That's enough to stop data loss; first-class editability is a separate piece.
  • No `Content_Types.xml` patching — pptxgenjs already declares common image content types and we don't introduce new ones beyond what the source already had. If a deck ships an unusual content type only referenced from an UnknownElement, we may need to merge `[Content_Types].xml` too; flag as a follow-up if you hit it.

Test plan

  • `pnpm -F @textcortex/slidewise test` — 31/31 pass (one new round-trip case).
  • `pnpm -F @textcortex/slidewise build:lib` — clean.
  • Round-tripped the Dickinson sample deck: unknown count and chart fragment survive intact.
  • Double round-tripped the Dickinson sample to confirm idempotence (no exponential blow-up of preserved entries).
  • Smoke-test in the website editor: open a deck with a chart, save, reopen.

Slidewise's importer wraps OOXML it can't model — charts, SmartArt,
group shapes, OLE, math, complex tables — into an UnknownElement
carrying the raw XML (ooxmlXml). The serializer's case "unknown":
return; meant every parse → save cycle silently destroyed those
parts of the deck.

- parsePptx now keeps the original archive bytes on the Deck and the
  source slide path on each Slide (non-enumerable so they don't
  pollute JSON dumps).
- serializeDeck post-processes the zip pptxgenjs writes: for every
  slide whose Deck-side counterpart carries UnknownElement payloads,
  inject each ooxmlXml fragment into the generated <p:spTree>, copy
  the rels + media those fragments referenced from the source
  archive, renumber rIds to avoid clashes with what pptxgenjs already
  wrote, and uniquely prefix preserved media targets so they don't
  collide with media pptxgenjs allocated.
- parsePptx also accepts Uint8Array directly so the server-side
  serialize → arrayBuffer → parsePptx loop works without an extra
  allocation.
- New regression test exercises the full parse → serialize →
  re-parse loop on a synthetic SmartArt-style fragment, asserting
  that the fragment lands in the new spTree and the referenced PNG
  is copied to the output with a fresh rId.

Verified end-to-end on the Dickinson sample (its bar chart on slide 4
is preserved as a graphicFrame → c:chart fragment plus chart1.xml
copied across as ppt/charts/slidewise_preserved_0_chart1.xml).
@karthikmudunuri karthikmudunuri merged commit e78ce35 into main May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant