-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(server): parse ydoc to markdown #12812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: canary
Are you sure you want to change the base?
Conversation
WalkthroughA new feature was added to support retrieving a document's complete markdown content via a backend API. This includes a new endpoint, utility functions for parsing document snapshots to markdown, test cases (unit and end-to-end), and snapshot files to verify output correctness. No existing exported entities were changed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Controller
participant DocReader
participant Database
participant RPCService
Client->>Controller: GET /workspaces/:wsId/docs/:docId/markdown
Controller->>DocReader: getDocMarkdown(wsId, docId)
alt RPC DocReader
DocReader->>RPCService: fetch markdown (wsId, docId)
alt Success
RPCService-->>DocReader: {title, markdown}
DocReader-->>Controller: {title, markdown}
else Failure (network, etc.)
DocReader->>Database: getDocMarkdown(wsId, docId)
Database-->>DocReader: {title, markdown} or null
DocReader-->>Controller: {title, markdown} or null
end
else Database DocReader
DocReader->>Database: getDocMarkdown(wsId, docId)
Database-->>DocReader: {title, markdown} or null
DocReader-->>Controller: {title, markdown} or null
end
alt Found
Controller-->>Client: 200 OK, JSON {title, markdown}
else Not found
Controller-->>Client: 404 Not Found, error JSON
end
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes found. Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (9)
⏰ Context from checks skipped due to timeout of 90000ms (56)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
03e15d8
to
dd21429
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## canary #12812 +/- ##
==========================================
+ Coverage 55.59% 55.62% +0.02%
==========================================
Files 2658 2658
Lines 125982 126073 +91
Branches 19961 19968 +7
==========================================
+ Hits 70042 70124 +82
- Misses 53601 53610 +9
Partials 2339 2339
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
016adc0
to
a9e4dd0
Compare
a9e4dd0
to
e4d26eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (10)
packages/backend/server/src/core/utils/blocksuite.ts (2)
11-14
: Avoid ambiguous aliasing – keep original helper name to aid readability
parsePageDoc
is re-exported under the local aliasparseDocToMarkdown
.
Because the file later introduces another public function namedparseDocToMarkdownFromDocSnapshot
, this aliasing makes it harder to reason about which symbol is doing the heavy lifting. Keeping the original import name (or choosing a more explicit alias such asparsePageDocToMarkdown
) markedly reduces mental overhead.-import { - parsePageDoc as parseDocToMarkdown, +import { + parsePageDoc as parsePageDocToMarkdown,and adjust references accordingly.
201-226
: URL helper uses singular “workspace” – check for broken links
buildDocUrl
returns/workspace/${workspaceId}/${docId}
(singular workspace).
Elsewhere in the backend the canonical route is/workspaces/:workspaceId/...
(plural). Rendering markdown with wrong links will yield 404s in the UI.- buildDocUrl: (docId: string) => { - return `/workspace/${workspaceId}/${docId}`; + buildDocUrl: (docId: string) => { + return `/workspaces/${workspaceId}/${docId}`; },Please confirm the expected URL schema throughout the client before applying.
packages/backend/server/src/core/utils/__tests__/blocksute.spec.ts (1)
93-101
: Test does not assert non-null result – potential false positivesThe snapshot will happily record
null
, so the test would still pass even if markdown parsing failed. Add an explicit truthy check to guard against silent regressions.- const result = parseDocToMarkdownFromDocSnapshot( + const result = parseDocToMarkdownFromDocSnapshot( workspace.id, docSnapshot.id, docSnapshot.blob ); + t.truthy(result); t.snapshot(result);packages/backend/server/src/core/doc/__tests__/reader-from-database.spec.ts (1)
267-274
: Strengthen assertion before snapshotSame issue as in the util‐test: if
getDocMarkdown
returnsnull
, the snapshot will still pass. Add a sanity check.- const result = await docReader.getDocMarkdown(workspace.id, docSnapshot.id); + const result = await docReader.getDocMarkdown(workspace.id, docSnapshot.id); + t.truthy(result); t.snapshot(result);packages/backend/server/src/core/doc-service/controller.ts (1)
45-57
: Consider exposing content type or headers for markdown endpoint
getDocMarkdown
returns a JSON object containing title & markdown, yet consumers might reasonably expect rawtext/markdown
. Clarify the contract (e.g./markdown
→ JSON vs/markdown/raw
→ plain MD) or at least document it. Returning the markdown directly withres.type('text/markdown')
could avoid an extra unwrap on the client side.packages/backend/server/src/core/doc/__tests__/reader-from-rpc.spec.ts (1)
379-407
: Large snapshot duplication inflates repo & review noise.The exact same ~120-line markdown blob is now stored in three snapshot files. Consider:
- Extracting it to a fixture (
fixtures/affine-doc.md
) and snapshotting only a hash orfs.readFileSync
result.- Using an AVA macro to share a single snapshot across suites.
That keeps repo size down and avoids triple updates whenever the markdown changes.
packages/backend/server/src/__tests__/e2e/doc-service/controller.spec.ts (2)
19-25
: Token binding could be tightened to the workspace for better security.
crypto.sign(docSnapshot.id)
signs only the document id, so any user who learns a valid token for one doc can try it against other workspaces.
Consider signing both identifiers (e.g.workspaceId:docId
) or embedding a nonce.This is an optional hardening step but cheap to adopt now.
28-42
: Minor: you’re asserting the fullContent-Type
string; matching only the MIME (/application\/json/
) avoids failures if charset changes.packages/backend/server/src/core/doc/reader.ts (2)
37-41
: Interface added, but consider future-proof naming
DocMarkdown
currently contains onlytitle
andmarkdown
.
If we expect front-matter (tags, createdAt, etc.) later, a more genericDocMarkdownContent
(or similar) would leave room for extension without a breaking rename.
329-354
: RPC fallback path duplicates logging logicThe catch-block here is an almost verbatim copy of the one in
getDoc
/getDocDiff
.
Refactor to a private helper to reduce duplication and guarantee consistent logging semantics.- } catch (e) { - if (e instanceof UserFriendlyError) { - throw e; - } - const err = e as Error; - ... - return await super.getDocMarkdown(workspaceId, docId); - } + } catch (e) { + return await this.handleRpcError( + e, + `doc markdown ${url}`, + () => super.getDocMarkdown(workspaceId, docId) + ); + }(Helper would encapsulate the existing logic.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
packages/backend/server/src/__tests__/e2e/doc-service/__snapshots__/controller.spec.ts.snap
is excluded by!**/*.snap
packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-database.spec.ts.snap
is excluded by!**/*.snap
packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-rpc.spec.ts.snap
is excluded by!**/*.snap
packages/backend/server/src/core/utils/__tests__/__snapshots__/blocksute.spec.ts.snap
is excluded by!**/*.snap
📒 Files selected for processing (12)
packages/backend/server/src/__tests__/e2e/doc-service/__snapshots__/controller.spec.ts.md
(1 hunks)packages/backend/server/src/__tests__/e2e/doc-service/controller.spec.ts
(1 hunks)packages/backend/server/src/core/doc-service/__tests__/controller.spec.ts
(4 hunks)packages/backend/server/src/core/doc-service/controller.ts
(1 hunks)packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-database.spec.ts.md
(1 hunks)packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-rpc.spec.ts.md
(1 hunks)packages/backend/server/src/core/doc/__tests__/reader-from-database.spec.ts
(1 hunks)packages/backend/server/src/core/doc/__tests__/reader-from-rpc.spec.ts
(3 hunks)packages/backend/server/src/core/doc/reader.ts
(5 hunks)packages/backend/server/src/core/utils/__tests__/__snapshots__/blocksute.spec.ts.md
(1 hunks)packages/backend/server/src/core/utils/__tests__/blocksute.spec.ts
(2 hunks)packages/backend/server/src/core/utils/blocksuite.ts
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (55)
- GitHub Check: Unit Test (3)
- GitHub Check: Unit Test (4)
- GitHub Check: Unit Test (1)
- GitHub Check: Unit Test (2)
- GitHub Check: Unit Test (5)
- GitHub Check: Native Unit Test
- GitHub Check: test-build-mobile-app / build-android-web
- GitHub Check: test-build-mobile-app / build-ios-web
- GitHub Check: fuzzing
- GitHub Check: E2E Test (1)
- GitHub Check: y-octo binding test on x86_64-pc-windows-msvc
- GitHub Check: y-octo binding test on x86_64-apple-darwin
- GitHub Check: Run native tests
- GitHub Check: y-octo binding test on aarch64-pc-windows-msvc
- GitHub Check: E2E BlockSuite Test (6)
- GitHub Check: E2E BlockSuite Test (7)
- GitHub Check: E2E BlockSuite Test (2)
- GitHub Check: E2E BlockSuite Test (8)
- GitHub Check: E2E BlockSuite Test (9)
- GitHub Check: E2E BlockSuite Test (10)
- GitHub Check: E2E BlockSuite Test (5)
- GitHub Check: E2E BlockSuite Test (4)
- GitHub Check: E2E BlockSuite Test (1)
- GitHub Check: E2E BlockSuite Test (3)
- GitHub Check: E2E Test (10)
- GitHub Check: Build Server native
- GitHub Check: E2E Test (6)
- GitHub Check: E2E Test (4)
- GitHub Check: E2E Test (8)
- GitHub Check: E2E Test (5)
- GitHub Check: E2E Test (7)
- GitHub Check: E2E Test (2)
- GitHub Check: E2E Test (3)
- GitHub Check: E2E Test (9)
- GitHub Check: E2E BlockSuite Cross Browser Test (1, chromium)
- GitHub Check: E2E Mobile Test (5)
- GitHub Check: E2E Mobile Test (4)
- GitHub Check: loom thread test
- GitHub Check: E2E Mobile Test (3)
- GitHub Check: Build AFFiNE native (aarch64-pc-windows-msvc)
- GitHub Check: E2E Mobile Test (2)
- GitHub Check: E2E Mobile Test (1)
- GitHub Check: Build AFFiNE native (x86_64-pc-windows-msvc)
- GitHub Check: E2E BlockSuite Cross Browser Test (2, webkit)
- GitHub Check: E2E BlockSuite Cross Browser Test (1, webkit)
- GitHub Check: Build @affine/electron renderer
- GitHub Check: E2E BlockSuite Cross Browser Test (2, firefox)
- GitHub Check: E2E BlockSuite Cross Browser Test (1, firefox)
- GitHub Check: Analyze (typescript, blocksuite)
- GitHub Check: E2E BlockSuite Cross Browser Test (2, chromium)
- GitHub Check: Analyze (javascript, affine)
- GitHub Check: Analyze (typescript, affine)
- GitHub Check: Analyze (javascript, blocksuite)
- GitHub Check: Typecheck
- GitHub Check: Lint
🔇 Additional comments (9)
packages/backend/server/src/core/utils/blocksuite.ts (1)
211-220
: Future-proof for possible async return valueIf
@affine/reader/dist
.parsePageDocToMarkdown (currently synchronous) ever becomes async, this wrapper will silently return a pending Promise.
Safest is toawait
and mark this functionasync
now – zero runtime cost today, avoids a class of bugs later.-export function parseDocToMarkdownFromDocSnapshot( +export async function parseDocToMarkdownFromDocSnapshot( ... -) { ... - const parsed = parseDocToMarkdown({ + const parsed = await parsePageDocToMarkdown({packages/backend/server/src/__tests__/e2e/doc-service/__snapshots__/controller.spec.ts.md (1)
1-119
: Snapshot file – nothing actionable.packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-database.spec.ts.md (1)
1-107
: Snapshot file – nothing actionable.packages/backend/server/src/core/doc/__tests__/__snapshots__/reader-from-rpc.spec.ts.md (1)
1-107
: Snapshot file – nothing actionable.packages/backend/server/src/core/doc-service/__tests__/controller.spec.ts (4)
178-193
: Good addition: explicitly asserting JSONContent-Type
.The extra
Content-Type
expectations tighten the contract of the endpoint and protect against accidental regressions (e.g. serving plain text).
Looks correct and stable.
210-215
: Consistent header assertion retainedSame positive remark as above – thanks for keeping the tests symmetric between
full=true
and default branches.
258-278
: 👍 New happy-path test for/markdown
endpointThe new test covers:
- endpoint routing
- DB reader wiring through the mock
- correct headers + body
Nothing missing here.
280-297
: 404 branch well-coveredNegative path mirrors other endpoints; test logic & assertions LGTM.
packages/backend/server/src/core/utils/__tests__/__snapshots__/blocksute.spec.ts.md (1)
1370-1470
: Approve new snapshot for markdown parsing
The freshly added snapshot forcan parse doc to markdown from doc snapshot
accurately captures the expectedmarkdown
string andtitle
output. The headings, blank lines, lists, links, and table formatting conform to the feature’s requirements and ensure reliable regression testing.
const module = await createModule({ | ||
imports: [DocStorageModule], | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid shadowing Node’s built-in module
identifier & heavy top-level await
.
- Declaring a variable literally named
module
hides Node’s built-inmodule
(always defined in CJS, sometimes poly-filled by tooling). That invites “cannot read property of undefined”–style head-scratchers when helpers assume the global still exists. - Kicking off
createModule
with top-levelawait
means expensive I/O runs as soon as the file is required, even when AVA filters tests. Moving it intotest.before()
keeps the test runner fast and predictable.
-const module = await createModule({
- imports: [DocStorageModule],
-});
+// Initialise lazily inside a hook, and use a less confusing name.
+let testModule: Awaited<ReturnType<typeof createModule>>;
+
+test.before(async () => {
+ testModule = await createModule({ imports: [DocStorageModule] });
+});
(You’ll need to update subsequent references from module
→ testModule
.)
🤖 Prompt for AI Agents
In packages/backend/server/src/core/doc/__tests__/reader-from-rpc.spec.ts around
lines 22 to 25, rename the variable `module` to `testModule` to avoid shadowing
Node's built-in `module` identifier. Also, move the `await createModule({
imports: [DocStorageModule] })` call from the top-level to inside a
`test.before()` hook to prevent expensive I/O from running immediately on file
load. Update all subsequent references from `module` to `testModule`
accordingly.
abstract getDocMarkdown( | ||
workspaceId: string, | ||
docId: string | ||
): Promise<DocMarkdown | null>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Abstract method lacks caching strategy
getDocMarkdown
is introduced without a cache analogue to getDocContent
.
Rendering markdown from a YDoc snapshot is CPU-heavy and repeatedly hitting the DB on every request is avoidable.
+ // TODO: consider `getDocMarkdownWithoutCache` + 7-day cache, symmetric with doc content
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
abstract getDocMarkdown( | |
workspaceId: string, | |
docId: string | |
): Promise<DocMarkdown | null>; | |
// TODO: consider `getDocMarkdownWithoutCache` + 7-day cache, symmetric with doc content | |
abstract getDocMarkdown( | |
workspaceId: string, | |
docId: string | |
): Promise<DocMarkdown | null>; |
🤖 Prompt for AI Agents
In packages/backend/server/src/core/doc/reader.ts around lines 68 to 72, the
abstract method getDocMarkdown lacks a caching mechanism similar to
getDocContent. To fix this, introduce a caching strategy for getDocMarkdown to
store and retrieve rendered markdown results, reducing CPU load and database
hits. Implement a cache layer that checks for existing markdown before rendering
from the YDoc snapshot and fetching from the database.
async getDocMarkdown( | ||
workspaceId: string, | ||
docId: string | ||
): Promise<DocMarkdown | null> { | ||
const doc = await this.workspace.getDoc(workspaceId, docId); | ||
if (!doc) { | ||
return null; | ||
} | ||
return parseDocToMarkdownFromDocSnapshot(workspaceId, docId, doc.bin); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database implementation: one missing null-guard
parseDocToMarkdownFromDocSnapshot
can theoretically return null
/undefined
(e.g. corrupted snapshot).
If that happens we’ll violate the declared return type and propagate an undefined value.
- return parseDocToMarkdownFromDocSnapshot(workspaceId, docId, doc.bin);
+ const markdown = parseDocToMarkdownFromDocSnapshot(workspaceId, docId, doc.bin);
+ return markdown ?? null;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async getDocMarkdown( | |
workspaceId: string, | |
docId: string | |
): Promise<DocMarkdown | null> { | |
const doc = await this.workspace.getDoc(workspaceId, docId); | |
if (!doc) { | |
return null; | |
} | |
return parseDocToMarkdownFromDocSnapshot(workspaceId, docId, doc.bin); | |
} | |
async getDocMarkdown( | |
workspaceId: string, | |
docId: string | |
): Promise<DocMarkdown | null> { | |
const doc = await this.workspace.getDoc(workspaceId, docId); | |
if (!doc) { | |
return null; | |
} | |
const markdown = parseDocToMarkdownFromDocSnapshot(workspaceId, docId, doc.bin); | |
return markdown ?? null; | |
} |
🤖 Prompt for AI Agents
In packages/backend/server/src/core/doc/reader.ts around lines 185 to 194, the
function getDocMarkdown calls parseDocToMarkdownFromDocSnapshot which may return
null or undefined, violating the declared return type Promise<DocMarkdown |
null>. To fix this, add a null check after calling
parseDocToMarkdownFromDocSnapshot and explicitly return null if the result is
null or undefined, ensuring the function always returns a value matching its
declared type.
e4d26eb
to
541a2cc
Compare
close AI-190
Summary by CodeRabbit
New Features
Tests
Documentation
PR Dependency Tree
This tree was auto-generated by Charcoal