Co-locate llms.txt per-page markdown as {route}/index.md#55
Merged
Conversation
Move the per-page stripped-markdown copies from the flat /_llms/{path}.md
namespace to a copy co-located beside each page: {route}/index.md (root at
/index.md), mirroring the page's index.html. An agent reaches a page's
markdown by appending "index.md" to its URL, and the static build writes
the markdown into the same output folder as the page — no separate _llms
tree to discover or special-case.
- LlmsTxtOptions: drop OutputDirectory (no separate dir to configure).
- LlmsTxtService: BuildCoLocatedMarkdownPath/Url replace the OutputDirectory
scheme; internal-link rewriter and front-door/subtree links follow.
- LlmsArtifactContentService: claim /index.md (root, ExactClaim) plus
**/index.md (SuffixClaim) instead of the /_llms/**.md prefix claim.
- DocSite App.razor: robots-only hint names the new convention.
- Update integration assertions, example config, and docs/example prose.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
…rkdown">
Emit a per-page alternate link in the page <head> pointing at the page's
co-located {route}/index.md, so content-negotiating agents discover the
token-cheap markdown variant (Claude Code's WebFetch sends
Accept: text/markdown and keys on exactly this).
The href is pure string math on the page's own canonical route — no
llms-service lookup, so it never re-enters the self-fetching projection
(the constraint the old App.razor comment cited for omitting it). It is
gated on llms generation being enabled and the page actually having a
sidecar: the link lives in the content catch-all (DocSite Pages.razor) and
the post page (BlogSite Blog.razor), which routed components like the API
reference never reach, and `llms: false` / locale-fallback pages are gated
out. App.razor's robots-only body hint stays generic since it renders for
every route, including sidecar-less ones.
- DocSite Pages.razor / BlogSite Blog.razor: emit the alternate link.
- App.razor: comment now reflects head-link-present + generic body hint.
- Integration test: a content page advertises the link and the advertised
URL resolves to text/markdown.
- Docs: blog post prose matches the implemented behaviour.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
The docs landing page at "/" is a marketing splash (Razor Index.razor); it has no markdown form, so an agent asking for the home got a 404 on /index.md and 140KB+ of Tailwind/SVG HTML at "/". Now "/" advertises, and /index.md serves, a purpose-built orientation: what Pennington is, how to read the site as markdown, the Diátaxis map, and a quickstart. Served via MapGet(...).WithLlmsTxtEntry(...) — the documented pattern for custom markdown — so it never becomes a routed content page and leaves the marketing splash intact (an index.llms.md would instead be rendered as HTML at "/" by the page resolver, hijacking the landing page). The artifact router claims /index.md but falls through to the endpoint when no generated sidecar exists. Index.razor advertises it with <link rel="alternate" type="text/markdown">. - AgentHomeMarkdown.cs: the authored body + llms.txt entry title/description. - Program.cs: the /index.md endpoint. - Index.razor: the alternate link. - Integration test: "/" is marketing HTML and advertises /index.md; /index.md serves the authored markdown, not the converted splash. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
A Diátaxis tutorial under tutorials/beyond-basics that walks the reader through the agent-readable Markdown features: discovering the per-page `.md` copies and `/llms.txt` that AddDocSite wires automatically, seeing how agents find them (the `<link rel="alternate">` tag and the index), branding the front door with `llms-header.txt`, holding a page back with `llms: false`, and giving a Razor landing page a hand-written Markdown twin via MapGet + WithLlmsTxtEntry. Matches the existing tutorials' structure and voice (intro + prerequisites, numbered sections with <Steps>/<Checkpoint>, summary). Every checkpoint is a `curl` the reader can run, so each step produces a visible result. Stays accurate: the Markdown is reached at the explicit `.md` URL and advertised via the alternate link + llms.txt — Pennington does not negotiate on the Accept header, so the tutorial never claims it does. Verified end-to-end against the docs content through the integration fixture: the page renders, all xrefs resolve, nested code fences survive, and the page serves its own co-located Markdown copy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
LlmsOnly routes (*.llms.md) are documented as "agent-only content with no HTML page" — emitted to llms.txt and its sidecar markdown, excluded from nav, sitemap, search, and the static build. But PageResolver, the single entry point for HTTP page serving, didn't check the source type: it parsed and rendered LlmsOnlySource items like any markdown page, so requesting one returned HTML and leaked agent-only content to humans (and, in the docs, let an index.llms.md shadow the marketing landing page). Decline LlmsOnlySource matches in PageResolver.ResolveAsync so the request 404s. Using `continue` rather than returning lets a real HTML page from another service at the same slug still win. llms.txt is unaffected: the site projection renders llms-only items in-process and never depends on HTTP serving them. - PageResolver: skip LlmsOnlySource matches. - Unit tests: llms-only resolves to null; a real page at the same slug still wins. - Integration test: the docs' migrating-via-ai.llms.md 404s as HTML while its /index.md markdown copy still serves. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
|
🛰️ Docs preview: https://pr-55.pennington-dev.pages.dev Rebuilt on every push to this PR; torn down when it closes. |
With llms-only routes no longer served as HTML, the docs home no longer needs the MapGet workaround that existed only to avoid index.llms.md hijacking the marketing landing page. Move the machine-readable home into a plain content file: Content/index.llms.md produces the /index.md sidecar through the normal pipeline (front-matter header, content hash, token estimate), and the landing page at "/" stays the marketing splash. - Add Content/index.llms.md; delete AgentHomeMarkdown.cs and the /index.md MapGet + WithLlmsTxtEntry wiring (and the now-unused using). - Index.razor keeps the <link rel="alternate" type="text/markdown"> pointing at /index.md. - Update the markdown-for-agents tutorial's "Razor landing twin" section and summary to teach the index.llms.md approach the project now uses. - Home integration test asserts the content-pipeline output (front-matter header + converted body, marketing splash absent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
48a729f to
7331daf
Compare
…oudflare
Add a Cloudflare Pages advanced-mode _worker.js that serves a page's co-located {route}/index.md when the client sends `Accept: text/markdown` (e.g. Claude Code WebFetch), falling back to HTML otherwise. Pages with `llms: false` (no twin) fall through to HTML. Page responses gain `Vary: Accept` so caches keep the two representations separate.
The build wipes output/, so the worker source lives at docs/cloudflare/_worker.js and the deploy workflow copies it into output/_worker.js after minify, before `pages deploy`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move the per-page stripped-markdown copies from the flat /_llms/{path}.md
namespace to a copy co-located beside each page: {route}/index.md (root at
/index.md), mirroring the page's index.html. An agent reaches a page's
markdown by appending "index.md" to its URL, and the static build writes
the markdown into the same output folder as the page — no separate _llms
tree to discover or special-case.
scheme; internal-link rewriter and front-door/subtree links follow.
/index.md (SuffixClaim) instead of the /_llms/.md prefix claim.
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn