Skip to content

Co-locate llms.txt per-page markdown as {route}/index.md#55

Merged
phil-scott-78 merged 7 commits into
mainfrom
claude/dotnet-docs-cache-hosting-ywaso0
Jun 24, 2026
Merged

Co-locate llms.txt per-page markdown as {route}/index.md#55
phil-scott-78 merged 7 commits into
mainfrom
claude/dotnet-docs-cache-hosting-ywaso0

Conversation

@phil-scott-78

Copy link
Copy Markdown
Contributor

Move the per-page stripped-markdown copies from the flat /_llms/{path}.md
namespace to a copy co-located beside each page: {route}/index.md (root at
/index.md), mirroring the page's index.html. An agent reaches a page's
markdown by appending "index.md" to its URL, and the static build writes
the markdown into the same output folder as the page — no separate _llms
tree to discover or special-case.

  • LlmsTxtOptions: drop OutputDirectory (no separate dir to configure).
  • LlmsTxtService: BuildCoLocatedMarkdownPath/Url replace the OutputDirectory
    scheme; internal-link rewriter and front-door/subtree links follow.
  • LlmsArtifactContentService: claim /index.md (root, ExactClaim) plus
    /index.md (SuffixClaim) instead of the /_llms/.md prefix claim.
  • DocSite App.razor: robots-only hint names the new convention.
  • Update integration assertions, example config, and docs/example prose.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn

claude added 5 commits June 23, 2026 21:51
Move the per-page stripped-markdown copies from the flat /_llms/{path}.md
namespace to a copy co-located beside each page: {route}/index.md (root at
/index.md), mirroring the page's index.html. An agent reaches a page's
markdown by appending "index.md" to its URL, and the static build writes
the markdown into the same output folder as the page — no separate _llms
tree to discover or special-case.

- LlmsTxtOptions: drop OutputDirectory (no separate dir to configure).
- LlmsTxtService: BuildCoLocatedMarkdownPath/Url replace the OutputDirectory
  scheme; internal-link rewriter and front-door/subtree links follow.
- LlmsArtifactContentService: claim /index.md (root, ExactClaim) plus
  **/index.md (SuffixClaim) instead of the /_llms/**.md prefix claim.
- DocSite App.razor: robots-only hint names the new convention.
- Update integration assertions, example config, and docs/example prose.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
…rkdown">

Emit a per-page alternate link in the page <head> pointing at the page's
co-located {route}/index.md, so content-negotiating agents discover the
token-cheap markdown variant (Claude Code's WebFetch sends
Accept: text/markdown and keys on exactly this).

The href is pure string math on the page's own canonical route — no
llms-service lookup, so it never re-enters the self-fetching projection
(the constraint the old App.razor comment cited for omitting it). It is
gated on llms generation being enabled and the page actually having a
sidecar: the link lives in the content catch-all (DocSite Pages.razor) and
the post page (BlogSite Blog.razor), which routed components like the API
reference never reach, and `llms: false` / locale-fallback pages are gated
out. App.razor's robots-only body hint stays generic since it renders for
every route, including sidecar-less ones.

- DocSite Pages.razor / BlogSite Blog.razor: emit the alternate link.
- App.razor: comment now reflects head-link-present + generic body hint.
- Integration test: a content page advertises the link and the advertised
  URL resolves to text/markdown.
- Docs: blog post prose matches the implemented behaviour.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
The docs landing page at "/" is a marketing splash (Razor Index.razor); it
has no markdown form, so an agent asking for the home got a 404 on /index.md
and 140KB+ of Tailwind/SVG HTML at "/". Now "/" advertises, and /index.md
serves, a purpose-built orientation: what Pennington is, how to read the site
as markdown, the Diátaxis map, and a quickstart.

Served via MapGet(...).WithLlmsTxtEntry(...) — the documented pattern for
custom markdown — so it never becomes a routed content page and leaves the
marketing splash intact (an index.llms.md would instead be rendered as HTML at
"/" by the page resolver, hijacking the landing page). The artifact router
claims /index.md but falls through to the endpoint when no generated sidecar
exists. Index.razor advertises it with <link rel="alternate" type="text/markdown">.

- AgentHomeMarkdown.cs: the authored body + llms.txt entry title/description.
- Program.cs: the /index.md endpoint.
- Index.razor: the alternate link.
- Integration test: "/" is marketing HTML and advertises /index.md; /index.md
  serves the authored markdown, not the converted splash.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
A Diátaxis tutorial under tutorials/beyond-basics that walks the reader
through the agent-readable Markdown features: discovering the per-page
`.md` copies and `/llms.txt` that AddDocSite wires automatically, seeing
how agents find them (the `<link rel="alternate">` tag and the index),
branding the front door with `llms-header.txt`, holding a page back with
`llms: false`, and giving a Razor landing page a hand-written Markdown twin
via MapGet + WithLlmsTxtEntry.

Matches the existing tutorials' structure and voice (intro + prerequisites,
numbered sections with <Steps>/<Checkpoint>, summary). Every checkpoint is a
`curl` the reader can run, so each step produces a visible result. Stays
accurate: the Markdown is reached at the explicit `.md` URL and advertised
via the alternate link + llms.txt — Pennington does not negotiate on the
Accept header, so the tutorial never claims it does.

Verified end-to-end against the docs content through the integration
fixture: the page renders, all xrefs resolve, nested code fences survive,
and the page serves its own co-located Markdown copy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
LlmsOnly routes (*.llms.md) are documented as "agent-only content with no
HTML page" — emitted to llms.txt and its sidecar markdown, excluded from
nav, sitemap, search, and the static build. But PageResolver, the single
entry point for HTTP page serving, didn't check the source type: it parsed
and rendered LlmsOnlySource items like any markdown page, so requesting one
returned HTML and leaked agent-only content to humans (and, in the docs,
let an index.llms.md shadow the marketing landing page).

Decline LlmsOnlySource matches in PageResolver.ResolveAsync so the request
404s. Using `continue` rather than returning lets a real HTML page from
another service at the same slug still win. llms.txt is unaffected: the site
projection renders llms-only items in-process and never depends on HTTP
serving them.

- PageResolver: skip LlmsOnlySource matches.
- Unit tests: llms-only resolves to null; a real page at the same slug still
  wins.
- Integration test: the docs' migrating-via-ai.llms.md 404s as HTML while its
  /index.md markdown copy still serves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
@github-actions

Copy link
Copy Markdown

🛰️ Docs preview: https://pr-55.pennington-dev.pages.dev

Rebuilt on every push to this PR; torn down when it closes.

With llms-only routes no longer served as HTML, the docs home no longer needs
the MapGet workaround that existed only to avoid index.llms.md hijacking the
marketing landing page. Move the machine-readable home into a plain content
file: Content/index.llms.md produces the /index.md sidecar through the normal
pipeline (front-matter header, content hash, token estimate), and the landing
page at "/" stays the marketing splash.

- Add Content/index.llms.md; delete AgentHomeMarkdown.cs and the /index.md
  MapGet + WithLlmsTxtEntry wiring (and the now-unused using).
- Index.razor keeps the <link rel="alternate" type="text/markdown"> pointing
  at /index.md.
- Update the markdown-for-agents tutorial's "Razor landing twin" section and
  summary to teach the index.llms.md approach the project now uses.
- Home integration test asserts the content-pipeline output (front-matter
  header + converted body, marketing splash absent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012QXe2oVrrTojA3XynJzcsn
@phil-scott-78 phil-scott-78 force-pushed the claude/dotnet-docs-cache-hosting-ywaso0 branch from 48a729f to 7331daf Compare June 24, 2026 13:56
…oudflare

Add a Cloudflare Pages advanced-mode _worker.js that serves a page's co-located {route}/index.md when the client sends `Accept: text/markdown` (e.g. Claude Code WebFetch), falling back to HTML otherwise. Pages with `llms: false` (no twin) fall through to HTML. Page responses gain `Vary: Accept` so caches keep the two representations separate.

The build wipes output/, so the worker source lives at docs/cloudflare/_worker.js and the deploy workflow copies it into output/_worker.js after minify, before `pages deploy`.
@phil-scott-78 phil-scott-78 merged commit c4be6f3 into main Jun 24, 2026
4 checks passed
@phil-scott-78 phil-scott-78 deleted the claude/dotnet-docs-cache-hosting-ywaso0 branch June 24, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants