Skip to content

feat(langchain): emit source-url/source-document parts from citations#15757

Merged
lgrammel merged 2 commits into
vercel:mainfrom
dflynn15:df/langchain-source-parts
Jun 5, 2026
Merged

feat(langchain): emit source-url/source-document parts from citations#15757
lgrammel merged 2 commits into
vercel:mainfrom
dflynn15:df/langchain-source-parts

Conversation

@dflynn15
Copy link
Copy Markdown
Contributor

@dflynn15 dflynn15 commented Jun 2, 2026

Summary

The @ai-sdk/langchain adapter previously dropped LangChain citation annotations entirely. Models that return citations through LangChain (web search, RAG, etc.) produced no source information in a useChat UI, even though the same models surface source-url parts when used through native AI SDK providers.

This change maps LangChain citation annotations to spec-compliant source-url / source-document UI message stream parts.

  • LangChain Core standardizes citations as { type: 'citation', url?, title?, source?, citedText?, startIndex?, endIndex? } entries in the annotations array of a text content block. The adapter's text extractors only read .text, so these were silently discarded.
  • Citations with a url become source-url parts (keyed by url); url-less citations with a title/source become source-document parts. Url-less citations with no human-readable label are skipped rather than emitted as a placeholder.
  • Wired into all three stream paths: direct model streams (processModelChunk), streamEvents (on_chat_model_stream), and LangGraph (messages + values).
  • Citation metadata that the source parts can't represent natively (citedText, startIndex, endIndex, source) is preserved under providerMetadata.langchain, so the projection is lossless.
  • Sources are deduped by sourceId. Url-less citations are keyed by content (not position) so that differing citation subsets/orderings between the messages and values events don't cause id collisions.

Test plan

  • Unit tests for extractCitationsFromContentBlocks (array content, serialized kwargs messages, empty annotations, plain-string content).
  • Unit tests for emitSourceChunks: source-url, source-document, providerMetadata round-trip + omission, dedupe across calls, url-less id collision regression, and skipping label-less citations.
  • Integration tests across the model, streamEvents, and LangGraph (messages + values dedupe) paths.
  • pnpm --filter @ai-sdk/langchain test (node + edge) — 208 passing.
  • tsc --noEmit clean for the package.

Notes

  • patch changeset included.
  • No public API changes; this is additive behavior in the adapter.

Made with Cursor

LangChain citation annotations on text content blocks (e.g. from web
search or RAG) were previously dropped instead of being surfaced to the
UI. They are now mapped to spec-compliant AI SDK source-url /
source-document UI message stream parts across the model, streamEvents,
and LangGraph (messages + values) paths.

Citation metadata that the source parts cannot represent natively
(citedText, startIndex, endIndex, source) is preserved under
providerMetadata.langchain so the projection is lossless. Sources are
deduped by sourceId; url-less citations are keyed by content rather than
position to avoid collisions when the messages and values events surface
differing citation subsets.

Co-authored-by: Cursor <cursoragent@cursor.com>
@dflynn15 dflynn15 marked this pull request as ready for review June 3, 2026 16:46
Copy link
Copy Markdown
Contributor

@christian-bromann christian-bromann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lgrammel
Copy link
Copy Markdown
Collaborator

lgrammel commented Jun 5, 2026

@dflynn15 please sign the commits, see https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits (otherwise we cannot merge)

@dflynn15
Copy link
Copy Markdown
Contributor Author

dflynn15 commented Jun 5, 2026

Thanks for the flag, all fixed up, apologies for that @lgrammel!

@lgrammel lgrammel merged commit c231e42 into vercel:main Jun 5, 2026
48 of 49 checks passed
@lgrammel lgrammel added the backport Admins only: add this label to a pull request in order to backport it to the prior version label Jun 5, 2026
github-actions Bot added a commit that referenced this pull request Jun 5, 2026
@github-actions github-actions Bot removed the backport Admins only: add this label to a pull request in order to backport it to the prior version label Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

⚠️ Backport to release-v6.0 created but has conflicts: #15848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants