Change contexts to use ids rather than text #990

mskarlin · 2025-07-03T21:24:21Z

Gives each context a deterministically generated ID, which is then used for references.

This replaces the need for complex citation string formatting because the IDs are simple and easy to identify via regexp. The answer which uses the context ids rather than the text names is added into a new Session attribute, raw_answer, this can be used to find the exact context corresponding to each key. answer and formatted_answer are left as they were before, but rather than relying on the LLM to populate the citation strings, we deterministically substitute the text name for each citation ID, after de-duping.

Before this change, if gather_evidence was run twice with different questions, we had no way to map a citation to the correct summary.

This PR also fixes a latent bug I found in the tests where auto-generated dockeys weren't able to be overridden with the metadata based dockey if using aadd. This only showed in the tests because I needed to regenerate the cassette for a few tests.

Copilot

Pull Request Overview

This PR refactors context citations to use deterministic IDs instead of text names, adds a raw_answer field to store unformatted LLM output, and updates answer formatting to substitute IDs with context names and build a reference list.

Add an auto-populated Context.id field and helper to extract citation IDs.
Change aquery and format_answers to work with IDs, storing both raw_answer and formatted_answer.
Update prompts, settings, and tests to use and validate new citation IDs.

Reviewed Changes

Copilot reviewed 6 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_paperqa.py	Rename test param, add assertions for citation IDs in answers
paperqa/utils.py	Add `get_citation_keys`, deprecate old `get_citenames`
paperqa/types.py	Introduce `Context.id` with a pre-validator for auto-generation
paperqa/settings.py	Update example citation to new ID format
paperqa/prompts.py	Adjust citation examples and prompt text for ID usage
paperqa/docs.py	Replace context-name references with IDs; implement `format_answers`

Comments suppressed due to low confidence (2)

paperqa/utils.py:159

[nitpick] Instead of a comment, consider marking get_citenames as deprecated (for example, emitting a warning or using a @deprecated decorator) to clearly signal to maintainers that it should no longer be used.

# no longer used, but kept for backwards compatibility

paperqa/types.py:135

The decorator @model_validator is used but not imported from pydantic, which will cause a NameError. Please add from pydantic import model_validator at the top of this file.

    @model_validator(mode="before")

paperqa/docs.py

tests/test_paperqa.py

…er-qa into use-context-ids

paperqa/docs.py

paperqa/types.py

paperqa/utils.py

jamesbraza · 2025-07-03T22:49:17Z

paperqa/prompts.py

-    "- Example2024Example et al. (2024) \n"
-    "- Example's work (pages 17–19) \n"  # noqa: RUF001
-    "- (pages 17–19) \n"  # noqa: RUF001
+    "- (pqac-d79ef6fa and pqac-0f650d59) \n"


Can you review in the README.md or docs/ if there's anything we need to update?

yea i'll give it a once over -- for the end user, they shouldn't notice this change using our standard objects.

…er-qa into use-context-ids

change contexts to use ids rather than text

9afdc22

mskarlin requested review from whitead, jamesbraza and Copilot July 3, 2025 21:24

Merge branch 'main' into use-context-ids

6982628

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working enhancement New feature or request labels Jul 3, 2025

Copilot AI reviewed Jul 3, 2025

View reviewed changes

paperqa/docs.py Outdated Show resolved Hide resolved

paperqa/docs.py Outdated Show resolved Hide resolved

tests/test_paperqa.py Show resolved Hide resolved

mskarlin and others added 3 commits July 3, 2025 14:36

fix typo and support init=False for id attr in Context

4970549

Merge branch 'use-context-ids' of https://github.com/Future-House/pap…

4201e4b

…er-qa into use-context-ids

Merge branch 'main' into use-context-ids

5a0a237

jamesbraza reviewed Jul 3, 2025

View reviewed changes

whitead reviewed Jul 3, 2025

View reviewed changes

paperqa/utils.py Outdated Show resolved Hide resolved

jamesbraza reviewed Jul 3, 2025

View reviewed changes

mskarlin and others added 6 commits July 3, 2025 15:55

address style comments

40cc56a

Merge branch 'use-context-ids' of https://github.com/Future-House/pap…

77f22af

…er-qa into use-context-ids

Merge branch 'main' into use-context-ids

96a8984

replace example context key in readme

380b651

Merge branch 'use-context-ids' of https://github.com/Future-House/pap…

0f24f68

…er-qa into use-context-ids

rename to citation_ids

aa594cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change contexts to use ids rather than text #990

Change contexts to use ids rather than text #990

Uh oh!

mskarlin commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jamesbraza Jul 3, 2025

Uh oh!

mskarlin Jul 3, 2025

Uh oh!

Uh oh!

Change contexts to use ids rather than text #990

Are you sure you want to change the base?

Change contexts to use ids rather than text #990

Uh oh!

Conversation

mskarlin commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jamesbraza Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

mskarlin Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!