Skip to content

[Bug]: Automatic Scan not including existing tags in AI system prompt #600

Closed
@DarkPhyber-hg

Description

@DarkPhyber-hg

🔍 Bug Summary

An automatic scan is sending over [object] in the tag list, but a manual scan sends the proper list of existing tags

📖 Description

I noticed that when doing manual scans i was consistently getting different results than the "scan now" button on the same documents. I looked at the api request going to the LLM and noticed a difference, the existing paperless-ngx tags are not being transmitted when hitting scan now.

🔄 Steps to Reproduce

i did a fresh install of paperless-ngx and paperless-ai. analyze a document manually, and analzye a document with the scan now button while looking at the api calls. I am running a local llm, which makes looking at the api calls fairly easy.

✅ Expected Behavior

the tag list should be transmitted during an automated scan and a manual scan

❌ Actual Behavior

the tag list is not being transmitted to the ai api during an automated scan

🏷️ Paperless-AI Version

3.0.6

📜 Docker Logs

#relevant portion of API request being transmitted during a manual scan

  "messages": [
    {
      "role": "system",
      "content": "\n        Prexisting tags: Health/Prescription, inbox, Taxes/2024, Taxes/2025, Taxes/HSA\n\n\n        Prexisiting correspondent: \n\n\n        \n\n\n  Return the result EXCLUSIVELY as a JSON object. The Tags, Title and Document_Type MUST be in the language that is used in the document.:\n  IMPORTANT: The custom_fields are optional and can be left out if not needed, only try to fill out the values if you find a matching information in the document.\n  Do not change the value of field_name, only fill out the values. If the field is about money only add the number without currency and always use a . for decimal places.\n  {\n    \"title\": \"xxxxx\",\n    \"correspondent\": \"xxxxxxxx\",\n    \"tags\": [\"Tag1\", \"Tag2\", \"Tag3\", \"Tag4\"],\n    \"document_type\": \"Invoice/Contract/...\",\n    \"document_date\": \"YYYY-MM-DD\",\n    \"language\": \"en/de/es/...\",\n    \"custom_fields\":     {\n      \"0\": {\n        \"field_name\": \"Total\",\n        \"value\": \"Fill in the value based on your analysis\"\n      }\n    }\n  }"
    },



#Relevant portion of the API requwest being transmitted during the "scan now" button

  "messages": [
    {
      "role": "system",
      "content": "\n        Prexisting tags: [object Object], [object Object], [object Object], [object Object], [object Object], [object Object]\n\n\n        Prexisiting correspondent: CVS Pharmacy #8617\n\n\n        \n\n\n  Return the result EXCLUSIVELY as a JSON object. The Tags, Title and Document_Type MUST be in the language that is used in the document.:\n  IMPORTANT: The custom_fields are optional and can be left out if not needed, only try to fill out the values if you find a matching information in the document.\n  Do not change the value of field_name, only fill out the values. If the field is about money only add the number without currency and always use a . for decimal places.\n  {\n    \"title\": \"xxxxx\",\n    \"correspondent\": \"xxxxxxxx\",\n    \"tags\": [\"Tag1\", \"Tag2\", \"Tag3\", \"Tag4\"],\n    \"document_type\": \"Invoice/Contract/...\",\n    \"document_date\": \"YYYY-MM-DD\",\n    \"language\": \"en/de/es/...\",\n    \"custom_fields\":     {\n      \"0\": {\n        \"field_name\": \"Total\",\n        \"value\": \"Fill in the value based on your analysis\"\n      }\n    }\n  }"
    },

📜 Paperless-ngx Logs

🖼️ Screenshots of your settings page

No response

🖥️ Desktop Environment

macOS

💻 OS Version

macos 16

🌐 Browser

None

🔢 Browser Version

No response

🌐 Mobile Browser

No response

📝 Additional Information

  • I have checked existing issues and this is not a duplicate
  • I have tried debugging this issue on my own
  • I can provide a fix and submit a PR
  • I am sure that this problem is affecting everyone, not only me
  • I have provided all required information above

📌 Extra Notes

I'm not a programmer, but i can kinda look through code. I found some debug settings commented out in customService.js, lines 166-177, i removed the comments to see if i got any extra info. Something about a manual scan vs the scan now button is getting/processing the existing tags differently.

Manual Scan
paperless-ai  | [DEBUG] System prompt: 
paperless-ai  |         Prexisting tags: ai-processed, inbox, Medication, Metoprolol Tartrate, Pharmacy, Prescription, Refill, Refill Information
paperless-ai  |         Prexisiting correspondent: CVS Pharmacy #8617
paperless-ai  |         
paperless-ai  |   Return the result EXCLUSIVELY as a JSON object. The Tags, Title and Document_Type MUST be in the language that is used in the document.:
paperless-ai  |   IMPORTANT: The custom_fields are optional and can be left out if not needed, only try to fill out the values if you find a matching information in the document.
paperless-ai  |   Do not change the value of field_name, only fill out the values. If the field is about money only add the number without currency and always use a . for decimal places.
paperless-ai  |   {
paperless-ai  |     "title": "xxxxx",
paperless-ai  |     "correspondent": "xxxxxxxx",
paperless-ai  |     "tags": ["Tag1", "Tag2", "Tag3", "Tag4"],
paperless-ai  |     "document_type": "Invoice/Contract/...",
paperless-ai  |     "document_date": "YYYY-MM-DD",
paperless-ai  |     "language": "en/de/es/...",
paperless-ai  |     "custom_fields":     {
paperless-ai  |       "0": {
paperless-ai  |         "field_name": "Total",
paperless-ai  |         "value": "Fill in the value based on your analysis"
paperless-ai  |       }
paperless-ai  |     }
paperless-ai  |   }
paperless-ai  | [DEBUG] Prompt tags: 
paperless-ai  | [DEBUG] Model: mlx-community/QwQ-32B-bf16
paperless-ai  | [DEBUG] Custom fields: "custom_fields":     {
paperless-ai  |       "0": {
paperless-ai  |         "field_name": "Total",
paperless-ai  |         "value": "Fill in the value based on your analysis"
paperless-ai  |       }
paperless-ai  |     }
paperless-ai  | [DEBUG] Existing tags: ai-processed, inbox, Medication, Metoprolol Tartrate, Pharmacy, Prescription, Refill, Refill Information
paperless-ai  | [DEBUG] Existing correspondents: CVS Pharmacy #8617
paperless-ai  | [DEBUG] Custom prompt: null
paperless-ai  | [DEBUG] External API data: null
paperless-ai  | ######################################################################

Scan Button

paperless-ai  | [DEBUG] System prompt: 
paperless-ai  |         Prexisting tags: [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object]
paperless-ai  |         Prexisiting correspondent: CVS Pharmacy #8617
paperless-ai  |         
paperless-ai  |   Return the result EXCLUSIVELY as a JSON object. The Tags, Title and Document_Type MUST be in the language that is used in the document.:
paperless-ai  |   IMPORTANT: The custom_fields are optional and can be left out if not needed, only try to fill out the values if you find a matching information in the document.
paperless-ai  |   Do not change the value of field_name, only fill out the values. If the field is about money only add the number without currency and always use a . for decimal places.
paperless-ai  |   {
paperless-ai  |     "title": "xxxxx",
paperless-ai  |     "correspondent": "xxxxxxxx",
paperless-ai  |     "tags": ["Tag1", "Tag2", "Tag3", "Tag4"],
paperless-ai  |     "document_type": "Invoice/Contract/...",
paperless-ai  |     "document_date": "YYYY-MM-DD",
paperless-ai  |     "language": "en/de/es/...",
paperless-ai  |     "custom_fields":     {
paperless-ai  |       "0": {
paperless-ai  |         "field_name": "Total",
paperless-ai  |         "value": "Fill in the value based on your analysis"
paperless-ai  |       }
paperless-ai  |     }
paperless-ai  |   }
paperless-ai  | [DEBUG] Prompt tags: 
paperless-ai  | [DEBUG] Model: mlx-community/QwQ-32B-bf16
paperless-ai  | [DEBUG] Custom fields: "custom_fields":     {
paperless-ai  |       "0": {
paperless-ai  |         "field_name": "Total",
paperless-ai  |         "value": "Fill in the value based on your analysis"
paperless-ai  |       }
paperless-ai  |     }
paperless-ai  | [DEBUG] Existing tags: [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object], [object Object]
paperless-ai  | [DEBUG] Existing correspondents: CVS Pharmacy #8617
paperless-ai  | [DEBUG] Custom prompt: null
paperless-ai  | [DEBUG] External API data: null
paperless-ai  | ######################################################################

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions