Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.15.3,When upload documents to the knowledge base through the document api create_by_file, there may be situations where the documents are available without being indexed #15161

Open
5 tasks done
ccooder opened this issue Mar 7, 2025 · 3 comments
Labels
🐞 bug Something isn't working pendding Need more information to proceed

Comments

@ccooder
Copy link

ccooder commented Mar 7, 2025

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Just use document's api create_by_file
At first, it's worked, but then the situation shown in the title will appear

✔️ Expected Behavior

The uploaded file will Indexing and avavilable

❌ Actual Behavior

Image
Although the status is available, the file does not have any segmentation because it becomes available immediately after the interface is called
Image

@dosubot dosubot bot added the 🐞 bug Something isn't working label Mar 7, 2025
Copy link

dosubot bot commented Mar 7, 2025

I found two similar unsolved discussions that might be relevant to your issue:

  1. Knowledge base file upload text segmentation and cleaning error [1].
  2. Unable to automatically perform embedding processing after uploading documents using the API? [2].

Regarding your issue, there are known limitations with the create_by_file API in version 0.15.3 that might be related to your problem. The API supports only single file uploads, and files must be 15MB or smaller. Additionally, the self-hosted Docker version does not support PPTX files, even though the official API does [3].

The Dify platform determines when a document is fully indexed by checking the document's indexing status. You can verify the indexing status using the following API call:

curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'

This will provide the current status of the document indexing process. The possible indexing_status values include waiting, parsing, cleaning, splitting, indexing, paused, error, and completed [4][5].

If a document is marked as available but not indexed, it might be due to an error in the indexing process. Specific error messages such as document_indexing, unsupported_file_type, or file_too_large can indicate issues [4][6].

You might want to check the indexing status and error messages to diagnose the issue further.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@crazywoola crazywoola added the pendding Need more information to proceed label Mar 7, 2025
@ccooder
Copy link
Author

ccooder commented Mar 7, 2025

Does a self-hosted service also have limitations?

@langgenius langgenius deleted a comment from dosubot bot Mar 11, 2025
@crazywoola
Copy link
Member

crazywoola commented Mar 11, 2025

Does a self-hosted service also have limitations?

No, there are some bugs in this process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working pendding Need more information to proceed
Projects
None yet
Development

No branches or pull requests

2 participants