Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #89

Open
OliverOffing opened this issue Apr 29, 2023 · 4 comments
Open

IndexError: list index out of range #89

OliverOffing opened this issue Apr 29, 2023 · 4 comments

Comments

@OliverOffing
Copy link

I'm trying to add 250 documents but I'm hitting this error. It seems that this error shows up when trying to add >100 documents. Is that a hard limit I'm hitting or could there be something else, like perhaps one of the files if faulty?

Traceback (most recent call last):
  File "/app/main.py", line 24, in <module>
    docs.add(d)
  File "/usr/local/lib/python3.10/site-packages/paperqa/docs.py", line 111, in add
    citation = self.cite_chain.run(texts[0])
IndexError: list index out of range
@OliverOffing
Copy link
Author

OliverOffing commented Apr 29, 2023

Actually, the problem was due to a document being too small.

Contents of the file that caused the problem:

# FAQs

Source:

texts, _ = read_doc(path, "", "", chunk_chars=chunk_chars)

We should do either one of these:

  • Improve the error message so that when we can't process a file, the file name specific is presented to the user, or
  • outright ignore the file and issue a warning message telling that that file is being ignored

@amittos
Copy link

amittos commented May 5, 2023

Actually, I have the same issue only my document is definitely not too small. I believe that it's too large.

Is there an intuitive way to handle this error?

@OliverOffing
Copy link
Author

I'm doing this until we find a fix:

    for d in my_docs:
        try:
            docs.add(d)
        except Exception as e:
            print('Error adding %s: %s' % (d, e))

@whitead
Copy link
Owner

whitead commented Jun 14, 2023

@OliverOffing's solution is the preferred solution. You add them, get an exception if it fails, and you decide how to deal with the exception. You can skip and continue or try to figure out what is wrong yourself. The best we have is checks to see if it looks like a document.

The specific error above, about the index error for very short documents, has been fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants