Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: text/html content not found in email #290

Closed
peterchanws opened this issue May 18, 2023 · 6 comments
Closed

ValueError: text/html content not found in email #290

peterchanws opened this issue May 18, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@peterchanws
Copy link

Describe the bug and how to reproduce it
I put ~4000 eml files in source_document folder. Run ingest.py and got:

File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 78, in main
documents = load_documents(source_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 65, in load_documents
return [load_single_document(file_path) for file_path in all_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 65, in
return [load_single_document(file_path) for file_path in all_files]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/Desktop/privateGPT/ingest.py", line 53, in load_single_document
return loader.load()[0]
^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/langchain/document_loaders/email.py", line 24, in _get_elements
return partition_email(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/miniconda3/lib/python3.11/site-packages/unstructured/partition/email.py", line 249, in partition_email
raise ValueError(f"{content_source} content not found in email")
ValueError: text/html content not found in email

Environment (please complete the following information):

  • OS / hardware: macOS 12.6.5 / Intel Xeon E5]
  • Python version [3.11.3]
@peterchanws peterchanws added the bug Something isn't working label May 18, 2023
@pseudotensor
Copy link

pseudotensor commented May 18, 2023

We solved that already in h2oGPT. See https://github.com/h2oai/h2ogpt

i.e. https://github.com/h2oai/h2ogpt/blob/main/gpt_langchain.py#L364-L375

Basically the default mode for email loader is to assume html, it doesn't auto-detect. The other option is text/plain.

@peterchanws
Copy link
Author

How can I add it to privateGPT?

@pseudotensor
Copy link

How can I add it to privateGPT?

You can make a PR to add the same kind of code I shared above. I gave link to the code itself.

@peterchanws
Copy link
Author

Thanks. I am not a coder. I will find how to make a PR to add your code.

@pseudotensor
Copy link

Then ask one of the devs here to do it.

@peterchanws
Copy link
Author

Done. #294
Thanks again.

imartinez added a commit that referenced this issue May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants