Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while indexing PDF files #140

Closed
markustobler opened this issue Dec 28, 2022 · 3 comments
Closed

Error while indexing PDF files #140

markustobler opened this issue Dec 28, 2022 · 3 comments

Comments

@markustobler
Copy link

I got the following errors in my cronjob:
Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean) Syntax Error: Invalid object stream Syntax Error: Invalid object stream Syntax Error: Invalid object stream Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean) Syntax Error: Marked object is wrong type (boolean)

These errors are coming from pdftotext which apperantly has a problem reading the PDF files. At least the error handling could be improved so that in the ke_search log it is shown which files are problematic.

An example pdf file which causes an error is attached to the issue.

nvs-seminarprogramm-2023-1.pdf

@christianbltr
Copy link
Member

This file does not cause problems in my test environment. I tested it with pdftotext version 20.09.0 and 0.86.1.

Which version of pdftotext are you using?

@markustobler
Copy link
Author

markustobler commented Jan 6, 2023

I'am using "tpwd/ke_search": "^4.5". The error occurs only on the automatic cronjob. If I start the scheduler task manually there is no error. I have no idea how I could debug this? Somebody in the TYPO3 Slack channel told me that this error could be a problem with pdftotext. It could also be something else.

christianbltr added a commit that referenced this issue Jan 6, 2023
Errors from pdftotext and pdfinfo are
now logged to the ke_search error log
in order to make it easier to find the
problematic files.
@christianbltr
Copy link
Member

Errors from pdftotext and pdfinfo will now be logged to the ke_search error log. That should make it at least easier to find the problematic files.
The patch is in the current master and will be in version 4.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants