Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Commit

Permalink
Merge pull request #421 from ddddavidmartin/clarify_forgiving_ocr_han…
Browse files Browse the repository at this point in the history
…dling

Clarify forgiving ocr handling
  • Loading branch information
danielquinn committed Oct 8, 2018
2 parents 8dc355a + 818780a commit bd95804
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
5 changes: 5 additions & 0 deletions paperless.conf.example
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,11 @@ PAPERLESS_DEBUG="false"
#PAPERLESS_CONSUMER_LOOP_TIME=10


# By default Paperless stops consuming a document if no language can be detected.
# Set to true to consume documents even if the language detection fails.
#PAPERLESS_FORGIVING_OCR="false"


###############################################################################
#### Interface ####
###############################################################################
Expand Down
5 changes: 4 additions & 1 deletion src/paperless_tesseract/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,10 @@ def _get_ocr(self, imgs):
)
raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
return raw_text
raise OCRError("Language detection failed")
error_msg = ("Language detection failed. Set "
"PAPERLESS_FORGIVING_OCR in config file to continue "
"anyway.")
raise OCRError(error_msg)

if ISO639[guessed_language] == self.DEFAULT_OCR_LANGUAGE:
raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
Expand Down

0 comments on commit bd95804

Please sign in to comment.