Open
Description
Description of the bug
pymupdf/__init__.py in ?(tessdata)
17818 # Unix-like systems:
17819 cp = subprocess.run("whereis tesseract-ocr", shell=1, capture_output=1, check=0, text=True)
17820 response = cp.stdout.strip().split()
17821 if cp.returncode or len(response) != 2: # if not 2 tokens: no tesseract-ocr
> 17822 raise RuntimeError("No tessdata specified and Tesseract is not installed")
17823
17824 # search tessdata in folder structure
17825 dirname = response[1] # contains tesseract-ocr installation folder
RuntimeError: No tessdata specified and Tesseract is not installed
How to reproduce the bug
PyMuPDF installation command:
uv add pymupdf
Issue:
for page in doc:
textPage = page.get_textpage_ocr()
print(textPage.extract_text())
On running the above script, I am getting the error
I can see that on MacOS, tesseract is installed using brew install tesseract
and has no package for tesseract-ocr
Tesseract Installation Proof:
tesseract: /opt/homebrew/bin/tesseract
tesseract-ocr:
PyMuPDF version
1.26.1
Operating system
MacOS
Python version
3.12
Metadata
Metadata
Assignees
Labels
No labels