Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Extraction: pdf --> txt #14

Open
grahamsack opened this issue Feb 28, 2014 · 8 comments
Open

Text Extraction: pdf --> txt #14

grahamsack opened this issue Feb 28, 2014 · 8 comments

Comments

@grahamsack
Copy link
Contributor

There are a few pre-existing python packages for this...

  • pypdf
  • slate
  • pdfminer
@grahamsack grahamsack self-assigned this Feb 28, 2014
@jonahsmith
Copy link
Contributor

FYI, I can't get Slate to work either. I might be missing something, but here is the error I'm getting:

  File "slateTest.py", line 1, in <module>
    import slate
  File "/Library/Python/2.7/site-packages/slate/__init__.py", line 48, in <module>
    from slate import PDF
  File "/Library/Python/2.7/site-packages/slate/slate.py", line 3, in <module>
    from pdfminer.pdfparser import PDFParser, PDFDocument
ImportError: cannot import name PDFDocument

Looks like there's a problem calling something in pdfminer? Graham, is this the issue you were having yesterday?

@grahamsack
Copy link
Contributor Author

Yes. Same issue. I'm using pdfminer from command line now

Sent from my iPhone

On Mar 1, 2014, at 12:38 PM, jonahsmith notifications@github.com wrote:

FYI, I can't get Slate to work either. I might be missing something, but here is the error I'm getting:

File "slateTest.py", line 1, in
import slate
File "/Library/Python/2.7/site-packages/slate/init.py", line 48, in
from slate import PDF
File "/Library/Python/2.7/site-packages/slate/slate.py", line 3, in
from pdfminer.pdfparser import PDFParser, PDFDocument
ImportError: cannot import name PDFDocument
Looks like there's a problem calling something in pdfminer? Graham, is this the issue you were having yesterday?


Reply to this email directly or view it on GitHub.

@astreylabs astreylabs assigned astreylabs and unassigned grahamsack Mar 16, 2014
@aburkh
Copy link

aburkh commented Jun 11, 2014

The problem is that slate tries to import PDFDocument from pdfminer.pdfparser.
The correct module is pdfminer.pdfdocument.

@daryltucker
Copy link

I still see this issue.

I was able to sudo pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515, which are compatible versions.

The slate devs are aware.

@KurtOstergaard
Copy link

I tried the slate==0.3 and pdfminer==20110515 line and I still get an error.
Any other workarounds?

@tobiasmcnulty
Copy link

Works with slate==0.3 and pdfminer=20110515 for me

@tobiasmcnulty
Copy link

If you're inside a virtualenv make sure not to use sudo

@arderyp
Copy link

arderyp commented Mar 10, 2016

@tobiasmcnulty's suggestion works for me too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants