Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command-line-interface #27

Merged
merged 4 commits into from
Apr 20, 2020
Merged

command-line-interface #27

merged 4 commits into from
Apr 20, 2020

Conversation

naustica
Copy link
Member

@naustica naustica commented Apr 18, 2020

  • Command-line interface
  • Unit tests for cli
  • Functions to download and view PDFs

@naustica naustica requested a review from bganglia April 18, 2020 18:10
@naustica naustica changed the base branch from master to develop April 18, 2020 18:11
@bganglia bganglia marked this pull request as ready for review April 18, 2020 19:15
Copy link
Collaborator

@bganglia bganglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, but the open command does not work on my system (Ubuntu). xdg-open does work though.

Apparently you can find out the operating system from within python, but I don't know if that is useful.
https://stackoverflow.com/questions/1854/what-os-am-i-running-on/58071295#58071295

I am not sure how to do this in Windows though.

It might also be good to use '.' as a default for the path argument.

@naustica
Copy link
Member Author

thanks, I will fix that. You mean the open command on line 301, right?

@naustica
Copy link
Member Author

Also, is there an alternative way to express urllib.request.urlopen(pdf_link) with the requests library?

@bganglia
Copy link
Collaborator

Yes, I guess you could put the response text into a BytesIO object. Something like

from io import BytesIO
response = requests.get("http://google.com")
BytesIO(bytearray(response.text,encoding="utf-8"))

@naustica naustica marked this pull request as draft April 19, 2020 08:22
@naustica
Copy link
Member Author

@bganglia Thanks. Did you test the BytesIO variant with your pdfminer?

@bganglia
Copy link
Collaborator

@naustica Not yet. Let me try that now.

@bganglia
Copy link
Collaborator

bganglia commented Apr 19, 2020

@naustica It looks like it works the same as urllib. For example this one works:

>>> minecart.Document(io.BytesIO(bytearray(requests.get("https://www.jns-journal.com/article/S0022-510X(20)30168-4/pdf").text,encoding="utf-8")))
WARNING:root:Cannot locate objid=215
<minecart.miner.Document object at 0x7f1a71833110>

But now that I look at it more closely, sometimes both urllib and requests fail because a link like https://doi.org/10.1016/j.jns.2020.116832 is just some HTML that redirects the browser to a PDF. So I will look for a more robust way of downloading the PDF. Selenium would be overkill, so there must be some simpler way

@naustica
Copy link
Member Author

@bganglia That looks interesting. Let me hear when you find out something new.

@naustica naustica marked this pull request as ready for review April 20, 2020 15:31
@naustica
Copy link
Member Author

@bganglia Can you review my code again? I would then merge the code into the develop branch.

@bganglia
Copy link
Collaborator

@naustica I am taking a look at it right now.

If no PDF is available, get_pdf_link returns None, so view_pdf should check for that. However, if you want, we could just make this an issue and address it later.

Everything else works great on Linux. I can try it on Windows too.

@naustica
Copy link
Member Author

@bganglia I would tackle the issue when Im writing the remaining tests for the unpywall class if thats ok for you.

@bganglia
Copy link
Collaborator

@naustica Ok, sounds good. I would say that it looks great to merge then

@bganglia bganglia merged commit 0472326 into develop Apr 20, 2020
@naustica naustica deleted the command-line-interface branch April 21, 2020 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants