command-line-interface #27

naustica · 2020-04-18T18:10:25Z

Command-line interface
Unit tests for cli
Functions to download and view PDFs

bganglia

It looks good to me, but the open command does not work on my system (Ubuntu). xdg-open does work though.

Apparently you can find out the operating system from within python, but I don't know if that is useful.
https://stackoverflow.com/questions/1854/what-os-am-i-running-on/58071295#58071295

I am not sure how to do this in Windows though.

It might also be good to use '.' as a default for the path argument.

naustica · 2020-04-18T19:47:20Z

thanks, I will fix that. You mean the open command on line 301, right?

naustica · 2020-04-18T19:49:14Z

Also, is there an alternative way to express urllib.request.urlopen(pdf_link) with the requests library?

bganglia · 2020-04-18T20:21:06Z

Yes, I guess you could put the response text into a BytesIO object. Something like

from io import BytesIO
response = requests.get("http://google.com")
BytesIO(bytearray(response.text,encoding="utf-8"))

naustica · 2020-04-19T16:30:17Z

@bganglia Thanks. Did you test the BytesIO variant with your pdfminer?

bganglia · 2020-04-19T17:09:31Z

@naustica Not yet. Let me try that now.

bganglia · 2020-04-19T17:50:57Z

@naustica It looks like it works the same as urllib. For example this one works:

>>> minecart.Document(io.BytesIO(bytearray(requests.get("https://www.jns-journal.com/article/S0022-510X(20)30168-4/pdf").text,encoding="utf-8")))
WARNING:root:Cannot locate objid=215
<minecart.miner.Document object at 0x7f1a71833110>

But now that I look at it more closely, sometimes both urllib and requests fail because a link like https://doi.org/10.1016/j.jns.2020.116832 is just some HTML that redirects the browser to a PDF. So I will look for a more robust way of downloading the PDF. Selenium would be overkill, so there must be some simpler way

naustica · 2020-04-19T22:22:38Z

@bganglia That looks interesting. Let me hear when you find out something new.

naustica · 2020-04-20T15:34:57Z

@bganglia Can you review my code again? I would then merge the code into the develop branch.

bganglia · 2020-04-20T17:52:43Z

@naustica I am taking a look at it right now.

If no PDF is available, get_pdf_link returns None, so view_pdf should check for that. However, if you want, we could just make this an issue and address it later.

Everything else works great on Linux. I can try it on Windows too.

naustica · 2020-04-20T19:03:04Z

@bganglia I would tackle the issue when Im writing the remaining tests for the unpywall class if thats ok for you.

bganglia · 2020-04-20T19:29:05Z

@naustica Ok, sounds good. I would say that it looks great to merge then

bug fixes, add function to view and download pdfs, clean code

4c43419

naustica requested a review from bganglia April 18, 2020 18:10

naustica changed the base branch from master to develop April 18, 2020 18:11

bganglia marked this pull request as ready for review April 18, 2020 19:15

bganglia reviewed Apr 18, 2020

View reviewed changes

naustica marked this pull request as draft April 19, 2020 08:22

fix bugs, rewrites cli

dd5a581

bganglia mentioned this pull request Apr 19, 2020

Support link to HTML to PDF #29

Open

naustica added 2 commits April 20, 2020 17:24

add tests for cli

f548918

replace urlib with requests

666a9c3

naustica marked this pull request as ready for review April 20, 2020 15:31

bganglia merged commit 0472326 into develop Apr 20, 2020

naustica deleted the command-line-interface branch April 21, 2020 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

command-line-interface #27

command-line-interface #27

naustica commented Apr 18, 2020 •

edited

Loading

bganglia left a comment

naustica commented Apr 18, 2020

naustica commented Apr 18, 2020

bganglia commented Apr 18, 2020

naustica commented Apr 19, 2020

bganglia commented Apr 19, 2020

bganglia commented Apr 19, 2020 •

edited

Loading

naustica commented Apr 19, 2020

naustica commented Apr 20, 2020

bganglia commented Apr 20, 2020

naustica commented Apr 20, 2020

bganglia commented Apr 20, 2020

command-line-interface #27

command-line-interface #27

Conversation

naustica commented Apr 18, 2020 • edited Loading

bganglia left a comment

Choose a reason for hiding this comment

naustica commented Apr 18, 2020

naustica commented Apr 18, 2020

bganglia commented Apr 18, 2020

naustica commented Apr 19, 2020

bganglia commented Apr 19, 2020

bganglia commented Apr 19, 2020 • edited Loading

naustica commented Apr 19, 2020

naustica commented Apr 20, 2020

bganglia commented Apr 20, 2020

naustica commented Apr 20, 2020

bganglia commented Apr 20, 2020

naustica commented Apr 18, 2020 •

edited

Loading

bganglia commented Apr 19, 2020 •

edited

Loading