Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving a PDF document #116

Closed
remil19 opened this issue Aug 9, 2013 · 13 comments
Closed

Saving a PDF document #116

remil19 opened this issue Aug 9, 2013 · 13 comments
Labels

Comments

@remil19
Copy link

remil19 commented Aug 9, 2013

J'ai essayé d'ajouter un pdf qui s'était ouvert dans mon navigateur via pdf.js à poche, mais celui-ci l'enregistre comme du texte brut et sans titre. Ce serait bien qu'il enregistre un titre et qu'il l'ouvre dans un onglet quand on clique dessus.

@nicosomb
Copy link
Member

Hello @remil19!
Sorry to answer so late.

I don't know if this feature has to be implemented in wallabag (our new name).
If anyone has an idea?

@tcitworld
Copy link
Member

In fact, with a PDF file, no informations about the title are transmitted by HTTP. You can test it with any pdf document online with http://web-sniffer.net/.
So the only information we could get would be the filename, and sometimes they're not at all speaking to the user. But maybe if Wallabag v2 allows to change the title, something can be made.
However, a detection should be made (with the Content-Type : application/pdf http header) to say to Wallabag "this isn't a regular page".

Of course, we can also use an external library to read metadata informations.

@nicosomb
Copy link
Member

I just added the "Plugin" label to this issue.

@remil19
Copy link
Author

remil19 commented Feb 21, 2014

Sorry to answer so late but the first line of a PDF usually begin with %PDF so i guess we could look for this string when a entry is created (and i think it could be usefull to be implemented directly in Wallabag : many long and interesting documents are published in PDF and it may not require a lot of modification : maybe just a boolean in the database or a type attribute and you also could just use this viewer: https://github.com/mozilla/pdf.js).

@mariroz
Copy link
Contributor

mariroz commented Feb 22, 2014

hi, @remil19 , yes, of course, pdf should be handled when entry is imported. But not by parsing itself, but 1 step ahead: by checking document http headers. (see related issue #444 about plain text handling). I hope, that now or later we will implement this. Anyway I will try :).

@nicosomb nicosomb modified the milestones: 1.7.0, 2.0 Feb 22, 2014
@nicosomb nicosomb added Feature and removed Plugin labels Feb 22, 2014
@tcitworld
Copy link
Member

Yes, detection with parsing first bytes of files isn't really easy to made, compared to http headers detection. Although, I don't know if all servers serve pdf properly.

@tcitworld tcitworld modified the milestones: 1.8.0, 1.7.0 Apr 24, 2014
@tcitworld tcitworld modified the milestones: 2.0, 1.8.0 Jun 7, 2014
@tcitworld
Copy link
Member

Going for v2.x.

@nicosomb nicosomb removed the Question label Jul 30, 2014
@nicosomb
Copy link
Member

Assigned to Tender discussion #4.

@j0k3r
Copy link
Member

j0k3r commented Sep 15, 2015

For now, instead of storing the pdf itself, we provide a text version of it: j0k3r/graby#16

@nicosomb
Copy link
Member

nicosomb commented Apr 8, 2016

Done by @j0k3r in graby 👍

@nicosomb nicosomb closed this as completed Apr 8, 2016
@mdimura
Copy link

mdimura commented Apr 16, 2016

I tried saving PDF-url with wallabag v2.0.1, but I get " wallabag can't retrieve contents for this article. Please report this issue to us. " error. Would be great if wallabag downloaded the original PDF and stored it locally for future reading.

@nicosomb
Copy link
Member

Can you open a new issue for that please?

@nicosomb nicosomb removed this from the 2.1.0 milestone Sep 14, 2016
@toobluescientist
Copy link

I tried saving PDF-url with wallabag v2.0.1, but I get " wallabag can't retrieve contents for this article. Please report this issue to us. " error. Would be great if wallabag downloaded the original PDF and stored it locally for future reading.

Hey, I have the same trouble as you. For me, I still cannot save PDF from a PDF url onto Wallabag. If it saves, sometimes the text is just unreadable. How is it now for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants