Try translating PDF URLs based on URL #70

dstillman · 2018-12-23T07:48:54Z

Related to #38, but a few translators are able to function based on the URL, even when it's a PDF page. We should try to support those cases, before either trying PDF recognition (from #38) or failing (if PDF recognition isn't enabled). This includes DOIs in the URL as well as certain sites where we recognize PDF URLs (since people sometimes click "Save to Zotero" when viewing a PDF without going back to the article page). I can try to find an example of such a translator if necessary.

This might be a little tricky, because we may need to provide a fake empty document to run detect on, but we won't want to fall back to generic webpage saving.

mrtcode · 2019-01-07T22:37:17Z

Can we wait for #59 or do we want this for the current t-s version?

So I think not only PDF but all URLs should be tried.

I.e. this doesn't work because it's returning a JSON content type.

If it's HTML or XML content type, it already goes through translation architecture, otherwise:
Create an empty document
Do a separate translation
If successful, return the translated metadata
If it's a PDF, upload and process it
If not a PDF, return invalid content type error

And we don't want to translate URLs that return an HTTP error code?

dstillman · 2019-01-08T04:29:22Z

This can wait for #59 if that's easier.

And we don't want to translate URLs that return an HTTP error code?

I think that's right.

mrtcode · 2019-01-09T21:11:43Z

I already implemented a fix that does what is described in this issue, but it's based on #59, therefore it will need to wait. Another requirement is zotero/translators#1799, because the current DOI translator can't extract from URL. For now it's better to just do #72.

phiresky · 2019-08-20T09:35:44Z

@mrtcode This probably isn't the right place to ask, but what is the reason that Zotero Connect can get the actual citation from something like https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf but Translation Server can't? Also I'm having a hard time figuring out how Zotero Connect does that at all...

mrtcode · 2019-08-20T10:03:46Z

@phiresky Zotero Connector uses 'Neural Information Processing Systems' translator which is actually slicing off the '.pdf' extension and extracting metadata from the web page behind this paper https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks . Technically translation-sever should be capable to do the same. I think we have to fix that. Good observation.

phiresky · 2019-08-20T10:27:23Z

Thanks. Here are some more examples that work fine via Zotero but not via Translation Server:

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44321.pdf
http://sci-hub.tw/https://ieeexplore.ieee.org/abstract/document/8666636 (I guess this is a different problem?)
http://img.cs.uec.ac.jp/pub/conf17/171024ege_0.pdf

Are those the same issue?

My motivation here by the way is that I'm writing papers in markdown and I wrote a tool to transparently convert URLs to citations without having to use a reference manager: https://github.com/phiresky/pandoc-url2cite

mvolz · 2019-12-18T09:55:01Z

This has been brought up again on the email list: https://groups.google.com/forum/#!msg/zotero-dev/9AmwvQqBCBY/H57ukdE9AgAJ

tg-z mentioned this issue Dec 15, 2019

Automatic citation extraction from URLs - phiresky's blog tg-z/web-clip#726

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try translating PDF URLs based on URL #70

Try translating PDF URLs based on URL #70

dstillman commented Dec 23, 2018

mrtcode commented Jan 7, 2019

dstillman commented Jan 8, 2019

mrtcode commented Jan 9, 2019

phiresky commented Aug 20, 2019

mrtcode commented Aug 20, 2019

phiresky commented Aug 20, 2019

mvolz commented Dec 18, 2019

Try translating PDF URLs based on URL #70

Try translating PDF URLs based on URL #70

Comments

dstillman commented Dec 23, 2018

mrtcode commented Jan 7, 2019

dstillman commented Jan 8, 2019

mrtcode commented Jan 9, 2019

phiresky commented Aug 20, 2019

mrtcode commented Aug 20, 2019

phiresky commented Aug 20, 2019

mvolz commented Dec 18, 2019