-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BioRxiv: Use attachment title that conveys that file is a preprint #3137
Comments
I think there may be a case for a dedicated bioRxiv translator, if just to take advantage of its JSON API, which has lower overhead for bulk saving (multiple). But then again, the current generic translator (Highwire 2.0, which in turn uses EM) already works; it's just the title of the attached PDF file that's unsatisfactory. One thing we can do to possibly improve the baseline situation is to add a predicate on the itemType here: translators/Embedded Metadata.js Line 551 in 5f7e0bd
If itemType is "preprint", we get a title string reflecting this. This could be a reasonable default, but this could also be foiled by a preprint hosting service that put the link to external, non-preprint VoR PDF in the metadata if it can find one -- perhaps conceivable, but I'm not aware of any IRL instances of this. In any case, if we want to get a title string that has "bioRxiv" in it (in the manner of arXiv), perhaps a new bioRxiv translator is warranted? |
Ah, I didn't realize we didn't have a dedicated translator here.
This is probably reasonable. We don't actually need "bioRxiv" in the name — "Preprint PDF" is fine. (That said, if we could tell what was an "Submitted Version" vs. an "Accepted Version" from the site metadata, that would be an argument in favor of a dedicated translator, since we use those terms for automatically fetched OA PDFs.) |
Or we could only say "Preprint PDF" when the file is hosted on the same domain? Is that the case for all the main preprint servers? |
Check OSF based servers. I think they might host the files on an OSF domain even if that's not the preprint server domain. Otherwise I believe yes |
Another feature requiring a dedicated translator is the automatic download of supplementary files (which will require page scraping; I don't think the API tells us anything about supplements but I'll verify). |
For the hypothetical issue of "metadata link to pdf on a preprint-hosting page pointing to non-preprint), now I don't feel OSF sites would be an issue. It seems that the much more likely thing for a preprint service to do is to link the preprint to any external VoR by a permalink or identifier -- like how it is done by arXiv and OSF using DOI link -- rather than link to a specific format. TL;DR it's too hypothetical. |
In fact, in HighWire 2.0 translator we have this Lines 294 to 295 in 8e5c648
So the ability to handle bioRxiv/medRxiv by HighWire 2.0 translator is sort of a hack. However, even with the hack the EM sub-translator still can't see the type as This problem is general, because there's currently no good way to detect preprint type by HW metadata in the EM translator, and any fixes in EM will be special-case code (i.e. domain- or URL-based allowlisting) that should better go into HW translator anyway. It's made worse because EM believes HW type to be of high-accuracy. So in EM, I adjusted the priority of type determination wrt. HW-derived type: when we can already identify the preprint type by other means, don't let HW override it. This makes it possible in the HW translator to explicitly pass the |
https://forums.zotero.org/discussion/comment/443647/#Comment_443647
The text was updated successfully, but these errors were encountered: