BioRxiv: Use attachment title that conveys that file is a preprint #3137

dstillman · 2023-09-16T02:56:59Z

https://forums.zotero.org/discussion/comment/443647/#Comment_443647

zoe-translates · 2023-09-16T09:31:58Z

I think there may be a case for a dedicated bioRxiv translator, if just to take advantage of its JSON API, which has lower overhead for bulk saving (multiple). But then again, the current generic translator (Highwire 2.0, which in turn uses EM) already works; it's just the title of the attached PDF file that's unsatisfactory.

One thing we can do to possibly improve the baseline situation is to add a predicate on the itemType here:

translators/Embedded Metadata.js

Line 551 in 5f7e0bd

    
           newItem.attachments.push({ title: "Full Text PDF", url: pdfURL, mimeType: "application/pdf" });

If itemType is "preprint", we get a title string reflecting this. This could be a reasonable default, but this could also be foiled by a preprint hosting service that put the link to external, non-preprint VoR PDF in the metadata if it can find one -- perhaps conceivable, but I'm not aware of any IRL instances of this.

In any case, if we want to get a title string that has "bioRxiv" in it (in the manner of arXiv), perhaps a new bioRxiv translator is warranted?

dstillman · 2023-09-16T09:42:56Z

Ah, I didn't realize we didn't have a dedicated translator here.

If itemType is "preprint", we get a title string reflecting this.

This is probably reasonable. We don't actually need "bioRxiv" in the name — "Preprint PDF" is fine. (That said, if we could tell what was an "Submitted Version" vs. an "Accepted Version" from the site metadata, that would be an argument in favor of a dedicated translator, since we use those terms for automatically fetched OA PDFs.)

dstillman · 2023-09-16T09:45:17Z

Or we could only say "Preprint PDF" when the file is hosted on the same domain? Is that the case for all the main preprint servers?

adam3smith · 2023-09-16T10:22:39Z

Check OSF based servers. I think they might host the files on an OSF domain even if that's not the preprint server domain. Otherwise I believe yes

zoe-translates · 2023-09-16T14:00:22Z

Another feature requiring a dedicated translator is the automatic download of supplementary files (which will require page scraping; I don't think the API tells us anything about supplements but I'll verify).

zoe-translates · 2023-09-21T10:47:33Z

For the hypothetical issue of "metadata link to pdf on a preprint-hosting page pointing to non-preprint), now I don't feel OSF sites would be an issue. It seems that the much more likely thing for a preprint service to do is to link the preprint to any external VoR by a permalink or identifier -- like how it is done by arXiv and OSF using DOI link -- rather than link to a specific format. TL;DR it's too hypothetical.

zoe-translates · 2023-09-21T15:40:10Z

In fact, in HighWire 2.0 translator we have this

translators/HighWire 2.0.js

Lines 294 to 295 in 8e5c648

    
           if (item.publicationTitle.endsWith('Rxiv')) { 
        
           	item.itemType = preprintType;

So the ability to handle bioRxiv/medRxiv by HighWire 2.0 translator is sort of a hack. However, even with the hack the EM sub-translator still can't see the type as preprint inside it (hence unable to take advantage of auto naming based on preprint type).

This problem is general, because there's currently no good way to detect preprint type by HW metadata in the EM translator, and any fixes in EM will be special-case code (i.e. domain- or URL-based allowlisting) that should better go into HW translator anyway. It's made worse because EM believes HW type to be of high-accuracy.

So in EM, I adjusted the priority of type determination wrt. HW-derived type: when we can already identify the preprint type by other means, don't let HW override it.

This makes it possible in the HW translator to explicitly pass the itemType property to EM (as exports.itemType) This is what I did in #3146.

zoe-translates self-assigned this Sep 16, 2023

zoe-translates linked a pull request Sep 21, 2023 that will close this issue

Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BioRxiv: Use attachment title that conveys that file is a preprint #3137

BioRxiv: Use attachment title that conveys that file is a preprint #3137

dstillman commented Sep 16, 2023

zoe-translates commented Sep 16, 2023

dstillman commented Sep 16, 2023

dstillman commented Sep 16, 2023

adam3smith commented Sep 16, 2023

zoe-translates commented Sep 16, 2023

zoe-translates commented Sep 21, 2023

zoe-translates commented Sep 21, 2023

BioRxiv: Use attachment title that conveys that file is a preprint #3137

BioRxiv: Use attachment title that conveys that file is a preprint #3137

Comments

dstillman commented Sep 16, 2023

zoe-translates commented Sep 16, 2023

dstillman commented Sep 16, 2023

dstillman commented Sep 16, 2023

adam3smith commented Sep 16, 2023

zoe-translates commented Sep 16, 2023

zoe-translates commented Sep 21, 2023

zoe-translates commented Sep 21, 2023