Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

zoe-translates · 2023-09-21T15:39:32Z

In Embedded Metadata, usually the HighWire-determined item type is
preferred. However, HW is not known to handle preprints distinctly
from (published) articles. In fact, in the HW translator preprints are
handled manually for special cases (bioRxiv/medRxiv). Therefore, in
EM, if type determined by non-HW already says "preprint", don't let HW
override that type. Especially, this keeps exports.itemType respected
(e.g. set by a translator that calls EM).
In EM translator, if we have determined the type to be "preprint", use
"Preprint PDF" as the PDF attachment name, rather than "Full Text
PDF".
In HW2.0 translator, make the bioRxiv/medRxiv special-case code a bit
easier to maintain, by
1. making an explicit "isBioMedRxiv()" function,
2. avoiding duplicated conditionals testing bioRxiv/medRxiv,
3. explicitly pass detected itemType to EM translator
In HW2.0, for bioRxiv/medRxiv, delete "pages" field, which is almost
always an artifact arising from malformed HW metadata. This prevents
it from going into the extra.
Make the HW2.0 scrape code async.
Update HW2.0 test cases.

In addition, failure to detect "multiple" by HW2.0 is addressed by improving the fallback-to-multiple logic and fixing the argument list of the call to getSearchResults()

Fixes #3137

- In Embedded Metadata, usually the HighWire-determined item type is preferred. However, HW is not known to handle preprints distinctly from (published) articles. In fact, in the HW translator preprints are handled manually for special cases (bioRxiv/medRxiv). Therefore, in EM, if type determined by non-HW already says "preprint", don't let HW override that type. Especially, this keeps exports.itemType respected (e.g. set by a translator that calls EM). - In EM translator, if we have determined the type to be "preprint", use "Preprint PDF" as the PDF attachment name, rather than "Full Text PDF". - In HW2.0 translator, make the bioRxiv/medRxiv special-case code a bit easier to maintain, by 1) making an explicit "isBioMedRxiv()" function, 2) avoiding duplicated conditionals testing bioRxiv/medRxiv, 3) explicitly pass detected itemType to EM translator - In HW2.0, for bioRxiv/medRxiv, delete "pages" field, which is almost always an artifact arising from malformed HW metadata. This prevents it from going into the extra. - Make the HW2.0 scrape code async. - Update HW2.0 test cases.

…ults()

zoe-translates · 2023-09-26T10:40:08Z

I need a bit more thoughts on this. Tried to convert this to draft but browser/github isn't letting me do so.

AbeJellinek · 2023-09-26T16:16:00Z

if we have determined the type to be "preprint", use "Preprint PDF" as the PDF attachment name, rather than "Full Text PDF".

If we do this, we should update arXiv, Preprints.org, etc., as well.

zoe-translates · 2023-09-27T08:20:27Z

Well, I guess "full text" (or "full-text" as an adjective) simply means that: it's a text in its entirety, as opposed to an abstract, a summary, or an abridgement. So we may even say "full-text preprint" which means a preprint as a whole (for instance, "medRxiv launches full-text HTML of preprints online"). The NASA ADS lists both the VoR and the corresponding arXiv preprint PDF files under "full text sources" e.g. on the upper right of this page: 2010ApJ...725.2324B

So on second thought, I think while useful, the distinction of "preprint PDF" vs. "full-text PDF" isn't that clear-cut. It's just that "full text PDF" is less specific when the file is indeed a preprint.

zoe-translates added 2 commits September 21, 2023 23:08

HighWire 2.0: fix "multiple" fallback logic; fix call to getSearchRes…

35f44b3

…ults()

zoe-translates mentioned this pull request Sep 21, 2023

BioRxiv: Use attachment title that conveys that file is a preprint #3137

Open

dstillman marked this pull request as draft September 26, 2023 11:37

zoe-translates mentioned this pull request Oct 28, 2023

Janeway publishing platform #3175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

zoe-translates commented Sep 21, 2023

zoe-translates commented Sep 26, 2023

AbeJellinek commented Sep 26, 2023

zoe-translates commented Sep 27, 2023

Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

Are you sure you want to change the base?

Embedded Metadata and HighWire fixes for preprint type (#3137) #3146

Conversation

zoe-translates commented Sep 21, 2023

zoe-translates commented Sep 26, 2023

AbeJellinek commented Sep 26, 2023

zoe-translates commented Sep 27, 2023