Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import NCBI/Medline abstracts where newlines start with all caps letters #2224

Closed
adam3smith opened this issue Aug 8, 2020 · 2 comments · Fixed by #2225
Closed

Import NCBI/Medline abstracts where newlines start with all caps letters #2224

adam3smith opened this issue Aug 8, 2020 · 2 comments · Fixed by #2225

Comments

@adam3smith
Copy link
Collaborator

Reported here: https://forums.zotero.org/discussion/83719/some-abstracts-from-pubmed-truncated#latest

Sample data here:
https://gist.github.com/michi-zuri/5750a395438492c6c32e30d667fa4174

@michi-zuri
Copy link
Contributor

Interestingly, the title with a second line starting on SAMD9 was not affected, while the abstract was truncated with this term. But when changing this name in the title to AU - MD9 even the title gets truncated without warning, here is a minimal example where the title is truncated as well:

PMID- 000000000000
OWN - NLM
STAT- In-Process
LR  - 20200715
TI  - Outcomes of Hematopoietic Cell Transplantation in Patients with Germline
      AU - MD9/SAMD9L Mutations.
PG  - 2186-2196
LID - S1083-8791(19)30439-2 [pii]
LID - 10.1016/j.bbmt.2019.07.007 [doi]
AB  - Germline mutations in SAMD9 and SAMD9L genes cause MIRAGE... A patient with
      SAMD9L-associated MDS died of diffuse alveolar hemorrhage. 
FAU - Ahmed, Ibrahim A
AU  - Ahmed IA

@michi-zuri
Copy link
Contributor

The - in the regex

else if (line.search(/^[A-Z0-9]+\s*-/) != -1) {

and the missing consideration of leading spaces for each line due to stripping them all here
line = line.replace(/^\s+/, "");

explain why SAMD9L-associated triggered the processing of the tag up to that point in the abstract, but not in the title.

Another minimal exmample, this time only the title gets truncated:

PMID- 000000000000
OWN - NLM
STAT- In-Process
LR  - 20200715
TI  - Mickey Mouse had an 
      O-some day!
AB  - Mickey Mouse had a quiet day until something happened and
      SAMD9L was caught in the end. 
FAU - Mouse, Mickey
AU  - Mouse M 

michi-zuri added a commit to michi-zuri/translators that referenced this issue Aug 8, 2020
Fixes zotero#2224 
Whitespace gets cropped at the right moment, not too early. This way tag lines can easily be distinguished from new lines of continued tags.
adam3smith added a commit that referenced this issue Aug 9, 2020
Update MEDLINEnbib.js
Fixes #2224
Whitespace gets cropped at the right moment, not too early. This way tag lines can easily be distinguished from new lines of continued tags.
MylesFTOP pushed a commit to MylesFTOP/translators that referenced this issue Aug 23, 2020
Fixes zotero#2224 
Whitespace gets cropped at the right moment, not too early. This way tag lines can easily be distinguished from new lines of continued tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants