Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLSyntaxError #55

Closed
deakkon opened this issue Feb 28, 2018 · 4 comments
Closed

XMLSyntaxError #55

deakkon opened this issue Feb 28, 2018 · 4 comments

Comments

@deakkon
Copy link

deakkon commented Feb 28, 2018

In [41]: pp.parse_medline_xml('/home/docClass/files/pubmed/pubmed18n1040.xml.gz')
Error: it was not able to read a path, a file-like object, or a string as an XML
File "", line 1
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

Source: ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/pubmed18n1040.xml.gz

@daniel-acuna
Copy link
Collaborator

I don't think parse_medline_xml parses .gz files. You need to uncompress it first.

@deakkon
Copy link
Author

deakkon commented Feb 28, 2018

Hi,

are you sure?
E.g. pp.parse_medline_xml('pubmed18n0364.xml.gz') (source ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed18n0364.xml.gz)

gives back a list of dicts.

@daniel-acuna
Copy link
Collaborator

Can you try uncompress it first? The file works for me

@deakkon
Copy link
Author

deakkon commented Feb 28, 2018

Sorry, my mistake! The issues was that the file was not properly downloaded (Im performing a batch download and no error was printed out).

Redownloaded it manually and it works directly from the path (skipping uncompressing).

Best,
J.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants