-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't read mzML files written by OpenChrom because of missing namespace declaration #17
Comments
Hi Ethan, thanks for the bug report - it's good to know that this is happening. Namespaces are definitely one of those things I don't understand about XML very well so I'm not super surprised that we ran into an issue with it at some point. It looks like the main issue is that I wrote RaMS with the expectation that the default namespaces (d1/d2) are the only ones used, but the OpenChrom mzMLs only have the xsi namespace:
The good news is that the fix is relatively simple: we just need to drop the namespaces from the xpath expressions:
I've implemented this in the namespace_free branch and it works perfectly on the demo file you shared, but is then broken for mzMLs from other places. The bad news is that this is super pervasive - I've basically hard-coded these namespaces everywhere in the code and it won't be easy to carefully strip them out and create a new switch that handles both types. I think the real solution is more along the lines of the second strategy you propos and will probably be for me to learn more about XML namespaces and do the proper handling when the file is initially read in, but it may be a little while before I can set aside time for that. In the meantime, I hope the namespace_free branch will work for your purposes. You can install it with
|
Thanks for looking into this! It took me stupidly long to figure this out, but I think I actually found a pretty easy solution. It turns out you can just set the namespace as an attribute of the xml. I think a function like this will fix the problem (to be called right after reading in the xml with
I could probably put together a pull request later this afternoon if you like |
Hi Ethan, just wanted to let you know I haven't forgotten about this! I'm a bit swamped with other tasks right now and have had to de-prioritize package development but I'm hoping to get v1.4 our by the end of this year or early 2024 and this would be the perfect thing to include at that point. Your PR looks great and I'll probably just implement it as-is, I just don't have the time to go through the updates and versioning process at the moment. |
Sounds good. Thanks for the update and no worries about the delay -- I am quite familiar with being swamped. |
Completed with #24 |
Hi,
The mzML files written by OpenChrom are missing a namespace declaration that is required for RaMS to parse the files correctly. It throws an error:
Error in xml_find_first.xml_node(xml_data, paste0("//d1:", node_to_check)) : xmlXPathEval: evaluation failed".
If I manually add the namespace declaration to the file, it parses fine.I'm not sure what the best fix would be here. I'm not too familiar with mzML so I don't know if these namespace declarations are "required" for the mzML files to be valid. In any case I am wondering if there is a way to add a check in RaMS for the namespace declaration and maybe modify the xpaths accordingly so that RaMS can read the OpenChrom files without modification. Alternatively, I guess it would be possible to add the namespace declaration to the XML after importing it to R. I'm not sure which way would be a better approach here?
I'm attaching an example file that was produced by OpenChrom (with the .txt extension because GitHub wouldn't let me upload mzML).
alkanes 0.25 mg-ml_2-10-2021.txt
Thanks!
Ethan
The text was updated successfully, but these errors were encountered: