-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: handle namespaced XML #73
Conversation
I believe you could probably use read_namespaced_event, ignoring the resolved namespace? |
Could you please also rebase? |
Alternatively, if you don't want to use quick_xml namespaces resolution (which has some overhead), do you mind quickly benchmarking this alternative: fn compare_name(node_name: &[u8], search_name: &[u8]) -> bool {
node_name.ends_with(search_name)
&& (node_name.len() == search_name.len()
|| node_name[node_name.len() - search_name.len() - 1] == b':') // as we are sure node_name.len() > search_name.len()
} We could indeed probably amend quick-xml and provide this fn in |
I've rebased. Perhaps I'm misunderstanding the
still yields names with the namespace:
I did a simple (repeated runs w/ The
But either way does seem like it should end up in |
Oh really! I'll check on quick-xml but it might be a regression then. |
Just opened an issue on quick-xml. I am fine merging this PR, I'll update quick-xml later. Except if you want to make that change on quick-xml directly, I cannot do it at the moment. |
After reviewing |
Should we close this PR and wait for merging the one on quick-xml first? |
I have a rebase of this PR which works with the |
Just published quick-xml 0.7.0 |
PR updated to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bother you again, but I am not sure you need read_namespaced_event
if you are dealing with namespaces manually via the local_name
function anyway.
src/xlsx.rs
Outdated
match xml.read_event(&mut buf) { | ||
Ok(Event::Start(ref e)) if e.name() == b"si" => { | ||
if let Some(s) = read_string(&mut xml, b"si")? { | ||
match xml.read_namespaced_event(&mut buf).map(|t| t.1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use read_event
directly since the local_name
is on BytesStart
?
We're not the only ones:
EDIT: I've just did it in #74, can you rebase? |
32b966b
to
7d13ce7
Compare
b285cc0
to
a294fbf
Compare
(cherry picked from commit 9f029bc)
tests both namespaced xml content and inline v. shared richtext strings (cherry picked from commit 3cbe855)
* introduces LocalName trait * impl `LocalName` for `Bytes{Start,End}`, which returns the name with any prefix removed * use `local_name()` to handle namespacing (cherry picked from commit 50176dd)
(cherry picked from commit 258ad64)
(cherry picked from commit a294fbf)
You're right; Thanks! |
Thanks a lot for your patience! |
Just published version 0.7.0 |
This continues using a pattern for parsing namespaced xlsx files introduced in pr tafia#73 and thereby allows parsing of namespaced .xlsx files with definedName values. `xml.read_text` requires the fully namespaced name. Since that isn't always equal to the local_name, the pattern introduced has been to use `e.local_name` for making comparisons to the raw value of interest and then `e.name()` instead of that raw value for xml.read_text .
This continues using a pattern for parsing namespaced xlsx files introduced in pr tafia#73 and thereby allows parsing of namespaced .xlsx files with definedName values. `xml.read_text` requires the fully namespaced name. Since that isn't always equal to the local_name, the pattern introduced has been to use `e.local_name` for making comparisons to the raw value of interest and then `e.name()` instead of that raw value for xml.read_text .
When putting together a test case for #72 , I created a .xlsx using Open XML SDK to be able to manually create both shared string and inline string data. It turns out that the xml generated by that will use namespaced XML tags (an example of such can be seen here, where, e.g.,
<x:sheets>
is used instead of<sheets>
. A sample .xlsx showing this behavior was added (richtext-namespaced.xlsx
).This PR introduces a trait
LocalName
which is used to strip namespace prefixes from tags forBytes{Start,End}
. However, such a method probably makes more sense added toquick-xml
? (Or possibly this functionality already exists there and I missed it). So consider this PR a WIP; I'm happy to retool it to work w/ an updatedquick-xml
or another approach.I also tried to make
Event::Eof
handling consistent by making it an error and explicitly checking for various closing tags.