Sometimes pugixml fails to find xpath values #57

MarkZ1 · 2015-09-23T14:24:11Z

I have the following code to extract data from an XML document using XPATH:

pugi::xml_document doc;

size_t length = strlen(message_text);
pugi::xml_parse_result result = doc.load_buffer_inplace(message_text, length);

pugi::xpath_node_set node_set = doc.select_nodes(xpath);
pugi::xpath_node_set::const_iterator it;
for (it = node_set.begin(); it != node_set.end(); ++it)
{
    pugi::xpath_node node = *it;
    strncpy(value, node.node().text().get(), size);
    value[size] = '\0';

...

Sometimes this works fine and sometimes the contents of 'value' is an empty string even if
the document contains data in the element. How can I troubleshoot this? I have not used
pugixml before and I am on a very tight deadline.

The text was updated successfully, but these errors were encountered:

zeux · 2015-09-23T14:31:29Z

You can locate the node without the text using node.node().offset_debug() - this is the byte offset from the beginning of the file where the node is
You can check node.node().child_value() (should be the same but who knows)
You're using load_buffer_inplace; make sure message_text is not being modified after load_buffer_inplace or reused - load_buffer_inplace is a destructive operation. Check if load_buffer works instead.
If you attach the document and/or XPath query you're using I may have more ideas. If you don't want to make them public feel free to send them to me by e-mail: arseny.kapoulkine@gmail.com

MarkZ1 · 2015-09-23T15:07:27Z

Thanks for such a quick response. Using 'load_buffer' instead seems to have fixed the problem. However I am certain that the buffer is not modified until after all the pugixml methods have been called.

zeux · 2015-09-23T15:20:56Z

Can you provide a more complete version of the code (including details such as where the value is defined, how message_text is allocated & destroyed etc.)?

MarkZ1 · 2015-09-23T15:27:37Z

Both value & message_text are local variables of the caller to the parser function, both char[]. The prototype for the parser function is:

extern "C" {
int iso20022_get_field(const char *xpath,
char *value, char *message_text,
int size);
}

Prior to me using the load_buffer_inplace, message_text was "const char*"

zeux · 2015-09-23T15:48:51Z

The only way I can imagine the code you posted malfunctioning is if [value, value + size] and [message_text, message_text + strlen(message_text)) overlap. If they do then strncpy will fill part of message_text with zeroes, which will essentially make some nodes in the document lose their names & contents.

If the above is true it's likely that the node you find will be very unusual - this:

 printf("%d <%s>\n", node.node().type(), node.node().name());

Would print:

2 <>

Also note that the code value[size] = 0 is suspicious - from the declaration of the function it seems that value is a buffer with size bytes, not size+1?

MarkZ1 · 2015-09-23T16:12:28Z

There is definitely no overlap. And value is defined as [size+1]

zeux · 2015-10-09T04:18:39Z

I don't really know at this point how to make progress on this. I still suspect an application bug but I can't be sure.

Can you provide the source for iso20022_get_field function, the source for the function that calls it, an example XPath expression and an example XML where you could see the issue? (I understand that the issue is intermittent - this does not matter). Feel free to e-mail them to arseny.kapoulkine at gmail dot com if you don't want to attach them to the issue.

zeux · 2015-10-24T16:30:12Z

Wait, this is obvious. I'm not sure how I missed this.

I was assuming that the code checks the XML parse result but I don't see this in the code. If you call iso20022_get_field twice on the same buffer, the first call will destructively modify the buffer if you use load_buffer_inplace, and the second call will get a buffer that likely has the first tag abruptly ending, e.g. "<foo" - in which case the parsing will fail, and XPath query will not find anything.

MarkZ1 · 2015-10-26T07:26:30Z

That must be it.

manishpatodi · 2021-10-01T09:26:54Z

Hello , Any API that i can use to find the node (anywhere in xml) without xpath? (I know the node name)

zeux closed this as completed Oct 24, 2015

zeux added the invalid label Nov 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes pugixml fails to find xpath values #57

Sometimes pugixml fails to find xpath values #57

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Oct 9, 2015

zeux commented Oct 24, 2015

MarkZ1 commented Oct 26, 2015

manishpatodi commented Oct 1, 2021 •

edited

Loading

Sometimes pugixml fails to find xpath values #57

Sometimes pugixml fails to find xpath values #57

Comments

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Sep 23, 2015

MarkZ1 commented Sep 23, 2015

zeux commented Oct 9, 2015

zeux commented Oct 24, 2015

MarkZ1 commented Oct 26, 2015

manishpatodi commented Oct 1, 2021 • edited Loading

manishpatodi commented Oct 1, 2021 •

edited

Loading