Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes pugixml fails to find xpath values #57

Closed
MarkZ1 opened this issue Sep 23, 2015 · 10 comments
Closed

Sometimes pugixml fails to find xpath values #57

MarkZ1 opened this issue Sep 23, 2015 · 10 comments
Labels

Comments

@MarkZ1
Copy link

MarkZ1 commented Sep 23, 2015

I have the following code to extract data from an XML document using XPATH:

pugi::xml_document doc;

size_t length = strlen(message_text);
pugi::xml_parse_result result = doc.load_buffer_inplace(message_text, length);

pugi::xpath_node_set node_set = doc.select_nodes(xpath);
pugi::xpath_node_set::const_iterator it;
for (it = node_set.begin(); it != node_set.end(); ++it)
{
    pugi::xpath_node node = *it;
    strncpy(value, node.node().text().get(), size);
    value[size] = '\0';

...

Sometimes this works fine and sometimes the contents of 'value' is an empty string even if
the document contains data in the element. How can I troubleshoot this? I have not used
pugixml before and I am on a very tight deadline.

@zeux
Copy link
Owner

zeux commented Sep 23, 2015

  1. You can locate the node without the text using node.node().offset_debug() - this is the byte offset from the beginning of the file where the node is
  2. You can check node.node().child_value() (should be the same but who knows)
  3. You're using load_buffer_inplace; make sure message_text is not being modified after load_buffer_inplace or reused - load_buffer_inplace is a destructive operation. Check if load_buffer works instead.
  4. If you attach the document and/or XPath query you're using I may have more ideas. If you don't want to make them public feel free to send them to me by e-mail: arseny.kapoulkine@gmail.com

@MarkZ1
Copy link
Author

MarkZ1 commented Sep 23, 2015

Thanks for such a quick response. Using 'load_buffer' instead seems to have fixed the problem. However I am certain that the buffer is not modified until after all the pugixml methods have been called.

@zeux
Copy link
Owner

zeux commented Sep 23, 2015

Can you provide a more complete version of the code (including details such as where the value is defined, how message_text is allocated & destroyed etc.)?

@MarkZ1
Copy link
Author

MarkZ1 commented Sep 23, 2015

Both value & message_text are local variables of the caller to the parser function, both char[]. The prototype for the parser function is:

extern "C" {
int iso20022_get_field(const char *xpath,
char *value, char *message_text,
int size);
}

Prior to me using the load_buffer_inplace, message_text was "const char*"

@zeux
Copy link
Owner

zeux commented Sep 23, 2015

The only way I can imagine the code you posted malfunctioning is if [value, value + size] and [message_text, message_text + strlen(message_text)) overlap. If they do then strncpy will fill part of message_text with zeroes, which will essentially make some nodes in the document lose their names & contents.

If the above is true it's likely that the node you find will be very unusual - this:

 printf("%d <%s>\n", node.node().type(), node.node().name());

Would print:

2 <>

Also note that the code value[size] = 0 is suspicious - from the declaration of the function it seems that value is a buffer with size bytes, not size+1?

@MarkZ1
Copy link
Author

MarkZ1 commented Sep 23, 2015

There is definitely no overlap. And value is defined as [size+1]

@zeux
Copy link
Owner

zeux commented Oct 9, 2015

I don't really know at this point how to make progress on this. I still suspect an application bug but I can't be sure.

Can you provide the source for iso20022_get_field function, the source for the function that calls it, an example XPath expression and an example XML where you could see the issue? (I understand that the issue is intermittent - this does not matter). Feel free to e-mail them to arseny.kapoulkine at gmail dot com if you don't want to attach them to the issue.

@zeux
Copy link
Owner

zeux commented Oct 24, 2015

Wait, this is obvious. I'm not sure how I missed this.

I was assuming that the code checks the XML parse result but I don't see this in the code. If you call iso20022_get_field twice on the same buffer, the first call will destructively modify the buffer if you use load_buffer_inplace, and the second call will get a buffer that likely has the first tag abruptly ending, e.g. "<foo" - in which case the parsing will fail, and XPath query will not find anything.

@zeux zeux closed this as completed Oct 24, 2015
@MarkZ1
Copy link
Author

MarkZ1 commented Oct 26, 2015

That must be it.

@zeux zeux added the invalid label Nov 10, 2015
@manishpatodi
Copy link

manishpatodi commented Oct 1, 2021

Hello , Any API that i can use to find the node (anywhere in xml) without xpath? (I know the node name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants