Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htmlDoc.DocumentNode.SelectSingleNode("someXpathValue") returns random null objects #203

Open
Futuresmo opened this issue May 28, 2018 · 6 comments
Assignees

Comments

@Futuresmo
Copy link

initially this function worked just fine returning objects values as expected. Recently it started returning null values. I do not expect neither empty collection or null for this function as html document loaded without issues. Xpath value seem to be fine as well, as occasionally function returns values as expected.

@JonathanMagnan JonathanMagnan self-assigned this May 28, 2018
@JonathanMagnan
Copy link
Member

Hello @Futuresmo ,

Can you give me an example? Our code doesn't have obviously some random behavior, it always work or not so we suspect the server give different value sometimes (usually caused by bot detection).

Best Regards,

Jonathan


Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsBulk OperationsDapper PlusLinqToSql Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@Futuresmo
Copy link
Author

htmlDoc.DocumentNode.SelectSingleNode("//div[h2]");
initially returned "div" type of object. Since couple of weeks i am getting NULL values instead.

I am having max 5-10 calls during the days, not sure why this would be considered as bot. Is there any workaround there?

@JonathanMagnan
Copy link
Member

Hello @Futuresmo ,

Do you have the link as well?

If that always happen, perhaps they simply modified the HTML. If that happens from time to time, there is not so much we can do as the library probably always work, simply the HTML is not the same.

Best Regards,

Jonathan

@Futuresmo
Copy link
Author

https://forexlive.com/orders/!/fx-option-expiries-for-the-1400-gmt-cut-28-march-2018-20180328

just tested successfully, followed by null value in a minute

@JonathanMagnan
Copy link
Member

Hello @Futuresmo ,

If you look at the current HTML you will find out that's almost empty since they detected it was not really coming from a browser but from a script/robot.

I believe maybe playing with the UserAgent could help you but I'm not aware of an UserAgent that work with this site.

web.UserAgent = "Mozilla/5.0";

Unfortunately, I don't believe we will be able to help you further in this issue.

Best Regards,

Jonathan

@Futuresmo
Copy link
Author

Not sure if this is relevant but comparing htmlDocument returned from the http above in happy scenario has much more child nodes. In Null type scenario the nodes referenced in xpath are missing, that's why node object is returned as null. Changing the user agent as described above does not seem to make any difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants