Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlNodeNavigator copy constructor is inefficient #324

Closed
Schnutzel opened this issue Sep 3, 2019 · 2 comments
Closed

HtmlNodeNavigator copy constructor is inefficient #324

Schnutzel opened this issue Sep 3, 2019 · 2 comments
Assignees

Comments

@Schnutzel
Copy link

Description

HtmlNodeNavigator assigns new instances to _doc and _nameTable, as such:

private readonly HtmlDocument _doc = new HtmlDocument();
private readonly HtmlNameTable _nametable = new HtmlNameTable();

However, some of the constructors (including the copy constructor) immediately assign new values to these fields. This causes unnecessary creation and destruction of these objects, which can cause slowness when used repeatedly, for example HtmlNodeNavigator.Clone() is indirectly called by HtmlNode.SelectNodes.

Suggested fix

Move the assignments from the field declaration line to the empty constructor:

internal HtmlNodeNavigator()
{
    _doc = new HtmlDocument();
    _nametable = new HtmlNameTable();
    Reset();
}

and call this constructor from all other constructors except the ones that initialize _doc and _nametable themselves.

How to reproduce

I added the following unit test:

[Test]
public void SelectEventAttributesTest()
{
    String xpath = "//* [@onkeypress or @onkeydown or @onkeyup or @onclick or @ondblclick or @onmousedown or @onmouseup or @onmouseover or @onmousemove or @onmouseout or @onmouseenter or @onmouseleave or @onmousewheel or @oncontextmenu or @onabort or @onbeforeunload or @onerror or @onload or @onmove or @onresize or @onscroll or @onstop or @onunload or @onreset or @onsubmit or @onblur or @onchange or @onfocus or @onfocusin or @onfocusout or @oninput or @onbeforeactivate or @onactivate or @onbefordeactivate or @ondeactivate or @onbounce or @onfinish or @onstart or @onbeforecopy or @onbeforecut or @onbeforeeditfocus or @onbeforepaste or @onbeforeupdate or @oncopy or @oncut or @ondrag or @ondragdrop or @ondragend or @ondragenter or @ondragleave or @ondragover or @ondragstart or @ondrop or @onlosecapture or @onpaste or @onselect or @onselectstart or @oncontrolselect or @onmovestart or @onmoveend or @onafterupdate or @oncellchange or @ondataavailable or @ondatasetchanged or @ondatasetcomplete or @onerrorupdate or @onrowenter or @onrowexit or @onrowsdelete or @onrowsinserted or @onafterprint or @onbeforeprint or @onfilterchange or @onhelp or @onpropertychange or @onreadystatechange]";
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(@"<html><body><div id='foo' onclick='bar'><span> some</span> text</div></body></html>");
    for (int i = 0; i < 100000; i++)
    {
        doc.DocumentNode.SelectNodes(xpath).ToList();
    }
}

Runtime with and without the fix:

  • Without the fix: 8 seconds.
  • With the fix: 5 seconds.

Further technical details

  • HAP version: 1.11.1.0
  • NET version (net472, netcore, etc.): 4.7.2
@JonathanMagnan JonathanMagnan self-assigned this Sep 3, 2019
@JonathanMagnan
Copy link
Member

JonathanMagnan commented Sep 3, 2019

Hello @Schnutzel ,

Thank you for reporting,

We will look at your suggested fix.

Best Regards,

Jonathan


Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsEntity Framework ClassicBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@JonathanMagnan
Copy link
Member

Sorry for the long delay,

We are finally able to work again on HAP (we had lot of internal stuff to complete first).

The v1.11.13 has been released with your code.

Thank again for your pull request.

Best Regards,

Jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants