Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed patch for U4-10756 - Guid TypedContent performance #2398

Merged
merged 5 commits into from
Jan 24, 2018

Conversation

leekelleher
Copy link
Member

@leekelleher leekelleher commented Jan 18, 2018

This is a proposal for a temporary workaround to issue U4-10756.

It uses a local static ConcurrentDictionary to store a lookup of Guid/int values.

  • If the Guid isn't in the lookup, then the traditional XPath is used, which would add the resulting node ID (int) to the lookup.
  • If the lookup contains the Guid, then the returned int value will be used to perform a more efficient retrieval.

http://issues.umbraco.org/issue/U4-10854


Note: I have also removed the UseLegacyXmlSchema check, see commit da11acd for notes.

@nul800sebastiaan
Copy link
Member

Makes a lot of sense for the short term, we'll review more closely next week, thanks!

@leekelleher
Copy link
Member Author

@nul800sebastiaan It was one of those where I was trying to figure it out. So no expectation (from me) in this being the accepted solution. Definitely a discussion point. 👍

@nul800sebastiaan
Copy link
Member

Yep, no worries, it's a bit late to put it in the next patch, but can make it into 7.7.10 - we do have a larger solution already in mind but this will help for now!

Created http://issues.umbraco.org/issue/U4-10854 for this one, cheers!

@nul800sebastiaan
Copy link
Member

I edited your link above to go to the new issue.

@ HQ when merging, please use http://issues.umbraco.org/issue/U4-10854

@Shazwazza
Copy link
Contributor

I wonder what the xpath or xpath navigator performance would be to go look at every Guid in the content cache and cache it all up-front when first requested as opposed to one by one?
Of course we can then do the one by one lookup after the fact if it isn't in cache (i.e. when a new item is published) but it might not be much slower to cache all of them on first request

}
}

private static ConcurrentDictionary<Guid, int> _guidToIntLoopkup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not initialize it straight away instead of incurring the cost of a null check every time TypedDocumentById is called? An empty dictionary costs so little plus the check is not thread safe.

private static readonly ConcurrentDictionary<Guid, int> GuidToIntLoopkup = new ConcurrentDictionary<Guid, int>();

Copy link
Member Author

@leekelleher leekelleher Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was trying to initialising it once TypedDocumentById(Guid) is called - as opposed to whenever PublishedContentQuery is instantiated. e.g. if you don't use that method, then it wouldn't be created?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init cost is very little compared to the null check hit. As soon as the class is referenced anywhere the reference will be created.


// When we have the node, we add the GUID/INT value to the lookup
if (doc != null)
_guidToIntLoopkup.TryAdd(id, doc.Id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that the key/value pair here are correct. There's no need for TryAdd since you are not checking the result anyway. _guidToIntLoopkup[id] = doc.Id should be sufficient.

@leekelleher
Copy link
Member Author

@Shazwazza - Populating a lookup upfront is fine. Performance wise, I guess it's how many published nodes the website has? But then the lookup would need to be accessible from an app-startup event, so would need better API design, (than what I offer in this PR).

In @AndyButland's workaround solution - it does a database hit to get all the Guid/Int IDs: UdiToIdCache.cs#L27.

Alternatively, in my comment on @hartvig's other PR (#2367 (comment)), I mention a IndexingXPathNavigator class - which would add XSLT's key() feature. This would effectively be doing the same thing, building up an index, accessible by key.

I guess it comes down to how much time/effort is expected for this workaround?

leekelleher added a commit to leekelleher/umbraco-cms that referenced this pull request Jan 19, 2018
@Shazwazza
Copy link
Contributor

@leekelleher Just to clarify what i mean, you've said

Populating a lookup upfront is fine. Performance wise, I guess it's how many published nodes the website has? But then the lookup would need to be accessible from an app-startup event, so would need better API design, (than what I offer in this PR).

I'm not talking about populating the dictionary on startup. I'm talking about populating the dictionary on first access - just like you are doing in this PR but instead of looking up the single entry, on the very first hit we could lookup all GUIDs -> INTs that exist in the xml doc. I'm saying this could be an easy win if the performance of looking up all GUIDs in the document using xpath isn't much more overhead than looking up a single one.

making sense?

@leekelleher
Copy link
Member Author

@Shazwazza - ok gotcha. I'll take a look 👍

This is a proposal for a temporary workaround to issue U4-10756.

It uses a local static ConcurrentDictionary to store a lookup of Guid/int values.
If the Guid isn't in the lookup, then the traditional XPath is used, which would add the resulting node ID (int) to the lookup.
If the lookup contains the Guid, then the returned int value will be used to perform a more efficient retrieval.

<http://issues.umbraco.org/issue/U4-10756>
The enhancement in PR umbraco#2367 removed the `"@isdoc"` check,
which was the main difference between the legacy and current XML schema.

Reducing the XPath to `"//*[@key=$guid]"` will perform the same for both types of schema.
_(and saves on a couple of allocations)_
as per @Shazwazza's comment: umbraco#2398 (comment)

Tested against 12,000 nodes (in the XML cache), profiling showed it took 55ms.
@leekelleher
Copy link
Member Author

leekelleher commented Jan 24, 2018

@Shazwazza - I've added in code to eagerly populate the Guid lookup, (see 0780979). I tested against 12,000 nodes, it took 55ms, (using MiniProfiler). I'm not sure what our level of acceptance would be here?

@zpqrtbnk
Copy link
Contributor

Sounds good enough for me as it is now. IndexingXPathNavigator may be interesting, but we create a new XmlDocument (and thus a navigator?) each time we publish... all in all I think we can just merge and already gain a lot.

@zpqrtbnk zpqrtbnk merged commit 41c96e6 into umbraco:dev-v7 Jan 24, 2018
@leekelleher
Copy link
Member Author

Cool, thanks @zpqrtbnk!
Yeah, the IndexingXPathNavigator did look heavy - I'd expect it would involve a lot of work.

@leekelleher leekelleher deleted the issue/U4-10756 branch January 24, 2018 14:35
if (_guidToIntLoopkup.Count == 0)
{
// TODO: Remove the debug profile logger
using (ApplicationContext.Current.ProfilingLogger.DebugDuration<PublishedContentQuery>("Populate GUID/INT lookup"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zpqrtbnk @Shazwazza Do you see any issue with using DebugDuration here?
I'm ever so slightly concerned about any performance bottlenecks here.

@zpqrtbnk
Copy link
Contributor

DebugDuration is annoying, sure, and when not debugging we should somehow not try to trace the duration at all. That's something I have on my list for v8 but could be done in 7 I guess. But let's not cripple this PR now. The impact is probably quite low and happens only once.

@MarcStoecker
Copy link
Contributor

I was reading the discussion you had here and just wanted to say "thank you" for putting so much effort into Umbraco. #h5yr 👍

Copy link
Contributor

@bokmadsen bokmadsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m no concurrency expert but does this not introduce race conditions where multiple threads populate the cache on first access at the same time? The same with returning the it afterwards: wouldn’t it be better to use GetOrAdd with the func overload to remove that specific race condition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants