Proposed patch for U4-10756 - Guid TypedContent performance #2398

leekelleher · 2018-01-18T11:48:39Z

This is a proposal for a temporary workaround to issue U4-10756.

It uses a local static ConcurrentDictionary to store a lookup of Guid/int values.

If the Guid isn't in the lookup, then the traditional XPath is used, which would add the resulting node ID (int) to the lookup.
If the lookup contains the Guid, then the returned int value will be used to perform a more efficient retrieval.

http://issues.umbraco.org/issue/U4-10854

Note: I have also removed the UseLegacyXmlSchema check, see commit da11acd for notes.

nul800sebastiaan · 2018-01-18T16:19:59Z

Makes a lot of sense for the short term, we'll review more closely next week, thanks!

leekelleher · 2018-01-18T16:21:43Z

@nul800sebastiaan It was one of those where I was trying to figure it out. So no expectation (from me) in this being the accepted solution. Definitely a discussion point. 👍

nul800sebastiaan · 2018-01-18T16:24:05Z

Yep, no worries, it's a bit late to put it in the next patch, but can make it into 7.7.10 - we do have a larger solution already in mind but this will help for now!

Created http://issues.umbraco.org/issue/U4-10854 for this one, cheers!

nul800sebastiaan · 2018-01-18T16:25:23Z

I edited your link above to go to the new issue.

@ HQ when merging, please use http://issues.umbraco.org/issue/U4-10854

Shazwazza · 2018-01-18T19:17:24Z

I wonder what the xpath or xpath navigator performance would be to go look at every Guid in the content cache and cache it all up-front when first requested as opposed to one by one?
Of course we can then do the one by one lookup after the fact if it isn't in cache (i.e. when a new item is published) but it might not be much slower to cache all of them on first request

JimBobSquarePants · 2018-01-19T02:02:49Z

src/Umbraco.Web/PublishedContentQuery.cs

-        }
+        }
+
+        private static ConcurrentDictionary<Guid, int> _guidToIntLoopkup;


Why not initialize it straight away instead of incurring the cost of a null check every time TypedDocumentById is called? An empty dictionary costs so little plus the check is not thread safe.

private static readonly ConcurrentDictionary<Guid, int> GuidToIntLoopkup = new ConcurrentDictionary<Guid, int>();

I guess I was trying to initialising it once TypedDocumentById(Guid) is called - as opposed to whenever PublishedContentQuery is instantiated. e.g. if you don't use that method, then it wouldn't be created?

Init cost is very little compared to the null check hit. As soon as the class is referenced anywhere the reference will be created.

JimBobSquarePants · 2018-01-19T02:11:37Z

src/Umbraco.Web/PublishedContentQuery.cs

+
+                // When we have the node, we add the GUID/INT value to the lookup
+                if (doc != null)
+                    _guidToIntLoopkup.TryAdd(id, doc.Id);


Since we know that the key/value pair here are correct. There's no need for TryAdd since you are not checking the result anyway. _guidToIntLoopkup[id] = doc.Id should be sufficient.

leekelleher · 2018-01-19T09:53:51Z

@Shazwazza - Populating a lookup upfront is fine. Performance wise, I guess it's how many published nodes the website has? But then the lookup would need to be accessible from an app-startup event, so would need better API design, (than what I offer in this PR).

In @AndyButland's workaround solution - it does a database hit to get all the Guid/Int IDs: UdiToIdCache.cs#L27.

Alternatively, in my comment on @hartvig's other PR (#2367 (comment)), I mention a IndexingXPathNavigator class - which would add XSLT's key() feature. This would effectively be doing the same thing, building up an index, accessible by key.

I guess it comes down to how much time/effort is expected for this workaround?

@JimBobSquarePants

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

Shazwazza · 2018-01-23T18:55:06Z

@leekelleher Just to clarify what i mean, you've said

Populating a lookup upfront is fine. Performance wise, I guess it's how many published nodes the website has? But then the lookup would need to be accessible from an app-startup event, so would need better API design, (than what I offer in this PR).

I'm not talking about populating the dictionary on startup. I'm talking about populating the dictionary on first access - just like you are doing in this PR but instead of looking up the single entry, on the very first hit we could lookup all GUIDs -> INTs that exist in the xml doc. I'm saying this could be an easy win if the performance of looking up all GUIDs in the document using xpath isn't much more overhead than looking up a single one.

making sense?

leekelleher · 2018-01-24T09:56:09Z

@Shazwazza - ok gotcha. I'll take a look 👍

This is a proposal for a temporary workaround to issue U4-10756. It uses a local static ConcurrentDictionary to store a lookup of Guid/int values. If the Guid isn't in the lookup, then the traditional XPath is used, which would add the resulting node ID (int) to the lookup. If the lookup contains the Guid, then the returned int value will be used to perform a more efficient retrieval. <http://issues.umbraco.org/issue/U4-10756>

@key

The enhancement in PR umbraco#2367 removed the `"@isdoc"` check, which was the main difference between the legacy and current XML schema. Reducing the XPath to `"//*[@key=$guid]"` will perform the same for both types of schema. _(and saves on a couple of allocations)_

@JimBobSquarePants

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

@JimBobSquarePants

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

@Shazwazza

as per @Shazwazza's comment: umbraco#2398 (comment) Tested against 12,000 nodes (in the XML cache), profiling showed it took 55ms.

leekelleher · 2018-01-24T11:58:23Z

@Shazwazza - I've added in code to eagerly populate the Guid lookup, (see 0780979). I tested against 12,000 nodes, it took 55ms, (using MiniProfiler). I'm not sure what our level of acceptance would be here?

zpqrtbnk · 2018-01-24T14:27:31Z

Sounds good enough for me as it is now. IndexingXPathNavigator may be interesting, but we create a new XmlDocument (and thus a navigator?) each time we publish... all in all I think we can just merge and already gain a lot.

leekelleher · 2018-01-24T14:35:06Z

Cool, thanks @zpqrtbnk!
Yeah, the IndexingXPathNavigator did look heavy - I'd expect it would involve a lot of work.

leekelleher · 2018-01-24T14:40:06Z

src/Umbraco.Web/PublishedContentQuery.cs

+            if (_guidToIntLoopkup.Count == 0)
+            {
+                // TODO: Remove the debug profile logger
+                using (ApplicationContext.Current.ProfilingLogger.DebugDuration<PublishedContentQuery>("Populate GUID/INT lookup"))


@zpqrtbnk @Shazwazza Do you see any issue with using DebugDuration here?
I'm ever so slightly concerned about any performance bottlenecks here.

zpqrtbnk · 2018-01-24T14:46:01Z

DebugDuration is annoying, sure, and when not debugging we should somehow not try to trace the duration at all. That's something I have on my list for v8 but could be done in 7 I guess. But let's not cripple this PR now. The impact is probably quite low and happens only once.

MarcStoecker · 2018-01-24T18:53:04Z

I was reading the discussion you had here and just wanted to say "thank you" for putting so much effort into Umbraco. #h5yr 👍

bokmadsen

I’m no concurrency expert but does this not introduce race conditions where multiple threads populate the cache on first access at the same time? The same with returning the it afterwards: wouldn’t it be better to use GetOrAdd with the func overload to remove that specific race condition?

JimBobSquarePants reviewed Jan 19, 2018

View reviewed changes

leekelleher force-pushed the issue/U4-10756 branch from da11acd to 9277ae6 Compare January 19, 2018 11:13

leekelleher added a commit to leekelleher/umbraco-cms that referenced this pull request Jan 19, 2018

Amended the lookup dictionary setter,

9277ae6

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

leekelleher added 5 commits January 24, 2018 10:29

Amended the lookup dictionary setter,

6841b2a

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

Initialized the ConcurrentDictionary statically

a42e568

as per @JimBobSquarePants's comment: umbraco#2398 (comment)

Eagerly populating the Guid lookup

0780979

as per @Shazwazza's comment: umbraco#2398 (comment) Tested against 12,000 nodes (in the XML cache), profiling showed it took 55ms.

leekelleher force-pushed the issue/U4-10756 branch from 9277ae6 to 0780979 Compare January 24, 2018 11:56

zpqrtbnk merged commit 41c96e6 into umbraco:dev-v7 Jan 24, 2018

leekelleher deleted the issue/U4-10756 branch January 24, 2018 14:35

leekelleher commented Jan 24, 2018

View reviewed changes

bokmadsen reviewed Jan 25, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed patch for U4-10756 - Guid TypedContent performance #2398

Proposed patch for U4-10756 - Guid TypedContent performance #2398

leekelleher commented Jan 18, 2018 •

edited by nul800sebastiaan

Loading

nul800sebastiaan commented Jan 18, 2018

leekelleher commented Jan 18, 2018

nul800sebastiaan commented Jan 18, 2018

nul800sebastiaan commented Jan 18, 2018

Shazwazza commented Jan 18, 2018

JimBobSquarePants Jan 19, 2018

leekelleher Jan 19, 2018 •

edited

Loading

JimBobSquarePants Jan 21, 2018

JimBobSquarePants Jan 19, 2018

leekelleher commented Jan 19, 2018

Shazwazza commented Jan 23, 2018

leekelleher commented Jan 24, 2018

leekelleher commented Jan 24, 2018 •

edited

Loading

zpqrtbnk commented Jan 24, 2018

leekelleher commented Jan 24, 2018

leekelleher Jan 24, 2018

zpqrtbnk commented Jan 24, 2018

MarcStoecker commented Jan 24, 2018

bokmadsen left a comment

Proposed patch for U4-10756 - Guid TypedContent performance #2398

Proposed patch for U4-10756 - Guid TypedContent performance #2398

Conversation

leekelleher commented Jan 18, 2018 • edited by nul800sebastiaan Loading

nul800sebastiaan commented Jan 18, 2018

leekelleher commented Jan 18, 2018

nul800sebastiaan commented Jan 18, 2018

nul800sebastiaan commented Jan 18, 2018

Shazwazza commented Jan 18, 2018

JimBobSquarePants Jan 19, 2018

Choose a reason for hiding this comment

leekelleher Jan 19, 2018 • edited Loading

Choose a reason for hiding this comment

JimBobSquarePants Jan 21, 2018

Choose a reason for hiding this comment

JimBobSquarePants Jan 19, 2018

Choose a reason for hiding this comment

leekelleher commented Jan 19, 2018

Shazwazza commented Jan 23, 2018

leekelleher commented Jan 24, 2018

leekelleher commented Jan 24, 2018 • edited Loading

zpqrtbnk commented Jan 24, 2018

leekelleher commented Jan 24, 2018

leekelleher Jan 24, 2018

Choose a reason for hiding this comment

zpqrtbnk commented Jan 24, 2018

MarcStoecker commented Jan 24, 2018

bokmadsen left a comment

Choose a reason for hiding this comment

leekelleher commented Jan 18, 2018 •

edited by nul800sebastiaan

Loading

leekelleher Jan 19, 2018 •

edited

Loading

leekelleher commented Jan 24, 2018 •

edited

Loading