Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please limit index rebuilds to standard indexes #211

Closed
jemayn opened this issue Mar 5, 2024 · 5 comments
Closed

Please limit index rebuilds to standard indexes #211

jemayn opened this issue Mar 5, 2024 · 5 comments
Labels
release/13.1.0 state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks type/feature

Comments

@jemayn
Copy link

jemayn commented Mar 5, 2024

I've found that on a partial restore the environment being restored to triggers this code:
https://github.com/umbraco/Umbraco-CMS/blob/contrib/src/Umbraco.Infrastructure/Suspendable.cs#L98

It may be the same for standard restores and transfers, I've only tested it for partial restores.

The problem we are facing is that we have a custom index which pulls external data and indexes it. This operation runs once a night and takes about 20 mins.

However the deploy partial transfer code causes Umbraco to rebuild all indexes, and this includes our custom index, causing downtime in our search while the index is built.

It would be great if you could instead of calling a method to rebuild all indexes limit it to rebuild the built-in Umbraco indexes so our custom index can be skipped.

@AndyButland AndyButland added type/feature state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks labels Mar 8, 2024
@AndyButland
Copy link

We've added some configuration to allow control over this now @jemayn, that you'll find in the upcoming 13.1 release (due out on Tuesday). You can read about the configuration details in this pending PR for the documentation.

Once published, it will be available here.

@jemayn
Copy link
Author

jemayn commented Mar 19, 2024

Hey @AndyButland

Thanks for the swift action! I just have a few questions about the implications of this:

  1. Why was it nessecary to do a full rebuild of all indexes on these operations?
  2. If we disable it, will the externalindex not have the updates from the, for example, partial restore?
    I would have assumed that as it adds the changes and publishes them the regular index events that run on cacherefreshing would catch the changes and reindex the relevant nodes?

In short these new options seems a bit nuclear, and I want to make sure I understand their implications before applying them..
Surely there is a reason this is not the default behaviour, but I can't figure out why 🙂

@ronaldbarendse
Copy link

Hi @jemayn, let me try to answer your questions:

  1. By default background tasks are suspended by Deploy to avoid deadlocks/timeouts and improve performance during deployments. This is fine for scheduled publishing, as that runs each minute and will catch up on any missed publish dates after the suspension is resumed again. However, when the Examine indexers are suspended, all updates are discarded and the CMS only tracks whether an update (to any index) was tried and if so, triggers a rebuild of all indexes to ensure they are up-to-date again. So quite frankly, it's not necessary and will actually hurt performance if you have large indexes (or additional ones like in your case).
  2. By disabling the Examine suspension, any changes should indeed be (re)indexed via the normal cache refresher notifications that get triggered after the deployment operation completes. Because Deploy saves and publishes multiple content items in a single scope/transaction (and sometimes needs to save the same item multiple times), processing all these individual updates might take more time than a complete index rebuild and lead to more index fragmentation. However, starting with CMS v12, these updates are batched together and (for some types also) deduplicated (see PR Support publishing multiple notifications, filter on handler type and add IDistributedCacheNotificationHandler<TNotification> (optimizes cache refreshers) Umbraco-CMS#14332), so these downsides should be greatly reduced now.

We've opted for making this configurable in Deploy, because it doesn't require additional changes in the CMS (e.g. to track which indexes require a rebuild due to discarded updates), gives users more control over the deployment operations and potentially allows a smooth transition to removing some of these suspensions by default in the future...

Setting all suspensions to None should make Deploy behave similarly as doing e.g. a 'Publish with descendants' in the CMS, as that won't suspend anything and still require (re)indexing potentially a lot of content nodes in a single operation. I do recommend testing both the performance and whether the indexes are correctly updated after applying this setting though (and please report your findings here) 😄

@AndyButland
Copy link

Just to add one point to what @ronaldbarendse says... if you just want to resolve the issue with your Examine index, but stick as close as possible to Deploy's default processing and just omit the suspension and resumption of Examine indexing on partial restores, you can use the following config:

  "Suspensions": {
    "PartialRestore": "ScheduledPublishing, DocumentCache"
  }

@jemayn
Copy link
Author

jemayn commented Mar 21, 2024

Hey Ronald and Andy,

Thank you so much for the great explanation! Will do some testing with the new settings as soon as I get some time back on the affected project. Will make sure to respond back if I encounter any problems, but it sounds like a good way around my issue 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/13.1.0 state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks type/feature
Projects
None yet
Development

No branches or pull requests

3 participants