Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExamineIndexRebuilder throws errors sometimes #13804

Open
Zeegaan opened this issue Feb 8, 2023 · 8 comments
Open

ExamineIndexRebuilder throws errors sometimes #13804

Zeegaan opened this issue Feb 8, 2023 · 8 comments
Labels

Comments

@Zeegaan
Copy link
Member

Zeegaan commented Feb 8, 2023

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

10+

Bug summary

This happens because we try to use scope in background threads, so the scope can sometimes not be the ambient scope when we try and dispose it.

Specifics

Line 93 in ExamineIndexRebuilder.cs

_backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // This is a fire/forget task spawned by the background thread queue (which means we
                    // don't need to worry about ExecutionContext flowing).
                    Task.Run(() => RebuildIndexes(onlyEmptyIndexes, delay.Value, cancellationToken));

                    // immediately return so the queue isn't waiting.
                    return Task.CompletedTask;
                });

This ends up using ContentIndexPopulator which in turn uses the IContentService to GetPagedDescendants which creates a scope on line 789.

Steps to reproduce

This was discovered during hardcore testing, but should be able to reproduce to force index rebuilding, by saving lots of content simultaneously.

Expected result / actual result

To not get errors when the examine index is rebuilding.


This item has been added to our backlog AB#26699

@devantler
Copy link
Contributor

@Zeegaan Added the stack error from which we located the issue.

[10:20:02 ERR] Index populating failed for populator Umbraco.Cms.Infrastructure.Examine.PublishedContentIndexPopulator
System.InvalidOperationException: The Scope 86df2890-8624-4b7c-bf60-4ec2bf27b415 being disposed is not the Ambient Scope 01a42a5d-b7aa-4084-a4b5-2d930a76ff2e. This typically indicates that a child Scope was not disposed, or flowed to a child thread that was not awaited, or concurrent threads are accessing the same Scope (Ambient context) which is not supported. If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Cms.Infrastructure.Examine.ContentValueSetBuilder.GetValueSets(IContent[] content)
   at Umbraco.Cms.Infrastructure.Examine.ContentIndexPopulator.IndexPublishedContent(Int32 contentParentId, Int32 pageIndex, Int32 pageSize, IReadOnlyList`1 indexes)
   at Umbraco.Cms.Infrastructure.Examine.ContentIndexPopulator.PopulateIndexes(IReadOnlyList`1 indexes)
   at Umbraco.Cms.Infrastructure.Examine.IndexPopulator.Populate(IIndex[] indexes)
   at Umbraco.Cms.Infrastructure.Examine.ExamineIndexRebuilder.RebuildIndexes(Boolean onlyEmptyIndexes, TimeSpan delay, CancellationToken cancellationToken)
[10:20:02 ERR] Error occurred executing workItem.
System.InvalidOperationException: The Scope f3b51e5b-c1c9-418e-979f-a4856b7733f0 being disposed is not the Ambient Scope 01a42a5d-b7aa-4084-a4b5-2d930a76ff2e. This typically indicates that a child Scope was not disposed, or flowed to a child thread that was not awaited, or concurrent threads are accessing the same Scope (Ambient context) which is not supported. If using Task.Run (or similar) as a fire and forget tasks or to run threads in parallel you must suppress execution context flow with ExecutionContext.SuppressFlow() and ExecutionContext.RestoreFlow().
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Cms.Infrastructure.Examine.ExamineUmbracoIndexingHandler.DeferedReIndexForContent.<>c__DisplayClass6_0.<Execute>b__0(CancellationToken cancellationToken)
   at Umbraco.Cms.Infrastructure.HostedServices.QueuedHostedService.BackgroundProcessing(CancellationToken stoppingToken)

@elit0451 elit0451 added state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks state/reproduced area/backend affected/v10 affected/v11 labels Feb 14, 2023
@matteus-hemstrom-nltg
Copy link

We are experiencing locking issues and suspect that this might be one of the causes. When the exception above is thrown, it appears to prevent the scope from being disposed properly. As a result, this leaves a read lock on -333. All following attempts to save and publish fails to acquire write lock for -333 (so this may be related to #14195). When this happens, we need to stop the faulting Umbraco instance so that the lock is released.

Note our setup: one back office instance and multiple API/subscriber instances. The issue seems to occur through the following steps:

  1. Have back office running
  2. Start an API/subscriber instance
  3. On the subscriber instance: "Queuing background job for RebuildIndexes."
  4. Save & Publish documents (while RebuildIndexes occur) using the back office
  5. On the subscriber instance (sometimes): "The Scope X being disposed is not the Ambient Scope Y"
  6. Save & Publish fails on back office instance until subscriber is stopped

Do you think this might be the issue? And do you have any suggestions/workarounds? Is it possible to turn off Examine indexing on subscriber instances?

@bergmania
Copy link
Member

bergmania commented Dec 5, 2023

Hi @matteus-hemstrom-nltg..

Thanks for reporting.. We are still searching for a potential fix, but we have a hard time reproducing.

Would you be able to test using the custom code, that basically just wraps the execution in ExecutionContext.SuppressFlow()) and ensures a ask.CompletedTask is returned.

The following code should replace the native ExamineIndexRebuilder with this custom one.

using Examine;
using Umbraco.Cms.Core;
using Umbraco.Cms.Core.Composing;
using Umbraco.Cms.Core.Runtime;
using Umbraco.Cms.Core.Services;
using Umbraco.Cms.Infrastructure.Examine;
using Umbraco.Cms.Infrastructure.HostedServices;

namespace WhatEver;

public class TestCodeComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
    {
        builder.Services.AddUnique<IIndexRebuilder, MyExamineIndexRebuilder>();
        builder.Services.AddUnique<ExamineIndexRebuilder, MyExamineIndexRebuilder>();
    }
}

public class MyExamineIndexRebuilder : ExamineIndexRebuilder
{
    private readonly IBackgroundTaskQueue _backgroundTaskQueue;
    private readonly IExamineManager _examineManager;
    private readonly ILogger<ExamineIndexRebuilder> _logger;
    private readonly IMainDom _mainDom;
    private readonly IEnumerable<IIndexPopulator> _populators;
    private readonly object _rebuildLocker = new();
    private readonly IRuntimeState _runtimeState;

    /// <summary>
    ///     Initializes a new instance of the <see cref="ExamineIndexRebuilder" /> class.
    /// </summary>
    public MyExamineIndexRebuilder(
        IMainDom mainDom,
        IRuntimeState runtimeState,
        ILogger<ExamineIndexRebuilder> logger,
        IExamineManager examineManager,
        IEnumerable<IIndexPopulator> populators,
        IBackgroundTaskQueue backgroundTaskQueue)
        : base(mainDom, runtimeState, logger, examineManager, populators, backgroundTaskQueue)
    {
        _mainDom = mainDom;
        _runtimeState = runtimeState;
        _logger = logger;
        _examineManager = examineManager;
        _populators = populators;
        _backgroundTaskQueue = backgroundTaskQueue;
    }

    public bool CanRebuild(string indexName)
    {
        if (!_examineManager.TryGetIndex(indexName, out IIndex index))
        {
            throw new InvalidOperationException("No index found by name " + indexName);
        }

        return _populators.Any(x => x.IsRegistered(index));
    }

    public override void RebuildIndex(string indexName, TimeSpan? delay = null, bool useBackgroundThread = true)
    {
        if (delay == null)
        {
            delay = TimeSpan.Zero;
        }

        if (!CanRun())
        {
            return;
        }

        if (useBackgroundThread)
        {
            _logger.LogInformation("Starting async background thread for rebuilding index {indexName}.", indexName);

            _backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // Do not flow AsyncLocal to the child thread
                    using (ExecutionContext.SuppressFlow())
                    {
                        Task.Run(() => RebuildIndex(indexName, delay.Value, cancellationToken));

                        // immediately return so the queue isn't waiting.
                        return Task.CompletedTask;
                    }
                });
        }
        else
        {
            RebuildIndex(indexName, delay.Value, CancellationToken.None);
        }
    }

    public override void RebuildIndexes(bool onlyEmptyIndexes, TimeSpan? delay = null, bool useBackgroundThread = true)
    {
        if (delay == null)
        {
            delay = TimeSpan.Zero;
        }

        if (!CanRun())
        {
            return;
        }

        if (useBackgroundThread)
        {
            _logger.LogDebug($"Queuing background job for {nameof(RebuildIndexes)}.");

            _backgroundTaskQueue.QueueBackgroundWorkItem(
                cancellationToken =>
                {
                    // Do not flow AsyncLocal to the child thread
                    using (ExecutionContext.SuppressFlow())
                    {
                        Task.Run(() => RebuildIndexes(onlyEmptyIndexes, delay.Value, cancellationToken));

                        // immediately return so the queue isn't waiting.
                        return Task.CompletedTask;
                    }
                });
        }
        else
        {
            RebuildIndexes(onlyEmptyIndexes, delay.Value, CancellationToken.None);
        }
    }

    private bool CanRun() => _mainDom.IsMainDom && _runtimeState.Level == RuntimeLevel.Run;

    private void RebuildIndex(string indexName, TimeSpan delay, CancellationToken cancellationToken)
    {
        if (delay > TimeSpan.Zero)
        {
            Thread.Sleep(delay);
        }

        try
        {
            if (!Monitor.TryEnter(_rebuildLocker))
            {
                _logger.LogWarning(
                    "Call was made to RebuildIndexes but the task runner for rebuilding is already running");
            }
            else
            {
                if (!_examineManager.TryGetIndex(indexName, out IIndex index))
                {
                    throw new InvalidOperationException($"No index found with name {indexName}");
                }

                index.CreateIndex(); // clear the index
                foreach (IIndexPopulator populator in _populators)
                {
                    if (cancellationToken.IsCancellationRequested)
                    {
                        return;
                    }

                    populator.Populate(index);
                }
            }
        }
        finally
        {
            if (Monitor.IsEntered(_rebuildLocker))
            {
                Monitor.Exit(_rebuildLocker);
            }
        }
    }

    private void RebuildIndexes(bool onlyEmptyIndexes, TimeSpan delay, CancellationToken cancellationToken)
    {
        if (delay > TimeSpan.Zero)
        {
            Thread.Sleep(delay);
        }

        try
        {
            if (!Monitor.TryEnter(_rebuildLocker))
            {
                _logger.LogWarning(
                    $"Call was made to {nameof(RebuildIndexes)} but the task runner for rebuilding is already running");
            }
            else
            {
                // If an index exists but it has zero docs we'll consider it empty and rebuild
                IIndex[] indexes = (onlyEmptyIndexes
                    ? _examineManager.Indexes.Where(x =>
                        !x.IndexExists() || (x is IIndexStats stats && stats.GetDocumentCount() == 0))
                    : _examineManager.Indexes).ToArray();

                if (indexes.Length == 0)
                {
                    return;
                }

                foreach (IIndex index in indexes)
                {
                    index.CreateIndex(); // clear the index
                }

                // run each populator over the indexes
                foreach (IIndexPopulator populator in _populators)
                {
                    if (cancellationToken.IsCancellationRequested)
                    {
                        return;
                    }

                    try
                    {
                        populator.Populate(indexes);
                    }
                    catch (Exception e)
                    {
                        _logger.LogError(e, "Index populating failed for populator {Populator}", populator.GetType());
                    }
                }
            }
        }
        finally
        {
            if (Monitor.IsEntered(_rebuildLocker))
            {
                Monitor.Exit(_rebuildLocker);
            }
        }
    }
}

@matteus-hemstrom-nltg
Copy link

Thanks! We'll give the examine indexer above a try.

I tried a build earlier with something similar, however I just removed the Task.Run so that the indexing executed synchronously in the background thread. That worked well, but I was only running it in a test environment.

@bergmania
Copy link
Member

Thanks.. At least that sounds like this area is the smoking gun 🙈

@matteus-hemstrom-nltg
Copy link

@bergmania

We are continuing to monitor the issue, but are very hopeful that the patch will solve it.

We've been running the patch in production for almost four days now. We have not had any permanent locks, so the patch seems very promising. However, I should reiterate what you pointed out earlier: this issue is hard to reproduce consistently. As such, it's possible that we've been lucky in not encountering the bug.

I should also mention that temporary lock failures (on -333) still occur. I believe these should be possible to mitigate by increasing DistributedLockingWriteLockDefaultTimeout, correct?

@bergmania
Copy link
Member

Correct.. The fact that -333 happens is kind of expected doing long running operations. But definitely no locks should be hanging.

@jerpenol
Copy link

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants