Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examine indexes not written to temp folder in Azure #15783

Open
mortenbock opened this issue Feb 27, 2024 · 20 comments
Open

Examine indexes not written to temp folder in Azure #15783

mortenbock opened this issue Feb 27, 2024 · 20 comments
Labels

Comments

@mortenbock
Copy link
Contributor

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

12.0.1

Bug summary

We're deploying to a scaled out publisher/subscriber setup on Azure, and we're seeing issues with corrupted Examine indexes.

Specifics

We're setting this value for the publisher role, which has a single instance:

Umbraco:CMS:Examine:LuceneDirectoryFactory=SyncedTempFileSystemDirectoryFactory

Looking at the LuceneIndexFolder value for the index (see screenshot), it looks like it should be in c:\local\temp\examineindexes\6ea291505a94cafbad209a90085b57bf\externalindex, however, when navigating to c:\local\temp through Kudu, there are no examine related folders there.

The indexes DO however get created in the C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes folder, and we are seeing exceptions like this when editors are working in the backoffice:

invalid deletion count: 3 vs docCount=1 (resource: BufferedChecksumIndexInput(SimpleFSIndexInput(path="C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes\ExternalIndex\segments_14j")))

Steps to reproduce

image

Expected result / actual result

Should Umbraco not be writing indexes to the c:\local\temp folder, and then syncing back to the other folder?

Copy link

Hi there @mortenbock!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

  • We'll assess whether this issue relates to something that has already been fixed in a later version of the release that it has been raised for.
  • If it's a bug, is it related to a release that we are actively supporting or is it related to a release that's in the end-of-life or security-only phase?
  • We'll replicate the issue to ensure that the problem is as described.
  • We'll decide whether the behavior is an issue or if the behavior is intended.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@bielu
Copy link
Contributor

bielu commented Feb 27, 2024

@mortenbock there is no possiblity to see index in temp folder through KUDU. https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system
There are ways to disable isolation, but they might or not work in azure depends on mood of azure :)

@mortenbock
Copy link
Contributor Author

@bielu Ok, I guess it makes sense that they are not showing in the Kudu portal. I don't understand why I'm seeing exceptions for writing to the C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes index folder though.

@bielu
Copy link
Contributor

bielu commented Feb 27, 2024

is this expection on subscriber or publisher?
As it subscribers should be in temp directory configuration, when publisher should be on sync temp :)

Just noticed you setup in this way on publisher - not sure than

@mortenbock
Copy link
Contributor Author

What is the reason for using the SyncedTempFileSystemDirectoryFactory on the publisher instance? If the indexes from the local temp folder are synced to the wwwroot temp folder, then when are they consumed from the wwwroot temp folder?

Is it just a way to reduce startup time if the azure instance is destroyed and the service starts on a new instance?

@bielu
Copy link
Contributor

bielu commented Feb 28, 2024

@mortenbock I dont't remember but I think it is related to stability, in case if app restarts when writing etc

@nul800sebastiaan
Copy link
Member

Just wanted to give some background info.

I don't understand why I'm seeing exceptions for writing to the C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes index folder though.

Synced is in the name, and is defined as:

The index will operate on a local index created in the processes %temp% location and will replicate back to main storage in

And why this is needed: https://docs.umbraco.com/umbraco-cms/fundamentals/setup/server-setup/load-balancing/file-system-replication#examine-directory-factory-options

This setting is needed because Lucene has issues when working from a remote file share so the files need to be read/accessed locally. Any time the index is updated, this setting will ensure that both the locally created indexes and the normal indexes are written to. This will ensure that when the app is restarted or the local environment temp files are cleared out that the index files can be restored from the centrally stored index files.

@mortenbock
Copy link
Contributor Author

@nul800sebastiaan thank you for the update.

This should mean, that if I'm ok with rebuilding indexes on restarts, then I could just not use the Sync version of the directory factory on my publisher instance?

We're currently using deployment slots, so in any case, what is in the main storage would be outdated when we deploy, and would need rebuilding anyway.

@bielu
Copy link
Contributor

bielu commented Feb 29, 2024

@mortenbock you shouldn't use deployment slots on publisher instance. it is basically load balancing

@mortenbock
Copy link
Contributor Author

@bielu We're moving away from it, but it's where we are now. We stop the staging slot when not deploying, so there will only be two instances for a short time, which would also be the case when instances are moved around internally by azure.

@bielu
Copy link
Contributor

bielu commented Feb 29, 2024

@mortenbock which is enough to cause issues with examine (i sadly experience it sadly)

@mortenbock
Copy link
Contributor Author

@bielu Yes. Current strategy is to delete any existing indexes when deploying, and rebuilding them.

@nul800sebastiaan
Copy link
Member

This should mean, that if I'm ok with rebuilding indexes on restarts, then I could just not use the Sync version of the directory factory on my publisher instance?

I am actually not expert enough to give a good answer to that, I think I'd be afraid that some of my instances would be marked as the subscriber and not generate indexes at all. I'm sure that can be fixed by forcing it to get the publisher role though.

@mortenbock
Copy link
Contributor Author

mortenbock commented Feb 29, 2024

@nul800sebastiaan We have explicit role selection configured, so the subscribers should never be able to become publishers.

Subscribers already use the non sync temp directory factory, as per the recommended load balancing docs, so they should be fine.

This is azure, so the subscribers are on a separate app service, and do not share the file system with the publisher.

@kevinsteffer
Copy link

I maybe have some discoveries that could lead you in a right direction regarding not being able to see the Examine index files and folders on your frontend instances (Subscribers).

When you login to KUDU the KUDU App itself doesn't share the same filesystem og C:\local\temp as the Web App does.
That's why you can login to KUDU and see any ExamineIndexes at the location of the %TEMP% env. variable which right now is C:\local\temp. I wrote an ApiController endpoint that would send me the location of %TEMP% on the frontend instances and list it's content - and through the ApiController endpoint it returns C:\local\Temp\ExamineIndexes and all the index folders and files.

Reference: https://learn.microsoft.com/en-us/azure/app-service/operating-system-functionality#file-access-across-multiple-instances and you can read more about Files Systems in Web Apps here https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system

As part of ExamineIndexes in a load balance setup on the Subscribers and single instance on the BackOffice app I managed to have it working fine with SyncedTempFileSystemDirectoryFactory on the BackOffice App and TempFileSystemDirectoryFactory on the frontend App (Subscribers).
I run with Explicit Server Roles through some custom code that is controlled by and Appsetting configuration.

But since I've disabled the BackOffice and its endpoint on the Frontend App - I had no visual of the indexes being rebuild and when. But that's what my ApiController endpoint can tell me.

@p-m-j
Copy link
Contributor

p-m-j commented Apr 19, 2024

I think I'd be afraid that some of my instances would be marked as the subscriber and not generate indexes at all.

@nul800sebastiaan if the azure load balancing guidance is being followed for frontend servers (i.e. use TempFileSystemDirectoryFactory) each instance will always have its own copy of the lucene index and those indexes will live at %TEMP%/ExamineIndexes/{Some Unique Id} which is not shared between instances.

This setup is required as the default path /home/site/wwwroot/umbraco/Data/TEMP/ExamineIndexes IS shared between all instances of an app service and lucene locking doesn't play nice on that shared file system.

On the admin server which according to the guidance should be single instance only it is safe to sync from /home/site/wwwroot as no other process should be able to write there.

This should mean, that if I'm ok with rebuilding indexes on restarts, then I could just not use the Sync version of the directory factory on my publisher instance?

@mortenbock Yes it's a performance optimization to load the synced copy rather than rebuilding the index from scratch, if it's OK to rebuild for frontend instances it is OK for the backoffice instance also.

you shouldn't use deployment slots on publisher instance. it is basically load balancing

@bielu In theory this should be completely fine to do as the new slot should wait to aquire main dom status from the old slot before any lucene related stuff takes place.

It's probably pragmatic advice if you have a standalone admin server that is only used for backoffice traffic, but it's perfectly reasonable to send public/anon traffic to the admin server as well as any aditional subscriber servers in which case if you still want zero downtime deployments you have to do something and a deployment slot sounds like a reasonable approach.

@bielu
Copy link
Contributor

bielu commented Apr 19, 2024

@bielu In theory this should be completely fine to do as the new slot should wait to aquire main dom status from the old slot before any lucene related stuff takes place.
It's probably pragmatic advice if you have a standalone admin server that is only used for backoffice traffic, but it's perfectly reasonable to send public/anon traffic to the admin server as well as any aditional subscriber servers in which case if you still want zero downtime deployments you have to do something and a deployment slot sounds like a reasonable approach.

my experience telling opposite :) but yeah in theory it would be fine, but this fix it:
#15571
So in theory it should be all working now ;p

@Shazwazza
Copy link
Contributor

Shazwazza commented Apr 26, 2024

Just wanted to give some background info.

I don't understand why I'm seeing exceptions for writing to the C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes index folder though.

Synced is in the name, and is defined as:

The index will operate on a local index created in the processes %temp% location and will replicate back to main storage in

And why this is needed: https://docs.umbraco.com/umbraco-cms/fundamentals/setup/server-setup/load-balancing/file-system-replication#examine-directory-factory-options

This setting is needed because Lucene has issues when working from a remote file share so the files need to be read/accessed locally. Any time the index is updated, this setting will ensure that both the locally created indexes and the normal indexes are written to. This will ensure that when the app is restarted or the local environment temp files are cleared out that the index files can be restored from the centrally stored index files.

All of this is discussed in depth in my CG talk a couple years ago https://youtu.be/qXKGVjTlEOk?si=uq7UQ9J5Ka4lTp-j

The 'Synced' directory is there to avoid performance implications of rebuilding indexes on startup when a site is moved to another worker in Azure. That is the only reason it exists, and it can only be used on your Primary node. If you aren't load balancing and don't care about this performance hit, than change it to Temp (local only storage). If you think that there isn't any overhead, think about this: If you scale out to +5 nodes in a load balancing setup, that means that 5x nodes will be performing index rebuilds around the same time, this means that your DB is going to get pummeled by queries to build all of those new indexes. The performance hit isn't the index building - it is the DB queries and this can lead to DB locks and lead to the dreaded SQL Lock Timeout issue in the back office. Plus, if search is critical to your front-end, than for a while after your site has started up, there won't be any index which means there won't be any search until the background processing is done. Many of these reasons is why ExamineX was created.

@mortenbock
Copy link
Contributor Author

If you think that there isn't any overhead, think about this: If you scale out to +5 nodes in a load balancing setup, that means that 5x nodes will be performing index rebuilds around the same time, this means that your DB is going to get pummeled by queries to build all of those new indexes.

The official guidance is to use the TempEnv file system for replicas, so the scale out issue will be there whether or not we use the Synced provider on the primary server.

PS: The youtube link sent me to an ad, but I did manage to find the talk :D

@Shazwazza
Copy link
Contributor

Yep that's absolutely right, scaling out will always have that issue, or if your replicas are ever moved between workers. I'll check the link (dang youtube!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants