-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examine indexes not written to temp folder in Azure #15783
Comments
Hi there @mortenbock! Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better. We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.
We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions. Thanks, from your friendly Umbraco GitHub bot 🤖 🙂 |
@mortenbock there is no possiblity to see index in temp folder through KUDU. https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system |
@bielu Ok, I guess it makes sense that they are not showing in the Kudu portal. I don't understand why I'm seeing exceptions for writing to the |
|
What is the reason for using the Is it just a way to reduce startup time if the azure instance is destroyed and the service starts on a new instance? |
@mortenbock I dont't remember but I think it is related to stability, in case if app restarts when writing etc |
Just wanted to give some background info.
And why this is needed: https://docs.umbraco.com/umbraco-cms/fundamentals/setup/server-setup/load-balancing/file-system-replication#examine-directory-factory-options
|
@nul800sebastiaan thank you for the update. This should mean, that if I'm ok with rebuilding indexes on restarts, then I could just not use the Sync version of the directory factory on my publisher instance? We're currently using deployment slots, so in any case, what is in the main storage would be outdated when we deploy, and would need rebuilding anyway. |
@mortenbock you shouldn't use deployment slots on publisher instance. it is basically load balancing |
@bielu We're moving away from it, but it's where we are now. We stop the staging slot when not deploying, so there will only be two instances for a short time, which would also be the case when instances are moved around internally by azure. |
@mortenbock which is enough to cause issues with examine (i sadly experience it sadly) |
@bielu Yes. Current strategy is to delete any existing indexes when deploying, and rebuilding them. |
I am actually not expert enough to give a good answer to that, I think I'd be afraid that some of my instances would be marked as the subscriber and not generate indexes at all. I'm sure that can be fixed by forcing it to get the publisher role though. |
@nul800sebastiaan We have explicit role selection configured, so the subscribers should never be able to become publishers. Subscribers already use the non sync temp directory factory, as per the recommended load balancing docs, so they should be fine. This is azure, so the subscribers are on a separate app service, and do not share the file system with the publisher. |
I maybe have some discoveries that could lead you in a right direction regarding not being able to see the Examine index files and folders on your frontend instances (Subscribers). When you login to KUDU the KUDU App itself doesn't share the same filesystem og C:\local\temp as the Web App does. Reference: https://learn.microsoft.com/en-us/azure/app-service/operating-system-functionality#file-access-across-multiple-instances and you can read more about Files Systems in Web Apps here https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system As part of ExamineIndexes in a load balance setup on the Subscribers and single instance on the BackOffice app I managed to have it working fine with SyncedTempFileSystemDirectoryFactory on the BackOffice App and TempFileSystemDirectoryFactory on the frontend App (Subscribers). But since I've disabled the BackOffice and its endpoint on the Frontend App - I had no visual of the indexes being rebuild and when. But that's what my ApiController endpoint can tell me. |
@nul800sebastiaan if the azure load balancing guidance is being followed for frontend servers (i.e. use This setup is required as the default path /home/site/wwwroot/umbraco/Data/TEMP/ExamineIndexes IS shared between all instances of an app service and lucene locking doesn't play nice on that shared file system. On the admin server which according to the guidance should be single instance only it is safe to sync from /home/site/wwwroot as no other process should be able to write there.
@mortenbock Yes it's a performance optimization to load the synced copy rather than rebuilding the index from scratch, if it's OK to rebuild for frontend instances it is OK for the backoffice instance also.
@bielu In theory this should be completely fine to do as the new slot should wait to aquire main dom status from the old slot before any lucene related stuff takes place. It's probably pragmatic advice if you have a standalone admin server that is only used for backoffice traffic, but it's perfectly reasonable to send public/anon traffic to the admin server as well as any aditional subscriber servers in which case if you still want zero downtime deployments you have to do something and a deployment slot sounds like a reasonable approach. |
my experience telling opposite :) but yeah in theory it would be fine, but this fix it: |
All of this is discussed in depth in my CG talk a couple years ago https://youtu.be/qXKGVjTlEOk?si=uq7UQ9J5Ka4lTp-j The 'Synced' directory is there to avoid performance implications of rebuilding indexes on startup when a site is moved to another worker in Azure. That is the only reason it exists, and it can only be used on your Primary node. If you aren't load balancing and don't care about this performance hit, than change it to Temp (local only storage). If you think that there isn't any overhead, think about this: If you scale out to +5 nodes in a load balancing setup, that means that 5x nodes will be performing index rebuilds around the same time, this means that your DB is going to get pummeled by queries to build all of those new indexes. The performance hit isn't the index building - it is the DB queries and this can lead to DB locks and lead to the dreaded SQL Lock Timeout issue in the back office. Plus, if search is critical to your front-end, than for a while after your site has started up, there won't be any index which means there won't be any search until the background processing is done. Many of these reasons is why ExamineX was created. |
The official guidance is to use the TempEnv file system for replicas, so the scale out issue will be there whether or not we use the Synced provider on the primary server. PS: The youtube link sent me to an ad, but I did manage to find the talk :D |
Yep that's absolutely right, scaling out will always have that issue, or if your replicas are ever moved between workers. I'll check the link (dang youtube!) |
Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)
12.0.1
Bug summary
We're deploying to a scaled out publisher/subscriber setup on Azure, and we're seeing issues with corrupted Examine indexes.
Specifics
We're setting this value for the publisher role, which has a single instance:
Umbraco:CMS:Examine:LuceneDirectoryFactory=SyncedTempFileSystemDirectoryFactory
Looking at the
LuceneIndexFolder
value for the index (see screenshot), it looks like it should be inc:\local\temp\examineindexes\6ea291505a94cafbad209a90085b57bf\externalindex
, however, when navigating toc:\local\temp
through Kudu, there are no examine related folders there.The indexes DO however get created in the
C:\home\site\wwwroot\umbraco\Data\TEMP\ExamineIndexes
folder, and we are seeing exceptions like this when editors are working in the backoffice:Steps to reproduce
Expected result / actual result
Should Umbraco not be writing indexes to the
c:\local\temp
folder, and then syncing back to the other folder?The text was updated successfully, but these errors were encountered: