Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Optimize search index update queries #2808

Merged
merged 9 commits into from
May 8, 2024

Conversation

carathorys
Copy link
Contributor

@carathorys carathorys commented Apr 24, 2024

Description

The current procedure of the Search Index update is loading all the product variants, all their products, collections, facetValues, and facet relations.
This data would be huge by definition, but these data usually have translations as well.
In my example, it was loading 40 million rows for ~4900 products on 4 channels, and 4 languages.
Of course, I was trying to reindex only 1 channel with all the products, but after a while, and 16GB of RAM, the node process exited with a memory allocation error message.

What I've changed is the following: On any search index update I'm

  • loading only those relations, and/or translations, which are really necessary (turned off the eager relation loading)
  • this means that we're loading only those translations for a channel update, which are defined for the given channel

Breaking changes

  • I'm not aware of any

Checklist

📌 Always:

  • I have set a clear title
  • My PR is small and contains a single feature
  • I have checked my own PR

👍 Most of the time:

  • I have added or updated test cases
  • I have updated the README if needed

Copy link

netlify bot commented Apr 24, 2024

Deploy Preview for effervescent-donut-4977b2 canceled.

Name Link
🔨 Latest commit a364c57
🔍 Latest deploy log https://app.netlify.com/sites/effervescent-donut-4977b2/deploys/6634c799692f310008173ccf

@carathorys carathorys force-pushed the master branch 3 times, most recently from 5c57fde to 0dd8ec3 Compare April 25, 2024 03:00
Signed-off-by: carathorys <gallayb@gmail.com>
@michaelbromley
Copy link
Member

This approach is good. The reason I think this has not come up until now is that I suspect most projects working with very large data sets are using the ElasticsearchPlugin or some other search integration which is more memory-efficient when building the index.

@carathorys
Copy link
Contributor Author

@michaelbromley The main issue I think is with typeorm:
When you try to load a one-to-many relation for a product (like translations), you'll receive as many rows as many translations you have.
I also noticed that the count() results in a good number of products, but since we had many translations, many facetValue, and collection relations for all the products, instead of the 4900 rows I've ended up with a query which results that 20-40-80 million rows result.
Once it was fetched, TypeORM merged those objects (or at least it tried it).
We also noticed that there is some kind of error there, so we used a local patch to reduce the batch size to 100, and it worked, when we had only one channel.
Now we're trying to introduce more channels, with more languages, and that's why we faced with this issue.

@carathorys
Copy link
Contributor Author

@michaelbromley I think I've fixed the E2E tests, but I've seen some strange behaviors for the translations, I hope I fixed it, and didn't made any mistakes:
Now, since we have an availableLanguageCode on the channel, only those translations will be added to any channel index, which are available for that particular channel.
Also, I made the 'fallback' language to come from the this.configService.defaultLanguageCode, but it might still fail to re-index, or update the search index, if we don't have a proper setup, eg.:
Default language would be en, the available channel language would be de, and we have translation only for en_GB, the system will fail, because I'm not loading the en_GB translations for the channel, where it is not allowed to use.
I hope this will work for everyone! :)

@carathorys
Copy link
Contributor Author

I've updated the E2E tests where it was necessary, and picked the remaining checkboxes (I'm not sure where to update the readme, if it is needed at all)

@carathorys carathorys marked this pull request as ready for review May 2, 2024 13:48
@michaelbromley
Copy link
Member

Hi!
Thanks for your effort here 👍
Looks like there is a conflict in package-lock.json - could you sort this out so that the CI can run?

@carathorys
Copy link
Contributor Author

Done!
Accidentally I've added, and commited @vitest/ui package (it was easier to track the E2E test results), now I've removed, and updated the package-lock.json from the current master.

@michaelbromley michaelbromley merged commit e83dfc6 into vendure-ecommerce:master May 8, 2024
16 checks passed
@michaelbromley
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants