fix(core): Optimize search index update queries #2808

carathorys · 2024-04-24T15:59:23Z

Description

The current procedure of the Search Index update is loading all the product variants, all their products, collections, facetValues, and facet relations.
This data would be huge by definition, but these data usually have translations as well.
In my example, it was loading 40 million rows for ~4900 products on 4 channels, and 4 languages.
Of course, I was trying to reindex only 1 channel with all the products, but after a while, and 16GB of RAM, the node process exited with a memory allocation error message.

What I've changed is the following: On any search index update I'm

loading only those relations, and/or translations, which are really necessary (turned off the eager relation loading)
this means that we're loading only those translations for a channel update, which are defined for the given channel

Breaking changes

I'm not aware of any

Checklist

📌 Always:

I have set a clear title
My PR is small and contains a single feature
I have checked my own PR

👍 Most of the time:

I have added or updated test cases
I have updated the README if needed

netlify · 2024-04-24T15:59:41Z

✅ Deploy Preview for effervescent-donut-4977b2 canceled.

Name	Link
🔨 Latest commit	`a364c57`
🔍 Latest deploy log	https://app.netlify.com/sites/effervescent-donut-4977b2/deploys/6634c799692f310008173ccf

Signed-off-by: carathorys <gallayb@gmail.com>

michaelbromley · 2024-04-25T12:18:19Z

This approach is good. The reason I think this has not come up until now is that I suspect most projects working with very large data sets are using the ElasticsearchPlugin or some other search integration which is more memory-efficient when building the index.

carathorys · 2024-04-25T14:45:18Z

@michaelbromley The main issue I think is with typeorm:
When you try to load a one-to-many relation for a product (like translations), you'll receive as many rows as many translations you have.
I also noticed that the count() results in a good number of products, but since we had many translations, many facetValue, and collection relations for all the products, instead of the 4900 rows I've ended up with a query which results that 20-40-80 million rows result.
Once it was fetched, TypeORM merged those objects (or at least it tried it).
We also noticed that there is some kind of error there, so we used a local patch to reduce the batch size to 100, and it worked, when we had only one channel.
Now we're trying to introduce more channels, with more languages, and that's why we faced with this issue.

carathorys · 2024-05-02T13:44:38Z

@michaelbromley I think I've fixed the E2E tests, but I've seen some strange behaviors for the translations, I hope I fixed it, and didn't made any mistakes:
Now, since we have an availableLanguageCode on the channel, only those translations will be added to any channel index, which are available for that particular channel.
Also, I made the 'fallback' language to come from the this.configService.defaultLanguageCode, but it might still fail to re-index, or update the search index, if we don't have a proper setup, eg.:
Default language would be en, the available channel language would be de, and we have translation only for en_GB, the system will fail, because I'm not loading the en_GB translations for the channel, where it is not allowed to use.
I hope this will work for everyone! :)

carathorys · 2024-05-02T13:46:34Z

I've updated the E2E tests where it was necessary, and picked the remaining checkboxes (I'm not sure where to update the readme, if it is needed at all)

michaelbromley · 2024-05-03T06:32:53Z

Hi!
Thanks for your effort here 👍
Looks like there is a conflict in package-lock.json - could you sort this out so that the CI can run?

carathorys · 2024-05-03T08:08:38Z

Done!
Accidentally I've added, and commited @vitest/ui package (it was easier to track the E2E test results), now I've removed, and updated the package-lock.json from the current master.

michaelbromley · 2024-05-08T06:28:06Z

Thank you!

carathorys force-pushed the master branch 3 times, most recently from 5c57fde to 0dd8ec3 Compare April 25, 2024 03:00

fix(core): Optimize search index update queries

98f25ea

Signed-off-by: carathorys <gallayb@gmail.com>

carathorys force-pushed the master branch from 0dd8ec3 to 98f25ea Compare April 25, 2024 03:02

fix(core): Filter collections for the current channel

b12d4d4

carathorys added 3 commits April 26, 2024 08:45

fix(core): Load default language translations

b7582f0

fix(core): Try to fit into the E2E tests

ffd85e1

fix(core): Fix codes to fit with E2E tests

a5d6e47

carathorys marked this pull request as ready for review May 2, 2024 13:48

carathorys added 2 commits May 3, 2024 10:04

chore: Remove extra @vitest/ui package added by a mistake

6d7fa90

chore: Merge remote-tracking branch 'vendure/master'

fa50c37

carathorys added 2 commits May 3, 2024 12:23

fix(core): Optimize loading of translations upon an update

0363641

fix(core): Fix queries for mariadb & mysql

a364c57

michaelbromley merged commit e83dfc6 into vendure-ecommerce:master May 8, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): Optimize search index update queries #2808

fix(core): Optimize search index update queries #2808

carathorys commented Apr 24, 2024 •

edited

Loading

netlify bot commented Apr 24, 2024 •

edited

Loading

michaelbromley commented Apr 25, 2024

carathorys commented Apr 25, 2024

carathorys commented May 2, 2024

carathorys commented May 2, 2024

michaelbromley commented May 3, 2024

carathorys commented May 3, 2024

michaelbromley commented May 8, 2024

fix(core): Optimize search index update queries #2808

fix(core): Optimize search index update queries #2808

Conversation

carathorys commented Apr 24, 2024 • edited Loading

Description

Breaking changes

Checklist

netlify bot commented Apr 24, 2024 • edited Loading

✅ Deploy Preview for effervescent-donut-4977b2 canceled.

michaelbromley commented Apr 25, 2024

carathorys commented Apr 25, 2024

carathorys commented May 2, 2024

carathorys commented May 2, 2024

michaelbromley commented May 3, 2024

carathorys commented May 3, 2024

michaelbromley commented May 8, 2024

carathorys commented Apr 24, 2024 •

edited

Loading

netlify bot commented Apr 24, 2024 •

edited

Loading