Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facet and Stat Refactor #94

Merged
merged 25 commits into from
Mar 16, 2023
Merged

Facet and Stat Refactor #94

merged 25 commits into from
Mar 16, 2023

Conversation

mdavis95
Copy link
Contributor

@mdavis95 mdavis95 commented Mar 9, 2023

Closes #94

  • Improve facet and facet stat performance

    • Some tests show 30-40x improvement in large indexes with many facets
  • Store facets in new binary doc values field that is allows high performance when there are many facet fields in the index that are not used in every query

    • Fall back to old facet handling in existing indexes (but at a high performance penalty)
    • Format is [dimOrdinal dimOrdinalLength ordinalVal0 .. ordinalValN]* that allow facets dimensions that are not requested ordinalValues to be skipped without converting them from bytes to integers
  • Unify count request, stat, and stat facet logic into an aggregation handler

    • OOP cleanup of facet and stat logic
    • change facet indexing and searching to not use the lucene facet config helper class
  • Use koloboke hash maps for high performance faceting

  • bump cache size to 32 million for facet indexing (in lucene 9.4 it was unlimited, defaults to 4k in lucene 9.5)

  • Avoid unnecessarily conversion of Mongo document to and from bytes using a document container class

  • Optimize ResultHelper by removing string split and replacing with a indexOf/substring combo

    • Add more test cases for Result Helper
  • bump Junit version from 5.5.2 to 5.9.2

  • default zuliarestore to skip associated files and give option to skip, skip existing, or overwrite

  • remove mongo database between test runs to ensure test stability

  • make zuliadump use createDirectories instead of createDirectory to create recursively

…n early testing 30x faster

switch to bytebuffer based faceting approach
refactor packages inside of search
@mdavis95 mdavis95 merged commit 2d4a82b into main Mar 16, 2023
@mdavis95 mdavis95 deleted the new_facets branch March 16, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants