Skip to content

A little optimization about BKDReader #14717

Open
@kkewwei

Description

@kkewwei

Description

1.Reduce the match times for BKD node whose relation=CELL_CROSSES_QUERY.

When performing range query, if the Relation is CELL_CROSSES_QUERY, we need to traverse each value in this leaf node and matches. A segment query typically involves two leaf nodes with Relation set to CELL_CROSSES_QUERY, leading to 512×2 matches.

In reality, the 512 values on each BKD leaf node are stored in a compressed format, such as "30 consecutive 1, 30 consecutive 2, 30 consecutive 3.....". Notably, matching for 30 consecutive 1 only requires 1 times, instead of 30. With this optimization, the BKD tree only needs a few matching rather than 512 per leaf.

For a shard with 25 segments, the original approach would involve 25×512×2 = 25,600 matching operations. Through the above optimization, however, a lot of reduction in matching can be achieved.

visitor.visit(scratchIterator, scratchPackedValue);

......
scratchIterator.reset(i, length);
 if (visitor.compare(scratchPackedValue, scratchPackedValue) == PointValues.Relation.CELL_INSIDE_QUERY) {
          visitor.visit(new IntsRef(scratchIterator.docIDs, i, length));
}
......

2.When calling PointValues.intersect(), if there is no matching node left, we should exit promptly.

For example, in the following case, only leaf node 7 matches. In reality, we would visit nodes from 1 to 13.

  • When visiting leaf node 7: relation = CELL_CROSSES_QUERY
  • When visiting leaf node 8: relation = CELL_OUTSIDE_QUERY

Once the relation changes to CELL_OUTSIDE_QUERY in node 8, we should exit immediately instead of continuing to visit nodes 9 to 13.
Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions