Skip to content

[Derived Source] Add support for deriving source field in FieldMapper #17073

Closed
@rayshrey

Description

@rayshrey

Is your feature request related to a problem? Please describe

docValues are a columnar storage format used by Lucene to store indexed data in a way that facilitates efficient aggregation, sorting etc.
Stored fields, on the other hand, are used to store the actual values of fields as they were inserted into the index.

Currently the _source field stores the original documents as stored field. We can possibly skip storing the _source field in cases where the values are already store in docValues and retrieve the field values from docValue instead. This will help in reducing the storage cost significantly. Based on the nature of the query we can skip or fetch some or all of the fields from docValues to serve the search queries.

Describe the solution you'd like

Add a new parameter in MappedFieldType to indicate whether deriving source is supported for the filed or not, something similar to below

private final boolean derivedSourceSupported;

public boolean isDerivedSourceSupported() {
    return derivedSourceSupported;
}

Add a new method in FieldMapper with the following signature

public void buildDerivedSource(XContentBuilder builder, LeafReader leafReader, int docId) throws IOException {
        // Implement this method in respective Mappers
}

We will implement this method in all the sub-classes where we will be supporting deriving source.
Usage - will call this method in the Search or Get path and pass the leafReader(to read the docValues from) and docId(for which we need the docValues). We will also be passing a XContentBuilder to which we will be adding the source after deriving. Will be calling this method for all the FieldMappers that we have and will finally set the source to the builder object that we will get.

Related component

Indexing:Performance

Describe alternatives you've considered

N/A

Additional context

#9568 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions