FR: expose visit api #711

PrathamSoni · 2024-03-23T21:16:46Z

Expose the visit api as part of the sdk.

Ideal interaction: supply yql query to filter with optional limit/offset. method returns a (async/sync) generator across the documents. raises stop error when either limit is hit/end of generator.

olaughter · 2024-05-09T11:09:55Z

Hi @PrathamSoni I also had a need for this functionality. As a workaround, I've implemented it myself for now using the delete_all_docs method as reference, along with the visit command in the cli, and the documentation for the document/v1 endpoint.

Here is a simplified code snippet if you are interested in taking this approach, (note that this snippet doesn't include any error handling). For your ideal interaction, you could change the value for the selection parameter to add filtering:

from typing import Generator

from vespa.application import Vespa, VespaSync
from vespa.io import VespaResponse


def visit(
    vespa: Vespa,
    cluster: str,
    schema: str,
    namespace: str = None,
    chunk_size: int = 5,
) -> Generator[VespaResponse, None, None]:
    if not namespace:
        namespace = schema

    end_point = "{}/document/v1/{}/{}/docid/".format(
        vespa.end_point,
        namespace,
        schema,
    )

    params = {
        "cluster": cluster,
        "selection": True,
        "wantedDocumentCount": chunk_size,
        "slices": 1,  # No slices because we'll use continuation tokens for pagination
        "sliceId": 0,
    }

    with VespaSync(vespa) as sync_app:
        while True:
            response = sync_app.http_session.get(end_point, params=params)
            result = response.json()
            yield result
            if "continuation" in result.keys():
                params["continuation"] = result["continuation"]
            else:
                break

vespa_instance_url = ...
vespa = Vespa(url=vespa_instance_url)

for chunk in visit(
    vespa=vespa,
    cluster="test_content",
    schema="test",
    chunk_size=5,
):
    print(chunk)

@thomasht86, are you interested in seeing similar functionality get added to pyvespa? I can look into evolving this into an actual contribution (with tests, etc) if so

thomasht86 · 2024-05-10T07:03:31Z

@olaughter This looks very good, and a contribution for this would be very welcome! 🙏
I think we would want to create a new VespaDocv1Response class in vespa.io for the responses.

Installing dev-dependencies (pip install -e .[dev]) will ensure proper formatting with ruff and pre-commit.

Please let me know if I can speed up your effort in any way! 🚀

thomasht86 · 2024-05-28T06:02:34Z

Closed in #776

kkraune added this to the later milestone Apr 17, 2024

kkraune assigned thomasht86 Apr 17, 2024

olaughter mentioned this issue May 16, 2024

Expose visit api #776

Merged

thomasht86 closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: expose visit api #711

FR: expose visit api #711

PrathamSoni commented Mar 23, 2024

olaughter commented May 9, 2024 •

edited

thomasht86 commented May 10, 2024

thomasht86 commented May 28, 2024

FR: expose visit api #711

FR: expose visit api #711

Comments

PrathamSoni commented Mar 23, 2024

olaughter commented May 9, 2024 • edited

thomasht86 commented May 10, 2024

thomasht86 commented May 28, 2024

olaughter commented May 9, 2024 •

edited