Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: expose visit api #711

Closed
PrathamSoni opened this issue Mar 23, 2024 · 3 comments
Closed

FR: expose visit api #711

PrathamSoni opened this issue Mar 23, 2024 · 3 comments
Assignees
Milestone

Comments

@PrathamSoni
Copy link

Expose the visit api as part of the sdk.

Ideal interaction: supply yql query to filter with optional limit/offset. method returns a (async/sync) generator across the documents. raises stop error when either limit is hit/end of generator.

@kkraune kkraune added this to the later milestone Apr 17, 2024
@olaughter
Copy link
Contributor

olaughter commented May 9, 2024

Hi @PrathamSoni I also had a need for this functionality. As a workaround, I've implemented it myself for now using the delete_all_docs method as reference, along with the visit command in the cli, and the documentation for the document/v1 endpoint.

Here is a simplified code snippet if you are interested in taking this approach, (note that this snippet doesn't include any error handling). For your ideal interaction, you could change the value for the selection parameter to add filtering:

from typing import Generator

from vespa.application import Vespa, VespaSync
from vespa.io import VespaResponse


def visit(
    vespa: Vespa,
    cluster: str,
    schema: str,
    namespace: str = None,
    chunk_size: int = 5,
) -> Generator[VespaResponse, None, None]:
    if not namespace:
        namespace = schema

    end_point = "{}/document/v1/{}/{}/docid/".format(
        vespa.end_point,
        namespace,
        schema,
    )

    params = {
        "cluster": cluster,
        "selection": True,
        "wantedDocumentCount": chunk_size,
        "slices": 1,  # No slices because we'll use continuation tokens for pagination
        "sliceId": 0,
    }

    with VespaSync(vespa) as sync_app:
        while True:
            response = sync_app.http_session.get(end_point, params=params)
            result = response.json()
            yield result
            if "continuation" in result.keys():
                params["continuation"] = result["continuation"]
            else:
                break

vespa_instance_url = ...
vespa = Vespa(url=vespa_instance_url)

for chunk in visit(
    vespa=vespa,
    cluster="test_content",
    schema="test",
    chunk_size=5,
):
    print(chunk)

@thomasht86, are you interested in seeing similar functionality get added to pyvespa? I can look into evolving this into an actual contribution (with tests, etc) if so

@thomasht86
Copy link
Collaborator

@olaughter This looks very good, and a contribution for this would be very welcome! 🙏
I think we would want to create a new VespaDocv1Response class in vespa.io for the responses.

Installing dev-dependencies (pip install -e .[dev]) will ensure proper formatting with ruff and pre-commit.

Please let me know if I can speed up your effort in any way! 🚀

@thomasht86
Copy link
Collaborator

Closed in #776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants