In [1]:
# this is a hidden cell. It will not show on the documentation HTML.
import os
from vespa.package import VespaDocker
from vespa.gallery import QuestionAnswering

app_package = QuestionAnswering()

disk_folder = os.path.join(os.getenv("WORK_DIR"), "sample_application")
vespa_docker = VespaDocker(
    port=8081, 
    disk_folder=disk_folder # requires absolute path
)
app = vespa_docker.deploy(application_package=app_package)

Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for application status.
Waiting for application status.
Finished deployment.


# Exchange data with applications

> Feed, get, update and delete operations

We will use the [question answering (QA) app](https://pyvespa.readthedocs.io/en/latest/use_cases/qa/semantic-retrieval-for-question-answering-applications.html) to demonstrate ways to feed data to an application. We start by downloading sample data.

In [2]:
import json, requests

sentence_data = json.loads(
    requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data_100.json").text
)
list(sentence_data[0].keys())

['text', 'dataset', 'questions', 'context_id', 'sentence_embedding']

We assume that `app` holds a [Vespa](../../reference-api.rst#vespa.application.Vespa) connection instance to the desired Vespa application.

## Feed data

We can either feed a batch of data for convenience or feed individual data points for increased control.

### Batch

We need to prepare the data as a list of dicts having the `id` key holding a unique id of the data point and the `fields` key holding a dict with the data fields.

In [3]:
batch_feed = [
    {
        "id": idx, 
        "fields": sentence
    }
    for idx, sentence in enumerate(sentence_data)
]

We then feed the batch to the desired schema using the [feed_batch](../../reference-api.rst#feed-batch) method.

In [4]:
response = app.feed_batch(schema="sentence", batch=batch_feed)

### Individual data points

#### Synchronous

Syncronously feeding individual data points is similar to batch feeding, except that you have more control when looping through your dataset.

In [5]:
response = []
for idx, sentence in enumerate(sentence_data):
    response.append(
        app.feed_data_point(schema="sentence", data_id=idx, fields=sentence)
    )

#### Asynchronous

`app.asyncio()` returns a `VespaAsync` instance that contains async operations such as `feed_data_point`. Using the `async with`  context manager ensures that we open and close the appropriate connections required for async feeding. 

In [6]:
async with app.asyncio() as async_app:
    response = await async_app.feed_data_point(
        schema="sentence",
        data_id=idx,
        fields=sentence,
    )

We can then use asyncio constructs like `create_task` and `wait` to create different types of asynchronous flows like the one below.

In [7]:
from asyncio import create_task, wait, ALL_COMPLETED

async with app.asyncio() as async_app:
    feed = []
    for idx, sentence in enumerate(sentence_data):
        feed.append(
            create_task(
                async_app.feed_data_point(
                    schema="sentence",
                    data_id=idx,
                    fields=sentence,
                )
            )
        )
    await wait(feed, return_when=ALL_COMPLETED)
    response = [x.result() for x in feed]

<div class="alert alert-info">

**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.
</div>

## Get data

Similarly to the examples about feeding, we can get a batch of data for convenience or get individual data points for increased control.

### Batch

We need to prepare the data as a list of dicts having the `id` key holding a unique id of the data point. We then get the batch from the desired schema using the [get_batch](../../reference-api.rst#get-batch) method.

In [8]:
batch = [{"id": idx} for idx, sentence in enumerate(sentence_data)]
response = app.get_batch(schema="sentence", batch=batch)

### Individual data points

We can get individual data points synchronously or asynchronously.

#### Synchronous

In [9]:
response = app.get_data(schema="sentence", data_id=0)

#### Asynchronous

In [10]:
async with app.asyncio() as async_app:
    response = await async_app.get_data(schema="sentence",data_id=0)

<div class="alert alert-info">

**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.
</div>

## Update data

Similarly to the examples about feeding, we can update a batch of data for convenience or update individual data points for increased control.

### Batch

We need to prepare the data as a list of dicts having the `id` key holding a unique id of the data point, the `fields` key holding a dict with the fields to be updated and an optional `create` key with a boolean value to indicate if a data point should be created in case it does not exist (default to `False`).

In [11]:
batch_update = [
    {
        "id": idx,           # data_id
        "fields": sentence,  # fields to be updated
        "create": True       # Optional. Create data point if not exist, default to False.
        
    }
    for idx, sentence in enumerate(sentence_data)
]

We then update the batch on the desired schema using the [update_batch](../../reference-api.rst#update-batch) method.

In [12]:
response = app.update_batch(schema="sentence", batch=batch_update)

### Individual data points

We can update individual data points synchronously or asynchronously.

#### Synchronous

In [13]:
response = app.update_data(schema="sentence", data_id=0, fields=sentence_data[0], create=True)

#### Asynchronous

In [14]:
async with app.asyncio() as async_app:
    response = await async_app.update_data(schema="sentence",data_id=0, fields=sentence_data[0], create=True)

<div class="alert alert-info">

**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.
</div>

## Delete data

Similarly to the examples about feeding, we can delete a batch of data for convenience or delete individual data points for increased control.

### Batch

We need to prepare the data as a list of dicts having the `id` key holding a unique id of the data point. We then delete the batch from the desired schema using the [delete_batch](../../reference-api.rst#delete-batch) method.

In [15]:
batch = [{"id": idx} for idx, sentence in enumerate(sentence_data)]
response = app.delete_batch(schema="sentence", batch=batch)

### Individual data points

We can delete individual data points synchronously or asynchronously.

#### Synchronous

In [16]:
response = app.delete_data(schema="sentence", data_id=0)

#### Asynchronous

In [17]:
async with app.asyncio() as async_app:
    response = await async_app.delete_data(schema="sentence",data_id=0)

<div class="alert alert-info">

**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.
</div>

In [18]:
# this is a hidden cell. It will not show on the documentation HTML.
from shutil import rmtree

rmtree(disk_folder, ignore_errors=True)
vespa_docker.container.stop()
vespa_docker.container.remove()