# Feed data

> Synchronous and asynchronous feeding 


In [2]:
# this is a hidden cell. It will not show on the documentation HTML.
import os
from vespa.package import VespaDocker
from vespa.gallery import QuestionAnswering

app_package = QuestionAnswering()

disk_folder = os.path.join(os.getenv("WORK_DIR"), "sample_application")
vespa_docker = VespaDocker(
    port=8081, 
    container_memory="8G", 
    disk_folder=disk_folder # requires absolute path
)
app = vespa_docker.deploy(application_package=app_package)

Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for configuration server.
Waiting for application status.
Waiting for application status.
Finished deployment.


We will use the [question answering (QA) app](https://pyvespa.readthedocs.io/en/latest/use_cases/qa/semantic-retrieval-for-question-answering-applications.html) to demonstrate ways to feed data to an application. We start by downloading sample data.

In [3]:
import json, requests

sentence_data = json.loads(
    requests.get("https://data.vespa.oath.cloud/blog/qa/sample_sentence_data.json").text
)
list(sentence_data[0].keys())

['text', 'dataset', 'questions', 'context_id', 'sentence_embedding']

We can either feed a batch of data for convenience or feed individual data points for increased control. We can also choose between synchronous and asynchronous feeding. Given that feeding is I/O bound, we expect the asynchronous method to speed up most cases.

We assume that `app` holds a [Vespa](../../reference-api.rst#vespa.application.Vespa) connection instance to the desired Vespa application.

## Batch feeding

We need to prepare the data as a list of dicts having the `id` key holding a unique id of the data point and the `fields` key holding a dict with the data fields.

In [4]:
batch_feed = [
    {
        "id": idx, 
        "fields": sentence
    }
    for idx, sentence in enumerate(sentence_data)
]

### Synchronous

In [5]:
response = app.feed_batch(schema="sentence", batch=batch_feed)

### Asynchronous

In [6]:
response = await app.feed_batch(schema="sentence", batch=batch_feed, asynchronous=True)

<div class="alert alert-info">

**Note**: The **await** keyword is required when batch feeding asynchronously from Jupyter Notebooks because it already has its async event loop running in the background. You can skip the **await** keyword when using it on an environment with no running event loop, and pyvespa will take care of the rest.

</div>

## Feed individual data points

### Synchronous

Syncronously feeding individual data points is similar to batch feeding, except that you have more control when looping through your dataset.

In [7]:
response = []
for idx, sentence in enumerate(sentence_data):
    response.append(
        app.feed_data_point(schema="sentence", data_id=idx, fields=sentence)
    )

### Asynchronous

`app.asyncio()` returns a `VespaAsync` instance that contains async operations such as `feed_data_point`. Using the `async with`  context manager ensures that we open and close the appropriate connections required for async feeding. 

In [8]:
from asyncio import create_task, wait, ALL_COMPLETED

async with app.asyncio() as async_app:
    feed = []
    for idx, sentence in enumerate(sentence_data):
        feed.append(
            create_task(
                async_app.feed_data_point(
                    schema="sentence",
                    data_id=idx,
                    fields=sentence,
                )
            )
        )
    await wait(feed, return_when=ALL_COMPLETED)
    response = [x.result() for x in feed]

We can then use asyncio constructs like `create_task` and `wait` to create different types of asynchronous flows like the one above.

<div class="alert alert-info">

**Note**: The code above runs from a Jupyter Notebook because it already has its async event loop running in the background. You must create your event loop when running this code on an environment without one, just like any asyncio code requires.
</div>

In [9]:
# this is a hidden cell. It will not show on the documentation HTML.
from shutil import rmtree

rmtree(disk_folder, ignore_errors=True)
vespa_docker.container.stop()
vespa_docker.container.remove()