# Indexify Invoice Parsing Template

In this notebook, we show you a demo of how you can use Indexify to solve all your data embedding needs. To give a concrete example, we will try to index a set of invoices.

Specifically, we show you how you can
1. spin up a local indexify instance
2. create a new data-repository for a collection of data, in this case invoices
3. bind existing extractors to our data-repository
4. insert invoices
5. how you can search & retrieve invoices

<!-- I feel like the name "repository" is a bit too generic for data repository, it could be confused with extractor repository -->

### (1) Spin up a local indexify instance

First, let's spin up a local indexify instance

```
docker-compose up
```

That's it! This will be running the indexify server, alongside with the extractors that we want for invoice parsing.
In this case, we will be using the [Advanced Invoice Extractor](https://github.com/tensorlakeai/indexify-extractors/tree/david/advanced-invoice-extractor/advanced-invoice-extractor) from the indexify hub, as well as the [MiniLM-6 Embedding Extractor](https://github.com/tensorlakeai/indexify-extractors/blob/david/advanced-invoice-extractor/embedding-extractors/minilm-l6/README.md) which can be used to do fulltext-search.

Indexify is similar to kubernetes, docker, kafka or memcached if you used any of these applications before, it runs in the background as a binary.

Now let us quickly validate that indexify is up and running.

In [26]:
# As a temporary dev solution, let's install sdk-py from the top-level directory
# !(cd ../../sdk-py/ && pip install -e . )
# !(cd ../../sdk-py/ && poetry install -e . )
from indexify import IndexifyClient

In [27]:
# Load the client
client = IndexifyClient()

# Double check that we don't already have any repositories installed
client.repositories()

[Repository(name=default), Repository(name=invoices)]

We don't receive any errors when trying to connect to the default indexify server using the client, indexify seems to be up and running!

### (2) Create a data-repository for invoices

You can store data in different repositories.
This is similar to databases in Postgres, or collections in MongoDB. 
This serves the purpose so that we can organize our datasets better. 
In our case, we will want to create a new repository for invoices, as all our invoices should go into the same "collection".

In [28]:
client.create_repository("invoices")

Repository(name=invoices)

In [29]:
invoice_data_repository = client.get_repository("invoices")
invoice_data_repository

Repository(name=invoices)

### (3) Bind existing extractors to our data-repository

You can import existing extractors from the [indexify-hub](https://github.com/tensorlakeai/indexify-extractors), or write your own extractor using `indexify extractor new <extractor name>`. In this example, we will use the example [invoice-parsers](https://github.com/tensorlakeai/indexify-extractors/tree/main/invoice-extractor) which parses some basic data from our invoice.
We already spun it in the local `docker-compose.yaml`. Alternatively, if you have any other extractors, you can spin them up in a docker container / kubernetes of your choice, and using the `docker run yenicelik/simple-invoice-parser extractor start --coordinator-addr http://localhost:8950`. Here, `http://localhost:8950` is where our local indexify server is running, and listening to new extractors. 

Let's quickly validate that our `simple-invoice-parser` extractor is running and found by our indexify server.

In [33]:
# Let's bind the advanced invoice parser,
# and the MiniLM-6 parser as well to this repository,
# so work can begin in the background

# First, let's see what extractors we have available
client.extractors()

[Extractor(name=yenicelik/simple-invoice-parser, description=Parses an invoice using the to-be/donut-base-finetuned-invoices huggingface model)]

Perfect! Let us also bind this extractor to the repository, so that all files in the repository are processed by this extractor. 

In [None]:
# TODO: Bind the extractor
# TODO: What is the outputs here?
invoice_data_repository.bind_extractor(name="yenicelik/simple-invoice-parser", outputs="")

### (4) Insert Invoices

We can now insert invoices. For simplicity, we attached a couple of invoices in the `data/` directory of this example folder.

In [None]:
# TODO: Load documents, and then add them here
invoice_data_repository.add_documents()

Next, let us load the invoice

Create a repository where we will be inputting all our files

Add an extractor to the repository. This extractor is already running in the background as a docker image. 

### (5) Search for Invoices

In [None]:


indexify_extractor_sdk.