## **Setup**

<div class="align-center">
  <a href="https://getindexify.ai/"><img src="https://getindexify.ai/Indexify_Logo_Wordmark.svg" width="145"></a>
  <a href="https://discord.com/invite/kF8UZACA7r"><img src="https://raw.githubusercontent.com/rishiraj/random/main/Discord%20button.png" width="145"></a><br>
  Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/tensorlakeai/indexify">Github</a></i> ⭐
</div>

In [109]:
%pip install --upgrade --quiet  wikipedia

# Download Indexify Server
!curl https://getindexify.ai | sh

# Download Extractors
!indexify-extractor download hub://text/chunking
!indexify-extractor download hub://embedding/minilm-l6

Note: you may need to restart the kernel to use updated packages.


After installing the necessary libraries, download the server, and the extractors, you need to restart the runtime. Then, you have to run Indexify Server with the Extractors.

Open 2 terminals and run the following commands:

```bash
# Terminal 1
./indexify server -d

# Terminal 2
indexify-extractor join-server
```

## **Creating Extraction Graph**

In [110]:
from indexify import IndexifyClient, ExtractionGraph
client = IndexifyClient()

In [111]:
extraction_graph_spec = """
name: 'sportsknowledgebase'
extraction_policies:
   - extractor: 'tensorlake/chunk-extractor'
     name: 'chunker'
     input_params:
        chunk_size: 1000
        overlap: 100
   - extractor: 'tensorlake/minilm-l6'
     name: 'wikiembedding'
     content_source: 'chunker'
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)                                            

## **Indexify Retriever for RAG**

In [112]:
from langchain_community.document_loaders import WikipediaLoader

docs = WikipediaLoader(query="Kevin Durant", load_max_docs=1).load()
for doc in docs:
    content_id = client.add_documents("sportsknowledgebase", doc.page_content)
    client.wait_for_extraction(content_id)  

In [84]:
client.indexes()

[{'name': 'sportsknowledgebase.wikiembedding.embedding',
  'embedding_schema': {'dim': 384, 'distance': 'cosine'}}]

In [93]:
def get_context(question: str, index: str, top_k=3):
    results = client.search_index(name=index, query=question, top_k=3)
    context = ""
    for result in results:
        context = context + f"content id: {result['content_id']} \n\n passage: {result['text']}\n"
    return context

In [101]:
question = "When and where did Kevin Durant win his championships?"
context = get_context(question, "sportsknowledgebase.wikiembedding.embedding")

In [102]:
prompt = f"Answer the question, based on the question.\n question: {question} \n context: {context}"

In [100]:
from openai import OpenAI
client_openai = OpenAI()

In [104]:
chat_completion = client_openai.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
)


In [108]:
print(chat_completion.choices[0].message.content)

Kevin Durant won his championships with the Golden State Warriors in 2017 and 2018.
