In [1]:
%pip install -q -U indexify indexify-extractor-sdk

# Download Indexify Server
!curl https://getindexify.ai | sh

# Download Extractors
!indexify-extractor download hub://audio/asrdiarization
!indexify-extractor download hub://text/chunking
!indexify-extractor download hub://embedding/arctic

Note: you may need to restart the kernel to use updated packages.


After installing the necessary libraries, download the server, and the extractors, you need to restart the runtime. Then, you have to run Indexify Server with the Extractors.

Open 2 terminals and run the following commands:

```bash
# Terminal 1
./indexify server -d

# Terminal 2
indexify-extractor join-server
```

#### Create a Client, Define Extraction Graph & Ingest Contents

In [1]:
from indexify import IndexifyClient
client = IndexifyClient()

In [2]:
from indexify import ExtractionGraph

extraction_graph_spec = """
name: 'asrrag'
extraction_policies:
   - extractor: 'tensorlake/asrdiarization'
     name: 'sttextractor'
     input_params:
        batch_size: 24
   - extractor: 'tensorlake/chunk-extractor'
     name: 'chunker'
     input_params:
        chunk_size: 1000
        overlap: 100
     content_source: 'sttextractor'
   - extractor: 'tensorlake/arctic'
     name: 'embedder'
     content_source: 'chunker'
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)

In [3]:
content_id = client.upload_file("asrrag", "interview.mp3")
print(content_id)
client.wait_for_extraction(content_id)

26c06462ef9ce19b


## Performing RAG with OpenAI

In [4]:
def get_context(question: str, index: str, top_k=1):
    results = client.search_index(name=index, query=question, top_k=top_k)
    context = ""
    for result in results:
        context = context + f"content id: {result['content_id']} \n\n passage: {result['text']}\n"
    return context

In [5]:
question = "What does the guy has to say about his familiarity with the fashion world?"
context = get_context(question, "asrrag.embedder.embedding")
context

"content id: 6423bc4f19ad03cd \n\n passage: [{'speaker': 'SPEAKER_00', 'timestamp': (0.0, 3.84), 'text': ' It was glaringly hot, not a cloud in the sky nor a breath of wind.'}]\n"

In [6]:
def create_prompt(question, context):
    return f"Answer the question, based on the context.\n question: {question} \n context: {context}"

prompt = create_prompt(question, context)

In [14]:
from openai import OpenAI
client_openai = OpenAI()

In [15]:
chat_completion = client_openai.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
)
print(chat_completion.choices[0].message.content)

Based on the context provided, the essay on various organ systems like the nervous system and the digestive system would include information about the endocrine system, the immune system, the nervous system, the circulatory system, and the digestive system. The nervous system is described as a network of cells that transmit signals between different parts of the body, enabling communication and response to stimuli. The digestive system is highlighted as a series of organs that work together to break down food into nutrients for growth, repair, and energy production.
