# Run here only if you are on colab
---

These are the necessary steps for running ollama from colab that I copied form [this notebook](https://colab.research.google.com/github/5aharsh/collama/blob/main/Ollama_Setup.ipynb#scrollTo=zyGk-87qnbWE)

For the list of available models check [models being offerred by Ollama](https://ollama.com/library).

## Before you proceed
---

Since by default the runtime type of Colab instance is CPU based, in order to use LLM models make sure to change your runtime type to T4 GPU (or better if you're a paid Colab user). This can be done by going to **Runtime > Change runtime type**.

While running your script be mindful of the resources you're using. This can be tracked at **Runtime > View resources**.

## Running the notebook
---

After configuring the runtime just run it with **Runtime > Run all**. And you can start tinkering around. This example uses [Llama 3.2](https://ollama.com/library/llama3.2) to generate a response from a prompted question using [LangChain Ollama Integration](https://python.langchain.com/docs/integrations/chat/ollama/).

## Installing Dependencies
---

1. `pciutils` is required by Ollama to detect the GPU type.
2. Installation of Ollama in the runtime instance will be taken care by `curl -fsSL https://ollama.com/install.sh | sh`

In [None]:
!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

## Running Ollama
---

In order to use Ollama it needs to run as a service in background parallel to your scripts. Becasue Jupyter Notebooks is built to run code blocks in sequence this make it difficult to run two blocks at the same time. As a workaround we will create a service using subprocess in Python so it doesn't block any cell from running.

Service can be started by command `ollama serve`.

`time.sleep(5)` adds some delay to get the Ollama service up before downloading the model.

In [None]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

You can now import ollama and go along with the rest of the notebook

# Exercise: Developing a RAG pipeline, then making it better, then making it better...

## What is RAG anyways?

Rag is somewhat talked about nowadays, and one of the most popular use cases for generative text models

<img src="assets/what_is_rag.jpg" alt="what_is_rag" width="70%"/>

RAG is coined in [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401). It refers to supporting internal knowledge of pretrained language models with dynamic text data. We can practically define it as including the knowledge sources in the input along with the question.

Typical RAG process consists of a user query, a knowledge base, a search step that finds relevant documents to the user query from the knowledge base, a generative model that takes those as inputs and generates an answer to the query

This is basically what we are about to do
https://medium.com/@drjulija/what-is-retrieval-augmented-generation-rag-938e4f6e03d1

![basic rag pipeline](https://miro.medium.com/v2/resize:fit:800/format:webp/1*FhMJ8OE_PoeOyeAavYjzlw.png)

## The toolset

I willl start by introducing you a very nice tool called [ollama](https://ollama.com/). It is [a user-friendly and powerful software for running LLMs locally.](https://geshan.com.np/blog/2025/02/what-is-ollama/#:~:text=Ollama%20is%20a%20user%2Dfriendly,customizable%20with%20a%20model%20file.)  
Ollama manages efficient utilization of your current hardware for accelerated model inference. It is tricky to run big models on limited computation power. There are a lot of trick you can utilize for enabling that, ollama uses those tricks for you.  

check out the [docs](https://github.com/ollama/ollama/tree/main/docs)

I encourage you to read the [blogs](https://ollama.com/blog) on the official page and other guides to get an idea about its inner workings. Or simply peak at the code itself it is [open source](https://github.com/ollama/ollama)

Needless to say we will be using [ollama python library](https://github.com/ollama/ollama-python) in this week's challenge

Go ahead and install it and get it up and running before you import the ollama library (of course if you are using colab you don't need the ollama desktop app)

## Recap: Orthogonalization

You may have watched [this lecture](https://www.youtube.com/watch?v=UEtvV1D6B3s) before. It talks about the concept called "orthogonalization". It refers to system design and machine learning in general, being aware of what to tune in order to achieve what effect. It is simpler in, for example, supervised machine learning. However, it requires a different approach when we are dealing with tools and systems that are developed by other people and not for your exact spesific usecase (happens a lot, like almost all the time)

I will introduce our system, and I will identify the moving parts of our system, the parts that can be modified or replaced. Then I will ask you to **play around with these moving parts and try to get an idea about what causes what**.

To do that, we need to have an objective. Our objective is of course to answer questions accurately. I have put together an evaluation set to measure our objective. It is a bunch of *Multiple Choice* questions about a specific knowledge base.

Why multiple choice? Because it is very easy to measure, no partial points, no other business concerns, just true or false. It is usually a lot harder to measure the goodness of a RAG system in real life.

## Evaluation set

The question set that we will evaluate our system consists of 50 questions about a specific topic. I chose this topic because if I used something common and well known, it might already exist in the pretraining data of the generative models. To measure that bias, **we will also need to measure the baseline score without any rag for each llm in our experiments** 

In [1]:
import pandas as pd
# If your experiments take too long, you can use the mini evaluation set instead
# evalset = pd.read_csv("evaluation_set_mini.csv", sep="|") # you can even make it minier for your convenience
# In my hardware and pipeline 50 questions took around 10-15 minutes
evalset = pd.read_csv("evaluation_set.csv", sep="|")

In [2]:
pd.set_option('display.max_colwidth', None)

In [3]:
evalset.sample(10)

Unnamed: 0,question,A,B,C,D,answer
31,Dizinin yapımcısı kimdir?,A) Gülse Birsel,B) Sinan Çetin,C) Şenay Gürler,D) Jale Atabey Özberk,B
45,Volkan'ın şöhret olma tutkusu hangi sezonda yoğunluk kazanmıştır?,A) İkinci,B) Üçüncü,C) Dördüncü,D) Beşinci,A
21,Dizinin beşinci sezonunda yeni katılan karakterlerden biri olan Şahika'yı hangi oyuncu canlandırmıştır?,A) Hasibe Eren,B) Binnur Kaya,C) Şenay Gürler,D) Hümeyra,B
35,"Dizinin altıncı sezonunda hangi ünlü karakter, iki karakteri aynı anda canlandırmıştır?",A) Ata Demirer,B) Engin Günaydın,C) Binnur Kaya,D) Levent Üzümcü,A
15,"What is the total number of seasons for ""Avrupa Yakası""?",A) Four,B) Five,C) Six,D) Seven,C
41,"Dizinin beşinci sezonunda, hangi karakter yılbaşı bölümünden sonra performansından memnun olunmadığı için ayrılmıştır?",A) Cesur,B) Gülenay,C) Kubilay,D) Makbule,A
36,Dizinin beşinci sezonunda hangi karakter Burhan ile birlikte kaçmıştır?,A) İffet,B) Şahika,C) Selin,D) Makbule,D
4,"How many total episodes does the ""Avrupa Yakası"" series have?",A) 132,B) 150,C) 190,D) 200,C
10,"Who was the producer of ""Avrupa Yakası""?",A) Gülse Birsel,B) Sinan Çetin,C) Şenay Gürler,D) Jale Atabey Özberk,B
34,"Dizinin hangi sezonunda Volkan, askerden dönerek diziye geri katılmıştır?",A) Üçüncü,B) Dördüncü,C) Beşinci,D) Altıncı,D


Are the questions familiar to you? Do you know some of the answers?

## Knowledge Base

To be able to answer the questions about *that* topic, we need to build a knowledge base.

To achieve that, I wrote a script to scrape information from vikipedi (or wikipedia). Let's collect our knowledge intensive documents from the internet.

In [4]:
!python wiki_crawler.py -h

usage: wiki_crawler.py [-h] -c CATEGORY -o OUTPUT [-l {en,tr}] [--pdf]

Wikipedia Category Crawler

options:
  -h, --help            show this help message and exit
  -c CATEGORY, --category CATEGORY
                        Wikipedia category to crawl (e.g.,
                        "Search_algorithms", "Alacakaranlık_filmleri")
  -o OUTPUT, --output OUTPUT
                        Output directory path within current working directory
  -l {en,tr}, --language {en,tr}
                        Wikipedia language (en or tr), used for configuring
                        urls
  --pdf                 Download articles as PDF instead of text

Example usage:
        python wiki_crawler.py -c "Search_algorithms" -o search-algorithms -l en
        python wiki_crawler.py --category "Avrupa_Yakası" --output avrupa-yakasi --pdf --language tr


The script crawls a given wikipedia *Category*. I encourage you to go and check how the script works. I hope it can be useful for your personal RAG projects :)

The *Category* we are interested in, as you may have noticed, is Avrupa Yakası.

The cell below downloads the main text body from every page in "Avrupa_Yakası" category as txt files and also pdf files.

In [6]:
!python wiki_crawler.py --category "Avrupa_Yakası" --output avrupa-yakasi --language tr
!python wiki_crawler.py --category "Avrupa_Yakası" --output avrupa-yakasi --pdf --language tr

100%|█████████████████████████████████████████████| 9/9 [00:02<00:00,  3.37it/s]
100%|█████████████████████████████████████████████| 9/9 [00:27<00:00,  3.04s/it]


Go ahead and take a look at the files generated. You may notice the pdf versions contain more information than the txt files. Meaning we can change the scope of our knowledge base by parsing the pdfs or txts

## Indexing

Before we can do the actual searching on our newly created knowledge base, we need to create an index. An Index is basically a searchable space. The nature of the indexing process depends on the nature of the search. In our search, we want to input a query and want to get the relevant contexts to the desired answer.

The relevant contexts will be the inputs to an LLM. To reduce the irrelevant data we input to the LLM, which can result in noise in the generation, we need to divide our documents into smaller parts.

### Chunking

Chunking is the process of splitting text into meaningful *chunks.* The quality of the chunks have a significant impact on the quality of the search. For that, there are a lot of developed chunking strategies that are useful in various usecases. You can also come up with yours for this exercise. **This is the first moving part of our system**

- TokenChunker: Splits text into fixed-size token chunks.
- WordChunker: Splits text into chunks based on words.
- SentenceChunker: Splits text into chunks based on sentences.
- RecursiveChunker: Splits text hierarchically using customizable rules to create semantically meaningful chunks.
- SemanticChunker: Splits text into chunks based on semantic similarity.

If we are going to use an embedding model for the retrieval process, it is a good idea to enforce the maximum token size of chunks with that model's tokenizer. Otherwise we can create chunks that are too big for our embedding model.

In [4]:
from sentence_splitter import SentenceSplitter
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# For the later embedding part, I will use the nomic-embed-text model for demonstration
# that model uses bert-base-uncased tokenizer according to the model card https://huggingface.co/nomic-ai/nomic-embed-text-v1#transformers


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Below I implemented a simple recursive chunking algorithm from scratch for demonstration. You are free to use whatever method from whatever library to chunk your knowledge base for your experiments. Even go ahead and try the code generating models to see which one writes better chunking code for our use case :)

In [5]:
levels = ["paragraph", "sentence", "word", "token"]
sentence_splitter = SentenceSplitter(language="en")

def _split(text, split_level):
    
    if split_level == "paragraph":
        return text.split("\n")
    elif split_level == "sentence":
        return sentence_splitter.split(text)
    elif split_level == "word":
        return text.split()
    elif split_level == "token":
        return tokenizer.tokenize(text)

def _merge(current_chunk, new_text, merge_level):
    if merge_level == "paragraph":
        join_char = "\n"
    elif merge_level == "sentence":
        join_char = " "
    elif merge_level == "word":
        join_char = " "
    elif merge_level == "token":
        join_char = ""
        
    if not current_chunk:
        return new_text
        
    return current_chunk + join_char + new_text


def recursive_chunker(text, max_token_length, level=0):

    if not text.strip() or len(tokenizer.tokenize(text)) <= max_token_length:
        # if the input text is already smaller than the token limit, we return it directly as a one element list
        return [text]

    # if we are here, the input text is larger than the max_token_length
    list_of_texts = _split(text, levels[level])
    
    # if we can split the large text with the current level, loop inside the splitted text to see if each element is small enough
    if len(list_of_texts) > 1:
        chunks = []
        for text in list_of_texts:
            # Process each piece recursively so that the initial list_of_texts can only contain texts smaller than max_token_length
            sub_chunks = recursive_chunker(text, max_token_length, level + 1)
            chunks.extend(sub_chunks)
    else:
        # If the large text cannot be splitted with the current level, we move to the next level
        return recursive_chunker(text, max_token_length, level + 1)

    
    ### Merging phase

    final_chunks = []
    current_chunk_size = 0
    current_chunk = ""
    for element in chunks:
        element_size = len(tokenizer.tokenize(element))
        
        # can i merge the current element with the current chunk?
        if current_chunk_size + element_size <= max_token_length: #yes
            current_chunk = _merge(current_chunk, element, levels[level])
            current_chunk_size += element_size

        else: 
            # if the current chunk and candidate element exceeds the limit,
            # we add the current chunk to final chunk list, and flush the current chunk and current chunk size, with the misfit element
            final_chunks.append(current_chunk.strip())
            current_chunk_size = element_size
            current_chunk = element

    if current_chunk: # lastly we add the dangling chunk after the loop has ended
        final_chunks.append(current_chunk.strip())
        
    return final_chunks

Let's demostrate what the function does to a tiny little text. **It is clear that the chunking algorithm and the maximum token size would have an observable impact on our search results**

In [6]:
def print_chunks(chunks):
    for i, chunk in enumerate(chunks):
        print(f"{i+1}:\t{chunk}")
        print()

text = """This is a long piece of text that needs to be split into chunks.
The text contains multiple sentences and paragraphs. Each chunk should contain no more than a set number of tokens."""

chunks = recursive_chunker(text, max_token_length=10)
print("with maximum of 10 tokens:")
print_chunks(chunks)
print("="*32)

chunks = recursive_chunker(text, max_token_length=20)
print("with maximum of 20 tokens:")
print_chunks(chunks)
print("="*32)

chunks = recursive_chunker(text, max_token_length=32)
print("with maximum of 32 tokens:")
print_chunks(chunks)
print("="*32)

chunks = recursive_chunker(text, max_token_length=64)
print("with maximum of 64 tokens:")
print_chunks(chunks)
print("="*32)

chunks = recursive_chunker(text, max_token_length=256)
print("with maximum of 256 tokens:")
print_chunks(chunks)

with maximum of 10 tokens:
1:	This is a long piece of text that needs to

2:	be split into chunks.

3:	The text contains multiple sentences and paragraphs.

4:	Each chunk should contain no more than a set number

5:	of tokens.

with maximum of 20 tokens:
1:	This is a long piece of text that needs to be split into chunks.

2:	The text contains multiple sentences and paragraphs.

3:	Each chunk should contain no more than a set number of tokens.

with maximum of 32 tokens:
1:	This is a long piece of text that needs to be split into chunks.

2:	The text contains multiple sentences and paragraphs. Each chunk should contain no more than a set number of tokens.

with maximum of 64 tokens:
1:	This is a long piece of text that needs to be split into chunks.
The text contains multiple sentences and paragraphs. Each chunk should contain no more than a set number of tokens.

with maximum of 256 tokens:
1:	This is a long piece of text that needs to be split into chunks.
The text contains multiple s

Using that function, this is what the chunks of an entire document will look like:

In [7]:
with open("avrupa-yakasi/Avrupa Yakası (dizi) - Vikipedi.txt") as f:
    text = f.read()

chunks = recursive_chunker(text, max_token_length=256)
print_chunks(chunks[:5])

Token indices sequence length is longer than the specified maximum sequence length for this model (8179 > 512). Running this sequence through the model will result in indexing errors


1:	Avrupa Yakası, Plato Film imzalı, 2004-2009 yılları  arasında atv'de yayımlanan durum komedisi türündeki Türk televizyon dizisidir.[5] Dizide, Nişantaşı'nda yaşayan Sütçüoğlu ailesi ile Avrupa Yakası dergisi çalışanlarının ve onların yakınlarının komik öyküsü anlatılmaktadır.[6] Toplam 6 sezondan oluşan dizi, 24 Haziran 2009 tarihinde yayımlanan 190. bölümü ile final yaparak sona erdi.

2:	Oyuncu kadrosu sezondan sezona farklılık gösterse de; Gülse Birsel, Gazanfer Özcan, Levent Üzümcü, Şenay Gürler, Hale Caneroğlu, Yavuz Seçkin, Veysel Diker ve Yıldırım Öcek tüm sezonlarda yer alan oyunculardır. Üçüncü sezonda Engin Günaydın,[7] dördüncü sezonda ise Peker Açıkalın, Tolga Çevik, Hasibe Eren ve Sarp Apak oyuncu kadrosuna katılmıştır.[8]

3:	İlk sezon, 11 Şubat 2004 tarihinde yayımlanmaya başlamış ve 19. bölümüyle 23 Haziran 2004 tarihinde sona ermiştir. 1. sezon toplam 19 bölümden oluşmaktadır. Aslı ve Cem'in aşkı, Volkan'ın müzik sevdası, Fatoş'un erkeklerle olan ilişkisi gibi konul

and so on...

Are these chunks *good?* Are they *bad?* In terms of what? This is exactly why we are running experiments with the moving parts. We let the metrics speak

___

Let's get all the chunks for our knowledge base

In [7]:
from pathlib import Path
def create_folder_chunks(path, chunking_function, max_token_length=256):
    """
    Process text files in a folder and split their contents into chunks based on token length.

    Args:
        path (str or Path): Directory path containing text files to process
        max_token_length (int, optional): Maximum number of tokens per chunk. Defaults to 256.
        chunking_function (callable): Function that splits text into chunks based on token count.
            Must accept two parameters: text (str) and max_tokens (int)
            Should return a list of text chunks

    Returns:
        list[str]: List of text chunks from all processed files
    
    Notes:
        - Processes all .txt files found in the specified path
        - Each file's contents are processed independently
        - The chunking_function determines the actual splitting behavior
        - Returns a flat list containing chunks from all processed files
    """
    chunks = []
    for file in Path(path).glob("*.txt"):
        print(file)
        with open(file) as f:
            text = f.read()
        chunks.extend(chunking_function(text, max_token_length))
    return chunks

In [9]:
# The function takes as input the chunking function that takes a string and max_token_length
all_the_chunks = create_folder_chunks(path="avrupa-yakasi", chunking_function=recursive_chunker, max_token_length=256)

avrupa-yakasi/Avrupa Yakası (dizi) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (3. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası bölümleri listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (4. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (1. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (6. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası oyuncu listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (5. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (2. sezon) - Vikipedi.txt


In [10]:
len(all_the_chunks) # take a look at how many chunks we would have with different max token sizes

73

### Embedding

We will create an index with the embeddings of the chunks we have chunked just now. **This is the second moving part.** We can use different models from huggingface or ollama. I will demonstrate one from ollama

#### Ollama embedding inference

Ollama have a limited selection of embedding models but they are worth trying

In [8]:
import ollama
embedding_model = "bge-m3"

In [10]:
%%time
# pulling the model for the first time might take a little while
ollama.pull(embedding_model)

CPU times: user 3.44 ms, sys: 1.49 ms, total: 4.93 ms
Wall time: 748 ms


ProgressResponse(status='success', completed=None, total=None, digest=None)

In [12]:
%%time
res = ollama.embed(embedding_model, input=['Merhaba Dünya!','Hello World!'])

CPU times: user 2.91 ms, sys: 1.55 ms, total: 4.46 ms
Wall time: 2.52 s


In [14]:
%%time
all_the_embeddings = ollama.embed(embedding_model, input=all_the_chunks)

CPU times: user 20.2 ms, sys: 3.77 ms, total: 24 ms
Wall time: 23.9 s


In [15]:
len(all_the_embeddings.embeddings)

73

### Finally the actual indexing

I am using a very simple faiss code that I stole directly from [here](https://cheatsheet.md/vector-database/faiss-python-api#:~:text=Sample%20Code%20for%20Basic%20FAISS%20Setup%20in%20Python)

Why am i not using a fancy vectordb? Because we don't need one for this simple of a case. Right now everything is happening on the memory.

In [11]:
import faiss # while installing faiss, pay attention to whether you will run it on gpu or cpu
# https://github.com/facebookresearch/faiss?tab=readme-ov-file#installing

def create_index(embeddings):
    # Initialize a FAISS index
    dimension = len(embeddings[0])  # dimension of each vector
    index = faiss.IndexFlatL2(dimension)
     
    index.add(embeddings)
    return index

In [16]:
import numpy as np

index = create_index(np.array(all_the_embeddings.embeddings))

In [17]:
index

<faiss.swigfaiss_avx2.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x1154005a0> >

## Search

To query our index, we need to get vectors for the question. Below, index.search() calculates vector distances between the query vectors and chunk embeddings, then returns the top_k closest chunks.

**This is the third moving part** of our system. We can perform a lot of tricks with the search algorithm to improve our final results.

In [14]:
def search(query_vector, index, top_k):

    distances, indices = index.search(query_vector, top_k)
    
    return indices, distances

In [27]:
example_queries = ['Tacettin karakterini kim canlandırmaktadır?', "Bu dizinin adı ne?"]
query_vectors = ollama.embed(embedding_model, input=example_queries).embeddings

indices, distances = search(np.array(query_vectors), index, top_k=3)

In [28]:
indices.shape # queries, top_k

(2, 3)

In [29]:
for i, results in enumerate(indices):
    print(f"********* Retrieved contexts for {i+1}th question:")
    for j, chunk_id in enumerate(results):
        print(f"#### context: {j+1}")
        print(all_the_chunks[chunk_id])
        print()
    print()

********* Retrieved contexes for 1th question:
#### context: 1
Sütçüoğlu Muhallebicisi'nde daha çok görünen Sertaç rolünü Yavuz Seçkin, Tacettin rolünü Veysel Diker ve İzzet rolünü ise Timur Acar oynadı. Dizinin son bölümlerine doğru diziye konuk oyuncu olarak dahil olan fakat daha sonraki bölümlerde programın oyuncu kadrosunda yer alan Ömür Arpacı, Dursun rolü ile Avrupa Yakası'nda yer aldı.[22]


Kapıcı ailesinindeki karakterleri ise Bihter Özdemir, Ececan Gümeci, Celal Belgil ve Şensel Uykal canlandırdı. 5. sezonda diziye katılan Gürgen Öz, kapıcı ailesinin akrabası olan Cesur'u oynadı fakat dizinin ilerleyen bölümlerinde diziden ayrıldı.

#### context: 2
Gazanfer Özcan'ın vefat etmesinden sonra diziye Özcan'ın eşi Gönül Ülkü ile Müşfik Kenter diziye Hasibe Eren'in canlandırdığı Makbule'nin anne ve babası rolleriyle katıldı.[12]


Diziye adını veren Avrupa Yakası dergisinde çalışanlar diziye katkı sağlayan tamamlayıcı oyunculardır. Derginin editörü Fatoş 'u Şenay Gürler, modacısı Ya

The chunks will be the input to our generation model

## Generation

Finally we got to the LLM part. Let's see what is so great about them.

**This is the final moving part.** We can tweak with generation model, generation algorithm, generation parameters, system prompts and so on to improve scores.

In [13]:
# pick any one you like from https://ollama.com/search
generation_model = "gemma2:2b"

In [14]:
%%time
ollama.pull(generation_model)

CPU times: user 2.78 ms, sys: 1.46 ms, total: 4.25 ms
Wall time: 594 ms


ProgressResponse(status='success', completed=None, total=None, digest=None)

We can display the models we have pulled so far like this:

In [15]:
ollama.list()

ListResponse(models=[Model(model='gemma2:2b', modified_at=datetime.datetime(2025, 3, 27, 18, 33, 17, 97538, tzinfo=TzInfo(+02:00)), digest='8ccf136fdd5298f3ffe2d69862750ea7fb56555fa4d5b18c04e3fa4d82ee09d7', size=1629518495, details=ModelDetails(parent_model='', format='gguf', family='gemma2', families=['gemma2'], parameter_size='2.6B', quantization_level='Q4_0')), Model(model='bge-m3:latest', modified_at=datetime.datetime(2025, 3, 27, 18, 32, 18, 161976, tzinfo=TzInfo(+02:00)), digest='7907646426070047a77226ac3e684fbbe8410524f7b4a74d02837e43f2146bab', size=1157672605, details=ModelDetails(parent_model='', format='gguf', family='bert', families=['bert'], parameter_size='566.70M', quantization_level='F16')), Model(model='deepseek-r1:7b-qwen-distill-q4_K_M', modified_at=datetime.datetime(2025, 3, 27, 2, 15, 38, 242072, tzinfo=TzInfo(+02:00)), digest='0a8c266910232fd3291e71e5ba1e058cc5af9d411192cf88b6d30e92b6e73163', size=4683075271, details=ModelDetails(parent_model='', format='gguf', fam

This is just a regular general purpose LLM running on our hardware

In [154]:
%%time
generation = ollama.generate(generation_model, prompt='Beşi beş kuruştan beş yumurta kaç kuruş yapar?')

CPU times: user 2.45 ms, sys: 687 μs, total: 3.13 ms
Wall time: 5.33 s


In [155]:
print(generation.response)

Bu soru, aritmetik işlemle çözülebilir! 

* **Her yumurta, 1 kuruş değerinde.**
* **Beş yumurtanın toplam değeri, 5 * 1 = 5 kuruştur.**


**Cevap:** Beşi beş kuruştan beş yumurta **5 kuruş** yapar. 😊 



not bad actually

Here is a function for streaming LLM response for artistic effect.

In [29]:
def generate_answer(model, prompt, generation_kwargs):
    for chunk in ollama.generate(generation_model, prompt=prompt, stream=True, options=generation_kwargs):
      print(chunk["response"], end='', flush=True)

 gibi

 görünü

yor

!

  

😊

 





Be

ş

 kur

uş

luk

 bir

 değer

li

 mal

ze

meden

 oluşan

 bir

 şey

de

,

 bir

 yum

urta

 veya

 yum

ur

tal

ara

 ver

ile

cek

 bir

 işlem

 olduğu

 için

 bu

 sor

uyu

 cevap

lamak

 zor

.

 






**

Bil

me

celer

:**






An

cak

,

 bu

 sor

unun

 ne

 kadar

 basit

 ol

masına

 rağmen

,

 cev

ab

ını

 bul

mak

 için

 biraz

 düşün

mek

 gerekiyor

:






*

 Be

şi

 beş

 kur

uş

tan

 beş

 yum

urta

 kaç

 kur

uş

 yap

ar

?




    




Bu

 gibi

 bil

me

cel

erin

 amac

ı

,

 kelime

leri

 ve

 if

adel

eri

 kullan

arak

 mant

ıklı

 bir

 şekilde

 çöz

üm

 bul

mam

ızı

 sağ

lamak

.

 





**

Baş

arı

lar

!

**

  

🙌

 




In [30]:
generate_answer(generation_model, "Retrieval Augmented Generation nedir? çok kısaca açıkla", None)

Retrieval-Augmented Generation (RAG),  **bilgi tabanlı sistemlerin yapay zekâ tarafından üretilip, o bilgileri içeren özetler oluşturması için bir yöntemdir.** 

* **Geri Call:** Öncelikle sistem bilgisini **dahili bilgi bankasına** yani "retrieval" ile bağlar. Bu bilgi bankası, büyük miktarda bilgiyi tutar ve bu bilgileri kullanarak sorgulamalarını destekleyebilir.
* **Yapay Zeka Üretimi:** Sonra o bilgi bankasına dayalı olarak yapay zeka algoritması tarafından bir özet oluşturulur.

**Sonuç:** RAG, hem bilgi birikimini hem de yapay zekânın yaratıcılığını birleştirerek daha akıllı ve doğru cevaplar üretebilir. 


Örneğin, bir müşteri talepte bulunan ürün hakkında bilgiyi bulmak için RAG kullanabilirsiniz: 

1. **Talep:** "Bir otomobil satın almak istiyorum, ancak en az 200 km mesafeyle hareket edebilen bir model istiyor"
2. **Geri Call:** Sistem bilgisini bu talep üzerine otomatik olarak araştırarak, otomobil modellerinin özelliklerini ve mesafelerini karşılayan öneriler sunar.
3. **Y

## Putting it all together

In [17]:
def _form_prompt(question, contexts): # this is arbitrary, you will optimize this
    contexts_input = "\n"
    for i, context in enumerate(contexts):
        contexts_input += f"Context{i+1}: {context}\n"

    prompt = """You are a state of the art question answering model. 
Given a query and context, you will generate an accurate and complete answer.
You will generate your answer in turkish.
The contexts and question are as follows:
Question: {question}
Contexts: {contexts}
""".format(question=question, contexts=contexts_input)

    return prompt

def get_RAG_answer(query, embedding_model, generation_model, chunks, index, params):
    query_vector = ollama.embed(embedding_model, query).embeddings
    chunk_indices, _ = search(np.array(query_vector), index, params["top_k"])
    contexts = np.array(chunks)[chunk_indices[0]]
    prompt = _form_prompt(query, contexts)

    generate_answer(generation_model, prompt, params["generation_kwargs"])
    

In [38]:
params = {
    "top_k": 3,
    "generation_kwargs": {
        "top_k": None,
        "top_p": 0.8,
        "temperature": 0.7,
    }
}

get_RAG_answer("Fatoş'un dergideki görevi nedir?", embedding_model, generation_model, all_the_chunks, index, params)


Fatoş, Avrupa Yakası dergisinin editörü olarak görev yapıyor. 


___

We have done it! We have done 1 RAG, now it is time to make it better and better

# Benchmarking

First step on making something better is to measure its goodness. We will use our evaluation set to benchmark our experments

In [18]:
evalset

Unnamed: 0,question,A,B,C,D,answer
0,"When did the first season of ""Avrupa Yakası"" air?",A) 2002-2003,B) 2003-2004,C) 2004-2005,D) 2005-2006,C
1,"How many episodes does the second season of ""Avrupa Yakası"" contain?",A) 19,B) 32,C) 39,D) 28,C
2,Which season introduced the character Şahika?,A) Third Season,B) Fourth Season,C) Fifth Season,D) Sixth Season,C
3,"Which character did Binnur Kaya play in the sixth season of ""Avrupa Yakası""?",A) İffet Sütçüoğlu,B) Dilber Hala,C) Aslı,D) Makbule,B
4,"How many total episodes does the ""Avrupa Yakası"" series have?",A) 132,B) 150,C) 190,D) 200,C
5,"Who was the director for the majority of the ""Avrupa Yakası"" series?",A) Sinan Çetin,B) Gülse Birsel,C) Jale Atabey Özberk,D) Hakan Algül,C
6,Which actor played the character of Cem throughout the series?,A) Ata Demirer,B) Levent Üzümcü,C) Engin Günaydın,D) Yıldırım Öcek,B
7,"In which season did Engin Günaydın join the cast of ""Avrupa Yakası""?",A) First,B) Second,C) Third,D) Fourth,C
8,"What was the main setting of ""Avrupa Yakası""?",A) An office building,B) A suburban neighborhood,C) A family residence,D) A magazine office,C
9,Which character tried to marry Aslı in the first season?,A) Cem,B) Volkan,C) Tacettin,D) Sertaç,C


## Running Experiments

In [19]:
from tqdm import tqdm
tqdm.pandas()

In [21]:
Path("experiments").mkdir(parents=True, exist_ok=True)

I am combining the question text with the choices to create the string that will be passed to the prompt

In [20]:
def _form_question(row):
    return f"""{row["question"]}
{row["A"]}
{row["B"]}
{row["C"]}
{row["D"]}
"""    

In [21]:
evalset["full_question_text"] = evalset.progress_apply(_form_question, axis=1)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|████████████████████████████████████████| 50/50 [00:00<00:00, 13880.15it/s]


In [22]:
evalset.sample(5)

Unnamed: 0,question,A,B,C,D,answer,full_question_text
40,"Dizinin üçüncü sezonunda, Sütçüoğlu ailesinin hangi üyesi dergi ofisinde idare müdürü olarak işe başlamıştır?",A) Volkan,B) Aslı,C) Burhan,D) Tacettin,C,"Dizinin üçüncü sezonunda, Sütçüoğlu ailesinin hangi üyesi dergi ofisinde idare müdürü olarak işe başlamıştır? \n A) Volkan \n B) Aslı \n C) Burhan \n D) Tacettin \n"
42,Altıncı sezonun ilk üç bölümünde hangi karakterin repliği kaldırılarak yerine Ata Demirer'in Zeki Müren tonlamasında replik getirilmiştir?,A) Vural Çelik,B) Engin Günaydın,C) Yıldırım Öcek,D) Sarp Apak,A,Altıncı sezonun ilk üç bölümünde hangi karakterin repliği kaldırılarak yerine Ata Demirer'in Zeki Müren tonlamasında replik getirilmiştir? \n A) Vural Çelik \n B) Engin Günaydın \n C) Yıldırım Öcek \n D) Sarp Apak \n
30,"İlk sezonda, Aslı'nın annesi İffet tarafından evlendirilmeye çalışılan karakterin adı nedir?",A) Cem,B) Volkan,C) Tacettin,D) Sertaç,C,"İlk sezonda, Aslı'nın annesi İffet tarafından evlendirilmeye çalışılan karakterin adı nedir? \n A) Cem \n B) Volkan \n C) Tacettin \n D) Sertaç \n"
18,What role did Rutkay Aziz portray in the series?,A) Cem's father,B) Aslı's father,C) Volkan's father,D) None of the above,A,What role did Rutkay Aziz portray in the series? \n A) Cem's father \n B) Aslı's father \n C) Volkan's father \n D) None of the above \n
15,"What is the total number of seasons for ""Avrupa Yakası""?",A) Four,B) Five,C) Six,D) Seven,C,"What is the total number of seasons for ""Avrupa Yakası""? \n A) Four \n B) Five \n C) Six \n D) Seven \n"


## This is your playground

You don't need to write every part from scratch, you can just change what you want to experiment on, you can copy my example codes for the rest

In [195]:
from typing import List

def my_chunking(text: str, max_token_length: int) -> List[str]:
    """
    Here write a chunking function that takes a text and returns a list of chunks
    Feel free to use any available libraries
    such as langchain: https://github.com/langchain-ai/langchain, chonkie: https://github.com/chonkie-ai/chonkie
    or any one you like,
    
    If you want, make your favorite llm generate your code, even compare different LLMs chunking code
    
    """
    return chunks


def my_embedding(embedding_model, text: List[str]) -> List[List[float]]:
    """
    Here implement your vectorization function that takes a list of texts and return vectors that represent those texts
    Feel free to use whatever transformer or a good old term-document matrix or tf-idf or bm25

    """
    return vectors

def my_search(query_vector: List[List[float]], top_k: int, chunks: List[str], index) -> List[List[str]]:
    """
    Here write your search function that takes a vectorized query (vectorized by 'my_embedding' function)
    and returns a list of chunks for each query

    You can use simple vector distance, you can use a reranker, you can use query expansion or whatever tricks intrigue you
    """
    return retrieved_chunks


def my_generate(prompts: List[str], generation_config):
    """
    Here you will use your generation model, you can use whichever model you are curious about. I suggest checking some from https://ollama.com/search
    """
    return generation


def my_answer(queries: List[str], retrieved_chunks: List[List[str]], prompt_template: str, generation_config, generation_function):
    """
    This function encapsulates preparing your input prompt, generating an output, and exracting the choice (A,B,C,D) from the output.
    maybe do some 'prompt engineering' maybe research some ways for optimizing prompts automatically
    """
    def _my_form_prompt(prompt_template: str, queries: List[str], retrieved_chunks: List[List[str]]): # get your prompt ready for inputting
        pass
    def _my_parse_answer(llm_outputs): # extract what the model choose among A,B,C,D from the generated output
        pass

    prompts = _my_form_prompt(prompt_template, queries, retrieved_chunks) # apply your prompt template
    outputs = generation_function(prompts, generation_config)
    answers = _my_parse_answer(outputs)
    return outputs, answers

Our experiment config template will look like this. We will only use this for tracking and creating reports. If it is inconvenient, you can track your experiments in any other way

In [173]:
experiment_config = { # by "my" i mean "your" 
    "experiment_id": None, #my_experiment_id,
    "chunking_function": None, #my_chunking,
    "max_token_length": None, #my_chunk_size,
    "embedding_function": None, #my_embedding,
    "embedding_model": None, #my_embedding_model_name,
    "search_function": None, #my_search,
    "top_k": None, # my_top_k
    "search_metadata": {
        "anything": "relevant",
        "for_the": "experiment"
    },
    "answer_function": None, #my_answer,
    "generation_function": None, #my_generate,
    "generation_model": None, #my_generation_model_name,
    "prompt_template": None, #my_prompt_template,
    "generation_params": None #my_generation_params
}

## This is the running and reporting part

We will be appending our experiment result to this dataframe

In [312]:
experiment_results = evalset[["full_question_text", "answer"]]
experiment_results.to_csv("experiments/experiments_results.csv", sep="|")

In [313]:
experiment_leaderboard = pd.DataFrame(columns=list(experiment_config.keys()) + ["rights", "wrongs", "score"])
experiment_leaderboard.to_csv("experiments/experiment_leaderboard.csv", sep="|")

I am writing the trial function so that it can work with batches but you can also just use loops inside your functions

In [23]:
def run_trial(experiment_config):

    # get your chunking function
    get_chunks = experiment_config["chunking_function"]
    # create chunks for our knowledge base
    max_token_length = experiment_config["max_token_length"]
    chunks = create_folder_chunks(path="avrupa-yakasi", chunking_function=get_chunks, max_token_length=max_token_length)
    print("chunks are ready")
    
    # get your embedding function
    get_embeddings = experiment_config["embedding_function"]
    # get your embedding model
    embedding_model = experiment_config["embedding_model"]
    chunk_embeddings = get_embeddings(embedding_model, chunks, "chunks")
    print("chunk embeddings are ready")
    # create an index
    index = create_index(embeddings=np.array(chunk_embeddings))
    print("search index is ready")
    
    # get eval questions
    eval_questions = evalset["question"].values.tolist()
    # get question embeddings
    eval_question_embeddings = get_embeddings(embedding_model, eval_questions, "questions")
    # get your search function
    search = experiment_config["search_function"]
    top_k_contexts = search(eval_question_embeddings, experiment_config["top_k"], chunks, index)
    print("retrieval is completed")
    
    # lets get generating
    generate = experiment_config["generation_function"]
    get_answers = experiment_config["answer_function"]
    prompt_template = experiment_config["prompt_template"]
    generation_params = experiment_config["generation_params"]
    outputs, answers = get_answers(queries=eval_questions, retrieved_chunks=top_k_contexts, prompt_template=prompt_template, generation_config=generation_params, generation_function=generate)
    print("The answers are ready!!!")
    
    exp_id = experiment_config["experiment_id"]
    
    return {f"{exp_id}_outputs": outputs, f"{exp_id}_answers": answers}


def evaluate_trial(experiment_records, experiment_id):
    rights = experiment_records[experiment_records[f"{experiment_id}_answers"].str.lower().str.strip() == experiment_records["answer"].str.lower().str.strip()].index
    wrongs = experiment_records[experiment_records[f"{experiment_id}_answers"].str.lower().str.strip() != experiment_records["answer"].str.lower().str.strip()].index
    score = len(rights) / (len(rights) + len(wrongs))
    print("the results are in")
    evals = {"rights": rights, "wrongs": wrongs, "score": score}
    print(evals)
    return evals

def record_trial(experiment_result, experiment_config):
    experiment_records = pd.read_csv("experiments/experiments_results.csv", sep="|", index_col=0)
    experiment_records = pd.concat([experiment_records, pd.DataFrame(experiment_result)], axis=1)
    experiment_records.to_csv("experiments/experiments_results.csv", sep="|")

    evals = evaluate_trial(experiment_records, experiment_config["experiment_id"])
    experiment_leaderboard = pd.read_csv("experiments/experiment_leaderboard.csv", sep="|", index_col=0)
    new_row = {**experiment_config, **evals}
    experiment_leaderboard = pd.concat([experiment_leaderboard, pd.DataFrame([new_row])], ignore_index=True, axis=0)
    experiment_leaderboard.to_csv("experiments/experiment_leaderboard.csv", sep="|")

After configuring your run just run the cell below and everything will be run and recorded

In [128]:
%%time
results = run_trial(experiment_config)
record_trial(results, experiment_config)

## My example entry

In [199]:
from typing import List
levels = ["paragraph", "sentence", "word", "token"]
sentence_splitter = SentenceSplitter(language="en")
def my_chunking(text: str, max_token_length: int, level=0) -> List[str]:

    
    def _split(text, split_level):
        
        if split_level == "paragraph":
            return text.split("\n")
        elif split_level == "sentence":
            return sentence_splitter.split(text)
        elif split_level == "word":
            return text.split()
        elif split_level == "token":
            return tokenizer.tokenize(text)
    
    def _merge(current_chunk, new_text, merge_level):
        if merge_level == "paragraph":
            join_char = "\n"
        elif merge_level == "sentence":
            join_char = " "
        elif merge_level == "word":
            join_char = " "
        elif merge_level == "token":
            join_char = ""
            
        if not current_chunk:
            return new_text
            
        return current_chunk + join_char + new_text
    
    
    if not text.strip() or len(tokenizer.tokenize(text)) <= max_token_length:
        # if the input text is already smaller than the token limit, we return it directly as a one element list
        return [text]

    # if we are here, the input text is larger than the max_token_length
    list_of_texts = _split(text, levels[level])
    
    # if we can split the large text with the current level, loop inside the splitted text to see if each element is small enough
    if len(list_of_texts) > 1:
        chunks = []
        for text in list_of_texts:
            # Process each piece recursively so that the initial list_of_texts can only contain texts smaller than max_token_length
            sub_chunks = my_chunking(text, max_token_length, level + 1)
            chunks.extend(sub_chunks)
    else:
        # If the large text cannot be splitted with the current level, we move to the next level
        return my_chunking(text, max_token_length, level + 1)

    
    ### Merging phase

    final_chunks = []
    current_chunk_size = 0
    current_chunk = ""
    for element in chunks:
        element_size = len(tokenizer.tokenize(element))
        
        # can i merge the current element with the current chunk?
        if current_chunk_size + element_size <= max_token_length: #yes
            current_chunk = _merge(current_chunk, element, levels[level])
            current_chunk_size += element_size

        else: 
            # if the current chunk and candidate element exceeds the limit,
            # we add the current chunk to final chunk list, and flush the current chunk and current chunk size, with the misfit element
            final_chunks.append(current_chunk.strip())
            current_chunk_size = element_size
            current_chunk = element

    if current_chunk: # lastly we add the dangling chunk after the loop has ended
        final_chunks.append(current_chunk.strip())
    else:
        print("how can this happen???")
        
    return final_chunks


def my_embedding(embedding_model, text: List[str]) -> List[List[float]]:

    return ollama.embed(embedding_model, input=text).embeddings


def my_search(query_vector: List[List[float]], top_k: int, chunks: List[str], index) -> List[List[str]]:

    distances, indices = index.search(np.array(query_vector), top_k)
    
    return np.array(chunks)[indices]


def my_generate(prompts: List[str], generation_config):
    generations = []
    for prompt in tqdm(prompts):
        answer = ollama.generate(generation_model, prompt=prompt, options=generation_config).response
        generations.append(answer)
    return generations


def my_answer(queries: List[str], retrieved_chunks: List[List[str]], prompt_template: str, generation_config, generation_function):
    """
    This function encapsulates preparing your input prompt, generating an output, and exracting the choice (A,B,C,D) from the output.
    """
    def _my_form_prompt(prompt_template: str, queries: List[str], retrieved_chunks: List[List[str]]): # get your prompt ready for inputting
        prompts = []
        for query, top_k_chunks in tqdm(zip(queries, retrieved_chunks)):
            contexts_input = "\n"
            for i, context in enumerate(top_k_chunks):
                contexts_input += f"Context{i+1}: {context}\n"
    
            prompt = prompt_template.format(question=query, contexts=contexts_input)
            prompts.append(prompt)
        return prompts
    
    def _my_parse_answer(llm_outputs: List[str]): # extract what the model choose among A,B,C,D from the generated output
        outputs = []
        for output in tqdm(llm_outputs):
            outputs.append(output.split("My Final Answer:")[1].strip())
        return outputs

    prompts = _my_form_prompt(prompt_template, queries, retrieved_chunks) # apply your prompt template
    outputs = generation_function(prompts, generation_config)
    answers = _my_parse_answer(outputs)
    return outputs, answers

In [207]:
my_prompt_template = """You are a state of the art question answering model. 
You will be given a multiple choice question. And relevant contexts that will help you to select the correct choice.
Select you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>
The contexts and question are as follows:
Question: {question}
Contexts: {contexts}
"""


my_experiment_config = {
    "experiment_id": "example_entry",
    "chunking_function": my_chunking,
    "max_token_length": 256,
    "embedding_function": my_embedding,
    "embedding_model": "bge-m3",
    "search_function": my_search,
    "top_k": 5,
    "search_metadata": {
        "reranker": None,
        "augmentation": None
    },
    "answer_function": my_answer,
    "generation_function": my_generate,
    "generation_model": "gemma2:2b",
    "prompt_template": my_prompt_template,
    "generation_params": {
        "temperature": 0.7
    }
}

In [None]:
%%time
results = run_trial(my_experiment_config)
record_trial(results, my_experiment_config)

## wo retrieval trial

In [39]:
from typing import List

def fake_chunking(text: str, max_token_length: int, level=0) -> List[str]:
        
    return ["not real"]


def fake_embedding(embedding_model, text: List[str]) -> List[List[float]]:

    return [[1,2,3]]


def fake_search(query_vector: List[List[float]], top_k: int, chunks: List[str], index) -> List[List[str]]:

    return ["not real"]


def generic_generate(prompts: List[str], generation_config):
    generations = []
    for prompt in tqdm(prompts):
        answer = ollama.generate(generation_model, prompt=prompt, options=generation_config).response
        generations.append(answer)
    return generations


def answer_without_context(queries: List[str], retrieved_chunks: List[List[str]], prompt_template: str, generation_config, generation_function):
    """
    This function encapsulates preparing your input prompt, generating an output, and exracting the choice (A,B,C,D) from the output.
    """
    def _my_form_prompt(prompt_template: str, queries: List[str], retrieved_chunks: List[List[str]]): # get your prompt ready for inputting
        prompts = []
        for query in tqdm(zip(queries)):
    
            prompt = prompt_template.format(question=query)
            prompts.append(prompt)
        return prompts
    
    def _my_parse_answer(llm_outputs: List[str]): # extract what the model choose among A,B,C,D from the generated output
        outputs = []
        for output in tqdm(llm_outputs):
            try:
                outputs.append(output.split("My Final Answer:")[1].strip())
            except IndexError:
                outputs.append("E") # e stands for error xd
        return outputs

    prompts = _my_form_prompt(prompt_template, queries, retrieved_chunks) # apply your prompt template
    outputs = generation_function(prompts, generation_config)
    answers = _my_parse_answer(outputs)
    return outputs, answers

In [40]:
prompt_template_wo_txt = """You are an expert on Turkish Tv shows from before 2010. You have incredible knowledge especially about Avrupa Yakası by Gülse Birsel.
You can easily answer any question about Avrupa Yakası.
You will be given a multiple choice question. And you are asked to select the correct choice.
Give your answers by making absolutely sure the last thing you say is your choice in this format: My Final Answer: <the letter>
Here is your first question:
Question: {question}

"""


my_experiment_config = {
    "experiment_id": "no_retriever",
    "chunking_function": fake_chunking,
    "max_token_length": 256,
    "embedding_function": fake_embedding,
    "embedding_model": "none",
    "search_function": fake_search,
    "top_k": 0,
    "search_metadata": {
        "reranker": None,
        "augmentation": None
    },
    "answer_function": answer_without_context,
    "generation_function": my_generate,
    "generation_model": "gemma2:2b",
    "prompt_template": prompt_template_wo_txt,
    "generation_params": {
        "temperature": 0.7
    }
}

In [41]:
%%time
results = run_trial(my_experiment_config)

avrupa-yakasi/Avrupa Yakası (dizi) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (3. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası bölümleri listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (4. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (1. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (6. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası oyuncu listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (5. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (2. sezon) - Vikipedi.txt
chunks are ready
chunk embeddings are ready
search index is ready
retrieval is completed


50it [00:00, 70516.21it/s]
100%|███████████████████████████████████████████| 50/50 [03:27<00:00,  4.15s/it]
100%|███████████████████████████████████████| 50/50 [00:00<00:00, 367921.40it/s]

The answers are ready!!!
CPU times: user 151 ms, sys: 28.1 ms, total: 179 ms
Wall time: 3min 27s





In [42]:
results

{'no_retriever_outputs': ['Ah, Avrupa Yakası! A true masterpiece. \n\nThe first season of "Avrupa Yakası" aired in **2005**.  \n\nMy Final Answer: b \n',
  'The second season of Avrupa Yakası has 10 episodes. \n\nMy Final Answer: A  \n',
  'Şahika was introduced in season **2** of Avrupa Yakası. \n\nMy Final Answer: b \n',
  'Binnur Kaya played the character \'Ayşe\' in the sixth season of "Avrupa Yakası". \n\nMy Final Answer: b \n',
  'That\'s a great start! I\'m ready for some Avrupa Yakası trivia.  Let\'s get those questions flowing. \n\n**Question:** (\'How many total episodes does the "Avrupa Yakası" series have? \') \n\nMy Final Answer: <b> 103 </b> \n',
  'The director who helmed many episodes of Avrupa Yakası is **Gülşah Şahin.** \n\nMy Final Answer: b \n',
  'The actor who played the character of Cem throughout the series "Avrupa Yakası" is  **Cengiz Bozkurt**. \n\nMy Final Answer: C\n',
  'Engin Günaydın joined the cast of "Avrupa Yakası" in **Season 5**.  \n\nMy Final Answer

In [43]:
record_trial(results, my_experiment_config)

the results are in
{'rights': Index([3, 10, 17, 20, 23, 27, 30, 39], dtype='int64'), 'wrongs': Index([ 0,  1,  2,  4,  5,  6,  7,  8,  9, 11, 12, 13, 14, 15, 16, 18, 19, 21,
       22, 24, 25, 26, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43,
       44, 45, 46, 47, 48, 49],
      dtype='int64'), 'score': 0.16}


## tf-df as retriever

In [26]:
from typing import List
levels = ["paragraph", "sentence", "word", "token"]
sentence_splitter = SentenceSplitter(language="en")
def my_chunking(text: str, max_token_length: int, level=0) -> List[str]:

    
    def _split(text, split_level):
        
        if split_level == "paragraph":
            return text.split("\n")
        elif split_level == "sentence":
            return sentence_splitter.split(text)
        elif split_level == "word":
            return text.split()
        elif split_level == "token":
            return tokenizer.tokenize(text)
    
    def _merge(current_chunk, new_text, merge_level):
        if merge_level == "paragraph":
            join_char = "\n"
        elif merge_level == "sentence":
            join_char = " "
        elif merge_level == "word":
            join_char = " "
        elif merge_level == "token":
            join_char = ""
            
        if not current_chunk:
            return new_text
            
        return current_chunk + join_char + new_text
    
    
    if not text.strip() or len(tokenizer.tokenize(text)) <= max_token_length:
        # if the input text is already smaller than the token limit, we return it directly as a one element list
        return [text]

    # if we are here, the input text is larger than the max_token_length
    list_of_texts = _split(text, levels[level])
    
    # if we can split the large text with the current level, loop inside the splitted text to see if each element is small enough
    if len(list_of_texts) > 1:
        chunks = []
        for text in list_of_texts:
            # Process each piece recursively so that the initial list_of_texts can only contain texts smaller than max_token_length
            sub_chunks = my_chunking(text, max_token_length, level + 1)
            chunks.extend(sub_chunks)
    else:
        # If the large text cannot be splitted with the current level, we move to the next level
        return my_chunking(text, max_token_length, level + 1)

    
    ### Merging phase

    final_chunks = []
    current_chunk_size = 0
    current_chunk = ""
    for element in chunks:
        element_size = len(tokenizer.tokenize(element))
        
        # can i merge the current element with the current chunk?
        if current_chunk_size + element_size <= max_token_length: #yes
            current_chunk = _merge(current_chunk, element, levels[level])
            current_chunk_size += element_size

        else: 
            # if the current chunk and candidate element exceeds the limit,
            # we add the current chunk to final chunk list, and flush the current chunk and current chunk size, with the misfit element
            final_chunks.append(current_chunk.strip())
            current_chunk_size = element_size
            current_chunk = element

    if current_chunk: # lastly we add the dangling chunk after the loop has ended
        final_chunks.append(current_chunk.strip())
    else:
        print("how can this happen???")
        
    return final_chunks


vectorizer = None
context_vectors = []
query_vectors = []
def tfidf_embedding(embedding_model, text: List[str], input_type=None) -> List[List[float]]:
    
    from sklearn.feature_extraction.text import TfidfVectorizer
    import numpy as np

    global vectorizer, context_vectors, query_vectors
    
    if input_type == "chunks":
        vectorizer = TfidfVectorizer()
        # Fit and transform documents
        context_vectors = vectorizer.fit_transform(text)
        return context_vectors.toarray()
        
    elif input_type == "questions":
        vectorizer# Transform new document using existing vocabulary
        query_vectors = vectorizer.transform(text)
        return query_vectors.toarray()



def my_search(query_vector: List[List[float]], top_k: int, chunks: List[str], index) -> List[List[str]]:

    distances, indices = index.search(np.array(query_vector), top_k)
    
    return np.array(chunks)[indices]


def my_generate(prompts: List[str], generation_config):
    generations = []
    for prompt in tqdm(prompts):
        answer = ollama.generate(generation_model, prompt=prompt, options=generation_config).response
        generations.append(answer)
    return generations


def my_answer(queries: List[str], retrieved_chunks: List[List[str]], prompt_template: str, generation_config, generation_function):
    """
    This function encapsulates preparing your input prompt, generating an output, and exracting the choice (A,B,C,D) from the output.
    """
    def _my_form_prompt(prompt_template: str, queries: List[str], retrieved_chunks: List[List[str]]): # get your prompt ready for inputting
        prompts = []
        for query, top_k_chunks in tqdm(zip(queries, retrieved_chunks)):
            contexts_input = "\n"
            for i, context in enumerate(top_k_chunks):
                contexts_input += f"Context{i+1}: {context}\n"
    
            prompt = prompt_template.format(question=query, contexts=contexts_input)
            prompts.append(prompt)
        return prompts
    
    def _my_parse_answer(llm_outputs: List[str]): # extract what the model choose among A,B,C,D from the generated output
        outputs = []
        for output in tqdm(llm_outputs):
            try:
                outputs.append(output.split("My Final Answer:")[1].strip())
            except IndexError:
                outputs.append("E") # e stands for error xd
            
        return outputs

    prompts = _my_form_prompt(prompt_template, queries, retrieved_chunks) # apply your prompt template
    outputs = generation_function(prompts, generation_config)
    answers = _my_parse_answer(outputs)
    return outputs, answers

In [28]:
my_prompt_template = """You are a state of the art question answering model. 
You will be given a multiple choice question. And relevant contexts that will help you to select the correct choice.
Select you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>
The contexts and question are as follows:
Question: {question}
Contexts: {contexts}
"""


my_experiment_config = {
    "experiment_id": "tfidf_as_retriever_k10",
    "chunking_function": my_chunking,
    "max_token_length": 256,
    "embedding_function": tfidf_embedding,
    "embedding_model": "tfidf",
    "search_function": my_search,
    "top_k": 10,
    "search_metadata": {
        "reranker": None,
        "augmentation": None
    },
    "answer_function": my_answer,
    "generation_function": my_generate,
    "generation_model": "gemma2:2b",
    "prompt_template": my_prompt_template,
    "generation_params": {
        "temperature": 0.7
    }
}

In [29]:
%%time
results = run_trial(my_experiment_config)

avrupa-yakasi/Avrupa Yakası (dizi) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (3. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası bölümleri listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (4. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (1. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (6. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası oyuncu listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (5. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (2. sezon) - Vikipedi.txt
chunks are ready
chunk embeddings are ready
search index is ready
retrieval is completed


50it [00:00, 14144.14it/s]
100%|███████████████████████████████████████████| 50/50 [29:03<00:00, 34.88s/it]
100%|███████████████████████████████████████| 50/50 [00:00<00:00, 188762.56it/s]

The answers are ready!!!
CPU times: user 2.41 s, sys: 407 ms, total: 2.82 s
Wall time: 29min 5s





In [30]:
record_trial(results, my_experiment_config)

the results are in
{'rights': Index([1, 3, 4, 5, 13, 17, 26, 31, 34, 38, 39, 43, 47, 48], dtype='int64'), 'wrongs': Index([ 0,  2,  6,  7,  8,  9, 10, 11, 12, 14, 15, 16, 18, 19, 20, 21, 22, 23,
       24, 25, 27, 28, 29, 30, 32, 33, 35, 36, 37, 40, 41, 42, 44, 45, 46, 49],
      dtype='int64'), 'score': 0.28}


## bigger llm

In [24]:
from typing import List
levels = ["paragraph", "sentence", "word", "token"]
sentence_splitter = SentenceSplitter(language="en")
def my_chunking(text: str, max_token_length: int, level=0) -> List[str]:

    
    def _split(text, split_level):
        
        if split_level == "paragraph":
            return text.split("\n")
        elif split_level == "sentence":
            return sentence_splitter.split(text)
        elif split_level == "word":
            return text.split()
        elif split_level == "token":
            return tokenizer.tokenize(text)
    
    def _merge(current_chunk, new_text, merge_level):
        if merge_level == "paragraph":
            join_char = "\n"
        elif merge_level == "sentence":
            join_char = " "
        elif merge_level == "word":
            join_char = " "
        elif merge_level == "token":
            join_char = ""
            
        if not current_chunk:
            return new_text
            
        return current_chunk + join_char + new_text
    
    
    if not text.strip() or len(tokenizer.tokenize(text)) <= max_token_length:
        # if the input text is already smaller than the token limit, we return it directly as a one element list
        return [text]

    # if we are here, the input text is larger than the max_token_length
    list_of_texts = _split(text, levels[level])
    
    # if we can split the large text with the current level, loop inside the splitted text to see if each element is small enough
    if len(list_of_texts) > 1:
        chunks = []
        for text in list_of_texts:
            # Process each piece recursively so that the initial list_of_texts can only contain texts smaller than max_token_length
            sub_chunks = my_chunking(text, max_token_length, level + 1)
            chunks.extend(sub_chunks)
    else:
        # If the large text cannot be splitted with the current level, we move to the next level
        return my_chunking(text, max_token_length, level + 1)

    
    ### Merging phase

    final_chunks = []
    current_chunk_size = 0
    current_chunk = ""
    for element in chunks:
        element_size = len(tokenizer.tokenize(element))
        
        # can i merge the current element with the current chunk?
        if current_chunk_size + element_size <= max_token_length: #yes
            current_chunk = _merge(current_chunk, element, levels[level])
            current_chunk_size += element_size

        else: 
            # if the current chunk and candidate element exceeds the limit,
            # we add the current chunk to final chunk list, and flush the current chunk and current chunk size, with the misfit element
            final_chunks.append(current_chunk.strip())
            current_chunk_size = element_size
            current_chunk = element

    if current_chunk: # lastly we add the dangling chunk after the loop has ended
        final_chunks.append(current_chunk.strip())
    else:
        print("how can this happen???")
        
    return final_chunks


def my_embedding(embedding_model, text: List[str], input_type=None) -> List[List[float]]:

    return ollama.embed(embedding_model, input=text).embeddings


def my_search(query_vector: List[List[float]], top_k: int, chunks: List[str], index) -> List[List[str]]:

    distances, indices = index.search(np.array(query_vector), top_k)
    
    return np.array(chunks)[indices]


def my_generate(prompts: List[str], generation_config):
    generations = []
    for prompt in tqdm(prompts):
        answer = ollama.generate("deepseek-r1:7b-qwen-distill-q4_K_M", prompt=prompt, options=generation_config).response
        generations.append(answer)
    return generations


def my_answer(queries: List[str], retrieved_chunks: List[List[str]], prompt_template: str, generation_config, generation_function):
    """
    This function encapsulates preparing your input prompt, generating an output, and exracting the choice (A,B,C,D) from the output.
    """
    def _my_form_prompt(prompt_template: str, queries: List[str], retrieved_chunks: List[List[str]]): # get your prompt ready for inputting
        prompts = []
        for query, top_k_chunks in tqdm(zip(queries, retrieved_chunks)):
            contexts_input = "\n"
            for i, context in enumerate(top_k_chunks):
                contexts_input += f"Context{i+1}: {context}\n"
    
            prompt = prompt_template.format(question=query, contexts=contexts_input)
            prompts.append(prompt)
        return prompts
    
    def _my_parse_answer(llm_outputs: List[str]): # extract what the model choose among A,B,C,D from the generated output
        outputs = []
        for output in tqdm(llm_outputs):
            outputs.append(output.split("My Final Answer:")[1].strip())
        return outputs

    prompts = _my_form_prompt(prompt_template, queries, retrieved_chunks) # apply your prompt template
    outputs = generation_function(prompts, generation_config)
    answers = _my_parse_answer(outputs)
    return outputs, answers

In [25]:
%%time
ollama.pull("deepseek-r1:7b-qwen-distill-q4_K_M")

CPU times: user 3.24 ms, sys: 2.08 ms, total: 5.32 ms
Wall time: 583 ms


ProgressResponse(status='success', completed=None, total=None, digest=None)

In [26]:
ollama.list()

ListResponse(models=[Model(model='deepseek-r1:7b-qwen-distill-q4_K_M', modified_at=datetime.datetime(2025, 3, 27, 18, 34, 27, 822840, tzinfo=TzInfo(+02:00)), digest='0a8c266910232fd3291e71e5ba1e058cc5af9d411192cf88b6d30e92b6e73163', size=4683075271, details=ModelDetails(parent_model='', format='gguf', family='qwen2', families=['qwen2'], parameter_size='7.6B', quantization_level='Q4_K_M')), Model(model='gemma2:2b', modified_at=datetime.datetime(2025, 3, 27, 18, 33, 17, 97538, tzinfo=TzInfo(+02:00)), digest='8ccf136fdd5298f3ffe2d69862750ea7fb56555fa4d5b18c04e3fa4d82ee09d7', size=1629518495, details=ModelDetails(parent_model='', format='gguf', family='gemma2', families=['gemma2'], parameter_size='2.6B', quantization_level='Q4_0')), Model(model='bge-m3:latest', modified_at=datetime.datetime(2025, 3, 27, 18, 32, 18, 161976, tzinfo=TzInfo(+02:00)), digest='7907646426070047a77226ac3e684fbbe8410524f7b4a74d02837e43f2146bab', size=1157672605, details=ModelDetails(parent_model='', format='gguf', 

In [27]:
my_prompt_template = """You are a state of the art question answering model. 
You will be given a multiple choice question. And relevant contexts that will help you to select the correct choice.
Select you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>
The contexts and question are as follows:
Question: {question}
Contexts: {contexts}
"""


my_experiment_config = {
    "experiment_id": "bigger_llm_model",
    "chunking_function": my_chunking,
    "max_token_length": 256,
    "embedding_function": my_embedding,
    "embedding_model": "bge-m3",
    "search_function": my_search,
    "top_k": 5,
    "search_metadata": {
        "reranker": None,
        "augmentation": None
    },
    "answer_function": my_answer,
    "generation_function": my_generate,
    "generation_model": "deepseek-r1:7b-qwen-distill-q4_K_M",
    "prompt_template": my_prompt_template,
    "generation_params": {
        "temperature": 0.7
    }
}

In [28]:
%%time
results = run_trial(my_experiment_config)

Token indices sequence length is longer than the specified maximum sequence length for this model (8179 > 512). Running this sequence through the model will result in indexing errors


avrupa-yakasi/Avrupa Yakası (dizi) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (3. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası bölümleri listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (4. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (1. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (6. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası oyuncu listesi - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (5. sezon) - Vikipedi.txt
avrupa-yakasi/Avrupa Yakası (2. sezon) - Vikipedi.txt
chunks are ready
chunk embeddings are ready
search index is ready
retrieval is completed


50it [00:00, 19599.55it/s]
 10%|████                                     | 5/50 [10:39<1:35:57, 127.95s/it]


KeyboardInterrupt: 

In [None]:
record_trial(results, my_experiment_config)

## check results

In [83]:
pd.read_csv("experiments/experiments_results.csv", sep="|", index_col=0)

Unnamed: 0,full_question_text,answer,example_entry_outputs,example_entry_answers,no_retriever_outputs,no_retriever_answers,tfidf_as_retriever_outputs,tfidf_as_retriever_answers
0,"When did the first season of ""Avrupa Yakası"" air? \n A) 2002-2003 \n B) 2003-2004 \n C) 2004-2005 \n D) 2005-2006 \n",C,My Final Answer: C \n,C,"Ah, Avrupa Yakası! A true masterpiece. \n\nThe first season of ""Avrupa Yakası"" aired in **2005**. \n\nMy Final Answer: b \n",b,My Final Answer: c \n,c
1,"How many episodes does the second season of ""Avrupa Yakası"" contain? \n A) 19 \n B) 32 \n C) 39 \n D) 28 \n",C,My Final Answer: c \n,c,The second season of Avrupa Yakası has 10 episodes. \n\nMy Final Answer: A \n,A,My Final Answer: c \n,c
2,Which season introduced the character Şahika? \n A) Third Season \n B) Fourth Season \n C) Fifth Season \n D) Sixth Season \n,C,My Final Answer: B \n,B,Şahika was introduced in season **2** of Avrupa Yakası. \n\nMy Final Answer: b \n,b,My Final Answer: C \n,C
3,"Which character did Binnur Kaya play in the sixth season of ""Avrupa Yakası""? \n A) İffet Sütçüoğlu \n B) Dilber Hala \n C) Aslı \n D) Makbule \n",B,My Final Answer: C \n,C,"Binnur Kaya played the character 'Ayşe' in the sixth season of ""Avrupa Yakası"". \n\nMy Final Answer: b \n",b,My Final Answer: c \n,c
4,"How many total episodes does the ""Avrupa Yakası"" series have? \n A) 132 \n B) 150 \n C) 190 \n D) 200 \n",C,My Final Answer: B \n,B,"That's a great start! I'm ready for some Avrupa Yakası trivia. Let's get those questions flowing. \n\n**Question:** ('How many total episodes does the ""Avrupa Yakası"" series have? ') \n\nMy Final Answer: <b> 103 </b> \n",<b> 103 </b>,My Final Answer: 6 \n,6
5,"Who was the director for the majority of the ""Avrupa Yakası"" series? \n A) Sinan Çetin \n B) Gülse Birsel \n C) Jale Atabey Özberk \n D) Hakan Algül \n",C,My Final Answer: b \n,b,The director who helmed many episodes of Avrupa Yakası is **Gülşah Şahin.** \n\nMy Final Answer: b \n,b,My Final Answer: D \n,D
6,Which actor played the character of Cem throughout the series? \n A) Ata Demirer \n B) Levent Üzümcü \n C) Engin Günaydın \n D) Yıldırım Öcek \n,B,My Final Answer: b \n,b,"The actor who played the character of Cem throughout the series ""Avrupa Yakası"" is **Cengiz Bozkurt**. \n\nMy Final Answer: C\n",C,My Final Answer: A \n,A
7,"In which season did Engin Günaydın join the cast of ""Avrupa Yakası""? \n A) First \n B) Second \n C) Third \n D) Fourth \n",C,My Final Answer: a \n,a,"Engin Günaydın joined the cast of ""Avrupa Yakası"" in **Season 5**. \n\nMy Final Answer: a \n",a,My Final Answer: A \n,A
8,"What was the main setting of ""Avrupa Yakası""? \n A) An office building \n B) A suburban neighborhood \n C) A family residence \n D) A magazine office \n",C,My Final Answer: D \n,D,"The main setting of ""Avrupa Yakası"" was a **Istanbul apartment building**. \n\nMy Final Answer: b \n",b,My Final Answer: A \n,A
9,Which character tried to marry Aslı in the first season? \n A) Cem \n B) Volkan \n C) Tacettin \n D) Sertaç \n,C,My Final Answer: d \n,d,"Ah, a classic! Let's dive into the world of Avrupa Yakası. \n\n**Answer:** The character who attempted to wed Aslı in the very first season was Orhan. \n\nMy Final Answer: **C** \n",**C**,My Final Answer: C \n,C


In [31]:
pd.read_csv("experiments/experiment_leaderboard.csv", sep="|", index_col=0)

Unnamed: 0,experiment_id,chunking_function,max_token_length,embedding_function,embedding_model,search_function,top_k,search_metadata,answer_function,generation_function,generation_model,prompt_template,generation_params,rights,wrongs,score
0,example_entry,<function my_chunking at 0x116e10d30>,256,<function my_embedding at 0x11768ff40>,bge-m3,<function my_search at 0x116b512d0>,5,"{'reranker': None, 'augmentation': None}",<function my_answer at 0x11683d990>,<function my_generate at 0x11683f370>,gemma2:2b,You are a state of the art question answering model. \nYou will be given a multiple choice question. And relevant contexes that will help you to select the correct choice.\nSelect you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>\nThe contexes and question are as follows:\nQuestion: {question}\nContexes: {contexes}\n,{'temperature': 0.7},"Index([0, 1, 6, 10, 19, 23, 29, 31, 38, 48, 49], dtype='int64')","Index([ 2, 3, 4, 5, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22,\n 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44,\n 45, 46, 47],\n dtype='int64')",0.22
1,no_retriever,<function fake_chunking at 0x115b563b0>,256,<function fake_embedding at 0x115b553f0>,none,<function fake_search at 0x115b56560>,0,"{'reranker': None, 'augmentation': None}",<function answer_without_context at 0x115b56320>,<function my_generate at 0x115437250>,gemma2:2b,You are an expert on Turkish Tv shows from before 2010. You have incredible knowledge especially about Avrupa Yakası by Gülse Birsel.\nYou can easily answer any question about Avrupa Yakası.\nYou will be given a multiple choice question. And you are asked to select the correct choice.\nGive your answers by making absolutely sure the last thing you say is your choice in this format: My Final Answer: <the letter>\nHere is your first question:\nQuestion: {question}\n\n,{'temperature': 0.7},"Index([3, 10, 17, 20, 23, 27, 30, 39], dtype='int64')","Index([ 0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 18, 19, 21,\n 22, 24, 25, 26, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43,\n 44, 45, 46, 47, 48, 49],\n dtype='int64')",0.16
2,tfidf_as_retriever,<function my_chunking at 0x11cf4ac20>,256,<function tfidf_embedding at 0x11d59f370>,bge-m3,<function my_search at 0x11d59ecb0>,5,"{'reranker': None, 'augmentation': None}",<function my_answer at 0x11d51f7f0>,<function my_generate at 0x11d59e9e0>,gemma2:2b,You are a state of the art question answering model. \nYou will be given a multiple choice question. And relevant contexts that will help you to select the correct choice.\nSelect you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>\nThe contexts and question are as follows:\nQuestion: {question}\nContexts: {contexts}\n,{'temperature': 0.7},"Index([0, 1, 2, 9, 12, 14, 17, 18, 21, 23, 26, 29, 34, 37, 40, 48, 49], dtype='int64')","Index([ 3, 4, 5, 6, 7, 8, 10, 11, 13, 15, 16, 19, 20, 22, 24, 25, 27, 28,\n 30, 31, 32, 33, 35, 36, 38, 39, 41, 42, 43, 44, 45, 46, 47],\n dtype='int64')",0.34
3,tfidf_as_retriever_k10,<function my_chunking at 0x109534d30>,256,<function tfidf_embedding at 0x109534820>,tfidf,<function my_search at 0x109534b80>,10,"{'reranker': None, 'augmentation': None}",<function my_answer at 0x109534790>,<function my_generate at 0x109535000>,gemma2:2b,You are a state of the art question answering model. \nYou will be given a multiple choice question. And relevant contexts that will help you to select the correct choice.\nSelect you choice by making absolutely sure the last thing you say is your choice. In this format: My Final Answer: <the letter>\nThe contexts and question are as follows:\nQuestion: {question}\nContexts: {contexts}\n,{'temperature': 0.7},"Index([1, 3, 4, 5, 13, 17, 26, 31, 34, 38, 39, 43, 47, 48], dtype='int64')","Index([ 0, 2, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18, 19, 20, 21, 22, 23,\n 24, 25, 27, 28, 29, 30, 32, 33, 35, 36, 37, 40, 41, 42, 44, 45, 46, 49],\n dtype='int64')",0.28
