# Libraries to Install

In [1]:
!pip install gradio llama-index-core llama-index-llms-ollama llama-index-embeddings-ollama



# Sefaria

[Sefaria](https://www.sefaria.org/) is a free, open-source library of Jewish texts, including the **Tanakh**, **Talmud**, **Midrash**, **Halakha**, **Kabbalah**, and thousands of commentaries and modern works.  
It provides full access in **Hebrew and English** (with many other translations), cross-referenced links between texts, and powerful study tools.

Key features:
- 📚 Comprehensive collection of classical Jewish sources
- 🌐 Free online access with an open API
- 🔗 Interlinked texts (e.g., verses → Talmud → commentaries)
- 🌍 Supports collaborative translation and commentary projects
- 🖥️ Easy to integrate into apps, research, or personal study

Sefaria is widely used by students, scholars, educators, and anyone interested in exploring Jewish texts digitally.


### Download the 5 Books of Moses
* 'Genesis'
* 'Exodus'
* 'Leviticus'
* 'Numbers'
* 'Deuteronomy'

In your code, the `params` dict is being passed to `requests.get` like this:

```python
params={'context': 0, 'commentary': 0, 'pad': 0, 'lang': 'en'}
```

That means the actual request URL being sent to the Sefaria API will look like:

```
https://www.sefaria.org/api/texts/Genesis?context=0&commentary=0&pad=0&lang=en
```

For each book name, the parameters expand to:

* **`context=0`** → Don’t include surrounding context verses/lines.
* **`commentary=0`** → Don’t include commentary in the response.
* **`pad=0`** → Don’t pad the text with empty strings where text is missing.
* **`lang=en`** → Request the text in English.

So for `"Exodus"`, for example, the request URL would be:

```
https://www.sefaria.org/api/texts/Exodus?context=0&commentary=0&pad=0&lang=en
```

and the response you get back is JSON, where `data.get("text")` is a list of chapter/verse strings in English.

👉 If you’d like, I can show you how the raw JSON looks for one book so you know what to expect. Want me to pull the first few verses of Genesis from the API and show you the structure?


## 📊 Fetch Torah Text
1. Into dictionary by book
2. Into dataframe split by chapter and verse.

In [24]:
import pandas as pd
import requests

rows = []
for book_name in ['Genesis', 'Exodus', 'Leviticus', 'Numbers', 'Deuteronomy']:
    url = f'https://www.sefaria.org/api/texts/{book_name}'
    response = requests.get(url, params={'pad': 0, 'lang': 'en'})
    data = response.json()
    text = data.get("text")
    for chapter_idx,chapter in enumerate(text):
        for verse_idx,verse in enumerate(chapter):
            rows.append((book_name, chapter_idx, verse_idx, verse))
        
df_bible = pd.DataFrame(rows, columns=['book', 'chapter', 'verse', 'text'])
df_bible['text'] = df_bible['text'].str.split("<").str[0]
df_bible.dropna(inplace=True)

In [25]:
df_bible

Unnamed: 0,book,chapter,verse,text
0,Genesis,0,0,When God began to create
1,Genesis,0,1,"the earth being unformed and void, with darkne..."
2,Genesis,0,2,"God said, “Let there be light”; and there was ..."
3,Genesis,0,3,"God saw that the light was good, and God separ..."
4,Genesis,0,4,God called the light Day and called the darkne...
...,...,...,...,...
5841,Deuteronomy,33,7,And the Israelites bewailed Moses in the stepp...
5842,Deuteronomy,33,8,Now Joshua son of Nun was filled with the spir...
5843,Deuteronomy,33,9,Never again did there arise in Israel a prophe...
5844,Deuteronomy,33,10,for the various signs and portents that יהוה s...


# LlamaIndex

**LlamaIndex** (formerly **GPT Index**) is a Python library that connects **LLMs** (like GPT-4 or LLaMA) to your own data for building **retrieval-augmented generation (RAG)** apps.

### 🔍 How It Works

LLMs can’t access your PDFs, databases, or APIs directly. LlamaIndex bridges the gap by:

1. **Loading** data (PDFs, Notion, SQL, APIs, etc.).
2. **Indexing** it into structures (vectors, keywords, lists).
3. **Retrieving** relevant chunks for a query.
4. **Providing context** to the LLM for accurate answers.

### 🧱 Core Pieces

* **Data Connectors**: Import from files, sites, or databases.
* **Indices**: Store and organize data for retrieval.
* **Retrievers**: Find the right chunks per query.
* **Engines**: Pair retrievers + LLMs for apps like chatbots and document Q\&A.

### 📖 Example

For the Bible, LlamaIndex can:

* Load and chunk the text,
* Build a vector index,
* Retrieve verses about “kosher” or “Exodus 20,”
* Feed them to GPT-4/LLaMA-3 to answer naturally, e.g.:
  *“The Ten Commandments appear in Exodus 20.”*

---



# 🧠 Define Ollama LLM and Embedding Models

We’ll use ```nomic-embed-text``` for embeddings and Qwen ```qwq``` for answering questions:

**NOTE** using a reasoning model is highly recommended in my experience.

* Pull these models just in case you don't have them installed yet locally

## Prerequisite: 

* Install ```curl -fsSL https://ollama.com/install.sh | sh```



In [26]:
!ollama pull nomic-embed-text
!ollama pull qwq

[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 970aa74c0a90: 100% ▕██████████████████▏ 274 MB                         [K
pulling c71d239df917: 100% ▕██████████████████▏  11 KB                         [K
pulling ce4a164fc046: 100% ▕██████████████████▏   17 B                         [K
pulling 31df23ea7daa: 100% ▕██████████████████▏  420 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l
[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 7ccc6415b2c7: 100% ▕██████████████████▏  19 GB                         [K
pulling 41190096a061: 100% ▕██████████████████▏ 1.2 KB                         [K
pulling d18a5cc71b84: 100% ▕██████████████████▏  11 KB                         [K
pulling e5229acc2492: 100% ▕██████████████████▏  1

In [27]:
!ollama list

NAME                       ID              SIZE      MODIFIED      
qwq:latest                 009cb3f08d74    19 GB     7 seconds ago    
nomic-embed-text:latest    0a109f422b47    274 MB    8 seconds ago    
deepseek-r1:latest         6995872bfe4c    5.2 GB    5 weeks ago      
qwen3:latest               500a1f067a9f    5.2 GB    6 weeks ago      
qwen:latest                d53d04290064    2.3 GB    6 weeks ago      


### Set the LLM + Embedding Model to Ollama Models

In [29]:
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

# Set Embedding Model
Settings.embed_model=OllamaEmbedding('nomic-embed-text:latest')

# Set LLM Model
Settings.llm = Ollama(model='qwq:latest',stream=True)

### Parse The Torah DataFrame into Documents for Lllama Index

In [31]:
from llama_index.core import Document

documents = []

for (book, chapter), group in df_bible.groupby(['book', 'chapter']):
    chapter_text = "\n".join(group['text'])
    metadata = {
        "book": book,
        "chapter": int(chapter),
        "verse_start": int(group['verse'].min()),
        "verse_end": int(group['verse'].max()),
    }
    documents.append(Document(text=chapter_text, metadata=metadata))

### Index Them (I.E. Get Embeddings)

In [33]:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents,show_progress=True)

Parsing nodes:   0%|          | 0/187 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/244 [00:00<?, ?it/s]

### Create Query Engine + Chat Engine
* Note I used similarity top k = 10. I recommend a highish number for accuracy here

In [36]:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondenseQuestionChatEngine

query_engine = index.as_query_engine(similarity_top_k=10)
chat_memory = ChatMemoryBuffer.from_defaults(token_limit=4000)

# Display the Gradio ChatBot

In [None]:
import gradio as gr

def chat_interfact(message,history):
    response = chat_engine.stream_chat(message)
    so_far = ''
    for token != '<think>':
        so_far += str(token)
        yield so_far

gr.ChatInterface(chat_engin