In [1]:
import os

import click

from rag_workshop.config import settings
from rag_workshop.generation import create_rag_chain, get_documents_for_query

In [2]:
os.environ["OPENAI_API_KEY"] = settings.OPENAI_API_KEY

In [14]:
def format_results(results: list[str]):
    for i, result in enumerate(results):
        print(f"\nDocument {i}:")
        print("-" * 100)
        print(result[:500])
        print("...")

In [3]:
rag_chain = create_rag_chain()

[32m2025-03-19 16:04:28.281[0m | [1mINFO    [0m | [36mrag_workshop.retrievers[0m:[36mget_retriever[0m:[36m28[0m - [1mGetting retriever using 'sentence-transformers/all-MiniLM-L6-v2' on 'cpu' with 3 top results[0m
  from .autonotebook import tqdm as notebook_tqdm
[32m2025-03-19 16:04:32.619[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 200 and overlap: 30[0m
[32m2025-03-19 16:04:32.711[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 800 and overlap: 120[0m


### Example 1

In [19]:
results = get_documents_for_query("How does BERT work?")
format_results(results)

[32m2025-03-19 16:09:59.537[0m | [1mINFO    [0m | [36mrag_workshop.retrievers[0m:[36mget_retriever[0m:[36m28[0m - [1mGetting retriever using 'sentence-transformers/all-MiniLM-L6-v2' on 'cpu' with 3 top results[0m
[32m2025-03-19 16:10:01.210[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 200 and overlap: 30[0m
[32m2025-03-19 16:10:01.211[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 800 and overlap: 120[0m



Document 0:
----------------------------------------------------------------------------------------------------
## What Makes BERT Different?

  * BERT builds upon recent work in pre-training contextual representations — including [Semi-supervised Sequence Learning](https://arxiv.org/abs/1511.01432), [Generative Pre-Training](https://blog.openai.com/language-unsupervised/), [ELMo](https://allennlp.org/elmo), and [ULMFit](http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html).
  * However, unlike these previous models, BERT is the first deeply bidirectional, unsupervised language representa
...

Document 1:
----------------------------------------------------------------------------------------------------
* ELMo came up with the concept of contextualized embeddings by grouping together the hidden states of the LSTM-based model (and the initial non-contextualized embedding) in a certain way (concatenation followed by weighted summation).



## BERT: an Overview

  * At

In [20]:
rag_chain.invoke("How does BERT work?")

'BERT (Bidirectional Encoder Representations from Transformers) works by using a transformer-based architecture to generate deeply bidirectional contextual representations of text. It is pre-trained using two main objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).\n\n1. **Masked Language Modeling (MLM):** BERT masks a portion of the input tokens and trains the model to predict these masked tokens based on their context. This allows BERT to learn bidirectional representations, as it considers both the left and right context of a word.\n\n2. **Next Sentence Prediction (NSP):** BERT is also trained to understand the relationship between sentences. It does this by predicting whether a given sentence B follows sentence A in the original text, which helps the model handle tasks involving multiple sentences.\n\nBERT uses WordPiece tokenization, which breaks words into smaller sub-word units, and incorporates special tokens like `[CLS]` for classification tasks and 

### Example 2

In [23]:
results = get_documents_for_query("How are similarity scores normalized?")
format_results(results)


[32m2025-03-19 16:11:26.474[0m | [1mINFO    [0m | [36mrag_workshop.retrievers[0m:[36mget_retriever[0m:[36m28[0m - [1mGetting retriever using 'sentence-transformers/all-MiniLM-L6-v2' on 'cpu' with 3 top results[0m
[32m2025-03-19 16:11:28.536[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 200 and overlap: 30[0m
[32m2025-03-19 16:11:28.537[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 800 and overlap: 120[0m



Document 0:
----------------------------------------------------------------------------------------------------
* For each token, the model computes the similarity of its QQ vector with every other token’s KK vector in the sequence. This similarity score is then normalized (typically through softmax), resulting in attention weights.
    * These weights tell us the degree to which each token should “attend to” (or incorporate information from) other tokens.
  * **Weighted Summation of Values (Producing Contextual Embeddings)** :

    * Using the attention weights, each token creates a weighted sum over the
...

Document 1:
----------------------------------------------------------------------------------------------------
* This Jupyter notebook by Jay Alammar offers a great intro to using a pre-trained BERT model to carry out sentiment classification using the Stanford Sentiment Treebank (SST2) dataset.



[![](../../../images/read/bert_first.jpg)](https://github.com/jalammar/jalamma

In [24]:
rag_chain.invoke("How are similarity scores normalized?")

'The similarity scores are normalized typically through softmax.'

### Example 3

In [25]:
results = get_documents_for_query("What is the difference between dynamic and continuous batching?")
format_results(results)


[32m2025-03-19 16:11:46.223[0m | [1mINFO    [0m | [36mrag_workshop.retrievers[0m:[36mget_retriever[0m:[36m28[0m - [1mGetting retriever using 'sentence-transformers/all-MiniLM-L6-v2' on 'cpu' with 3 top results[0m
[32m2025-03-19 16:11:48.076[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 200 and overlap: 30[0m
[32m2025-03-19 16:11:48.077[0m | [1mINFO    [0m | [36mrag_workshop.splitters[0m:[36mget_splitter[0m:[36m20[0m - [1mGetting splitter with chunk size: 800 and overlap: 120[0m



Document 0:
----------------------------------------------------------------------------------------------------
[more](#)

Cookie duration resets each session. 

[View details](#) | [Storage details](#) | [Privacy policy](#)

Consent

## Localsensor B.V.

Doesn't use cookies.

Data collected and processed: IP addresses, Device characteristics, Device identifiers, Non-precise location data, Precise location data, Privacy choices

[more](#)

Uses other forms of storage.

[View details](#) | [Privacy policy](#)

Consent

## Online Solution

Cookie duration: 365 (days).

Data collected and processed: IP addre
...

Document 1:
----------------------------------------------------------------------------------------------------
[more](#)

Cookie duration resets each session. Uses other forms of storage.

[View details](#) | [Privacy policy](#)

Consent

## HUMAN

Doesn't use cookies.

Data collected and processed: IP addresses, Device characteristics, Device identifiers, Probabilistic ident

In [26]:
rag_chain.invoke("What is the difference between dynamic and continuous batching?")

"I DON'T KNOW"