
<div align="center">
  <h1></h1>
  <h1>Stylized Retrieval-Augmented Generation</h1>
</div>

In this notebook, I will build and implement a Retrieval-Augmented Generation (RAG) pipeline tailored for a text style transfer application.


### Table of Contents
- [1. Access to Hugging Face](#1-access-to-hugging-face)
- [2. Packages](#2-packages)
- [3. Problem Statement](#3-problem-statement)
- [4. Fetch and Parse](#4-fetch-and-parse)
- [5. Calculate Word Stats](#5-calculate-word-stats)
- [6. Set Up LLM](#6-set-up-llm)
- [7. BM25 Retriever](#7-bm25-retriever)
- [8. Build Chroma](#8-build-chroma)
- [9. Ensemble Retriever](#9-ensemble-retriever)
- [10. Format Documents](#10-format-documents)
- [11. RAG Chain](#11-rag-chain)
- [12. Final Response](#12-final-response)


# 1. Access to Hugging face
Execute the following cell to connect to your Hugging Face account.

In [1]:
import getpass
import os

# Prompt user for Hugging Face API token if not already set
if "HUGGINGFACEHUB_API_TOKEN" not in os.environ:
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your Huggingfacehub API token: ")


Enter your Huggingfacehub API token: ········


# 2. Packages
Execute the following code cells for installing the packages needed for creating your Stylized RAG.

note: If there are package conflics you can use pip-tools to automatically find and install the compatible versions.

In [2]:

!pip install -q langchain
!pip install -q langchain-community
!pip install -q langchain-chroma
!pip install -q langchain-huggingface
!pip install -q bs4
!pip install -q rank_bm25
!pip install -q huggingface_hub
!pip install -q requests

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m[31mERROR: Cannot uninstall protobuf 4.25.3, RECORD file not found. You might be able to recover from this via: 'pip install --force-reinstall --no-deps protobuf==4.25.3'.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[

# 3. Problem Statement
I will implement **Text Style Transfer**, a technique that modifies text style while preserving its content. I will build an **ensemble retriever** combining **BM25** for keyword-based retrieval and **Chroma** for semantic search to retrieve relevant documents, which will be used as input for the style transfer process. This project integrates classical retrieval methods with modern neural embeddings for practical NLP applications.

**what is text style transfer?**

**Text Style Transfer** is a natural language processing (NLP) technique that modifies the style of a given text while preserving its original content. It allows for the transformation of linguistic expressions to convey different tones, emotions, or writing styles without altering the underlying meaning. For example, it can rephrase formal text into a casual tone, adapt neutral statements into an emotional tone, or convert modern language into a Shakespearean style. This technique has applications in personalized communication, creative writing, sentiment adjustment, and even domain adaptation, making it a powerful tool for generating diverse textual outputs tailored to specific needs.

### Example of Text Style Transfer:

#### **Input (Neutral Tone):**
"I am excited about the opportunity to work on this project."

#### **Output (Formal Tone):**
"I am genuinely enthusiastic about the prospect of contributing to this project."

#### **Output (Casual Tone):**
"I'm super pumped to get started on this project!"

#### **Output (Shakespearean Style):**
"Verily, I am thrilled by the chance to partake in this noble endeavor."


# 4. Fetch and Parse

*    Fetching and parsing web content: Write a function that fetches the HTML content of a webpage and processes it to extract clean, readable text.
*    Splitting text into smaller chunks: Implement a function to split the text into overlapping chunks, ensuring that each chunk is manageable for downstream tasks.

In [3]:
import os
import requests
import numpy as np
from bs4 import BeautifulSoup
from langchain.schema import Document
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

def fetch_and_parse(url: str) -> str:
    """
    Fetch the webpage content at `url` and return a cleaned string of text.
    """
    try:
        # Step 1: Fetch webpage content
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for bad status codes
        
        # Step 2: Parse HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Step 3: Remove unwanted elements (scripts, styles, etc.)
        for element in soup(['script', 'style', 'head', 'header', 'footer', 'nav']):
            element.decompose()
            
        # Step 4: Extract and clean text content
        text = soup.get_text()
        # Clean up whitespace
        lines = (line.strip() for line in text.splitlines())
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        text = ' '.join(chunk for chunk in chunks if chunk)
        
        return text
        
    except requests.RequestException as e:
        print(f"Error fetching URL {url}: {str(e)}")
        return ""

def split_text_into_documents(text: str, chunk_size: int = 1000, overlap: int = 100):
    """
    Split a long text into overlapping chunks and return them as Documents.
    """
    # Initialize empty list for documents
    docs = []
    
    # If text is shorter than chunk_size, return it as a single document
    if len(text) <= chunk_size:
        return [Document(page_content=text)]
    
    # Split text into overlapping chunks
    start = 0
    while start < len(text):
        # Get chunk of text
        end = start + chunk_size
        chunk = text[start:end]
        
        # If we're not at the last chunk, try to find a good breaking point
        if end < len(text):
            # Try to find the last period or space before the end
            last_period = chunk.rfind('.')
            last_space = chunk.rfind(' ')
            break_point = max(last_period, last_space)
            
            if break_point != -1:
                chunk = chunk[:break_point + 1]
                end = start + break_point + 1
        
        # Create Document object and add to list
        docs.append(Document(page_content=chunk))
        
        # Move start position, accounting for overlap
        start = end - overlap
    
    return docs

USER_AGENT environment variable not set, consider setting it to identify your requests.


# 5. Calculate Word Stats

1. Calculate the total number of words and characters across all documents.
2. Compute the average number of words and characters per document.
3. Print the average statistics in a human-readable format.

In [4]:
def calculate_word_stats(texts):
    """
    Calculate and display average word and character statistics for a list of documents.

    Parameters:
    - texts (list): A list of Document objects, where each Document contains a `page_content` attribute.

    Returns:
    - None: Prints the average word and character counts per document.
    """

    # Step 1: Initialize variables to keep track of total words and total characters.
    total_words, total_characters = 0, 0
    
    #Handle empty input
    if not texts:
        print("No documents provided")
        return
    

    # Step 2: Iterate through each document in the `texts` list.
    for doc in texts:
        content = doc.page_content
        
        words = content.split()
        total_words  += len(words)
        
        total_characters += len(content)
        # Hint: `doc.page_content` contains the text of the document.

    # Step 3: Calculate the average words and characters per document.
    # - Avoid division by zero by checking if the `texts` list is not empty.
    num_docs = len(texts)
    avg_words = total_words / num_docs
    avg_characters = total_characters / num_docs

    # Step 4: Print the calculated averages in a readable format.
    # Example: "Average words per document: 123.45"
    print(f"Average words per document: {avg_words}")
    print(f"Average characters per document: {avg_characters}")


In [5]:
# Execute this cell to test your calculate_word_stats function.
# Create sample Document objects with text content for testing your code above.
sample_docs = [
    Document(page_content="This is the first test document."),
    Document(page_content="Here is another example document for testing."),
    Document(page_content="Short text."),
    Document(page_content="This document has more content. It's longer and has more words in it for testing purposes."),
]

# Call the function with the sample documents to calculate word statistics.
calculate_word_stats(sample_docs)


Average words per document: 7.75
Average characters per document: 44.5


# 6. Set Up LLM
This function will:

1. Initialize and connect to a pre-trained model available on Hugging Face.
2. Allow customization of parameters like the model repository ID and generation temperature.
3. Return the configured LLM object, which will be used later for text generation tasks in the RAG pipeline.

In [6]:
from langchain_huggingface import HuggingFaceEndpoint

def setup_llm(repo_id="mistralai/Mistral-7B-Instruct-v0.3"):
    """
    Set up and return a Hugging Face LLM using the specified model repository ID and generation parameters.

    Parameters:
    - repo_id (str): The repository ID of the Hugging Face model to use (default: "mistralai/Mistral-7B-Instruct-v0.3").
    - temperature (float): The generation temperature to control creativity in outputs (default: 1.0).

    Returns:
    - HuggingFaceEndpoint: A configured LLM object ready for text generation.
    """

    # Step 1: Import the HuggingFaceEndpoint class.
    llm = HuggingFaceEndpoint(
        repo_id = repo_id,
        temperature = 0.7)
    
    return llm


# 7. BM25 Retriever

1. Initialize the BM25 retriever with a set of documents.
2. Implement a method to retrieve the top k most relevant documents for a given query.
3. Use efficient tokenization and scoring to ensure accurate and fast results.
This component will enable the pipeline to fetch relevant information from a corpus, which is then passed to the LLM for further processing.

In [43]:
from rank_bm25 import BM25Okapi
from langchain_core.runnables import RunnablePassthrough

class BM25Retriever:
    """
    A class to implement BM25-based document retrieval.
    """
    def __init__(self, documents):
        """
        Initialize the BM25 retriever with the given documents.
        """
        # Store the original documents
        self.documents = documents
        
        # Extract text content from documents and create corpus
        self.corpus = [doc.page_content for doc in documents]
        
        # Tokenize the corpus - split each document into words
        self.tokenized_corpus = [doc.split() for doc in self.corpus]
        
        # Initialize BM25 with tokenized corpus
        self.bm25 = BM25Okapi(self.tokenized_corpus)

    def retrieve(self, query, k=5):
        """
        Retrieve the top k most relevant documents for a given query.
        """
        # Tokenize the query
        tokenized_query = query.split()
        
        # Get document scores using BM25
        scores = self.bm25.get_scores(tokenized_query)
        
        # Get indices of top k scoring documents
        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:k]
        
        # Return the original documents in order of relevance
        return [self.corpus[i] for i in top_indices]

Execute the following code to test the implementation.

In [44]:
from langchain.schema import Document

# Create sample Document objects.
sample_docs = [
    Document(page_content="Machine learning is a method of data analysis that automates analytical model building."),
    Document(page_content="Deep learning is a subset of machine learning that uses neural networks with three or more layers."),
    Document(page_content="Artificial intelligence encompasses a wide range of technologies, including machine learning and deep learning."),
    Document(page_content="Natural language processing is a field of AI focused on the interaction between computers and human language."),
]

# Initialize the retriever with the sample documents.
retriever = BM25Retriever(sample_docs)

# Test the retriever with a query.
query = "What is machine learning?"
top_docs = retriever.retrieve(query, k=2)

# Print the results.
print("Top Relevant Documents:")
for idx, doc in enumerate(top_docs, 1):
    print(f"{idx}. {doc}")


Top Relevant Documents:
1. Machine learning is a method of data analysis that automates analytical model building.
2. Deep learning is a subset of machine learning that uses neural networks with three or more layers.


Expected output:

Top Relevant Documents:
1. Machine learning is a method of data analysis that automates analytical model building.
2. Artificial intelligence encompasses a wide range of technologies, including machine learning and deep learning.


# 8. Build Chroma

1. Initializing a vector store (Chroma) with Hugging Face embeddings.
2. Adding a list of documents to the vector store.
3. Returning the vector store for later use in the retrieval and generation pipeline.

This function sets up the semantic retrieval system, allowing for more meaningful and context-aware results than keyword-based retrieval.

In [11]:
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.schema import Document

def build_chroma(documents: list[Document]) -> Chroma:
    """
    Build a Chroma vector store using Hugging Face embeddings.
    """
    # Initialize Hugging Face embeddings
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"
    )
    
    # Initialize Chroma vector store
    vector_store = Chroma(
        collection_name="EngGenAI",
        embedding_function=embeddings
    )
    
    # Add documents to the vector store
    # This will automatically compute embeddings for all documents
    vector_store.add_documents(documents)
    
    return vector_store

Execute the following code to test the implementation.

In [12]:
from langchain.schema import Document

# Create sample Document objects.
sample_docs = [
    Document(page_content="Machine learning is a method of data analysis that automates analytical model building."),
    Document(page_content="Deep learning is a subset of machine learning that uses neural networks with three or more layers."),
    Document(page_content="Artificial intelligence encompasses a wide range of technologies, including machine learning and deep learning."),
    Document(page_content="Natural language processing is a field of AI focused on the interaction between computers and human language."),
]

# Call the function to build the Chroma vector store.
vector_store = build_chroma(sample_docs)

# Test retrieval (optional, if supported).
print("Vector store built successfully!")
print(vector_store)  # Print the vector store object to verify.


  vector_store = Chroma(


Vector store built successfully!
<langchain_community.vectorstores.chroma.Chroma object at 0x35981f910>


Expected output:

Vector store built successfully!

<langchain.vectorstores.Chroma object at 0x7f8c1a4b3f10>

# 9. Ensemble Retriever

1. Retrieve documents from both Chroma (semantic search) and BM25 (lexical search).
2. Combine the results from both retrievers while deduplicating overlapping results.
3. Return the top k most relevant and unique documents.

This function plays a vital role in the RAG pipeline by ensuring that the retrieved documents are relevant and diverse, combining semantic understanding with precise keyword matching.

In [13]:
class EnsembleRetriever:
    """
    Merges results from Chroma similarity search and BM25 lexical search.
    """
    
    def __init__(self, chroma_store, bm25_retriever):
        self.chroma_store = chroma_store
        self.bm25_retriever = bm25_retriever
    
    def get_relevant_documents(self, query: str, k: int = 5):
        """
        Retrieve relevant documents by combining results from Chroma and BM25.
        """
        # Get documents from Chroma (semantic search)
        chroma_docs = self.chroma_store.similarity_search(query, k=k)
        
        # Get documents from BM25 (lexical search)
        bm25_docs = self.bm25_retriever.retrieve(query, k=k)
        
        # Convert BM25 results to Document objects if they aren't already
        bm25_docs = [
            doc if isinstance(doc, Document) else Document(page_content=doc)
            for doc in bm25_docs
        ]
        
        # Combine results from both retrievers
        combined = chroma_docs + bm25_docs
        
        # Deduplicate results
        seen = set()
        unique_docs = []
        
        for doc in combined:
            # Use first 60 chars as deduplication key
            content = doc.page_content
            key = content[:60]
            
            if key not in seen:
                unique_docs.append(doc)
                seen.add(key)
                
                # Break if we have enough documents
                if len(unique_docs) >= k:
                    break
        
        return unique_docs[:k]

Run the following code to test the implementation.

In [14]:
from langchain.schema import Document

# Sample documents
sample_docs = [
    Document(page_content="Machine learning automates model building using data."),
    Document(page_content="Deep learning is a type of machine learning using neural networks."),
    Document(page_content="AI includes technologies like machine learning and deep learning."),
    Document(page_content="Natural language processing focuses on human-computer language interaction."),
]

# Sample Chroma and BM25 retrievers (mock behavior)
class MockChroma:
    def similarity_search(self, query, k):
        return [Document(page_content="Machine learning automates model building using data.")]

class MockBM25:
    def retrieve(self, query, k):
        return ["Deep learning is a type of machine learning using neural networks."]

# Initialize mock retrievers
chroma = MockChroma()
bm25 = MockBM25()

# Initialize EnsembleRetriever
ensemble_retriever = EnsembleRetriever(chroma, bm25)

# Test the retriever with a query
query = "What is machine learning?"
results = ensemble_retriever.get_relevant_documents(query, k=3)

# Print the results
print("Ensemble Retrieval Results:")
for idx, doc in enumerate(results, 1):
    print(f"{idx}. {doc.page_content}")


Ensemble Retrieval Results:
1. Machine learning automates model building using data.
2. Deep learning is a type of machine learning using neural networks.


Ensemble Retrieval Results:
1. Machine learning automates model building using data.
2. Deep learning is a type of machine learning using neural networks.


In [15]:
from langchain_core.output_parsers import BaseOutputParser

class StrOutputParser(BaseOutputParser):
    def parse(self, text: str):
        return text

# 10. Format Documents

format_docs(docs):

This function takes a list of documents (docs) and formats them into a readable, numbered list. If no documents are provided, it returns a default message indicating the absence of context.

style_prompt:

This is a prompt template that prepares the input for a neural style transfer task. It asks an AI model to rewrite a given text (original_text) in a specified style, optionally using a contextual snippet (context) from the retrieved documents.

In [17]:
from langchain.prompts import PromptTemplate

def format_docs(docs):
    """
    Format a list of documents into a numbered, readable string.
    """
    if not docs:
        return "No relevant context found."
    
    # Format each document with a number and cleaned text
    snippet_list = []
    for i, doc in enumerate(docs, 1):
        # Clean the text: remove extra whitespace and normalize spacing
        content = ' '.join(doc.page_content.split())
        snippet_list.append(f"{i}. {content}")
    
    # Join all formatted snippets with newlines
    return '\n'.join(snippet_list)

# Define the style transfer prompt template
style_prompt = PromptTemplate(
    input_variables=["style", "context", "original_text"],
    template="""
Use the following context as reference to rewrite the original text in the specified style.

Context:
{context}

Original Text: {original_text}

Please rewrite the text in the following style: {style}

Keep the core meaning and factual content but adapt the writing style and tone.
Be creative but maintain accuracy and clarity.

Rewritten Text:
"""
)

Execute the following code to test the implementation.

In [18]:
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEndpoint  # Or the specific LLM library you're using

# Example setup for LLM (ensure this is compatible with your LLM)
def setup_llm():
    return HuggingFaceEndpoint(
        repo_id="mistralai/Mistral-7B-Instruct-v0.3",  # Replace with the appropriate model
        temperature=0.7
    )

# Sample documents
sample_docs = [
    Document(page_content="Machine learning automates data analysis."),
    Document(page_content="Deep learning uses neural networks to learn patterns."),
    Document(page_content="Artificial intelligence includes various technologies."),
]

# Test the format_docs function
formatted_docs = format_docs(sample_docs)
print("Formatted Documents:\n")
print(formatted_docs)

# Test the style_prompt with sample inputs
style = "poetic"
context = formatted_docs
original_text = "Artificial intelligence is transforming the world."

styled_prompt = style_prompt.format(
    style=style,
    context=context,
    original_text=original_text,
)

print("\nGenerated Prompt for Style Transfer:\n")
print(styled_prompt)

# Pass the prompt to the LLM
llm = setup_llm()  # Initialize the LLM
styled_output = llm(styled_prompt)  # Generate the styled text

print("\n--- Rewritten (Styled) Text ---")
print(styled_output)


Formatted Documents:

1. Machine learning automates data analysis.
2. Deep learning uses neural networks to learn patterns.
3. Artificial intelligence includes various technologies.

Generated Prompt for Style Transfer:


Use the following context as reference to rewrite the original text in the specified style.

Context:
1. Machine learning automates data analysis.
2. Deep learning uses neural networks to learn patterns.
3. Artificial intelligence includes various technologies.

Original Text: Artificial intelligence is transforming the world.

Please rewrite the text in the following style: poetic

Keep the core meaning and factual content but adapt the writing style and tone.
Be creative but maintain accuracy and clarity.

Rewritten Text:



  styled_output = llm(styled_prompt)  # Generate the styled text



--- Rewritten (Styled) Text ---
In the realm of the digital, where silicon hearts beat,
Artificial minds weave patterns that mortal minds can't meet.
A dance of algorithms, a symphony so sweet,
In the world of the artificial, our dreams take flight, so fleet.

Machine learning, the analyst of data, so swift and so keen,
Deep learning, the weaver of patterns, in the neural networks seen.
A tapestry of insights, woven in the deepest machine,
In the world of the artificial, knowledge grows like a green scene.

Artificial intelligence, the transformer of worlds, so grand,
Guiding us through the digital, with a hand held firm and strong,
In the world of the artificial, we dance to the beat of a song.

A dance of the digital, a waltz through the data,
Artificial intelligence, our guide, in the world of the future, so data.





# 11. RAG chain

Use the EnsembleRetriever to retrieve relevant documents from Chroma and BM25.
Format the retrieved documents into a readable context.
Generate a prompt for neural style transfer using the retrieved context and the input query.
Pass the prompt to the LLM and parse the model's response to return the final styled output.

In [19]:
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import PromptTemplate

def build_rag_chain(llm, chroma_store, bm25_retriever):
    """
    Build a RAG chain using an ensemble retriever with Chroma and BM25.
    """
    # Create the ensemble retriever
    ensemble_retriever = EnsembleRetriever(chroma_store, bm25_retriever)
    
    def retrieve_and_format_context(query, k=5):
        """Helper function to retrieve and format context"""
        context_docs = ensemble_retriever.get_relevant_documents(query, k=k)
        return format_docs(context_docs)
    
    def rag_chain(inputs):
        """Process inputs through the RAG pipeline"""
        # Get query and retrieve context
        query = inputs["question"]
        context = retrieve_and_format_context(query)
        
        # Generate the prompt using the style template
        prompt = style_prompt.format(
            style=inputs["style"],
            context=context,
            original_text=inputs["original_text"]
        )
        
        # Generate styled text using the LLM
        llm_output = llm(prompt)
        
        # Parse the output
        parser = StrOutputParser()
        result = parser.parse(llm_output)
        
        return result
    
    return rag_chain

# 12. Final response

1. Scrape content from specified URLs, process the raw text, and split it into smaller, retrievable chunks.
2. Build the retrievers: Create a Chroma vector store and a BM25 retriever using the processed documents.
3. Build the RAG chain: Set up a pipeline that integrates the retrievers, context formatting, and an LLM to perform neural style transfer.
4. Run the application: Accept a user query and a target style, then process the input through the RAG chain to produce styled output.

In [49]:


if __name__ == "__main__":
    """
    Main script for scraping, building retrievers, setting up the RAG chain,
    and running a neural style transfer demo.
    """

    # Step 1: Scrape content and split into documents
    print("Step 1: Scraping content and splitting into documents...")
    example_urls = [
        "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "https://en.wikipedia.org/wiki/Machine_learning"
    ]

    # Step 1A: Initialize an empty list to store all documents
    all_docs = []

    # Step 1B: Iterate through the URLs to fetch and process content
    for url in example_urls:
        print(f"Scraping content from: {url}")

        # Step 1B.1: Fetch and parse the raw text from the URL
        raw_text = fetch_and_parse(url)  # Replace with your implementation

        # Step 1B.2: Split the raw text into chunks (documents)
        splits = split_text_into_documents(raw_text)  # Replace with your implementation

        # Step 1B.3: Add the chunks to the list of documents
        all_docs.extend(splits)

    print(f"Total number of documents: {len(all_docs)}")

    # Step 2: Build Chroma and BM25 retrievers
    print("Step 2: Building Chroma vector store and BM25 retriever...")

    # Step 2A: Build the Chroma vector store
    chroma_store = build_chroma(all_docs)  # Replace with your implementation

    # Step 2B: Build the BM25 retriever
    bm25_retriever = BM25Retriever(all_docs)  # Replace with your implementation

    # Step 3: Build the RAG chain
    print("Step 3: Building RAG chain...")

    # Step 3A: Set up the LLM
    llm = setup_llm()  # Replace with your implementation

    # Step 3B: Build the RAG chain
    rag_chain = build_rag_chain(llm,chroma_store, bm25_retriever)  # Replace with your implementation

    # Step 4: Neural Style Transfer Demo
    print("\nStep 4: Neural Style Transfer Demo...")

    # Step 4A: Define the user query and target style
    user_text = "Explain machine learning."
    target_style = "as if it were a recipe for cooking"
    inputs = {"question": user_text, "style": target_style, "original_text": user_text}

    print("\n============================================")
    print("        Neural Style Transfer Demo          ")
    print("============================================")
    print(f"Original Text : {user_text}")
    print(f"Desired Style : {target_style}")

    # Step 5: Run the RAG chain
    print("\nStep 5: Running the RAG chain...")

    # Hint: Pass `inputs` through the RAG chain to generate styled output.
    styled_result = rag_chain(inputs)  # Replace with your implementation

    print("\n--- Styled Output ---")
    print(styled_result)


Step 1: Scraping content and splitting into documents...
Scraping content from: https://en.wikipedia.org/wiki/Artificial_intelligence
Scraping content from: https://en.wikipedia.org/wiki/Machine_learning
Total number of documents: 354
Step 2: Building Chroma vector store and BM25 retriever...
Step 3: Building RAG chain...

Step 4: Neural Style Transfer Demo...

        Neural Style Transfer Demo          
Original Text : Explain machine learning.
Desired Style : as if it were a recipe for cooking

Step 5: Running the RAG chain...

--- Styled Output ---

Title: A Simple Recipe for Machine Learning

Ingredients:
1. A generous helping of data
2. A pinch of statistical analysis
3. A dash of mathematical optimization
4. A dollop of deep learning (optional but highly recommended)

Instructions:
1. Gather your data and ensure it's well-prepared. This is the foundation for your dish.
2. Sprinkle a pinch of statistical analysis over your data to help you understand its characteristics and patte