#### Product recommendations in natural language using RAG and LLM
 - Embed user query using same embedding model as the vector database.(Sentence-Transformers)
 - Retrieve similar products using similarity search by query or by embeddings. (ChromaDB)
 - Generate a Natural Language Response using an LLM.
    - Open source models: LLaMA-2/Mistral/Gemma/Phi-2
    - ChatOpen AI  gpt-4

#### Load Pretrained Embedding Model

In [1]:
import pandas as pd
import os

In [18]:
# Define the storage path
PERSIST_DIRECTORY = "chromadb_vectorstore"

In [None]:
# Run this if you want to delete existing Chroma DB and start fresh DB creation. Restart before creating DB
import shutil

# Delete the existing directory
if os.path.exists(PERSIST_DIRECTORY):
    shutil.rmtree(PERSIST_DIRECTORY)
    print(f"Deleted existing ChromaDB at: {PERSIST_DIRECTORY}")

# If deleted, manually create the PERSIST_DIRECTORY again and restart to run recreation of the ChromaDB.

In [4]:
df_sample = pd.read_csv("sample_20k.csv")
df_sample.head(5)

Unnamed: 0,item_id,item_name,enhanced_product_desc,image_path,image_caption,complete_product_description
0,B07BDX9RLR,Amazon Brand - Symbol Men's Black Sneakers - 1...,"Given Product description: , Care Instructions...",0a/0a9b5866.jpg,a pair of black and white sneakers,a pair of black and white sneakers Given Produ...
1,B07913ZLB3,Amazon Brand - Symbol Men's Navy Polyester Sne...,"Given Product description: , Care Instructions...",f8/f8aa278f.jpg,men ' s sneakers - navy,men ' s sneakers - navy Given Product descript...
2,B07TBV6ZJT,Amazon Brand - Solimo Designer Candle Light 3D...,"Given Product description: , None, brand: Amaz...",ea/ea7a5dae.jpg,a red candle is lit on a white background,a red candle is lit on a white background Give...
3,B081HP8DWN,Amazon Brand - Solimo Designer Heart Design 3D...,"Given Product description: , None, brand: Amaz...",14/14177ded.jpg,a red phone case with hearts on it,a red phone case with hearts on it Given Produ...
4,B074H6PJKM,"365 Everyday Value, Dish Soap, Fig & Pear, 25 ...","Given Product description: , Brought to you by...",08/0888185d.jpg,a bottle of dish liquid,a bottle of dish liquid Given Product descript...


In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma


PERSIST_DIRECTORY = "chromadb_vectorstore"
# Initialize ChromaDB and OpenAI embeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma(embedding_function=embeddings, persist_directory=PERSIST_DIRECTORY)  # Pass embeddings to Chroma

# Store product embeddings
docs = [{'id': i, 'text': row['complete_product_description']} for i, row in df_sample.iterrows()]
vectorstore.add_texts(texts=[doc['text'] for doc in docs], metadatas=docs)

print("Embeddings generated and vector store saved to:", PERSIST_DIRECTORY)

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  vectorstore = Chroma(embedding_function=embeddings, persist_directory=PERSIST_DIRECTORY)  # Pass embeddings to Chroma


Embeddings generated and vector store saved to: chromadb_vectorstore


In [5]:
df_sample.iloc[1597]

item_id                                                                B07DJ8WC8G
item_name                       Amazon Brand - Symbol Men's Black/Blue Canvas ...
enhanced_product_desc           Given Product description: , Care Instructions...
image_path                                                        3f/3f38d885.jpg
image_caption                   men ' s sneakers in black and grey with red st...
complete_product_description    men ' s sneakers in black and grey with red st...
Name: 1597, dtype: object

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# Initialize ChromaDB and OpenAI embeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma(embedding_function=embeddings, persist_directory=PERSIST_DIRECTORY)  # Pass embeddings to Chroma

### Retrieve similar products

In [12]:
### Retrieve Similar Products
def retrieve_similar_products(query, top_k=10):
    """Retrieve the top-K similar products based on the query."""
    results = vectorstore.similarity_search(query, k=top_k)  # Get top 5 matches
    
    recommended_products = []
    for doc in results:
        id = doc.metadata.get("id")
        image_path = df_sample.iloc[id]['image_path']
        if not image_path or pd.isna(image_path):
            image_path = "No image available"
        recommended_products.append({
            "product_description": doc.page_content,  # Retrieved text
            "product_image": image_path,  # Image link
        })
        print(f"Doc: {doc}")
        print(f"metadata: {doc.metadata}")
        print(f"image_url: {image_path}")
        print("--------")
    
    return recommended_products

# Example user query
query = "Find me Men's Sneaker in black color"
recommended_products = retrieve_similar_products(query)

Doc: page_content='men ' s sneakers in black and grey with red stripes Given Product description: , Care Instructions: Allow your pair of shoes to air and de-odorize at regular basis; Using a Shoe-horn to wear your shoes will avoid damage to the back of your shoes; Use Shoe bags to prevent any stains or mildew., brand: Amazon Brand - Symbol, weight: , color: Black/Blue, height: 1176.0, width: 2560.0, model year: , shape: , style: AZ-SH-05D_Black/Blue_11, material: Canvas, product_type: SHOES' metadata={'id': 1597, 'text': "men ' s sneakers in black and grey with red stripes Given Product description: , Care Instructions: Allow your pair of shoes to air and de-odorize at regular basis; Using a Shoe-horn to wear your shoes will avoid damage to the back of your shoes; Use Shoe bags to prevent any stains or mildew., brand: Amazon Brand - Symbol, weight: , color: Black/Blue, height: 1176.0, width: 2560.0, model year: , shape: , style: AZ-SH-05D_Black/Blue_11, material: Canvas, product_type:

In [None]:
df_sample.iloc[1597]

item_id                                                                B07DJ8WC8G
item_name                       Amazon Brand - Symbol Men's Black/Blue Canvas ...
enhanced_product_desc           Given Product description: , Care Instructions...
image_path                                                        3f/3f38d885.jpg
image_caption                   men ' s sneakers in black and grey with red st...
complete_product_description    men ' s sneakers in black and grey with red st...
Name: 1597, dtype: object

In [13]:
df_sample.iloc[1597]['image_path']

'3f/3f38d885.jpg'

In [13]:
# Print the recommended products
for i, result in enumerate(recommended_products):
    print(f"Result {i+1}:")
    print(f"Product Description: {result['product_description']}")
    print(f"Product Image: {result['product_image']}")
    print("---")

Result 1:
Product Description: men ' s sneakers in black and grey with red stripes Given Product description: , Care Instructions: Allow your pair of shoes to air and de-odorize at regular basis; Using a Shoe-horn to wear your shoes will avoid damage to the back of your shoes; Use Shoe bags to prevent any stains or mildew., brand: Amazon Brand - Symbol, weight: , color: Black/Blue, height: 1176.0, width: 2560.0, model year: , shape: , style: AZ-SH-05D_Black/Blue_11, material: Canvas, product_type: SHOES
Product Image: 3f/3f38d885.jpg
---
Result 2:
Product Description: a black sneaker with white soles Given Product description: , Designed in Europe - please refer to size chart for specific measurements to achieve the perfect fit, brand: find., weight: , color: Black (Black), height: 585.0, width: 1530.0, model year: 2017.0, shape: , style: Men's Retro Trainer Sneakers, material: , product_type: SHOES
Product Image: ad/adba4e0c.jpg
---
Result 3:
Product Description: a picture of a black 

#### Generate Natual Language Response using LLM

In [14]:
!pip install python-dotenv
!pip install openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [15]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Check if the API key is loaded correctly
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    raise ValueError("OpenAI API key not found. Please set the OPENAI_API_KEY environment variable.")

In [16]:
from langchain.chat_models import ChatOpenAI

# Load an LLM (GPT-4 for best responses, or use an open-source model)
llm = ChatOpenAI(model_name="gpt-4", temperature=0.7)

def generate_natural_language_response(query, products):
    """Generate a response based on retrieved products using LLM."""
    prompt = f"""
    A customer is looking for a product based on this query: "{query}"
    Here are the recommended products:
    {products}
    
    Generate a natural language response listing the products in a friendly tone.
    """

    response = llm.predict(prompt)
    return response

# Generate response
response_text = generate_natural_language_response(query, recommended_products)
print(response_text)

  llm = ChatOpenAI(model_name="gpt-4", temperature=0.7)
  response = llm.predict(prompt)


Sure, I found several great options for men's sneakers in black color for you:

1. [Amazon Brand - Symbol Men's Sneakers](3f/3f38d885.jpg) in black and blue color. They have red stripes and are made of canvas. They also come with care instructions to help maintain their look.

2. [find. Men's Retro Trainer Sneakers](ad/adba4e0c.jpg) in black color. These were designed in Europe in 2017 and feature white soles.

3. [Amazon Brand - Inkast Denim Co. Sneaker](d0/d0eb62ae.jpg) is a black leather sneaker. It comes with care instructions to ensure longevity.

4. [Amazon Brand - Symbol Sneaker](80/803d8cfd.jpg) is a black and white sneaker. It's a low-top style made of synthetic material.

5. [Amazon Brand - Symbol Men's Shoes](41/412be8e0.jpg) are another great option. They are black and made of synthetic material.

Please note that some of these sneakers have specific care instructions to maintain their look and longevity. Feel free to click on the links to view the images and let me know if

#### Todo: 
    - Fine tune LLaMA-2 using LoRA and QLoRA?
    - Inference and Accuracy
    - Compare different open-source LLMs.