<a href="https://colab.research.google.com/github/yongsa-nut/SF251_67_2/blob/main/SF251_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cohere API and LLM Basic

In [None]:
!pip install cohere

In [None]:
from google.colab import userdata
import cohere

# Get your free API key: https://dashboard.cohere.com/api-keys
co = cohere.ClientV2(api_key=userdata.get('cohere'))

Basic API Usage

In [None]:
# Add the user message
message = "Hello!"

# Add the messages
messages = [
    {"role": "user", "content": message}
]

# Generate the response
response = co.chat(model="command-a-03-2025",
                   messages=messages)

print(response.message.content[0].text)

Other example: https://openrouter.ai/

Conversation Loop:
- User -> Assistant -> User -> Assistant -> ...

In [None]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."

# Create a custom system message
system_message = """## Task and Context
Generate concise responses, with maximum one-sentence."""

# Add the messages
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": message},
]

# Generate the response
response = co.chat(model="command-a-03-2025", messages=messages)

print(response.message.content[0].text)

Then add the assistant's response to our messages and we can add the user's message next.

In [None]:
# Append the previous response
messages.append(
    {"role": "assistant", "content": response.message.content[0].text}
)
# Add the user message
message = "Make it more upbeat and conversational."
# Append the user message
messages.append({"role": "user", "content": message})
# Generate the response with the current chat history as the context
response = co.chat(model="command-a-03-2025", messages=messages)
print(response.message.content[0].text)

Then we can keep going.
Below is the full loop.

In [None]:
system_message = "You are a helpful assistant." # << Change this to see the effect

messages = [{"role": "system", "content": system_message}]

# Coding this together!



In [None]:
print(messages)

# RAG

- Note: Cohere actually has its own RAG api ([documentation](https://docs.cohere.com/docs/rag-with-cohere))

<br>

---

## Demo 1: Keyword Matching

In [None]:
def get_response(query):
  response = co.chat(model="command-a-03-2025",
                     messages=[{'role':'user','content':query}])
  return response.message.content[0].text

get_response('Hello test test')

In [None]:
# The data
lorebook = { 'ตึกวิจัย' : 'ตำแหน่ง latitude = 14.0694899, longitude = 100.6050282',
             'ตึกอำนวยการ' : 'ตำแหน่ง latitude = 14.0690024, longitude = 100.6061611'
}

In [None]:
def keyword_generate(query, docs):
  # Retrive information and create context
  context = '<context>'
  for k in docs:
    if k in query:
      context += f'{k} = {docs[k]}\n'
  context += '</context>'

  prompt = f'''<question>{query}</question>
  Please use context in <context> tags to answer the question.
  <context>{context}</context>'''
  response = get_response(prompt)
  # printing out
  print('Query:', query)
  print('Retrieved documents:', context)
  print('Response:', response)

query = 'ตึกวิจัยอยู่ไหน?'

keyword_generate(query, lorebook)

Query: ตึกวิจัยอยู่ไหน?
Retrieved documents: <context>ตึกวิจัย = ตำแหน่ง latitude = 14.0694899, longitude = 100.6050282
</context>
Response: ตึกวิจัยอยู่ที่ตำแหน่ง latitude 14.0694899, longitude 100.6050282



<br>

---

## Demo 2: BM25

In [None]:
# Install necessary libraries
!pip install rank_bm25

In [None]:
from rank_bm25 import BM25Okapi

In [None]:
# Sample document collection
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "A journey of a thousand miles begins with a single step.",
    "To be or not to be, that is the question.",
    "All that glitters is not gold.",
    "Where there's smoke, there's fire.",
    "The early bird catches the worm.",
    "Actions speak louder than words.",
    "Knowledge is power, but enthusiasm pulls the switch.",
    "The pen is mightier than the sword.",
    "When life gives you lemons, make lemonade."
]

# Simple tokenization using string splitting
tokenized_docs = [doc.lower().split() for doc in documents]

# Create BM25 object
bm25 = BM25Okapi(tokenized_docs)

In [None]:
def retrieve(query, top_k=2):
  # Words only
  tokenized_query = query.lower().split()
  # Pass the list of words into bm25 to get scores
  doc_scores = bm25.get_scores(tokenized_query)
  # Then retrieve the score
  top_docs = sorted(enumerate(doc_scores), key=lambda x: x[1], reverse=True)[:top_k]

  return [documents[i] for i, _ in top_docs]

In [None]:
def bm25_response(query):
  retrieved_docs = retrieve(query)
  context = "\n".join(retrieved_docs)

  prompt = f'''<question>{query}</question>
  Please use context in <context> tags to answr the question.
  <context>{context}</context>'''
  response = get_response(prompt)
  # printing out
  print('Query:', query)
  print('Retrieved documents:', retrieved_docs)
  print('Response:', response)

query = "What jumps over the dog?"
bm25_response(query)

Query: What jumps over the dog?
Retrieved documents: ['The quick brown fox jumps over the lazy dog.', 'A journey of a thousand miles begins with a single step.']
Response: The quick brown fox jumps over the dog.



<br>

---

## Demo 3: RAG without actual database

In [None]:
!pip install sentence_transformers datasets

In [None]:
import torch
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
# Same document collection
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "A journey of a thousand miles begins with a single step.",
    "To be or not to be, that is the question.",
    "All that glitters is not gold.",
    "Where there's smoke, there's fire.",
    "The early bird catches the worm.",
    "Actions speak louder than words.",
    "Knowledge is power, but enthusiasm pulls the switch.",
    "The pen is mightier than the sword.",
    "When life gives you lemons, make lemonade."
]

# Initialize sentence transformer model
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

# Embed documents
doc_embeddings = embed_model.encode(documents)

In [None]:
doc_embeddings

In [None]:
len(doc_embeddings[0])

384

In [None]:
def retrieve_RAG_docs(query, top_k=1):
    # Embed the query
    query_embedding = embed_model.encode([query])

    # Calculate cosine similarity
    similarities = cosine_similarity(query_embedding, doc_embeddings)[0]

    # Get top-k relevant documents
    top_indices = similarities.argsort()[-top_k:][::-1]
    return [documents[i] for i in top_indices]

In [None]:
def RAG_response(query, top_k=3):
  retrieved_docs = retrieve_RAG_docs(query, top_k)
  context = "\n".join(retrieved_docs)

  prompt = f'''<question>{query}</question>
  Please use context in <context> tags to answr the question.
  <context>{context}</context>'''
  response = get_response(prompt)
  # printing out
  print('Query:', query)
  print('Retrieved documents:', retrieved_docs)
  print('Response:', response)

query = "What jumps over the dog?"
RAG_response(query)

<br>

---

## Demo 4: RAG with Pinecone

In [None]:
!pip3 install pinecone

In [None]:
from pinecone import Pinecone, ServerlessSpec

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
if device != 'cuda':
    print('Sorry no cuda.')
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

In [None]:
query = 'What jumps over the dog?'
xq = model.encode(query)
xq.shape

In [None]:
# Same document collection
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "A journey of a thousand miles begins with a single step.",
    "To be or not to be, that is the question.",
    "All that glitters is not gold.",
    "Where there's smoke, there's fire.",
    "The early bird catches the worm.",
    "Actions speak louder than words.",
    "Knowledge is power, but enthusiasm pulls the switch.",
    "The pen is mightier than the sword.",
    "When life gives you lemons, make lemonade."
]

doc_embeddings = embed_model.encode(documents)

### Setup Pinecone

In [None]:
# Get secret key
from google.colab import userdata

In [None]:
pinecone = Pinecone(api_key=userdata.get('pinecone_key'))
INDEX_NAME = 'test1-2-16-2025'

# Cleaning up the index
if INDEX_NAME in [index.name for index in pinecone.list_indexes()]:
    pinecone.delete_index(INDEX_NAME)
print(INDEX_NAME)

# Creating a serverless index
pinecone.create_index(
    name = INDEX_NAME,
    dimension = model.get_sentence_embedding_dimension(),
    metric = 'cosine',
    spec = ServerlessSpec(cloud='aws', region='us-east-1')) #

index = pinecone.Index(INDEX_NAME)
print(index)

### Upsert to Pinecone

- Format: A list of dict
  - `{'id':xx, 'value':embedding, 'metadata':dict}`
- Document: https://docs.pinecone.io/reference/api/2024-07/data-plane/upsert

In [None]:
# Create id
ids = [str(x) for x in range(len(documents))]
# Create metadata
metadatas = [{'text': text} for text in documents]
# Zip them together
records = zip(ids, doc_embeddings, metadatas)
index.upsert(vectors=records)   # There is a limit on how much you can upsert at a time. See https://docs.pinecone.io/guides/data/upsert-data

{'upserted_count': 10}

In [None]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}

### Retriving Documents




In [None]:
query = 'What jumps over the dog?'

# 1) Embedding your query
embed_query = model.encode(query).tolist()
retrieved_docs =  index.query(vector=embed_query, top_k=1, include_metadata=True)
print(retrieved_docs)

{'matches': [], 'namespace': '', 'usage': {'read_units': 1}}


In [None]:
text = [r['metadata']['text'] for r in retrieved_docs['matches']]
print(text)

[]


### Generate with retrived documents

In [None]:
def RAG_pinecone_response(query, top_k=1):
  # First embedding your query
  embed_query = model.encode(query).tolist()
  # Then retrieve the document
  retrieved_docs =  index.query(vector=embed_query,
                                top_k=top_k,
                                include_metadata=True)
  # Then get the actual text
  text = [r['metadata']['text'] for r in retrieved_docs['matches']]
  # Finally join them together
  context = "\n".join(text)

  prompt = f'''<question>{query}</question>
  Please use context in <context> tags to answr the question.
  <context>{context}</context>'''
  response = get_response(prompt)
  # printing out
  print('Query:', query)
  print('Retrieved documents:', text)
  print('Response:', response)

query = 'What jumps over the dog?'
RAG_pinecone_response(query)