## Library

In [None]:
pip install langchain openai google-search-results chromadb pypdf sentence_transformers

In [None]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI as l_OpenAI

## Enter Keys

In [None]:
SERPAPI_API_KEY = "xxx"
OPENAI_API_KEY = "sk-xxx"

In [None]:
import openai

In [None]:
from typing import List, Dict, Any

In [None]:
openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

## Approach 1

Fine-tuning a language model (LLM) like GPT (Generative Pre-trained Transformer) on OpenAI involves customizing the model's behavior by training it further on a specific dataset. This process adjusts the model's parameters so that it becomes more adept at generating responses or completing tasks that are similar to the examples in the dataset used for fine-tuning. The objective is to tailor the model's responses to better align with the domain-specific language, style, or requirements of the dataset.

### High-Level Process

The code snippet you've provided is a Python function that interfaces with a fine-tuned model hosted by OpenAI's API. Here's a breakdown of its components:

1. **Function Definition**: `def model_finetune(query: str) -> str` defines a function named `model_finetune` that takes a string `query` as input and returns a string. The input `query` is the prompt that you want to send to the fine-tuned model.

2. **Completion Request**: Within the function, `openai_client.completions.create()` is called. This function is part of the OpenAI API client library. It requests the API to generate a completion (or response) using the specified model and prompt.

    - `model="ft:davinci-002:personal::8JEsV0S6"` specifies the model to use for the completion. Here, `"ft:davinci-002:personal::8JEsV0S6"` indicates a specific fine-tuned version of the Davinci model. The `ft` prefix stands for fine-tuned, `davinci-002` is the base model type, and `personal::8JEsV0S6` uniquely identifies your fine-tuned model version. The fine-tuning was performed using 12 CSV files as indicated in your comment, which means your model has been trained to understand and generate responses based on the data in those CSVs.

3. **Prompt**: The `prompt` parameter in the function call is set to the `query` variable, which means the text you provide as input to the `model_finetune` function is sent to the model as the prompt for generating a completion.

4. **Return Value**: The model's response is accessed via `completion.choices[0].text`. The `choices` list contains possible completions generated by the model, and `[0]` selects the first (and typically only) completion. The `.text` property extracts the textual content of the completion. This text is then returned by the `model_finetune` function.

In essence, when you call `model_finetune` with a specific query, it sends this query to your fine-tuned version of the Davinci model on OpenAI's API. The model generates a response based on how it was fine-tuned with your specific dataset (the 12 CSVs), and this response is returned by the function. This allows you to leverage the power of a large language model while ensuring the responses are tailored to the nuances of your specific data and use case.


This is API call to request fine tuned model.

In [None]:
def model_finetune(query: str) -> str:
    completion = openai_client.completions.create(
        model="ft:davinci-002:personal::8JEsV0S6", # fine tuned model using customized data
        prompt=query
    )

    return completion.choices[0].text

## Approach 2

When you make an API call to `openai_client.chat.completions.create()` function as illustrated in your Python function `call_chatgpt`, you are essentially interacting with OpenAI's GPT (Generative Pre-trained Transformer) models in a chat-like format. This API is designed to facilitate conversational responses from the model, making it well-suited for applications that require interactive dialogues or conversational AI capabilities. Here's a high-level explanation of the process and components involved in the function:



### High-Level Process

1. **Function Invocation**: The function `call_chatgpt` is called with a `query` (the user's input) and an optional `model` parameter (the specific version of GPT you want to use, defaulting to "gpt-3.5-turbo").

2. **Conversation Context Preparation**: Before making the API call, the function prepares a conversation context. This context is represented as a list of messages, each with a `role` (either "system" or "user") and `content`. The "system" message sets the stage for the type of assistant the model should emulate (in this case, "You are a helpful assistant."), and the "user" message contains the query prefaced by "Question: ". This structured format helps the model understand the nature of the interaction and respond appropriately.

3. **API Call**: With the context prepared, the function then calls `openai_client.chat.completions.create()`, passing the selected `model` and the prepared `messages` as parameters. This API endpoint is specifically designed for generating responses in a conversational context, leveraging the model's ability to understand and continue dialogues.

    - `model`: Specifies which version of the GPT model to use for generating the response. Different versions may have different capabilities, performance characteristics, or costs associated with them.
    - `messages`: Provides the conversational context to the model, including any system instructions and the user's query.

4. **Response Extraction**: The API returns a response object that includes one or more "choices", each containing a message. The function extracts the `content` of the message from the first choice, which is the model's response to the user's query within the given context.

5. **Return Value**: The extracted content, which is the model-generated response to the query, is returned to the caller.

### Summary

In summary, calling `openai_client.chat.completions.create()` through the `call_chatgpt` function allows you to engage with an OpenAI GPT model in a conversational manner. By providing a structured conversational context and specifying a model, you can generate responses that are tailored to the input query, simulating a chat with a human-like assistant. This capability is particularly useful for building chatbots, virtual assistants, and other applications where interactive, natural language understanding and generation are required.


This is API call to ask `chatgpt` directly.

In [None]:
def call_chatgpt(query: str, model: str = "gpt-3.5-turbo") -> str:
    """
    Generates a response to a query using the specified language model.

    Args:
        query (str): The user's query that needs to be processed.
        model (str, optional): The language model to be used. Defaults to "gpt-3.5-turbo".

    Returns:
        str: The generated response to the query.
    """

    # Prepare the conversation context with system and user messages.
    messages: List[Dict[str, str]] = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Question: {query}."},
    ]

    # Use the OpenAI client to generate a response based on the model and the conversation context.
    response: Any = openai_client.chat.completions.create(
        model=model,
        messages=messages,
    )

    # Extract the content of the response from the first choice.
    content: str = response.choices[0].message.content

    # Return the generated content.
    return content


## Approach 3


The function `call_langchain` is designed to use LangChain, a framework for building applications with Language Models (LLMs) and integrating various tools for enhanced capabilities, including external data retrieval and processing. This specific function is set up to perform searches using SerpAPI (a service that interfaces with Google Search) and perform mathematical calculations using an integrated LLM, all within a conversational agent context. Here's a breakdown of the high-level operations performed by the function:

1. **Initialize the LLM with OpenAI**: The function starts by initializing a language model from OpenAI with a specified `temperature` parameter set to 0, indicating that the responses should be deterministic and not vary between calls. The `OPENAI_API_KEY` is used to authenticate and access OpenAI's API services.

2. **Load Required Tools**: It then loads a set of tools using `load_tools`, specifically "serpapi" for performing searches via Google Search API, and "llm-math" for carrying out mathematical operations. These tools are loaded with their respective API keys (`SERPAPI_API_KEY` for SerpAPI) and the previously initialized language model (`llm`).

3. **Initialize Agent**: With the tools and the language model ready, the function initializes an agent using `initialize_agent`. This agent is configured to use the provided tools and language model to interact with prompts. The `AgentType.ZERO_SHOT_REACT_DESCRIPTION` parameter suggests that the agent operates in a zero-shot manner, meaning it attempts to understand and respond to tasks without prior specific training on those tasks. The `verbose=True` parameter likely enables detailed logging or output of the agent's processing for debugging or informational purposes.

4. **Run the Agent**: The agent is then run with the user-provided `prompt`. This could be any query or instruction that the user wishes to process, such as a question that requires searching the internet via SerpAPI or a mathematical problem that the "llm-math" tool can help solve.

5. **Return Output**: Finally, the output from the agent's processing of the prompt is returned. This output could include answers to queries, results of internet searches, solutions to mathematical problems, or any other information the agent is configured to provide based on the prompt and the available tools.

In essence, the `call_langchain` function demonstrates a sophisticated integration of AI and external APIs to provide a versatile conversational agent capable of performing specific tasks like web searches and mathematical calculations, leveraging the capabilities of language models for understanding and generating human-like responses.


This is to use `langchain` to have internet access.

In [None]:
def call_langchain(prompt: str) -> str:
    llm = l_OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
    tools = load_tools(["serpapi", "llm-math"], llm=llm, serpapi_api_key=SERPAPI_API_KEY)
    agent = initialize_agent(
        tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
    )
    output = agent.run(prompt)

    return output

## Approach 4


The code snippet you've provided demonstrates an implementation of a Retrieval-Augmented Generation (RAG) algorithm using a Language Model (LLM) for processing and generating responses based on retrieved information from a specialized database. RAG combines the capabilities of information retrieval and generative language models to enhance the accuracy and relevance of responses to queries. Here's a step-by-step breakdown of the process and how the code operates at a high level:

### 1. **Setup and Embedding Preparation**
- **PDF to Chroma DB Conversion**: Initially, a PDF document is transformed into a format that can be queried effectively. This involves extracting text from the PDF and storing it in a "Chroma" database. This database is structured to facilitate efficient retrieval of documents based on semantic similarity.

- **Vector DB Creation**: The text data from the Chroma DB is then processed into a vector representation using an embedding function, typically a model like `SentenceTransformerEmbeddingFunction`. This creates a Vector DB where each document or snippet of text from the original PDF is represented as a vector (a list of numbers) in a high-dimensional space. The vector representation allows for measuring semantic similarity between the query and the documents.

### 2. **Query Processing**
- **Load Chroma Collection**: The `load_chroma` function initializes the Chroma database with the extracted text from the PDF (`pdf_path`) and prepares it for querying. The `embedding_function` is used to ensure that the documents are stored in their vectorized form.

- **Document Retrieval**: When the `rag` function is called with a user query, the `chroma_collection.query` function is used to retrieve the most relevant documents based on semantic similarity to the query. It returns a specified number of results (`n_results=10`), including the documents and their embeddings. The most relevant document is then selected (`results['documents'][0]`).

### 3. **Augmentation and Response Generation**
- **Query Augmentation**: The selected document is then used to augment the original query, framing a new prompt that includes both the user's query and the context provided by the retrieved document. This augmented query is more informative and allows the language model to generate a response that is not only based on the query but also grounded in the specific content of the relevant document.

- **Response Generation**: The augmented query is passed to a chat model (e.g., GPT-3.5-turbo via `call_chatgpt` function), which generates a response. This step leverages the generative capabilities of the LLM, using both the original query and the context from the retrieved document to provide a detailed and contextually relevant answer.

### 4. **Output**
- The function returns the generated response, which ideally combines the generative power of the LLM with the specific, relevant knowledge extracted from the Chroma database.

### Summary
This RAG implementation enhances the capability of a language model to provide accurate and contextually relevant answers by augmenting its responses with information retrieved from a specialized database. It demonstrates a powerful approach to integrating knowledge retrieval with generative AI, making it particularly useful for applications requiring detailed, accurate, and context-aware responses.

### Vector DB Query Search (from scratch)

![image](https://raw.githubusercontent.com/yiqiao-yin/WYNAssociates/main/figs/vector%20db%20from%20scratch.png)

The above walks through what does the **similarity search** do when we send a `query` into a vector database.

In [None]:
pdf_path = "/path/to/file/file_name.pdf"

In [None]:
import os

In [None]:
os.chdir("/content/drive/MyDrive/Colab Notebooks/AI Research/Students/xxx/lectures/2024")

In [None]:
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
from helper_utils import load_chroma, word_wrap, project_embeddings

In [None]:
pdf_path.split('/')[-1].split('.')[0]

In [None]:
%%time
embedding_function = SentenceTransformerEmbeddingFunction()

nom = pdf_path.split('/')[-1].split('.')[0]
chroma_collection = load_chroma(filename=pdf_path, collection_name=f'{nom}', embedding_function=embedding_function)
chroma_collection.count()

In [None]:
def rag(query: str) -> str:
    results = chroma_collection.query(query_texts=query, n_results=10, include=['documents', 'embeddings'])
    retrieved_doc = results['documents'][0]

    updated_query = f"""
        Answer the question: {query}
        Based on the document provided: {retrieved_doc}
    """
    response = call_chatgpt(updated_query)
    return response

## Get Data

This assumes we get `.csv` per topic.

In [None]:
print(nom)

In [None]:
path_of_csv = f"file/path/{nom}.csv"

In [None]:
import pandas as pd

In [None]:
current_data = pd.read_csv(path_of_csv)

In [None]:
current_data.head(2)

## Test

In [None]:
query = current_data.questions[0]
true_ans = current_data.answers[0]

In [None]:
ans_finetune = model_finetune(query)
ans_finetune

In [None]:
ans_langchain = call_langchain(query)

In [None]:
ans_langchain

In [None]:
ans_chatgpt = call_chatgpt(query)
ans_chatgpt

In [None]:
ans_rag = rag(query)
ans_rag

## Measure it

In [None]:
import numpy as np
from scipy.spatial.distance import cosine

In [None]:
def get_embedding(text, model="text-embedding-3-small"):
   text = text.replace("\n", " ")
   return openai_client.embeddings.create(input = [text], model=model).data[0].embedding

In [None]:
def calculate_sts_openai_score(sentence1: str, sentence2: str) -> float:
    # Compute sentence embeddings
    embedding1 = get_embedding(sentence1)  # Flatten the embedding array
    embedding2 = get_embedding(sentence2)  # Flatten the embedding array

    # Convert to array
    embedding1 = np.asarray(embedding1)
    embedding2 = np.asarray(embedding2)

    # Calculate cosine similarity between the embeddings
    similarity_score = 1 - cosine(embedding1, embedding2)

    return similarity_score

In [None]:
print(calculate_sts_openai_score(ans_finetune, true_ans))
print(calculate_sts_openai_score(ans_langchain, true_ans))
print(calculate_sts_openai_score(ans_chatgpt, true_ans))
print(calculate_sts_openai_score(ans_rag, true_ans))

## Test on Entire `.csv`

In [None]:
from tqdm import tqdm

In [None]:
current_ans = []

for i in tqdm(range(len(current_data))):
    query = current_data.questions[i]

    # Approach #1: model_finetune
    pred = model_finetune(query)
    current_ans.append(pred)

current_data['approach_1'] = current_ans

In [None]:
current_ans = []

for i in tqdm(range(len(current_data))):
    query = current_data.questions[i]

    # Approach #2: call_langchain
    try:
        pred = call_langchain(query)
    except:
        pred = ""
        print("Error")
    current_ans.append(pred)

current_data['approach_2'] = current_ans

In [None]:
current_ans = []

for i in tqdm(range(len(current_data))):
    query = current_data.questions[i]

    # Approach #3: call_chatgpt
    pred = call_chatgpt(query)
    current_ans.append(pred)

current_data['approach_3'] = current_ans

In [None]:
current_ans = []

for i in tqdm(range(len(current_data))):
    query = current_data.questions[i]

    # Approach #4: rag
    pred = rag(query)
    current_ans.append(pred)

current_data['approach_4'] = current_ans

In [None]:
%%time

current_data['score_approach_1'] = current_data.apply(lambda x: calculate_sts_openai_score(x['approach_1'], x['answers']), axis=1)
current_data['score_approach_2'] = current_data.apply(lambda x: calculate_sts_openai_score(x['approach_2'], x['answers']), axis=1)
current_data['score_approach_3'] = current_data.apply(lambda x: calculate_sts_openai_score(x['approach_3'], x['answers']), axis=1)
current_data['score_approach_4'] = current_data.apply(lambda x: calculate_sts_openai_score(x['approach_4'], x['answers']), axis=1)

In [None]:
current_data.to_csv(f"/content/drive/MyDrive/Colab Notebooks/AI Research/Students/xxx/data/final_score_{nom}.csv")